 Sep 2019


from falsehood you can derive everything ** false \leq truerestrict: don't talk about elements > you have to talk about arrows (relations) .... interview the friends *product types: [pairs, tuples, records,...]


www.youtube.com www.youtube.com

Chem Hung 1 week ago • Next, Inside Jeff's Package: Unpacking Jeff Bezos
@fun


mathoverflow.net mathoverflow.net

Having quick access to definitions and references, which are easily found by search engines, and viewable on a regular web page with a smartphone, should help facilitate quicker progress. The conclusion is, funding an online resource with the purpose to quickly get a new PhD student or researcher new to the field up to speed, is probably a good investment.
math wiki


www.cyto.purdue.edu www.cyto.purdue.edu

rangeor difference between maximum and minimum intensity values in a neighborhood
@property

Texture can be described as fine, coarse, grained, smooth, etc
@descriptors

Such features are found in the toneand structureof a texture
@properties

Texture is characterized by the spatial distribution of intensity levels in a neighborhood
distributions


developers.googleblog.com developers.googleblog.com

Text embedding models convert any input text into an output vector of numbers, and in the process map semantically similar words near each other in the embedding space: Figure 2: Text embeddings convert any text into a vector of numbers (left). Semantically similar pieces of text are mapped nearby each other in the embedding space (right). Given a trained text embedding model, we can directly measure the associations the model has between words or phrases. Many of these associations are expected and are helpful for natural language tasks. However, some associations may be problematic or hurtful. For example, the groundbreaking paper by Bolukbasi et al. [4] found that the vectorrelationship between "man" and "woman" was similar to the relationship between "physician" and "registered nurse" or "shopkeeper" and "housewife"
love that Big Lebowski reference


en.wikipedia.org en.wikipedia.org

Since all neurons in a single depth slice share the same parameters, the forward pass in each depth slice of the convolutional layer can be computed as a convolution of the neuron's weights with the input volume.[nb 2] Therefore, it is common to refer to the sets of weights as a filter (or a kernel), which is convolved with the input. The result of this convolution is an activation map, and the set of activation maps for each different filter are stacked together along the depth dimension to produce the output volume. Parameter sharing contributes to the translation invariance of the CNN architecture. Sometimes, the parameter sharing assumption may not make sense. This is especially the case when the input images to a CNN have some specific centered structure; for which we expect completely different features to be learned on different spatial locations. One practical example is when the inputs are faces that have been centered in the image: we might expect different eyespecific or hairspecific features to be learned in different parts of the image. In that case it is common to relax the parameter sharing scheme, and instead simply call the layer a "locally connected layer".
important terms you hear repeatedly great visuals and graphics @https://distill.pub/2018/buildingblocks/


setosa.io setosa.io

Here's a playground were you can select different kernel matrices and see how they effect the original image or build your own kernel. You can also upload your own image or use live video if your browser supports it. blurbottom sobelcustomembossidentityleft sobeloutlineright sobelsharpentop sobel The sharpen kernel emphasizes differences in adjacent pixel values. This makes the image look more vivid. The blur kernel deemphasizes differences in adjacent pixel values. The emboss kernel (similar to the sobel kernel and sometimes referred to mean the same) givens the illusion of depth by emphasizing the differences of pixels in a given direction. In this case, in a direction along a line from the top left to the bottom right. The indentity kernel leaves the image unchanged. How boring! The custom kernel is whatever you make it.
I'm all about my custom kernels!



We developed a new metric, UAR, which compares the robustness of a model against an attack to adversarial training against that attack. Adversarial training is a strong defense that uses knowledge of an adversary by training on adversarially attacked images[3]To compute UAR, we average the accuracy of the defense across multiple distortion sizes and normalize by the performance of an adversarially trained model; a precise definition is in our paper. . A UAR score near 100 against an unforeseen adversarial attack implies performance comparable to a defense with prior knowledge of the attack, making this a challenging objective.
@metric
Tags
Annotators
URL


betterexplained.com betterexplained.com

Time for the red pill. A matrix is a shorthand for our diagrams: A matrix is a single variable representing a spreadsheet of inputs or operations.

operation F is linear if scaling inputs scales the output, and adding inputs adds the outputs:
additionpreserving

Linear algebra gives you minispreadsheets for your math equations. We can take a table of data (a matrix) and create updated tables from the original. It’s the power of a spreadsheet written as an equation.
matrix = equationexcel

 Aug 2019

aatishb.com aatishb.com

So when we look at really tiny solids, energy doesn’t always flow from a hot object to a cold one. It can go the other way sometimes. And entropy doesn’t always increase. This isn't just a theoretical issue, entropy decreases have actually been seen in microscopic experiments.
boltzman: it's statistically increasingly unlikely that entropy decreases

The atomic world is a twoway street. But when we get to large collections of atoms, a oneway street emerges for the direction in which events take place.
micro > macro emergence


www.math3ma.com www.math3ma.com

Good general theory does not search for the maximum generality, but for the right generality.
Especially true in practical programming

It's true that category theory may not help you find a delta for your epsilon, or determine if your group of order 520 is simple, or construct a solution to your PDE. For those endeavors, we do have to put our feet back on the ground.
@CT:limits

hierarchy of questions: "What about the relationships between the relationships between the relationships between the...?" This leads to infinity categories. [And a possible brain freeze.] For more, see here.) As pieinthesky as this may sound, these ideascategories, functors, and natural transformationslead to a treasure trove of theory that shows up almost everywhere.
Turtles all the way up


bartoszmilewski.com bartoszmilewski.com

But there is an alternative. It’s called denotational semantics and it’s based on math. In denotational semantics every programing construct is given its mathematical interpretation. With that, if you want to prove a property of a program, you just prove a mathematical theorem


phenomenalworld.org phenomenalworld.org

And worst of all, we’ve lost sight of the most salient part about computers: their malleability. We’ve acquiesced the creation of our virtual worlds, where we now spend most of our time, to the select few who can spend the millions to hire enough software engineers. So many of the photons that hit our eyes come from purely fungible pixels, yet for most of us, these pixels are all but carved in stone. Smartphone apps, like the kitchen appliances before them, are polished, singlepurposes tools with only the meanest amount of customizability and interoperability. They are monstrosities of code, millions of lines, that only an army of programers could hope to tame. As soon as they can swipe, our children are given magical rectangles that for all their lives will be as inscrutable as if they were truly magic.
I was a professional web developer for two years and I now have to bring myself to even touch CSS or the DOM. Whenever I have to make anything on the web work I know I'm gonna spend 3 hours in pain for something that should take 5 minutes.


futureofcoding.org futureofcoding.org

He continued, “All the interesting things people want to do with computers require low level languages. Think about it. You can never make a abstract language that can do the cutting edge in software.” “Ok,” I said, willing to concede the point, “I’m not too worried if my language is too slow for cuttingedge algorithms. It’ll still be fast enough for most people doing most things.” He smirked, having ensnared me in his Socratic trap, “Then it’s not really a programming language. It’s just another limited abstraction, like Squarespace, that people will have to leave if they want to do anything novel.” Eyeroll. Paul Chiusano has a great response to people like Dave: There are a couple unspoken assumptions here that don’t hold up to scrutiny—one is that abstraction is in general not to be trusted. We can blame Spolsky for infecting our industry with that meme. A little reflection on the history of software reveals abstraction as the primary means by which humans make increasingly complex software possible and comprehensible. Hence why we’ve moved away from assembly language in favor of highlevel languages. Blanket mistrust of abstraction, to the point where we should actively avoid including means of abstraction in software technology, is absurd. (From http://pchiusano.github.io/20140702/cssisunnecessary.html)
Abstraction to make complex software possible and understandable


colah.github.io colah.github.io

small window
ie. kernel

Using multiple copies of a neuron in different places is the neural network equivalent of using functions. Because there is less to learn, the model learns more quickly and learns a better model. This technique – the technical name for it is ‘weight tying’ – is essential to the phenomenal results we’ve recently seen from deep learning.
This parameter sharing allows CNNs, for example, to need much less params/weights than Fully Connected NNs.

The known connection between geometry, logic, topology, and functional programming suggests that the connections between representations and types may be of fundamental significance.
Examples for each?

Representations are Types With every layer, neural networks transform data, molding it into a form that makes their task easier to do. We call these transformed versions of data “representations.” Representations correspond to types.
Interesting.
Like a Queue Type represents a FIFO flow and a Stack a FILO flow, where the space we transformed is the operation space of the type (eg a Queue has a folded operation space compared to an Array)
Just free styling here...

In this view, the representations narrative in deep learning corresponds to type theory in functional programming. It sees deep learning as the junction of two fields we already know to be incredibly rich. What we find, seems so beautiful to me, feels so natural, that the mathematician in me could believe it to be something fundamental about reality.
compositional deep learning

Appendix: Functional Names of Common Layers Deep Learning Name Functional Name Learned Vector Constant Embedding Layer List Indexing Encoding RNN Fold Generating RNN Unfold General RNN Accumulating Map Bidirectional RNN Zipped Left/Right Accumulating Maps Conv Layer “Window Map” TreeNet Catamorphism Inverse TreeNet Anamorphism
👌translation. I like to think about embeddings as List lookups


distill.pub distill.pub

If neurons are not the right way to understand neural nets, what is? In real life, combinations of neurons work together to represent images in neural networks. A helpful way to think about these combinations is geometrically: let’s define activation space to be all possible combinations of neuron activations. We can then think of individual neuron activations as the basis vectors of this activation space. Conversely, a combination of neuron activations is then just a vector in this space.
👌great reframe


distill.pub distill.pub

Semantic dictionaries are powerful not just because they move away from meaningless indices, but because they express a neural network’s learned abstractions with canonical examples. With image classification, the neural network learns a set of visual abstractions and thus images are the most natural symbols to represent them. Were we working with audio, the more natural symbols would most likely be audio clips. This is important because when neurons appear to correspond to human ideas, it is tempting to reduce them to words. Doing so, however, is a lossy operation — even for familiar abstractions, the network may have learned a deeper nuance. For instance, GoogLeNet has multiple floppy ear detectors that appear to detect slightly different levels of droopiness, length, and surrounding context to the ears. There also may exist abstractions which are visually familiar, yet that we lack good natural language descriptions for: for example, take the particular column of shimmering light where sun hits rippling water.
nuance beyond words


maxgoldste.in maxgoldste.in

This intentional break from pencilandpaper notation is meant to emphasize how matrices work. To compute the output vector (i.e. to apply the function), multiply each column of the matrix by the input above it, and then add up the columns (think of squishing them together horizontally).
read while playing with this: http://matrixmultiplication.xyz/

After months of using and learning about matrices, this is the best gist I've come across.


www.sanalabs.com www.sanalabs.com

formalized: knowledge retention

ie. decision tree split, entropy minimum or information max at 0.5:0.5 split
Tags
Annotators
URL

 Jul 2019

www.jwilber.me www.jwilber.me

In statistical testing, we structure experiments in terms of null & alternative hypotheses. Our test will have the following hypothesis schema: Η0: μtreatment <= μcontrol ΗA: μtreatment > μcontrol Our null hypothesis claims that the new shampoo does not increase wool quality. The alternative hypothesis claims the opposite; new shampoo yields superior wool quality.
hypothesis schema; statistics


futureofcoding.org futureofcoding.orgAbout1

Apparently 77% of Wikipedia is written by 1% of editors  and that’s not even counting users.
hard to believe
Tags
Annotators
URL


mmlbook.github.io mmlbook.github.io

One major idea in mathematics is the idea of “closure”. This is the question: What is the set of all things that can result from my proposed operations? In the case of vectors: What is the set of vectors that can result bystarting with a small set of vectors, and adding them to each other andscaling them? This results in a vector space
closure in mathematics. sounds similar to domain of a function

We will discuss classification in the context of supportclassificationvector machines
SVMs aren't used that much in practice anymore. It's more of an academic fling, because they're nice to work with mathematically. Empirically, Tree Ensembles or Neural Nets are almost always better.


en.wikipedia.org en.wikipedia.org

In Hardy's words, "Exposition, criticism, appreciation, is work for secondrate minds. [...] It is a melancholy experience for a professional mathematician to find himself writing about mathematics. The function of a mathematician is to do something, to prove new theorems, to add to mathematics, and not to talk about what he or other mathematicians have done."
similar to Nassim Taleb's "History is written by losers"
Tags
Annotators
URL


www.glamour.com www.glamour.com

She's since learned her allergies and asthma are conditions she's had since childhood that have nothing to do with her weight, and her migraines are hormonal. She's now on meds for these conditions.
Ignoring the fact that 'morbidly obese' by itself is a serious health condition

Interesting, I used to think that an early death from obesity hurts more than being told you're fat by a professional.
