 Nov 2019

playbook.samaltman.com playbook.samaltman.com

We once tried an experiment where we funded a bunch of promising founding teams with no ideas in the hopes they would land on a promising idea after we funded them.All of them failed. I think part of the problem is that good founders tend to have lots of good ideas (too many, usually). But an even bigger problem is that once you have a startup you have to hurry to come up with an idea, and because it’s already an official company the idea can’t be too crazy. You end up with plausible sounding but derivative ideas. This is the danger of pivots.
on preidea founding


futurefriday.com futurefriday.com

But consider the Stanford Marshmallow Experiment, a series of studies that demonstrate both the power and trainability of discipline:
there was recently a follow up to this and it couldn't be replicated with the same magnitude or conclusions.
Delayed gratification is still important but not as far as predictive in small kids as we thought.


www.howtogeek.com www.howtogeek.com

In practice, it’s more accurate to say that “everything is a stream of bytes” than “everything is a file.” /dev/random isn’t a file, but it certainly is a stream of bytes. And, although these things technically aren’t files, they are accessible in the file system – the file system is a universal “name space” where everything is accessible. Want to access a random number generator or read directly from a device? You’ll find both in the file system; no other form of addressing needed. Of course, some things aren’t actually files – processes running on your system aren’t a part of the file system. “Everything is a file” is inaccurate, but lots of things do behave as files.
unix simplicity

 Oct 2019


StubHub, for example, took a big chunk out of eBay by creating a ticketbuying experience that was so in tune with the needs of those specific buyers and sellers—with ticket verification, venue maps, and rapid shipping—that they were able to get strong traction in spite of the fact that their fees were substantially higher than what eBay was charging
vertical / domain


learnyouahaskell.com learnyouahaskell.com

We couldn't write a type declaration of Circle > Float because Circle is not a type, Shape is. Just like we can't write a function with a type declaration of True > Int
instantiated


distill.pub distill.pub

In this article, we use GoogLeNet, an image classification model, to demonstrate our interface ideas because its neurons seem unusually semantically meaningfu
Any news on why?
Tags
Annotators
URL


en.wikipedia.org en.wikipedia.org

Peirce was born at 3 Phillips Place in Cambridge, Massachusetts. He was the son of Sarah Hunt Mills and Benjamin Peirce, himself a professor of astronomy and mathematics at Harvard University and perhaps the first serious research mathematician in America.[citation needed] At age 12, Charles read his older brother's copy of Richard Whately's Elements of Logic, then the leading Englishlanguage text on the subject. So began his lifelong fascination with logic and reasoning.
so normalized for childhood environment he wasn't an outlier at all...


news.ycombinator.com news.ycombinator.com

The smugness of Haskell evangelists is what keeps me from using it.I swear, the first rule of Haskell is to never shut the fuck up about Haskell.
true, happened to me



gram matrix must be normalized by dividing each element by the total number of elements in the matrix.
true, after downsampling your gradient will get smaller on later layers


www.fritz.ai www.fritz.ai

Style transfer exploits this by running two images through a pretrained neural network, looking at the pretrained network’s output at multiple layers, and comparing their similarity. Images that produce similar outputs at one layer of the pretrained model likely have similar content, while matching outputs at another layer signals similar style.
Style ~ vertical Content ~ horizontal


blog.statebox.org blog.statebox.org

Pursuit, which can even search for functions based on type signature.As we hinted already, all our diagrams can be neatly translated to text, ultimately corresponding to equations in our underlying categorical syntax. This allows for the development of very advanced search features, even capable of identifying a pattern independently of the way it was expressed graphically. We want to say again that this is not magic! It works like this: Every diagram corresponds to a term in a category. If a term can be rewritten into another and viceversa, then the terms are describing the same thing. This way, when you click on some boxes in a diagram and search for equivalent patterns, the engine in the background is looking for all terms that are equivalent to the one describing whatever you selected.
isomorphisms and Eq relationships


graphicallinearalgebra.net graphicallinearalgebra.net

to prove that the following is true. W
Interesting. So we could phrase it as: "Prove that a vertical flip of the unit diagram preserves meaning/semantics"


graphicallinearalgebra.net graphicallinearalgebra.net

he did manage to convince some people, who became known as the algorists. The conservatives, those who preferred to stick with the status quo, were called abacists. They used the abacus, and converted to and from the Roman number system between calculations.
Abacists == Java Developers?

First, ‘+’ is an operation. The left hand side of equation ①, the stuff to the left of the symbol ‘=’, can be understood as a procedure, a computation. It’s the processing that you did with your fingers when the teacher was talking about apples.
also: 3 and 4 are operands


graphicallinearalgebra.net graphicallinearalgebra.net

Australianness. Australians, like the Scots, tend to call a spade a spade. The English aren’t usually so direct. Englishness is all about subtlety and insinuation. Over the centuries, they have refined their public discourse and developed highlevel, advanced techniques like (damning with) faint praise.
True, it was very annoying to work with the English on software projects...



categorical formalism should provide a much needed high level language for theory of computation, flexible enough to allow abstracting away the low level implementation details when they are irrelevant, or taking them into account when they are genuinely needed. A salient feature of the approach through monoidal categories is the formal graphical language of string diagrams, which supports visual reasoning about programs and computations. In the present paper, we provide a coalgebraic characterization of monoidal computer. It turns out that the availability of interpreters and specializers, that make a monoidal category into a monoidal computer, is equivalent with the existence of a *universal state space*, that carries a weakly final state machine for any pair of input and output types. Being able to program state machines in monoidal computers allows us to represent Turing machines, to capture their execution, count their steps, as well as, e.g., the memory cells that they use. The coalgebraic view of monoidal computer thus provides a convenient diagrammatic language for studying computability and complexity.
monoidal (category > computer)
Tags
Annotators
URL

 Sep 2019

medium.com medium.com

Most of the supervised learning algorithms are inherently discriminative, which means they learn how to model the conditional probability distribution function (p.d.f) p(yx) instead, which is the probability of a target (age=35) given an input (purchase=milk). Despite the fact that one could make predictions with this p.d.f, one is not allowed to sample new instances (simulate customers with ages) from the input distribution directly
difference to "normal" supervised algos


julien.danjou.info julien.danjou.info

Which works fine, but make you use lambda. You could rather use the operator module:import operator mylist = list(zip(range(40, 240), range(100, 100))) sorted(mylist, key=operator.itemgetter(1))



from falsehood you can derive everything ** false \leq truerestrict: don't talk about elements > you have to talk about arrows (relations) .... interview the friends *product types: [pairs, tuples, records,...]


www.youtube.com www.youtube.com

Chem Hung 1 week ago • Next, Inside Jeff's Package: Unpacking Jeff Bezos
@fun


mathoverflow.net mathoverflow.net

Having quick access to definitions and references, which are easily found by search engines, and viewable on a regular web page with a smartphone, should help facilitate quicker progress. The conclusion is, funding an online resource with the purpose to quickly get a new PhD student or researcher new to the field up to speed, is probably a good investment.
math wiki


www.cyto.purdue.edu www.cyto.purdue.edu

rangeor difference between maximum and minimum intensity values in a neighborhood
@property

Texture can be described as fine, coarse, grained, smooth, etc
@descriptors

Such features are found in the toneand structureof a texture
@properties

Texture is characterized by the spatial distribution of intensity levels in a neighborhood
distributions


developers.googleblog.com developers.googleblog.com

Text embedding models convert any input text into an output vector of numbers, and in the process map semantically similar words near each other in the embedding space: Figure 2: Text embeddings convert any text into a vector of numbers (left). Semantically similar pieces of text are mapped nearby each other in the embedding space (right). Given a trained text embedding model, we can directly measure the associations the model has between words or phrases. Many of these associations are expected and are helpful for natural language tasks. However, some associations may be problematic or hurtful. For example, the groundbreaking paper by Bolukbasi et al. [4] found that the vectorrelationship between "man" and "woman" was similar to the relationship between "physician" and "registered nurse" or "shopkeeper" and "housewife"
love that Big Lebowski reference


en.wikipedia.org en.wikipedia.org

Since all neurons in a single depth slice share the same parameters, the forward pass in each depth slice of the convolutional layer can be computed as a convolution of the neuron's weights with the input volume.[nb 2] Therefore, it is common to refer to the sets of weights as a filter (or a kernel), which is convolved with the input. The result of this convolution is an activation map, and the set of activation maps for each different filter are stacked together along the depth dimension to produce the output volume. Parameter sharing contributes to the translation invariance of the CNN architecture. Sometimes, the parameter sharing assumption may not make sense. This is especially the case when the input images to a CNN have some specific centered structure; for which we expect completely different features to be learned on different spatial locations. One practical example is when the inputs are faces that have been centered in the image: we might expect different eyespecific or hairspecific features to be learned in different parts of the image. In that case it is common to relax the parameter sharing scheme, and instead simply call the layer a "locally connected layer".
important terms you hear repeatedly great visuals and graphics @https://distill.pub/2018/buildingblocks/


setosa.io setosa.io

Here's a playground were you can select different kernel matrices and see how they effect the original image or build your own kernel. You can also upload your own image or use live video if your browser supports it. blurbottom sobelcustomembossidentityleft sobeloutlineright sobelsharpentop sobel The sharpen kernel emphasizes differences in adjacent pixel values. This makes the image look more vivid. The blur kernel deemphasizes differences in adjacent pixel values. The emboss kernel (similar to the sobel kernel and sometimes referred to mean the same) givens the illusion of depth by emphasizing the differences of pixels in a given direction. In this case, in a direction along a line from the top left to the bottom right. The indentity kernel leaves the image unchanged. How boring! The custom kernel is whatever you make it.
I'm all about my custom kernels!



We developed a new metric, UAR, which compares the robustness of a model against an attack to adversarial training against that attack. Adversarial training is a strong defense that uses knowledge of an adversary by training on adversarially attacked images[3]To compute UAR, we average the accuracy of the defense across multiple distortion sizes and normalize by the performance of an adversarially trained model; a precise definition is in our paper. . A UAR score near 100 against an unforeseen adversarial attack implies performance comparable to a defense with prior knowledge of the attack, making this a challenging objective.
@metric
Tags
Annotators
URL


betterexplained.com betterexplained.com

Time for the red pill. A matrix is a shorthand for our diagrams: A matrix is a single variable representing a spreadsheet of inputs or operations.

operation F is linear if scaling inputs scales the output, and adding inputs adds the outputs:
additionpreserving

Linear algebra gives you minispreadsheets for your math equations. We can take a table of data (a matrix) and create updated tables from the original. It’s the power of a spreadsheet written as an equation.
matrix = equationexcel

 Aug 2019

aatishb.com aatishb.com

So when we look at really tiny solids, energy doesn’t always flow from a hot object to a cold one. It can go the other way sometimes. And entropy doesn’t always increase. This isn't just a theoretical issue, entropy decreases have actually been seen in microscopic experiments.
boltzman: it's statistically increasingly unlikely that entropy decreases

The atomic world is a twoway street. But when we get to large collections of atoms, a oneway street emerges for the direction in which events take place.
micro > macro emergence


www.math3ma.com www.math3ma.com

Good general theory does not search for the maximum generality, but for the right generality.
Especially true in practical programming

It's true that category theory may not help you find a delta for your epsilon, or determine if your group of order 520 is simple, or construct a solution to your PDE. For those endeavors, we do have to put our feet back on the ground.
@CT:limits

hierarchy of questions: "What about the relationships between the relationships between the relationships between the...?" This leads to infinity categories. [And a possible brain freeze.] For more, see here.) As pieinthesky as this may sound, these ideascategories, functors, and natural transformationslead to a treasure trove of theory that shows up almost everywhere.
Turtles all the way up


bartoszmilewski.com bartoszmilewski.com

But there is an alternative. It’s called denotational semantics and it’s based on math. In denotational semantics every programing construct is given its mathematical interpretation. With that, if you want to prove a property of a program, you just prove a mathematical theorem


phenomenalworld.org phenomenalworld.org

And worst of all, we’ve lost sight of the most salient part about computers: their malleability. We’ve acquiesced the creation of our virtual worlds, where we now spend most of our time, to the select few who can spend the millions to hire enough software engineers. So many of the photons that hit our eyes come from purely fungible pixels, yet for most of us, these pixels are all but carved in stone. Smartphone apps, like the kitchen appliances before them, are polished, singlepurposes tools with only the meanest amount of customizability and interoperability. They are monstrosities of code, millions of lines, that only an army of programers could hope to tame. As soon as they can swipe, our children are given magical rectangles that for all their lives will be as inscrutable as if they were truly magic.
I was a professional web developer for two years and I now have to bring myself to even touch CSS or the DOM. Whenever I have to make anything on the web work I know I'm gonna spend 3 hours in pain for something that should take 5 minutes.


futureofcoding.org futureofcoding.org

He continued, “All the interesting things people want to do with computers require low level languages. Think about it. You can never make a abstract language that can do the cutting edge in software.” “Ok,” I said, willing to concede the point, “I’m not too worried if my language is too slow for cuttingedge algorithms. It’ll still be fast enough for most people doing most things.” He smirked, having ensnared me in his Socratic trap, “Then it’s not really a programming language. It’s just another limited abstraction, like Squarespace, that people will have to leave if they want to do anything novel.” Eyeroll. Paul Chiusano has a great response to people like Dave: There are a couple unspoken assumptions here that don’t hold up to scrutiny—one is that abstraction is in general not to be trusted. We can blame Spolsky for infecting our industry with that meme. A little reflection on the history of software reveals abstraction as the primary means by which humans make increasingly complex software possible and comprehensible. Hence why we’ve moved away from assembly language in favor of highlevel languages. Blanket mistrust of abstraction, to the point where we should actively avoid including means of abstraction in software technology, is absurd. (From http://pchiusano.github.io/20140702/cssisunnecessary.html)
Abstraction to make complex software possible and understandable


colah.github.io colah.github.io

small window
ie. kernel

Using multiple copies of a neuron in different places is the neural network equivalent of using functions. Because there is less to learn, the model learns more quickly and learns a better model. This technique – the technical name for it is ‘weight tying’ – is essential to the phenomenal results we’ve recently seen from deep learning.
This parameter sharing allows CNNs, for example, to need much less params/weights than Fully Connected NNs.

The known connection between geometry, logic, topology, and functional programming suggests that the connections between representations and types may be of fundamental significance.
Examples for each?

Representations are Types With every layer, neural networks transform data, molding it into a form that makes their task easier to do. We call these transformed versions of data “representations.” Representations correspond to types.
Interesting.
Like a Queue Type represents a FIFO flow and a Stack a FILO flow, where the space we transformed is the operation space of the type (eg a Queue has a folded operation space compared to an Array)
Just free styling here...

In this view, the representations narrative in deep learning corresponds to type theory in functional programming. It sees deep learning as the junction of two fields we already know to be incredibly rich. What we find, seems so beautiful to me, feels so natural, that the mathematician in me could believe it to be something fundamental about reality.
compositional deep learning

Appendix: Functional Names of Common Layers Deep Learning Name Functional Name Learned Vector Constant Embedding Layer List Indexing Encoding RNN Fold Generating RNN Unfold General RNN Accumulating Map Bidirectional RNN Zipped Left/Right Accumulating Maps Conv Layer “Window Map” TreeNet Catamorphism Inverse TreeNet Anamorphism
👌translation. I like to think about embeddings as List lookups


distill.pub distill.pub

If neurons are not the right way to understand neural nets, what is? In real life, combinations of neurons work together to represent images in neural networks. A helpful way to think about these combinations is geometrically: let’s define activation space to be all possible combinations of neuron activations. We can then think of individual neuron activations as the basis vectors of this activation space. Conversely, a combination of neuron activations is then just a vector in this space.
👌great reframe


distill.pub distill.pub

Semantic dictionaries are powerful not just because they move away from meaningless indices, but because they express a neural network’s learned abstractions with canonical examples. With image classification, the neural network learns a set of visual abstractions and thus images are the most natural symbols to represent them. Were we working with audio, the more natural symbols would most likely be audio clips. This is important because when neurons appear to correspond to human ideas, it is tempting to reduce them to words. Doing so, however, is a lossy operation — even for familiar abstractions, the network may have learned a deeper nuance. For instance, GoogLeNet has multiple floppy ear detectors that appear to detect slightly different levels of droopiness, length, and surrounding context to the ears. There also may exist abstractions which are visually familiar, yet that we lack good natural language descriptions for: for example, take the particular column of shimmering light where sun hits rippling water.
nuance beyond words


maxgoldste.in maxgoldste.in

This intentional break from pencilandpaper notation is meant to emphasize how matrices work. To compute the output vector (i.e. to apply the function), multiply each column of the matrix by the input above it, and then add up the columns (think of squishing them together horizontally).
read while playing with this: http://matrixmultiplication.xyz/

After months of using and learning about matrices, this is the best gist I've come across.


www.sanalabs.com www.sanalabs.com

formalized: knowledge retention

ie. decision tree split, entropy minimum or information max at 0.5:0.5 split
Tags
Annotators
URL

 Jul 2019

www.jwilber.me www.jwilber.me

In statistical testing, we structure experiments in terms of null & alternative hypotheses. Our test will have the following hypothesis schema: Η0: μtreatment <= μcontrol ΗA: μtreatment > μcontrol Our null hypothesis claims that the new shampoo does not increase wool quality. The alternative hypothesis claims the opposite; new shampoo yields superior wool quality.
hypothesis schema; statistics


futureofcoding.org futureofcoding.orgAbout1

Apparently 77% of Wikipedia is written by 1% of editors  and that’s not even counting users.
hard to believe
Tags
Annotators
URL


mmlbook.github.io mmlbook.github.io

One major idea in mathematics is the idea of “closure”. This is the question: What is the set of all things that can result from my proposed operations? In the case of vectors: What is the set of vectors that can result bystarting with a small set of vectors, and adding them to each other andscaling them? This results in a vector space
closure in mathematics. sounds similar to domain of a function

We will discuss classification in the context of supportclassificationvector machines
SVMs aren't used that much in practice anymore. It's more of an academic fling, because they're nice to work with mathematically. Empirically, Tree Ensembles or Neural Nets are almost always better.


en.wikipedia.org en.wikipedia.org

In Hardy's words, "Exposition, criticism, appreciation, is work for secondrate minds. [...] It is a melancholy experience for a professional mathematician to find himself writing about mathematics. The function of a mathematician is to do something, to prove new theorems, to add to mathematics, and not to talk about what he or other mathematicians have done."
similar to Nassim Taleb's "History is written by losers"
Tags
Annotators
URL


www.glamour.com www.glamour.com

She's since learned her allergies and asthma are conditions she's had since childhood that have nothing to do with her weight, and her migraines are hormonal. She's now on meds for these conditions.
Ignoring the fact that 'morbidly obese' by itself is a serious health condition

Interesting, I used to think that an early death from obesity hurts more than being told you're fat by a professional.
