85 Matching Annotations
  1. Dec 2019
    1. In other fields, you can get going after a relatively short time. In mathematics, you can find reputation-making problems that you can actually attack after just a couple of years of college. It is very unlikely that you will make a significant contribution to Shakespeare scholarship before 30. And you’ll read a lot more. Which brings me to the question, once you know what your interests are, how much should you read? How much should you slow down your reading as you age? What is the most fertile ratio of reading to creating? The answer can be tricky. As fields mature, and apparently unsolvable controversies start to dominate (such as has happened at the edge of physics, around superstring theory), a high-paradigm field can become low-paradigm. Subfields can differ: “systems engineering” is lower-paradigm than electrical or mechanical engineering. But Einstein is right about one thing: the “living vicariously” part. That, rather than sheer quantity of reading, is actually the critical part. Depending on what problem you are trying to understand or solve, your reading may take you the rest of your life, or be done in two years. But how you read can determine whether you become a pedantic bore who contributes nothing, or somebody who makes new contributions.

      nuance::reading being a lazy habit of mind

    1. Better for what? A top-down approach seems ideal for well-understood problems where correctness is the most important attribute. If you're making accounting software, or the avionics software for a plane, you're not necessarily blazing any trails. You have a list of requirements, and you need to satisfy them with minimal defects -- especially if money and lives are on the line. A bottom-up approach works well for projects with a lot of unknowns, like artificial intelligence or anything artistic. Paul Simon didn't come up with The Sound of Silence from a list of requirements. He sat in his bathroom with the lights off and riffed on his guitar until genius poured out. It's no accident that there are Clojure projects like Overtone and Quil. A REPL session is the programmer's guitar riff.

      haskell vs lisp kobe vs shaq

    1. edn supports a rich set of built-in elements, and the definition of extension elements in terms of the others. Users of data formats without such facilities must rely on either convention or context to convey elements not included in the base set. This greatly complicates application logic, betraying the apparent simplicity of the format. edn is simple, yet powerful enough to meet the demands of applications without convention or complex context-sensitive logic.

      remove context or convention needs

    1. They are processed into existence using the pulp of what already exists, rising like swamp things from the compost of the past. The mulch is turned and tended by many layers of editors who scrub it of anything possibly objectionable before it is fed into a government-run "adoption" system that provides mediocre material to students of all ages.

      textbooks

    1. After the initial steps by MIT and Harvard, many other universities and new institutes received money from the tech industry to work on AI ethics. Most such organizations are also headed by current or former executives of tech firms. For example, the Data & Society Research Institute is directed by a Microsoft researcher and initially funded by a Microsoft grant; New York University’s AI Now Institute was co-founded by another Microsoft researcher and partially funded by Microsoft, Google, and DeepMind; the Stanford Institute for Human-Centered AI is co-directed by a former vice president of Google; University of California, Berkeley’s Division of Data Sciences is headed by a Microsoft veteran; and the MIT Schwarzman College of Computing is headed by a board member of Amazon. During my time at the Media Lab, Ito maintained frequent contact with the executives and planners of all these organizations.

      Big tech uses academia as shield from .gov

    1. 6, we see that they repeat in a very simple pattern--the last digit is always 6.

      why, geometrically?

    1. Enter Niklas Luhmann. He was an insanely productive sociologist who did his work using the method of "slip-box" (in German, "Zettelkasten"). Making a slip-box is very simple, with many benefits. The slip-box will become a research partner who could "converse" with you, surprise you, lead you down surprising lines of thoughts. It would nudge you to (number in parenthesis denote the section in the book that talks about the item): Find dissenting views (10.2, 12.3) Really understand what you learned (10.4, 11.2, 11.3, 12.6) Think across contexts (12.5) Remember what you learned (11.3, 12.4) Be creative (12.5, 12.6, 12.7, 13.2) Get the gist, not stuck on details (12.6) Be motivated (13.3) Implement short feedback loops, which allows rapid improvements (12.6, 13.5)

      Niklas Luhmann. Zettelkasten Effects

    1. When people watch or engage with scheduled content, the experience becomes a ritual and the content becomes a cultural touchpoint. We see this when people join thousands of others for live Peloton rides, or when people check their favorite music streaming service for new music every Friday. And who can forget those few weeks in April and May when it seemed like everyone was talking about the final season of “Game of Thrones”? The conversations about the content often feel as valuable as the content itself. Consistent programming facilitates these conversations — and ultimately fosters a sense of connection — by turning that scheduled content into an event.

      Content Programming

    1. In the 4-part P.A.R.A. series, I described a universal system for organizing any kind of digital information from any source. It is a “good enough” system, maintaining notes according to their actionability (which takes just a moment to determine), instead of their meaning (which is ambiguous and depends on the context).

      PARA sort by action > meaning

    1. The whole point of having your rich text content in Portable Text is to be able to have full control over presentation. This is done through what we call serialization, in other words, specifying what should happen to the data structures that appears in the objects you loop through.

      serializer

    1. Not Coyote. The guy starts his business in his bedroom. And just keeps pushing those constraints until he can’t push any further into his apartment. He doesn’t try to climb a mountain in one attempt. He takes on a little challenge. Can I sell some glasses from my bedroom? Awesome, I did that, can I do a little more? He’s also extremely resourceful. The lense cutting machine I use, her name is Ripley after the lead character in the Alien series of films. She was acquired in an alley in the suburbs of Milwaukee. I love all the things he’s not. Most people when starting a business plan a list of things they need because everyone else has them. You need to at least reach feature parity with the competition, right? Here’s Coyote’s take on his online shop. If you’ve tried to view items in the Online Shop recently, you may have noticed that nothing appears… this is intentional… Long story short, we simply have too many frames in stock to list online; we’d rather devote more time to delivering the excellent face-to-face customer service that Labrabbit has become known for. His FAQ is similar. “Do you offer eye exams?” No. “Do you repair damaged eyewear?” No. “Do you carry Ray-Ban frames?” No. “Do you accept vision insurance?” No. He focuses on what makes his business special compared to all the competition: unique frames you can’t find anywhere else, and you get to deal with him directly.

      If you strive for feature parity you become a commodity

    1. There are many reasons that a working group filled with experts don’t consistently produce great results. For example, many of the participants can be humble about their knowledge so they tend to think that a good chunk of the people that will be using their technology will be just as enlightened. Bad feature ideas can be argued for months and rationalized because smart people, lacking any sort of compelling real world data, are great at debating and rationalizing bad decisions.

      :)

    1. “The future is likely to see a shift to machine learning of vocabularies and rulesets, changing the role of developers into that of teachers who instruct and assess the capabilities of systems, and monitor their performance,”

      ontology editing

    1. Economic theory as it exists increasingly resembles a shed full of broken tools. This is not to say there are no useful insights here, but fundamentally the existing discipline is designed to solve another century’s problems. The problem of how to determine the optimal distribution of work and resources to create high levels of economic growth is simply not the same problem we are now facing: i.e., how to deal with increasing technological productivity, decreasing real demand for labor, and the effective management of care work, without also destroying the Earth. This demands a different science. The “microfoundations” of current economics are precisely what is standing in the way of this. Any new, viable science will either have to draw on the accumulated knowledge of feminism, behavioral economics, psychology, and even anthropology to come up with theories based on how people actually behave, or once again embrace the notion of emergent levels of complexity—or, most likely, both

      science with knowledge of feminism? wtf

    1. Stocks and bonds are bets on big bundles of ideas: underlying technology, business strategy, marketing skill, prices of input factors, market demand, etc. You want to bet on just what you think you know about. Derivatives are usually required to have prices predictable from the underlying instruments they derive from. Many believe that the right combination of existing instruments can reproduce any bet, so new instruments only lower transaction costs. They're wrong. Insurance companies can sell arbitrary bets, but only to those with an "insurable interest". Regulators only allow Commodity Futures where someone needs to insure against big risks. British bookies can take any bets, but insist on setting prices instead of being market-makers. So they won't bet on stuff, like science, they don't understand. The British government takes a big bite of off-shore gambling firm profits via a point of consumption tax.

      Bets and their legalization and different forms

    1. features provide the perfect WYSIWYG UX ❤️ for creating semantic content. Written in ES6 with MVC architecture, custom data model, virtual DOM.

      MVC and vDOM? noobs

    1. My suspicion is, a good KPI for a knowledge tool is minimum threshold of time required to make a negentropic update to it, with every halving of the threshold increasing its capacity to hold positive-interest-rate knowledge repos by an order of magnitude.

      some adhoc loss func

    2. This btw is the maker time/manager time problem pg wrote about. Making needs 4 hour chunks because anything less tends to increase entropy rather than decrease it in any non-trivial knowledge work project. So anything that lowers that lower limit is a big win.

      <4h intense focus increases entropy (more stuff, less structure) in your brain

    1. pay-off matrix t

      term

    2. For example, the Director of a firm might tell his sales staff how he wants an advertising campaign to start and what should they do subsequently in response to various actions of competing firms.<img class="ds t u hi ak" src="https://miro.medium.com/max/1358/1*30oSSgYmYG6Vg1VmTfAVaw.png" width="679" height="225" role="presentation"/>Image Source: https://xkcd.com/601/

      strategy is sum of all moves and countermoves? n levels deep?

    1. “Crowdsourcing is a broken labor model,” said Clouse, whose fund led the Series A round. “You can’t build an enterprise-grade business on top of a crowdsourced marketplace, where both the employers and the workers waste most of their time navigating the chaos. Cloud labor is the future of work, but delivering on that requires a new approach and a new type of workforce — with CloudFactory leading the way.” While crowdsourcing marketplaces like Amazon’s Mechanical Turk can work for handling a moderate number of tasks, fast-growing companies relying on these services quickly start hitting a ceiling. Unlike crowdsourcing solutions, CloudFactory lets companies scale rapidly without losing control of quality or costs. CloudFactory replaces the inefficient marketplace model of crowdsourcing with a curated, closely managed workforce and intelligent algorithms that match the right task to the right worker at the right time.

      proposition: the pendulum from open/bottom-up swings back to more top-down and curation

    1. So hackable that whole industries have grown up to hack it. This is the explicit purpose of test-prep companies and admissions counsellors, but it's also a significant part of the function of private schools.

      and fund application writers

  2. Nov 2019
    1. We once tried an experiment where we funded a bunch of promising founding teams with no ideas in the hopes they would land on a promising idea after we funded them.All of them failed. I think part of the problem is that good founders tend to have lots of good ideas (too many, usually). But an even bigger problem is that once you have a startup you have to hurry to come up with an idea, and because it’s already an official company the idea can’t be too crazy. You end up with plausible sounding but derivative ideas. This is the danger of pivots.

      on pre-idea founding

    1. But consider the Stanford Marshmallow Experiment, a series of studies that demonstrate both the power and trainability of discipline:

      there was recently a follow up to this and it couldn't be replicated with the same magnitude or conclusions.

      Delayed gratification is still important but not as far as predictive in small kids as we thought.

    1. In practice, it’s more accurate to say that “everything is a stream of bytes” than “everything is a file.” /dev/random isn’t a file, but it certainly is a stream of bytes. And, although these things technically aren’t files, they are accessible in the file system – the file system is a universal “name space” where everything is accessible. Want to access a random number generator or read directly from a device? You’ll find both in the file system; no other form of addressing needed. Of course, some things aren’t actually files – processes running on your system aren’t a part of the file system. “Everything is a file” is inaccurate, but lots of things do behave as files.

      unix simplicity

  3. Oct 2019
    1. StubHub, for example, took a big chunk out of eBay by creating a ticket-buying experience that was so in tune with the needs of those specific buyers and sellers—with ticket verification, venue maps, and rapid shipping—that they were able to get strong traction in spite of the fact that their fees were substantially higher than what eBay was charging

      vertical / domain

    1. We couldn't write a type declaration of Circle -> Float because Circle is not a type, Shape is. Just like we can't write a function with a type declaration of True -> Int

      instantiated

    1. In this article, we use GoogLeNet, an image classification model, to demonstrate our interface ideas because its neurons seem unusually semantically meaningfu

      Any news on why?

    1. Peirce was born at 3 Phillips Place in Cambridge, Massachusetts. He was the son of Sarah Hunt Mills and Benjamin Peirce, himself a professor of astronomy and mathematics at Harvard University and perhaps the first serious research mathematician in America.[citation needed] At age 12, Charles read his older brother's copy of Richard Whately's Elements of Logic, then the leading English-language text on the subject. So began his lifelong fascination with logic and reasoning.

      so normalized for child-hood environment he wasn't an outlier at all...

    1. The smugness of Haskell evangelists is what keeps me from using it.I swear, the first rule of Haskell is to never shut the fuck up about Haskell.

      true, happened to me

    1. gram matrix must be normalized by dividing each element by the total number of elements in the matrix.

      true, after downsampling your gradient will get smaller on later layers

    1. Style transfer exploits this by running two images through a pre-trained neural network, looking at the pre-trained network’s output at multiple layers, and comparing their similarity. Images that produce similar outputs at one layer of the pre-trained model likely have similar content, while matching outputs at another layer signals similar style.

      Style ~ vertical Content ~ horizontal

    1. Pursuit, which can even search for functions based on type signature.As we hinted already, all our diagrams can be neatly translated to text, ultimately corresponding to equations in our underlying categorical syntax. This allows for the development of very advanced search features, even capable of identifying a pattern independently of the way it was expressed graphically. We want to say again that this is not magic! It works like this: Every diagram corresponds to a term in a category. If a term can be rewritten into another and vice-versa, then the terms are describing the same thing. This way, when you click on some boxes in a diagram and search for equivalent patterns, the engine in the background is looking for all terms that are equivalent to the one describing whatever you selected.

      isomorphisms and Eq relationships

    1. to prove that the following is true. W

      Interesting. So we could phrase it as: "Prove that a vertical flip of the unit diagram preserves meaning/semantics"

    1. he did manage to convince some people, who became known as the algorists. The conservatives, those who preferred to stick with the status quo, were called abacists. They used the abacus, and converted to and from the Roman number system between calculations.

      Abacists == Java Developers?

    2. First, ‘+’ is an operation. The left hand side of equation ①, the stuff to the left of the symbol ‘=’, can be understood as a procedure, a computation. It’s the processing that you did with your fingers when the teacher was talking about apples.

      also: 3 and 4 are operands

    1. Australian-ness. Australians, like the Scots, tend to call a spade a spade. The English aren’t usually so direct. Englishness is all about subtlety and insinuation.  Over the centuries, they have refined their public discourse and developed high-level, advanced techniques like (damning with) faint praise.

      True, it was very annoying to work with the English on software projects...

    1. categorical formalism should provide a much needed high level language for theory of computation, flexible enough to allow abstracting away the low level implementation details when they are irrelevant, or taking them into account when they are genuinely needed. A salient feature of the approach through monoidal categories is the formal graphical language of string diagrams, which supports visual reasoning about programs and computations. In the present paper, we provide a coalgebraic characterization of monoidal computer. It turns out that the availability of interpreters and specializers, that make a monoidal category into a monoidal computer, is equivalent with the existence of a *universal state space*, that carries a weakly final state machine for any pair of input and output types. Being able to program state machines in monoidal computers allows us to represent Turing machines, to capture their execution, count their steps, as well as, e.g., the memory cells that they use. The coalgebraic view of monoidal computer thus provides a convenient diagrammatic language for studying computability and complexity.

      monoidal (category -> computer)

  4. Sep 2019
    1. Most of the supervised learning algorithms are inherently discriminative, which means they learn how to model the conditional probability distribution function (p.d.f) p(y|x) instead, which is the probability of a target (age=35) given an input (purchase=milk). Despite the fact that one could make predictions with this p.d.f, one is not allowed to sample new instances (simulate customers with ages) from the input distribution directly

      difference to "normal" supervised algos

    1. Which works fine, but make you use lambda. You could rather use the operator module:import operator mylist = list(zip(range(40, 240), range(-100, 100))) sorted(mylist, key=operator.itemgetter(1))
    1. from falsehood you can derive everything ** false \leq truerestrict: don't talk about elements -> you have to talk about arrows (relations) .... interview the friends *product types: [pairs, tuples, records,...]

    1. Having quick access to definitions and references, which are easily found by search engines, and viewable on a regular web page with a smartphone, should help facilitate quicker progress. The conclusion is, funding an online resource with the purpose to quickly get a new PhD student or researcher new to the field up to speed, is probably a good investment.

      math wiki

    1. rangeor difference between maximum and minimum intensity values in a neighborhood

      @property

    2. Texture can be described as fine, coarse, grained, smooth, etc

      @descriptors

    3. Such features are found in the toneand structureof a texture

      @properties

    4. Texture is characterized by the spatial distribution of intensity levels in a neighborhood

      distributions

    1. Text embedding models convert any input text into an output vector of numbers, and in the process map semantically similar words near each other in the embedding space: Figure 2: Text embeddings convert any text into a vector of numbers (left). Semantically similar pieces of text are mapped nearby each other in the embedding space (right). Given a trained text embedding model, we can directly measure the associations the model has between words or phrases. Many of these associations are expected and are helpful for natural language tasks. However, some associations may be problematic or hurtful. For example, the ground-breaking paper by Bolukbasi et al. [4] found that the vector-relationship between "man" and "woman" was similar to the relationship between "physician" and "registered nurse" or "shopkeeper" and "housewife"

      love that Big Lebowski reference

    1. Since all neurons in a single depth slice share the same parameters, the forward pass in each depth slice of the convolutional layer can be computed as a convolution of the neuron's weights with the input volume.[nb 2] Therefore, it is common to refer to the sets of weights as a filter (or a kernel), which is convolved with the input. The result of this convolution is an activation map, and the set of activation maps for each different filter are stacked together along the depth dimension to produce the output volume. Parameter sharing contributes to the translation invariance of the CNN architecture. Sometimes, the parameter sharing assumption may not make sense. This is especially the case when the input images to a CNN have some specific centered structure; for which we expect completely different features to be learned on different spatial locations. One practical example is when the inputs are faces that have been centered in the image: we might expect different eye-specific or hair-specific features to be learned in different parts of the image. In that case it is common to relax the parameter sharing scheme, and instead simply call the layer a "locally connected layer".

      important terms you hear repeatedly great visuals and graphics @https://distill.pub/2018/building-blocks/

    1. Here's a playground were you can select different kernel matrices and see how they effect the original image or build your own kernel. You can also upload your own image or use live video if your browser supports it. blurbottom sobelcustomembossidentityleft sobeloutlineright sobelsharpentop sobel The sharpen kernel emphasizes differences in adjacent pixel values. This makes the image look more vivid. The blur kernel de-emphasizes differences in adjacent pixel values. The emboss kernel (similar to the sobel kernel and sometimes referred to mean the same) givens the illusion of depth by emphasizing the differences of pixels in a given direction. In this case, in a direction along a line from the top left to the bottom right. The indentity kernel leaves the image unchanged. How boring! The custom kernel is whatever you make it.

      I'm all about my custom kernels!

    1. We developed a new metric, UAR, which compares the robustness of a model against an attack to adversarial training against that attack. Adversarial training is a strong defense that uses knowledge of an adversary by training on adversarially attacked images[3]To compute UAR, we average the accuracy of the defense across multiple distortion sizes and normalize by the performance of an adversarially trained model; a precise definition is in our paper. . A UAR score near 100 against an unforeseen adversarial attack implies performance comparable to a defense with prior knowledge of the attack, making this a challenging objective.

      @metric

    1. Time for the red pill. A matrix is a shorthand for our diagrams: A matrix is a single variable representing a spreadsheet of inputs or operations.
    2. operation F is linear if scaling inputs scales the output, and adding inputs adds the outputs:

      addition-preserving

    3. Linear algebra gives you mini-spreadsheets for your math equations. We can take a table of data (a matrix) and create updated tables from the original. It’s the power of a spreadsheet written as an equation.

      matrix = equation-excel

  5. Aug 2019
    1. So when we look at really tiny solids, energy doesn’t always flow from a hot object to a cold one. It can go the other way sometimes. And entropy doesn’t always increase. This isn't just a theoretical issue, entropy decreases have actually been seen in microscopic experiments.

      boltzman: it's statistically increasingly unlikely that entropy decreases

    2. The atomic world is a two-way street. But when we get to large collections of atoms, a one-way street emerges for the direction in which events take place.

      micro -> macro emergence

    1. Good general theory does not search for the maximum generality, but for the right generality.

      Especially true in practical programming

    2. It's true that category theory may not help you find a delta for your epsilon, or determine if your group of order 520 is simple, or construct a solution to your PDE. For those endeavors, we do have to put our feet back on the ground.

      @CT:limits

    3. hierarchy of questions: "What about the relationships between the relationships between the relationships between the...?" This leads to infinity categories. [And a possible brain freeze.] For more, see here.)  As pie-in-the-sky as this may sound, these ideas---categories, functors, and natural transformations---lead to a treasure trove of theory that shows up almost everywhere.

      Turtles all the way up

    1. But there is an alternative. It’s called denotational semantics and it’s based on math. In denotational semantics every programing construct is given its mathematical interpretation. With that, if you want to prove a property of a program, you just prove a mathematical theorem
    1. And worst of all, we’ve lost sight of the most salient part about computers: their malleability. We’ve acquiesced the creation of our virtual worlds, where we now spend most of our time, to the select few who can spend the millions to hire enough software engineers. So many of the photons that hit our eyes come from purely fungible pixels, yet for most of us, these pixels are all but carved in stone. Smartphone apps, like the kitchen appliances before them, are polished, single-purposes tools with only the meanest amount of customizability and interoperability. They are monstrosities of code, millions of lines, that only an army of programers could hope to tame. As soon as they can swipe, our children are given magical rectangles that for all their lives will be as inscrutable as if they were truly magic.

      I was a professional web developer for two years and I now have to bring myself to even touch CSS or the DOM. Whenever I have to make anything on the web work I know I'm gonna spend 3 hours in pain for something that should take 5 minutes.

    1. He continued, “All the interesting things people want to do with computers require low level languages. Think about it. You can never make a abstract language that can do the cutting edge in software.” “Ok,” I said, willing to concede the point, “I’m not too worried if my language is too slow for cutting-edge algorithms. It’ll still be fast enough for most people doing most things.” He smirked, having ensnared me in his Socratic trap, “Then it’s not really a programming language. It’s just another limited abstraction, like Squarespace, that people will have to leave if they want to do anything novel.” Eyeroll. Paul Chiusano has a great response to people like Dave: There are a couple unspoken assumptions here that don’t hold up to scrutiny—one is that abstraction is in general not to be trusted. We can blame Spolsky for infecting our industry with that meme. A little reflection on the history of software reveals abstraction as the primary means by which humans make increasingly complex software possible and comprehensible. Hence why we’ve moved away from assembly language in favor of high-level languages. Blanket mistrust of abstraction, to the point where we should actively avoid including means of abstraction in software technology, is absurd. (From http://pchiusano.github.io/2014-07-02/css-is-unnecessary.html)

      Abstraction to make complex software possible and understandable

    1. small window

      ie. kernel

    2. Using multiple copies of a neuron in different places is the neural network equivalent of using functions. Because there is less to learn, the model learns more quickly and learns a better model. This technique – the technical name for it is ‘weight tying’ – is essential to the phenomenal results we’ve recently seen from deep learning.

      This parameter sharing allows CNNs, for example, to need much less params/weights than Fully Connected NNs.

    3. The known connection between geometry, logic, topology, and functional programming suggests that the connections between representations and types may be of fundamental significance.

      Examples for each?

    4. Representations are Types With every layer, neural networks transform data, molding it into a form that makes their task easier to do. We call these transformed versions of data “representations.” Representations correspond to types.

      Interesting.

      Like a Queue Type represents a FIFO flow and a Stack a FILO flow, where the space we transformed is the operation space of the type (eg a Queue has a folded operation space compared to an Array)

      Just free styling here...

    5. In this view, the representations narrative in deep learning corresponds to type theory in functional programming. It sees deep learning as the junction of two fields we already know to be incredibly rich. What we find, seems so beautiful to me, feels so natural, that the mathematician in me could believe it to be something fundamental about reality.

      compositional deep learning

    6. Appendix: Functional Names of Common Layers Deep Learning Name Functional Name Learned Vector Constant Embedding Layer List Indexing Encoding RNN Fold Generating RNN Unfold General RNN Accumulating Map Bidirectional RNN Zipped Left/Right Accumulating Maps Conv Layer “Window Map” TreeNet Catamorphism Inverse TreeNet Anamorphism

      👌translation. I like to think about embeddings as List lookups

    1. If neurons are not the right way to understand neural nets, what is? In real life, combinations of neurons work together to represent images in neural networks. A helpful way to think about these combinations is geometrically: let’s define activation space to be all possible combinations of neuron activations. We can then think of individual neuron activations as the basis vectors of this activation space. Conversely, a combination of neuron activations is then just a vector in this space.

      👌great reframe

    1. Semantic dictionaries are powerful not just because they move away from meaningless indices, but because they express a neural network’s learned abstractions with canonical examples. With image classification, the neural network learns a set of visual abstractions and thus images are the most natural symbols to represent them. Were we working with audio, the more natural symbols would most likely be audio clips. This is important because when neurons appear to correspond to human ideas, it is tempting to reduce them to words. Doing so, however, is a lossy operation — even for familiar abstractions, the network may have learned a deeper nuance. For instance, GoogLeNet has multiple floppy ear detectors that appear to detect slightly different levels of droopiness, length, and surrounding context to the ears. There also may exist abstractions which are visually familiar, yet that we lack good natural language descriptions for: for example, take the particular column of shimmering light where sun hits rippling water.

      nuance beyond words

    1. This intentional break from pencil-and-paper notation is meant to emphasize how matrices work. To compute the output vector (i.e. to apply the function), multiply each column of the matrix by the input above it, and then add up the columns (think of squishing them together horizontally).

      read while playing with this: http://matrixmultiplication.xyz/

    2. After months of using and learning about matrices, this is the best gist I've come across.

  6. Jul 2019
    1. In statistical testing, we structure experiments in terms of null & alternative hypotheses. Our test will have the following hypothesis schema: Η0: μtreatment <= μcontrol ΗA: μtreatment > μcontrol Our null hypothesis claims that the new shampoo does not increase wool quality. The alternative hypothesis claims the opposite; new shampoo yields superior wool quality.

      hypothesis schema; statistics

  7. futureofcoding.org futureofcoding.org
    1. Apparently 77% of Wikipedia is written by 1% of editors - and that’s not even counting users.

      hard to believe

    1. One major idea in mathematics is the idea of “closure”. This is the ques-tion: What is the set of all things that can result from my proposed oper-ations? In the case of vectors: What is the set of vectors that can result bystarting with a small set of vectors, and adding them to each other andscaling them? This results in a vector space

      closure in mathematics. sounds similar to domain of a function

    2. We will discuss classification in the context of supportclassificationvector machines

      SVMs aren't used that much in practice anymore. It's more of an academic fling, because they're nice to work with mathematically. Empirically, Tree Ensembles or Neural Nets are almost always better.

    1. In Hardy's words, "Exposition, criticism, appreciation, is work for second-rate minds. [...] It is a melancholy experience for a professional mathematician to find himself writing about mathematics. The function of a mathematician is to do something, to prove new theorems, to add to mathematics, and not to talk about what he or other mathematicians have done."

      similar to Nassim Taleb's "History is written by losers"

    1. She's since learned her allergies and asthma are conditions she's had since childhood that have nothing to do with her weight, and her migraines are hormonal. She's now on meds for these conditions.

      Ignoring the fact that 'morbidly obese' by itself is a serious health condition

    2. Interesting, I used to think that an early death from obesity hurts more than being told you're fat by a professional.