1,145 Matching Annotations
  1. Last 7 days
    1. Letters of transit signed by General de Gaulle. Cannot be rescinded, not even questioned.Rick appears ready to take them form Ugarte.

      Heavy price tag. Visa's are the currency in Casablanca.

    1. As a prototype it hits a sweet spot: it's challenging - it's no small feat to recognize handwritten digits - but it's not so difficult as to require an extremely complicated solution, or tremendous computational power. Furthermore, it's a great way to develop more advanced techniques, such as deep learning. And so throughout the book we'll return repeatedly to the problem of handwriting recognition. Later in the book, we'll discuss how these ideas may be applied to other problems in computer vision, and also in speech, natural language processing, and other domains.Of course, if the point of the chapter was only to write a computer program to recognize handwritten digits, then the chapter would be much shorter! But along the way we'll develop many key ideas about neural networks, including two important types of artificial neuron (the perceptron and the sigmoid neuron), and the standard learning algorithm for neural networks, known as stochastic gradient descent. Throughout, I focus on explaining why things are done the way they are, and on building your neural networks intuition. That requires a lengthier discussion than if I just presented the basic mechanics of what's going on, but it's worth it for the deeper understanding you'll attain. Amongst the payoffs, by the end of the chapter we'll be in position to understand what deep learning is, and why it matters.PerceptronsWhat is a neural network? To get started, I'll explain a type of artificial neuron called a perceptron. Perceptrons were developed in the 1950s and 1960s by the scientist Frank Rosenblatt, inspired by earlier work by Warren McCulloch and Walter Pitts. Today, it's more common to use other models of artificial neurons - in this book, and in much modern work on neural networks, the main neuron model used is one called the sigmoid neuron. We'll get to sigmoid neurons shortly. But to understand why sigmoid neurons are defined the way they are, it's worth taking the time to first understand perceptrons.So how do perceptrons work? A perceptron takes several binary inputs, x1,x2,…x1,x2,…x_1, x_2, \ldots, and produces a single binary output: In the example shown the perceptron has three inputs, x1,x2,x3x1,x2,x3x_1, x_2, x_3. In general it could have more or fewer inputs. Rosenblatt proposed a simple rule to compute the output. He introduced weights, w1,w2,…w1,w2,…w_1,w_2,\ldots, real numbers expressing the importance of the respective inputs to the output. The neuron's output, 000 or 111, is determined by whether the weighted sum ∑jwjxj∑jwjxj\sum_j w_j x_j is less than or greater than some threshold value. Just like the weights, the threshold is a real number which is a parameter of the neuron. To put it in more precise algebraic terms: output={01if ∑jwjxj≤ thresholdif ∑jwjxj> threshold(1)(1)output={0if ∑jwjxj≤ threshold1if ∑jwjxj> threshold\begin{eqnarray} \mbox{output} & = & \left\{ \begin{array}{ll} 0 & \mbox{if } \sum_j w_j x_j \leq \mbox{ threshold} \\ 1 & \mbox{if } \sum_j w_j x_j > \mbox{ threshold} \end{array} \right. \tag{1}\end{eqnarray} That's all there is to how a perceptron works!That's the basic mathematical model. A way you can think about the perceptron is that it's a device that makes decisions by weighing up evidence. Let me give an example. It's not a very realistic example, but it's easy to understand, and we'll soon get to more realistic examples. Suppose the weekend is coming up, and you've heard that there's going to be a cheese festival in your city. You like cheese, and are trying to decide whether or not to go to the festival. You might make your decision by weighing up three factors: Is the weather good? Does your boyfriend or girlfriend want to accompany you? Is the festival near public transit? (You don't own a car). We can represent these three factors by corresponding binary variables x1,x2x1,x2x_1, x_2, and x3x3x_3. For instance, we'd have x1=1x1=1x_1 = 1 if the weather is good, and x1=0x1=0x_1 = 0 if the weather is bad. Similarly, x2=1x2=1x_2 = 1 if your boyfriend or girlfriend wants to go, and x2=0x2=0x_2 = 0 if not. And similarly again for x3x3x_3 and public transit.Now, suppose you absolutely adore cheese, so much so that you're happy to go to the festival even if your boyfriend or girlfriend is uninterested and the festival is hard to get to. But perhaps you really loathe bad weather, and there's no way you'd go to the festival if the weather is bad. You can use perceptrons to model this kind of decision-making. One way to do this is to choose a weight w1=6w1=6w_1 = 6 for the weather, and w2=2w2=2w_2 = 2 and w3=2w3=2w_3 = 2 for the other conditions. The larger value of w1w1w_1 indicates that the weather matters a lot to you, much more than whether your boyfriend or girlfriend joins you, or the nearness of public transit. Finally, suppose you choose a threshold of 555 for the perceptron. With these choices, the perceptron implements the desired decision-making model, outputting 111 whenever the weather is good, and 000 whenever the weather is bad. It makes no difference to the output whether your boyfriend or girlfriend wants to go, or whether public transit is nearby.By varying the weights and the threshold, we can get different models of decision-making. For example, suppose we instead chose a threshold of 333. Then the perceptron would decide that you should go to the festival whenever the weather was good or when both the festival was near public transit and your boyfriend or girlfriend was willing to join you. In other words, it'd be a different model of decision-making. Dropping the threshold means you're more willing to go to the festival.Obviously, the perceptron isn't a complete model of human decision-making! But what the example illustrates is how a perceptron can weigh up different kinds of evidence in order to make decisions. And it should seem plausible that a complex network of perceptrons could make quite subtle decisions: In this network, the first column of perceptrons - what we'll call the first layer of perceptrons - is making three very simple decisions, by weighing the input evidence. What about the perceptrons in the second layer? Each of those perceptrons is making a decision by weighing up the results from the first layer of decision-making. In this way a perceptron in the second layer can make a decision at a more complex and more abstract level than perceptrons in the first layer. And even more complex decisions can be made by the perceptron in the third layer. In this way, a many-layer network of perceptrons can engage in sophisticated decision making.Incidentally, when I defined perceptrons I said that a perceptron has just a single output. In the network above the perceptrons look like they have multiple outputs. In fact, they're still single output. The multiple output arrows are merely a useful way of indicating that the output from a perceptron is being used as the input to several other perceptrons. It's less unwieldy than drawing a single output line which then splits.Let's simplify the way we describe perceptrons. The condition ∑jwjxj>threshold∑jwjxj>threshold\sum_j w_j x_j > \mbox{threshold} is cumbersome, and we can make two notational changes to simplify it. The first change is to write ∑jwjxj∑jwjxj\sum_j w_j x_j as a dot product, w⋅x≡∑jwjxjw⋅x≡∑jwjxjw \cdot x \equiv \sum_j w_j x_j, where www and xxx are vectors whose components are the weights and inputs, respectively. The second change is to move the threshold to the other side of the inequality, and to replace it by what's known as the perceptron's bias, b≡−thresholdb≡−thresholdb \equiv -\mbox{threshold}. Using the bias instead of the threshold, the perceptron rule can be rewritten: output={01if w⋅x+b≤0if w⋅x+b>0(2)(2)output={0if w⋅x+b≤01if w⋅x+b>0\begin{eqnarray} \mbox{output} = \left\{ \begin{array}{ll} 0 & \mbox{if } w\cdot x + b \leq 0 \\ 1 & \mbox{if } w\cdot x + b > 0 \end{array} \right. \tag{2}\end{eqnarray} You can think of the bias as a measure of how easy it is to get the perceptron to output a 111. Or to put it in more biological terms, the bias is a measure of how easy it is to get the perceptron to fire. For a perceptron with a really big bias, it's extremely easy for the perceptron to output a 111. But if the bias is very negative, then it's difficult for the perceptron to output a 111. Obviously, introducing the bias is only a small change in how we describe perceptrons, but we'll see later that it leads to further notational simplifications. Because of this, in the remainder of the book we won't use the threshold, we'll always use the bias.I've described perceptrons as a method for weighing evidence to make decisions. Another way perceptrons can be used is to compute the elementary logical functions we usually think of as underlying computation, functions such as AND, OR, and NAND. For example, suppose we have a perceptron with two inputs, each with weight −2−2-2, and an overall bias of 333. Here's our perceptron: Then we see that input 000000 produces output 111, since (−2)∗0+(−2)∗0+3=3(−2)∗0+(−2)∗0+3=3(-2)*0+(-2)*0+3 = 3 is positive. Here, I've introduced the ∗∗* symbol to make the multiplications explicit. Similar calculations show that the inputs 010101 and 101010 produce output 111. But the input 111111 produces output 000, since (−2)∗1+(−2)∗1+3=−1(−2)∗1+(−2)∗1+3=−1(-2)*1+(-2)*1+3 = -1 is negative. And so our perceptron implements a NAND gate!The NAND example shows that we can use perceptrons to compute simple logical functions. In fact, we can use networks of perceptrons to compute any logical function at all. The reason is that the NAND gate is universal for computation, that is, we can build any computation up out of NAND gates. For example, we can use NAND gates to build a circuit which adds two bits, x1x1x_1 and x2x2x_2. This requires computing the bitwise sum, x1⊕x2x1⊕x2x_1 \oplus x_2, as well as a carry bit which is set to 111 when both x1x1x_1 and x2x2x_2 are 111, i.e., the carry bit is just the bitwise product x1x2x1x2x_1 x_2: To get an equivalent network of perceptrons we replace all the NAND gates by perceptrons with two inputs, each with weight −2−2-2, and an overall bias of 333. Here's the resulting network. Note that I've moved the perceptron corresponding to the bottom right NAND gate a little, just to make it easier to draw the arrows on the diagram: One notable aspect of this network of perceptrons is that the output from the leftmost perceptron is used twice as input to the bottommost perceptron. When I defined the perceptron model I didn't say whether this kind of double-output-to-the-same-place was allowed. Actually, it doesn't much matter. If we don't want to allow this kind of thing, then it's possible to simply merge the two lines, into a single connection with a weight of -4 instead of two connections with -2 weights. (If you don't find this obvious, you should stop and prove to yourself that this is equivalent.) With that change, the network looks as follows, with all unmarked weights equal to -2, all biases equal to 3, and a single weight of -4, as marked: Up to now I've been drawing inputs like x1x1x_1 and x2x2x_2 as variables floating to the left of the network of perceptrons. In fact, it's conventional to draw an extra layer of perceptrons - the input layer - to encode the inputs: This notation for input perceptrons, in which we have an output, but no inputs, is a shorthand. It doesn't actually mean a perceptron with no inputs. To see this, suppose we did have a perceptron with no inputs. Then the weighted sum ∑jwjxj∑jwjxj\sum_j w_j x_j would always be zero, and so the perceptron would output 111 if b>0b>0b > 0, and 000 if b≤0b≤0b \leq 0. That is, the perceptron would simply output a fixed value, not the desired value (x1x1x_1, in the example above). It's better to think of the input perceptrons as not really being perceptrons at all, but rather special units which are simply defined to output the desired values, x1,x2,…x1,x2,…x_1, x_2,\ldots.The adder example demonstrates how a network of perceptrons can be used to simulate a circuit containing many NAND gates. And because NAND gates are universal for computation, it follows that perceptrons are also universal for computation.The computational universality of perceptrons is simultaneously reassuring and disappointing. It's reassuring because it tells us that networks of perceptrons can be as powerful as any other computing device. But it's also disappointing, because it makes it seem as though perceptrons are merely a new type of NAND gate. That's hardly big news!However, the situation is better than this view suggests. It turns out that we can devise learning algorithms which can automatically tune the weights and biases of a network of artificial neurons. This tuning happens in response to external stimuli, without direct intervention by a programmer. These learning algorithms enable us to use artificial neurons in a way which is radically different to conventional logic gates. Instead of explicitly laying out a circuit of NAND and other gates, our neural networks can simply learn to solve problems, sometimes problems where it would be extremely difficult to directly design a conventional circuit.Sigmoid neuronsLearning algorithms sound terrific. But how can we devise such algorithms for a neural network? Suppose we have a network of perceptrons that we'd like to use to learn to solve some problem. For example, the inputs to the network might be the raw pixel data from a scanned, handwritten image of a digit. And we'd like the network to learn weights and biases so that the output from the network correctly classifies the digit. To see how learning might work, suppose we make a small change in some weight (or bias) in the network. What we'd like is for this small change in weight to cause only a small corresponding change in the output from the network. As we'll see in a moment, this property will make learning possible. Schematically, here's what we want (obviously this network is too simple to do handwriting recognition!): If it were true that a small change in a weight (or bias) causes only a small change in output, then we could use this fact to modify the weights and biases to get our network to behave more in the manner we want. For example, suppose the network was mistakenly classifying an image as an "8" when it should be a "9". We could figure out how to make a small change in the weights and biases so the network gets a little closer to classifying the image as a "9". And then we'd repeat this, changing the weights and biases over and over to produce better and better output. The network would be learning.The problem is that this isn't what happens when our network contains perceptrons. In fact, a small change in the weights or bias of any single perceptron in the network can sometimes cause the output of that perceptron to completely flip, say from 000 to 111. That flip may then cause the behaviour of the rest of the network to completely change in some very complicated way. So while your "9" might now be classified correctly, the behaviour of the network on all the other images is likely to have completely changed in some hard-to-control way. That makes it difficult to see how to gradually modify the weights and biases so that the network gets closer to the desired behaviour. Perhaps there's some clever way of getting around this problem. But it's not immediately obvious how we can get a network of perceptrons to learn.We can overcome this problem by introducing a new type of artificial neuron called a sigmoid neuron. Sigmoid neurons are similar to perceptrons, but modified so that small changes in their weights and bias cause only a small change in their output. That's the crucial fact which will allow a network of sigmoid neurons to learn.Okay, let me describe the sigmoid neuron. We'll depict sigmoid neurons in the same way we depicted perceptrons: Just like a perceptron, the sigmoid neuron has inputs, x1,x2,…x1,x2,…x_1, x_2, \ldots. But instead of being just 000 or 111, these inputs can also take on any values between 000 and 111. So, for instance, 0.638…0.638…0.638\ldots is a valid input for a sigmoid neuron. Also just like a perceptron, the sigmoid neuron has weights for each input, w1,w2,…w1,w2,…w_1, w_2, \ldots, and an overall bias, bbb. But the output is not 000 or 111. Instead, it's σ(w⋅x+b)σ(w⋅x+b)\sigma(w \cdot x+b), where σσ\sigma is called the sigmoid function* *Incidentally, σσ\sigma is sometimes called the logistic function, and this new class of neurons called logistic neurons. It's useful to remember this terminology, since these terms are used by many people working with neural nets. However, we'll stick with the sigmoid terminology., and is defined by: σ(z)≡11+e−z.(3)(3)σ(z)≡11+e−z.\begin{eqnarray} \sigma(z) \equiv \frac{1}{1+e^{-z}}. \tag{3}\end{eqnarray} To put it all a little more explicitly, the output of a sigmoid neuron with inputs x1,x2,…x1,x2,…x_1,x_2,\ldots, weights w1,w2,…w1,w2,…w_1,w_2,\ldots, and bias bbb is 11+exp(−∑jwjxj−b).(4)(4)11+exp⁡(−∑jwjxj−b).\begin{eqnarray} \frac{1}{1+\exp(-\sum_j w_j x_j-b)}. \tag{4}\end{eqnarray}At first sight, sigmoid neurons appear very different to perceptrons. The algebraic form of the sigmoid function may seem opaque and forbidding if you're not already familiar with it. In fact, there are many similarities between perceptrons and sigmoid neurons, and the algebraic form of the sigmoid function turns out to be more of a technical detail than a true barrier to understanding.To understand the similarity to the perceptron model, suppose z≡w⋅x+bz≡w⋅x+bz \equiv w \cdot x + b is a large positive number. Then e−z≈0e−z≈0e^{-z} \approx 0 and so σ(z)≈1σ(z)≈1\sigma(z) \approx 1. In other words, when z=w⋅x+bz=w⋅x+bz = w \cdot x+b is large and positive, the output from the sigmoid neuron is approximately 111, just as it would have been for a perceptron. Suppose on the other hand that z=w⋅x+bz=w⋅x+bz = w \cdot x+b is very negative. Then e−z→∞e−z→∞e^{-z} \rightarrow \infty, and σ(z)≈0σ(z)≈0\sigma(z) \approx 0. So when z=w⋅x+bz=w⋅x+bz = w \cdot x +b is very negative, the behaviour of a sigmoid neuron also closely approximates a perceptron. It's only when w⋅x+bw⋅x+bw \cdot x+b is of modest size that there's much deviation from the perceptron model.What about the algebraic form of σσ\sigma? How can we understand that? In fact, the exact form of σσ\sigma isn't so important - what really matters is the shape of the function when plotted. Here's the shape: -4-3-2-1012340. function function s(x) {return 1/(1+Math.exp(-x));} var m = [40, 120, 50, 120]; var height = 290 - m[0] - m[2]; var width = 600 - m[1] - m[3]; var xmin = -5; var xmax = 5; var sample = 400; var x1 = d3.scale.linear().domain([0, sample]).range([xmin, xmax]); var data = d3.range(sample).map(function(d){ return { x: x1(d), y: s(x1(d))}; }); var x = d3.scale.linear().domain([xmin, xmax]).range([0, width]); var y = d3.scale.linear() .domain([0, 1]) .range([height, 0]); var line = d3.svg.line() .x(function(d) { return x(d.x); }) .y(function(d) { return y(d.y); }) var graph = d3.select("#sigmoid_graph") .append("svg") .attr("width", width + m[1] + m[3]) .attr("height", height + m[0] + m[2]) .append("g") .attr("transform", "translate(" + m[3] + "," + m[0] + ")"); var xAxis = d3.svg.axis() .scale(x) .tickValues(d3.range(-4, 5, 1)) .orient("bottom") graph.append("g") .attr("class", "x axis") .attr("transform", "translate(0, " + height + ")") .call(xAxis); var yAxis = d3.svg.axis() .scale(y) .tickValues(d3.range(0, 1.01, 0.2)) .orient("left") .ticks(5) graph.append("g") .attr("class", "y axis") .call(yAxis); graph.append("path").attr("d", line(data)); graph.append("text") .attr("class", "x label") .attr("text-anchor", "end") .attr("x", width/2) .attr("y", height+35) .text("z"); graph.append("text") .attr("x", (width / 2)) .attr("y", -10) .attr("text-anchor", "middle") .style("font-size", "16px") .text("sigmoid function"); This shape is a smoothed out version of a step function: -4-3-2-1012340. function function s(x) {return x < 0 ? 0 : 1;} var m = [40, 120, 50, 120]; var height = 290 - m[0] - m[2]; var width = 600 - m[1] - m[3]; var xmin = -5; var xmax = 5; var sample = 400; var x1 = d3.scale.linear().domain([0, sample]).range([xmin, xmax]); var data = d3.range(sample).map(function(d){ return { x: x1(d), y: s(x1(d))}; }); var x = d3.scale.linear().domain([xmin, xmax]).range([0, width]); var y = d3.scale.linear() .domain([0,1]) .range([height, 0]); var line = d3.svg.line() .x(function(d) { return x(d.x); }) .y(function(d) { return y(d.y); }) var graph = d3.select("#step_graph") .append("svg") .attr("width", width + m[1] + m[3]) .attr("height", height + m[0] + m[2]) .append("g") .attr("transform", "translate(" + m[3] + "," + m[0] + ")"); var xAxis = d3.svg.axis() .scale(x) .tickValues(d3.range(-4, 5, 1)) .orient("bottom") graph.append("g") .attr("class", "x axis") .attr("transform", "translate(0, " + height + ")") .call(xAxis); var yAxis = d3.svg.axis() .scale(y) .tickValues(d3.range(0, 1.01, 0.2)) .orient("left") .ticks(5) graph.append("g") .attr("class", "y axis") .call(yAxis); graph.append("path").attr("d", line(data)); graph.append("text") .attr("class", "x label") .attr("text-anchor", "end") .attr("x", width/2) .attr("y", height+35) .text("z"); graph.append("text") .attr("x", (width / 2)) .attr("y", -10) .attr("text-anchor", "middle") .style("font-size", "16px") .text("step function"); If σσ\sigma had in fact been a step function, then the sigmoid neuron would be a perceptron, since the output would be 111 or 000 depending on whether w⋅x+bw⋅x+bw\cdot x+b was positive or negative* *Actually, when w⋅x+b=0w⋅x+b=0w \cdot x +b = 0 the perceptron outputs 000, while the step function outputs 111. So, strictly speaking, we'd need to modify the step function at that one point. But you get the idea.. By using the actual σσ\sigma function we get, as already implied above, a smoothed out perceptron. Indeed, it's the smoothness of the σσ\sigma function that is the crucial fact, not its detailed form. The smoothness of σσ\sigma means that small changes ΔwjΔwj\Delta w_j in the weights and ΔbΔb\Delta b in the bias will produce a small change ΔoutputΔoutput\Delta \mbox{output} in the output from the neuron. In fact, calculus tells us that ΔoutputΔoutput\Delta \mbox{output} is well approximated by Δoutput≈∑j∂output∂wjΔwj+∂output∂bΔb,(5)(5)Δoutput≈∑j∂output∂wjΔwj+∂output∂bΔb,\begin{eqnarray} \Delta \mbox{output} \approx \sum_j \frac{\partial \, \mbox{output}}{\partial w_j} \Delta w_j + \frac{\partial \, \mbox{output}}{\partial b} \Delta b, \tag{5}\end{eqnarray} where the sum is over all the weights, wjwjw_j, and ∂output/∂wj∂output/∂wj\partial \, \mbox{output} / \partial w_j and ∂output/∂b∂output/∂b\partial \, \mbox{output} /\partial b denote partial derivatives of the outputoutput\mbox{output} with respect to wjwjw_j and bbb, respectively. Don't panic if you're not comfortable with partial derivatives! While the expression above looks complicated, with all the partial derivatives, it's actually saying something very simple (and which is very good news): ΔoutputΔoutput\Delta \mbox{output} is a linear function of the changes ΔwjΔwj\Delta w_j and ΔbΔb\Delta b in the weights and bias. This linearity makes it easy to choose small changes in the weights and biases to achieve any desired small change in the output. So while sigmoid neurons have much of the same qualitative behaviour as perceptrons, they make it much easier to figure out how changing the weights and biases will change the output.If it's the shape of σσ\sigma which really matters, and not its exact form, then why use the particular form used for σσ\sigma in Equation (3)σ(z)≡11+e−zσ(z)≡11+e−z\begin{eqnarray} \sigma(z) \equiv \frac{1}{1+e^{-z}} \nonumber\end{eqnarray}$('#margin_387419264610_reveal').click(function() {$('#margin_387419264610').toggle('slow', function() {});});? In fact, later in the book we will occasionally consider neurons where the output is f(w⋅x+b)f(w⋅x+b)f(w \cdot x + b) for some other activation function f(⋅)f(⋅)f(\cdot). The main thing that changes when we use a different activation function is that the particular values for the partial derivatives in Equation (5)Δoutput≈∑j∂output∂wjΔwj+∂output∂bΔbΔoutput≈∑j∂output∂wjΔwj+∂output∂bΔb\begin{eqnarray} \Delta \mbox{output} \approx \sum_j \frac{\partial \, \mbox{output}}{\partial w_j} \Delta w_j + \frac{\partial \, \mbox{output}}{\partial b} \Delta b \nonumber\end{eqnarray}$('#margin_727997094331_reveal').click(function() {$('#margin_727997094331').toggle('slow', function() {});}); change. It turns out that when we compute those partial derivatives later, using σσ\sigma will simplify the algebra, simply because exponentials have lovely properties when differentiated. In any case, σσ\sigma is commonly-used in work on neural nets, and is the activation function we'll use most often in this book.How should we interpret the output from a sigmoid neuron? Obviously, one big difference between perceptrons and sigmoid neurons is that sigmoid neurons don't just output 000 or 111. They can have as output any real number between 000 and 111, so values such as 0.173…0.173…0.173\ldots and 0.689…0.689…0.689\ldots are legitimate outputs. This can be useful, for example, if we want to use the output value to represent the average intensity of the pixels in an image input to a neural network. But sometimes it can be a nuisance. Suppose we want the output from the network to indicate either "the input image is a 9" or "the input image is not a 9". Obviously, it'd be easiest to do this if the output was a 000 or a 111, as in a perceptron. But in practice we can set up a convention to deal with this, for example, by deciding to interpret any output of at least as indicating a "9", and any output less than as indicating "not a 9". I'll always explicitly state when we're using such a convention, so it shouldn't cause any confusion. Exercises Sigmoid neurons simulating perceptrons, part I \mbox{} Suppose we take all the weights and biases in a network of perceptrons, and multiply them by a positive constant, c>0c>0c > 0. Show that the behaviour of the network doesn't change.Sigmoid neurons simulating perceptrons, part II \mbox{} Suppose we have the same setup as the last problem - a network of perceptrons. Suppose also that the overall input to the network of perceptrons has been chosen. We won't need the actual input value, we just need the input to have been fixed. Suppose the weights and biases are such that w⋅x+b≠0w⋅x+b≠0w \cdot x + b \neq 0 for the input xxx to any particular perceptron in the network. Now replace all the perceptrons in the network by sigmoid neurons, and multiply the weights and biases by a positive constant c>0c>0c > 0. Show that in the limit as c→∞c→∞c \rightarrow \infty the behaviour of this network of sigmoid neurons is exactly the same as the network of perceptrons. How can this fail when w⋅x+b=0w⋅x+b=0w \cdot x + b = 0 for one of the perceptrons? The architecture of neural networksIn the next section I'll introduce a neural network that can do a pretty good job classifying handwritten digits. In preparation for that, it helps to explain some terminology that lets us name different parts of a network. Suppose we have the network: As mentioned earlier, the leftmost layer in this network is called the input layer, and the neurons within the layer are called input neurons. The rightmost or output layer contains the output neurons, or, as in this case, a single output neuron. The middle layer is called a hidden layer, since the neurons in this layer are neither inputs nor outputs. The term "hidden" perhaps sounds a little mysterious - the first time I heard the term I thought it must have some deep philosophical or mathematical significance - but it really means nothing more than "not an input or an output". The network above has just a single hidden layer, but some networks have multiple hidden layers. For example, the following four-layer network has two hidden layers: Somewhat confusingly, and for historical reasons, such multiple layer networks are sometimes called multilayer perceptrons or MLPs, despite being made up of sigmoid neurons, not perceptrons. I'm not going to use the MLP terminology in this book, since I think it's confusing, but wanted to warn you of its existence.The design of the input and output layers in a network is often straightforward. For example, suppose we're trying to determine whether a handwritten image depicts a "9" or not. A natural way to design the network is to encode the intensities of the image pixels into the input neurons. If the image is a 646464 by 646464 greyscale image, then we'd have 4,096=64×644,096=64×644,096 = 64 \times 64 input neurons, with the intensities scaled appropriately between 000 and 111. The output layer will contain just a single neuron, with output values of less than indicating "input image is not a 9", and values greater than indicating "input image is a 9 ". While the design of the input and output layers of a neural network is often straightforward, there can be quite an art to the design of the hidden layers. In particular, it's not possible to sum up the design process for the hidden layers with a few simple rules of thumb. Instead, neural networks researchers have developed many design heuristics for the hidden layers, which help people get the behaviour they want out of their nets. For example, such heuristics can be used to help determine how to trade off the number of hidden layers against the time required to train the network. We'll meet several such design heuristics later in this book. Up to now, we've been discussing neural networks where the output from one layer is used as input to the next layer. Such networks are called feedforward neural networks. This means there are no loops in the network - information is always fed forward, never fed back. If we did have loops, we'd end up with situations where the input to the σσ\sigma function depended on the output. That'd be hard to make sense of, and so we don't allow such loops.However, there are other models of artificial neural networks in which feedback loops are possible. These models are called recurrent neural networks. The idea in these models is to have neurons which fire for some limited duration of time, before becoming quiescent. That firing can stimulate other neurons, which may fire a little while later, also for a limited duration. That causes still more neurons to fire, and so over time we get a cascade of neurons firing. Loops don't cause problems in such a model, since a neuron's output only affects its input at some later time, not instantaneously.Recurrent neural nets have been less influential than feedforward networks, in part because the learning algorithms for recurrent nets are (at least to date) less powerful. But recurrent networks are still extremely interesting. They're much closer in spirit to how our brains work than feedforward networks. And it's possible that recurrent networks can solve important problems which can only be solved with great difficulty by feedforward networks. However, to limit our scope, in this book we're going to concentrate on the more widely-used feedforward networks.A simple network to classify handwritten digitsHaving defined neural networks, let's return to handwriting recognition. We can split the problem of recognizing handwritten digits into two sub-problems. First, we'd like a way of breaking an image containing many digits into a sequence of separate images, each containing a single digit. For example, we'd like to break the imageinto six separate images, We humans solve this segmentation problem with ease, but it's challenging for a computer program to correctly break up the image. Once the image has been segmented, the program then needs to classify each individual digit. So, for instance, we'd like our program to recognize that the first digit above,is a 5.We'll focus on writing a program to solve the second problem, that is, classifying individual digits. We do this because it turns out that the segmentation problem is not so difficult to solve, once you have a good way of classifying individual digits. There are many approaches to solving the segmentation problem. One approach is to trial many different ways of segmenting the image, using the individual digit classifier to score each trial segmentation. A trial segmentation gets a high score if the individual digit classifier is confident of its classification in all segments, and a low score if the classifier is having a lot of trouble in one or more segments. The idea is that if the classifier is having trouble somewhere, then it's probably having trouble because the segmentation has been chosen incorrectly. This idea and other variations can be used to solve the segmentation problem quite well. So instead of worrying about segmentation we'll concentrate on developing a neural network which can solve the more interesting and difficult problem, namely, recognizing individual handwritten digits.To recognize individual digits we will use a three-layer neural network: The input layer of the network contains neurons encoding the values of the input pixels. As discussed in the next section, our training data for the network will consist of many 282828 by 282828 pixel images of scanned handwritten digits, and so the input layer contains 784=28×28784=28×28784 = 28 \times 28 neurons. For simplicity I've omitted most of the 784784784 input neurons in the diagram above. The input pixels are greyscale, with a value of representing white, a value of representing black, and in between values representing gradually darkening shades of grey.The second layer of the network is a hidden layer. We denote the number of neurons in this hidden layer by nnn, and we'll experiment with different values for nnn. The example shown illustrates a small hidden layer, containing just n=15n=15n = 15 neurons.The output layer of the network contains 10 neurons. If the first neuron fires, i.e., has an output ≈1≈1\approx 1, then that will indicate that the network thinks the digit is a 000. If the second neuron fires then that will indicate that the network thinks the digit is a 111. And so on. A little more precisely, we number the output neurons from 000 through 999, and figure out which neuron has the highest activation value. If that neuron is, say, neuron number 666, then our network will guess that the input digit was a 666. And so on for the other output neurons.You might wonder why we use 101010 output neurons. After all, the goal of the network is to tell us which digit (0,1,2,…,90,1,2,…,90, 1, 2, \ldots, 9) corresponds to the input image. A seemingly natural way of doing that is to use just 444 output neurons, treating each neuron as taking on a binary value, depending on whether the neuron's output is closer to 000 or to 111. Four neurons are enough to encode the answer, since 24=1624=162^4 = 16 is more than the 10 possible values for the input digit. Why should our network use 101010 neurons instead? Isn't that inefficient? The ultimate justification is empirical: we can try out both network designs, and it turns out that, for this particular problem, the network with 101010 output neurons learns to recognize digits better than the network with 444 output neurons. But that leaves us wondering why using 101010 output neurons works better. Is there some heuristic that would tell us in advance that we should use the 101010-output encoding instead of the 444-output encoding?To understand why we do this, it helps to think about what the neural network is doing from first principles. Consider first the case where we use 101010 output neurons. Let's concentrate on the first output neuron, the one that's trying to decide whether or not the digit is a 000. It does this by weighing up evidence from the hidden layer of neurons. What are those hidden neurons doing? Well, just suppose for the sake of argument that the first neuron in the hidden layer detects whether or not an image like the following is present:It can do this by heavily weighting input pixels which overlap with the image, and only lightly weighting the other inputs. In a similar way, let's suppose for the sake of argument that the second, third, and fourth neurons in the hidden layer detect whether or not the following images are present:As you may have guessed, these four images together make up the 000 image that we saw in the line of digits shown earlier:So if all four of these hidden neurons are firing then we can conclude that the digit is a 000. Of course, that's not the only sort of evidence we can use to conclude that the image was a 000 - we could legitimately get a 000 in many other ways (say, through translations of the above images, or slight distortions). But it seems safe to say that at least in this case we'd conclude that the input was a 000.Supposing the neural network functions in this way, we can give a plausible explanation for why it's better to have 101010 outputs from the network, rather than 444. If we had 444 outputs, then the first output neuron would be trying to decide what the most significant bit of the digit was. And there's no easy way to relate that most significant bit to simple shapes like those shown above. It's hard to imagine that there's any good historical reason the component shapes of the digit will be closely related to (say) the most significant bit in the output.Now, with all that said, this is all just a heuristic. Nothing says that the three-layer neural network has to operate in the way I described, with the hidden neurons detecting simple component shapes. Maybe a clever learning algorithm will find some assignment of weights that lets us use only 444 output neurons. But as a heuristic the way of thinking I've described works pretty well, and can save you a lot of time in designing good neural network architectures.Exercise There is a way of determining the bitwise representation of a digit by adding an extra layer to the three-layer network above. The extra layer converts the output from the previous layer into a binary representation, as illustrated in the figure below. Find a set of weights and biases for the new output layer. Assume that the first 333 layers of neurons are such that the correct output in the third layer (i.e., the old output layer) has activation at least 0.990.990.99, and incorrect outputs have activation less than Learning with gradient descentNow that we have a design for our neural network, how can it learn to recognize digits? The first thing we'll need is a data set to learn from - a so-called training data set. We'll use the MNIST data set, which contains tens of thousands of scanned images of handwritten digits, together with their correct classifications. MNIST's name comes from the fact that it is a modified subset of two data sets collected by NIST, the United States' National Institute of Standards and Technology. Here's a few images from MNIST: As you can see, these digits are, in fact, the same as those shown at the beginning of this chapter as a challenge to recognize. Of course, when testing our network we'll ask it to recognize images which aren't in the training set!The MNIST data comes in two parts. The first part contains 60,000 images to be used as training data. These images are scanned handwriting samples from 250 people, half of whom were US Census Bureau employees, and half of whom were high school students. The images are greyscale and 28 by 28 pixels in size. The second part of the MNIST data set is 10,000 images to be used as test data. Again, these are 28 by 28 greyscale images. We'll use the test data to evaluate how well our neural network has learned to recognize digits. To make this a good test of performance, the test data was taken from a different set of 250 people than the original training data (albeit still a group split between Census Bureau employees and high school students). This helps give us confidence that our system can recognize digits from people whose writing it didn't see during training.We'll use the notation xxx to denote a training input. It'll be convenient to regard each training input xxx as a 28×28=78428×28=78428 \times 28 = 784-dimensional vector. Each entry in the vector represents the grey value for a single pixel in the image. We'll denote the corresponding desired output by y=y(x)y=y(x)y = y(x), where yyy is a 101010-dimensional vector. For example, if a particular training image, xxx, depicts a 666, then y(x)=(0,0,0,0,0,0,1,0,0,0)Ty(x)=(0,0,0,0,0,0,1,0,0,0)Ty(x) = (0, 0, 0, 0, 0, 0, 1, 0, 0, 0)^T is the desired output from the network. Note that TTT here is the transpose operation, turning a row vector into an ordinary (column) vector.What we'd like is an algorithm which lets us find weights and biases so that the output from the network approximates y(x)y(x)y(x) for all training inputs xxx. To quantify how well we're achieving this goal we define a cost function* *Sometimes referred to as a loss or objective function. We use the term cost function throughout this book, but you should note the other terminology, since it's often used in research papers and other discussions of neural networks. : C(w,b)≡12n∑x∥y(x)−a∥2.(6)(6)C(w,b)≡12n∑x‖y(x)−a‖2.\begin{eqnarray} C(w,b) \equiv \frac{1}{2n} \sum_x \| y(x) - a\|^2. \tag{6}\end{eqnarray} Here, www denotes the collection of all weights in the network, bbb all the biases, nnn is the total number of training inputs, aaa is the vector of outputs from the network when xxx is input, and the sum is over all training inputs, xxx. Of course, the output aaa depends on xxx, www and bbb, but to keep the notation simple I haven't explicitly indicated this dependence. The notation ∥v∥‖v‖\| v \| just denotes the usual length function for a vector vvv. We'll call CCC the quadratic cost function; it's also sometimes known as the mean squared error or just MSE. Inspecting the form of the quadratic cost function, we see that C(w,b)C(w,b)C(w,b) is non-negative, since every term in the sum is non-negative. Furthermore, the cost C(w,b)C(w,b)C(w,b) becomes small, i.e., C(w,b)≈0C(w,b)≈0C(w,b) \approx 0, precisely when y(x)y(x)y(x) is approximately equal to the output, aaa, for all training inputs, xxx. So our training algorithm has done a good job if it can find weights and biases so that C(w,b)≈0C(w,b)≈0C(w,b) \approx 0. By contrast, it's not doing so well when C(w,b)C(w,b)C(w,b) is large - that would mean that y(x)y(x)y(x) is not close to the output aaa for a large number of inputs. So the aim of our training algorithm will be to minimize the cost C(w,b)C(w,b)C(w,b) as a function of the weights and biases. In other words, we want to find a set of weights and biases which make the cost as small as possible. We'll do that using an algorithm known as gradient descent. Why introduce the quadratic cost? After all, aren't we primarily interested in the number of images correctly classified by the network? Why not try to maximize that number directly, rather than minimizing a proxy measure like the quadratic cost? The problem with that is that the number of images correctly classified is not a smooth function of the weights and biases in the network. For the most part, making small changes to the weights and biases won't cause any change at all in the number of training images classified correctly. That makes it difficult to figure out how to change the weights and biases to get improved performance. If we instead use a smooth cost function like the quadratic cost it turns out to be easy to figure out how to make small changes in the weights and biases so as to get an improvement in the cost. That's why we focus first on minimizing the quadratic cost, and only after that will we examine the classification accuracy.Even given that we want to use a smooth cost function, you may still wonder why we choose the quadratic function used in Equation (6)C(w,b)≡12n∑x∥y(x)−a∥2C(w,b)≡12n∑x‖y(x)−a‖2\begin{eqnarray} C(w,b) \equiv \frac{1}{2n} \sum_x \| y(x) - a\|^2 \nonumber\end{eqnarray}$('#margin_501822820305_reveal').click(function() {$('#margin_501822820305').toggle('slow', function() {});});. Isn't this a rather ad hoc choice? Perhaps if we chose a different cost function we'd get a totally different set of minimizing weights and biases? This is a valid concern, and later we'll revisit the cost function, and make some modifications. However, the quadratic cost function of Equation (6)C(w,b)≡12n∑x∥y(x)−a∥2C(w,b)≡12n∑x‖y(x)−a‖2\begin{eqnarray} C(w,b) \equiv \frac{1}{2n} \sum_x \| y(x) - a\|^2 \nonumber\end{eqnarray}$('#margin_555483302348_reveal').click(function() {$('#margin_555483302348').toggle('slow', function() {});}); works perfectly well for understanding the basics of learning in neural networks, so we'll stick with it for now.Recapping, our goal in training a neural network is to find weights and biases which minimize the quadratic cost function C(w,b)C(w,b)C(w, b). This is a well-posed problem, but it's got a lot of distracting structure as currently posed - the interpretation of www and bbb as weights and biases, the σσ\sigma function lurking in the background, the choice of network architecture, MNIST, and so on. It turns out that we can understand a tremendous amount by ignoring most of that structure, and just concentrating on the minimization aspect. So for now we're going to forget all about the specific form of the cost function, the connection to neural networks, and so on. Instead, we're going to imagine that we've simply been given a function of many variables and we want to minimize that function. We're going to develop a technique called gradient descent which can be used to solve such minimization problems. Then we'll come back to the specific function we want to minimize for neural networks.Okay, let's suppose we're trying to minimize some function, C(v)C(v)C(v). This could be any real-valued function of many variables, v=v1,v2,…v=v1,v2,…v = v_1, v_2, \ldots. Note that I've replaced the www and bbb notation by vvv to emphasize that this could be any function - we're not specifically thinking in the neural networks context any more. To minimize C(v)C(v)C(v) it helps to imagine CCC as a function of just two variables, which we'll call v1v1v_1 and v2v2v_2:What we'd like is to find where CCC achieves its global minimum. Now, of course, for the function plotted above, we can eyeball the graph and find the minimum. In that sense, I've perhaps shown slightly too simple a function! A general function, CCC, may be a complicated function of many variables, and it won't usually be possible to just eyeball the graph to find the minimum.One way of attacking the problem is to use calculus to try to find the minimum analytically. We could compute derivatives and then try using them to find places where CCC is an extremum. With some luck that might work when CCC is a function of just one or a few variables. But it'll turn into a nightmare when we have many more variables. And for neural networks we'll often want far more variables - the biggest neural networks have cost functions which depend on billions of weights and biases in an extremely complicated way. Using calculus to minimize that just won't work!(After asserting that we'll gain insight by imagining CCC as a function of just two variables, I've turned around twice in two paragraphs and said, "hey, but what if it's a function of many more than two variables?" Sorry about that. Please believe me when I say that it really does help to imagine CCC as a function of two variables. It just happens that sometimes that picture breaks down, and the last two paragraphs were dealing with such breakdowns. Good thinking about mathematics often involves juggling multiple intuitive pictures, learning when it's appropriate to use each picture, and when it's not.)Okay, so calculus doesn't work. Fortunately, there is a beautiful analogy which suggests an algorithm which works pretty well. We start by thinking of our function as a kind of a valley. If you squint just a little at the plot above, that shouldn't be too hard. And we imagine a ball rolling down the slope of the valley. Our everyday experience tells us that the ball will eventually roll to the bottom of the valley. Perhaps we can use this idea as a way to find a minimum for the function? We'd randomly choose a starting point for an (imaginary) ball, and then simulate the motion of the ball as it rolled down to the bottom of the valley. We could do this simulation simply by computing derivatives (and perhaps some second derivatives) of CCC - those derivatives would tell us everything we need to know about the local "shape" of the valley, and therefore how our ball should roll.Based on what I've just written, you might suppose that we'll be trying to write down Newton's equations of motion for the ball, considering the effects of friction and gravity, and so on. Actually, we're not going to take the ball-rolling analogy quite that seriously - we're devising an algorithm to minimize CCC, not developing an accurate simulation of the laws of physics! The ball's-eye view is meant to stimulate our imagination, not constrain our thinking. So rather than get into all the messy details of physics, let's simply ask ourselves: if we were declared God for a day, and could make up our own laws of physics, dictating to the ball how it should roll, what law or laws of motion could we pick that would make it so the ball always rolled to the bottom of the valley?To make this question more precise, let's think about what happens when we move the ball a small amount Δv1Δv1\Delta v_1 in the v1v1v_1 direction, and a small amount Δv2Δv2\Delta v_2 in the v2v2v_2 direction. Calculus tells us that CCC changes as follows: ΔC≈∂C∂v1Δv1+∂C∂v2Δv2.(7)(7)ΔC≈∂C∂v1Δv1+∂C∂v2Δv2.\begin{eqnarray} \Delta C \approx \frac{\partial C}{\partial v_1} \Delta v_1 + \frac{\partial C}{\partial v_2} \Delta v_2. \tag{7}\end{eqnarray} We're going to find a way of choosing Δv1Δv1\Delta v_1 and Δv2Δv2\Delta v_2 so as to make ΔCΔC\Delta C negative; i.e., we'll choose them so the ball is rolling down into the valley. To figure out how to make such a choice it helps to define ΔvΔv\Delta v to be the vector of changes in vvv, Δv≡(Δv1,Δv2)TΔv≡(Δv1,Δv2)T\Delta v \equiv (\Delta v_1, \Delta v_2)^T, where TTT is again the transpose operation, turning row vectors into column vectors. We'll also define the gradient of CCC to be the vector of partial derivatives, (∂C∂v1,∂C∂v2)T(∂C∂v1,∂C∂v2)T\left(\frac{\partial C}{\partial v_1}, \frac{\partial C}{\partial v_2}\right)^T. We denote the gradient vector by ∇C∇C\nabla C, i.e.: ∇C≡(∂C∂v1,∂C∂v2)T.(8)(8)∇C≡(∂C∂v1,∂C∂v2)T.\begin{eqnarray} \nabla C \equiv \left( \frac{\partial C}{\partial v_1}, \frac{\partial C}{\partial v_2} \right)^T. \tag{8}\end{eqnarray} In a moment we'll rewrite the change ΔCΔC\Delta C in terms of ΔvΔv\Delta v and the gradient, ∇C∇C\nabla C. Before getting to that, though, I want to clarify something that sometimes gets people hung up on the gradient. When meeting the ∇C∇C\nabla C notation for the first time, people sometimes wonder how they should think about the ∇∇\nabla symbol. What, exactly, does ∇∇\nabla mean? In fact, it's perfectly fine to think of ∇C∇C\nabla C as a single mathematical object - the vector defined above - which happens to be written using two symbols. In this point of view, ∇∇\nabla is just a piece of notational flag-waving, telling you "hey, ∇C∇C\nabla C is a gradient vector". There are more advanced points of view where ∇∇\nabla can be viewed as an independent mathematical entity in its own right (for example, as a differential operator), but we won't need such points of view.With these definitions, the expression (7)ΔC≈∂C∂v1Δv1+∂C∂v2Δv2ΔC≈∂C∂v1Δv1+∂C∂v2Δv2\begin{eqnarray} \Delta C \approx \frac{\partial C}{\partial v_1} \Delta v_1 + \frac{\partial C}{\partial v_2} \Delta v_2 \nonumber\end{eqnarray}$('#margin_512380394946_reveal').click(function() {$('#margin_512380394946').toggle('slow', function() {});}); for ΔCΔC\Delta C can be rewritten as ΔC≈∇C⋅Δv.(9)(9)ΔC≈∇C⋅Δv.\begin{eqnarray} \Delta C \approx \nabla C \cdot \Delta v. \tag{9}\end{eqnarray} This equation helps explain why ∇C∇C\nabla C is called the gradient vector: ∇C∇C\nabla C relates changes in vvv to changes in CCC, just as we'd expect something called a gradient to do. But what's really exciting about the equation is that it lets us see how to choose ΔvΔv\Delta v so as to make ΔCΔC\Delta C negative. In particular, suppose we choose Δv=−η∇C,(10)(10)Δv=−η∇C,\begin{eqnarray} \Delta v = -\eta \nabla C, \tag{10}\end{eqnarray} where ηη\eta is a small, positive parameter (known as the learning rate). Then Equation (9)ΔC≈∇C⋅ΔvΔC≈∇C⋅Δv\begin{eqnarray} \Delta C \approx \nabla C \cdot \Delta v \nonumber\end{eqnarray}$('#margin_31741254841_reveal').click(function() {$('#margin_31741254841').toggle('slow', function() {});}); tells us that ΔC≈−η∇C⋅∇C=−η∥∇C∥2ΔC≈−η∇C⋅∇C=−η‖∇C‖2\Delta C \approx -\eta \nabla C \cdot \nabla C = -\eta \|\nabla C\|^2. Because ∥∇C∥2≥0‖∇C‖2≥0\| \nabla C \|^2 \geq 0, this guarantees that ΔC≤0ΔC≤0\Delta C \leq 0, i.e., CCC will always decrease, never increase, if we change vvv according to the prescription in (10)Δv=−η∇CΔv=−η∇C\begin{eqnarray} \Delta v = -\eta \nabla C \nonumber\end{eqnarray}$('#margin_48762573303_reveal').click(function() {$('#margin_48762573303').toggle('slow', function() {});});. (Within, of course, the limits of the approximation in Equation (9)ΔC≈∇C⋅ΔvΔC≈∇C⋅Δv\begin{eqnarray} \Delta C \approx \nabla C \cdot \Delta v \nonumber\end{eqnarray}$('#margin_919658643545_reveal').click(function() {$('#margin_919658643545').toggle('slow', function() {});});). This is exactly the property we wanted! And so we'll take Equation (10)Δv=−η∇CΔv=−η∇C\begin{eqnarray} \Delta v = -\eta \nabla C \nonumber\end{eqnarray}$('#margin_287729255111_reveal').click(function() {$('#margin_287729255111').toggle('slow', function() {});}); to define the "law of motion" for the ball in our gradient descent algorithm. That is, we'll use Equation (10)Δv=−η∇CΔv=−η∇C\begin{eqnarray} \Delta v = -\eta \nabla C \nonumber\end{eqnarray}$('#margin_718723868298_reveal').click(function() {$('#margin_718723868298').toggle('slow', function() {});}); to compute a value for ΔvΔv\Delta v, then move the ball's position vvv by that amount: v→v′=v−η∇C.(11)(11)v→v′=v−η∇C.\begin{eqnarray} v \rightarrow v' = v -\eta \nabla C. \tag{11}\end{eqnarray} Then we'll use this update rule again, to make another move. If we keep doing this, over and over, we'll keep decreasing CCC until - we hope - we reach a global minimum.Summing up, the way the gradient descent algorithm works is to repeatedly compute the gradient ∇C∇C\nabla C, and then to move in the opposite direction, "falling down" the slope of the valley. We can visualize it like this:Notice that with this rule gradient descent doesn't reproduce real physical motion. In real life a ball has momentum, and that momentum may allow it to roll across the slope, or even (momentarily) roll uphill. It's only after the effects of friction set in that the ball is guaranteed to roll down into the valley. By contrast, our rule for choosing ΔvΔv\Delta v just says "go down, right now". That's still a pretty good rule for finding the minimum!To make gradient descent work correctly, we need to choose the learning rate ηη\eta to be small enough that Equation (9)ΔC≈∇C⋅ΔvΔC≈∇C⋅Δv\begin{eqnarray} \Delta C \approx \nabla C \cdot \Delta v \nonumber\end{eqnarray}$('#margin_560455937071_reveal').click(function() {$('#margin_560455937071').toggle('slow', function() {});}); is a good approximation. If we don't, we might end up with ΔC>0ΔC>0\Delta C > 0, which obviously would not be good! At the same time, we don't want ηη\eta to be too small, since that will make the changes ΔvΔv\Delta v tiny, and thus the gradient descent algorithm will work very slowly. In practical implementations, ηη\eta is often varied so that Equation (9)ΔC≈∇C⋅ΔvΔC≈∇C⋅Δv\begin{eqnarray} \Delta C \approx \nabla C \cdot \Delta v \nonumber\end{eqnarray}$('#margin_157848846275_reveal').click(function() {$('#margin_157848846275').toggle('slow', function() {});}); remains a good approximation, but the algorithm isn't too slow. We'll see later how this works. I've explained gradient descent when CCC is a function of just two variables. But, in fact, everything works just as well even when CCC is a function of many more variables. Suppose in particular that CCC is a function of mmm variables, v1,…,vmv1,…,vmv_1,\ldots,v_m. Then the change ΔCΔC\Delta C in CCC produced by a small change Δv=(Δv1,…,Δvm)TΔv=(Δv1,…,Δvm)T\Delta v = (\Delta v_1, \ldots, \Delta v_m)^T is ΔC≈∇C⋅Δv,(12)(12)ΔC≈∇C⋅Δv,\begin{eqnarray} \Delta C \approx \nabla C \cdot \Delta v, \tag{12}\end{eqnarray} where the gradient ∇C∇C\nabla C is the vector ∇C≡(∂C∂v1,…,∂C∂vm)T.(13)(13)∇C≡(∂C∂v1,…,∂C∂vm)T.\begin{eqnarray} \nabla C \equiv \left(\frac{\partial C}{\partial v_1}, \ldots, \frac{\partial C}{\partial v_m}\right)^T. \tag{13}\end{eqnarray} Just as for the two variable case, we can choose Δv=−η∇C,(14)(14)Δv=−η∇C,\begin{eqnarray} \Delta v = -\eta \nabla C, \tag{14}\end{eqnarray} and we're guaranteed that our (approximate) expression (12)ΔC≈∇C⋅ΔvΔC≈∇C⋅Δv\begin{eqnarray} \Delta C \approx \nabla C \cdot \Delta v \nonumber\end{eqnarray}$('#margin_869505431896_reveal').click(function() {$('#margin_869505431896').toggle('slow', function() {});}); for ΔCΔC\Delta C will be negative. This gives us a way of following the gradient to a minimum, even when CCC is a function of many variables, by repeatedly applying the update rule v→v′=v−η∇C.(15)(15)v→v′=v−η∇C.\begin{eqnarray} v \rightarrow v' = v-\eta \nabla C. \tag{15}\end{eqnarray} You can think of this update rule as defining the gradient descent algorithm. It gives us a way of repeatedly changing the position vvv in order to find a minimum of the function CCC. The rule doesn't always work - several things can go wrong and prevent gradient descent from finding the global minimum of CCC, a point we'll return to explore in later chapters. But, in practice gradient descent often works extremely well, and in neural networks we'll find that it's a powerful way of minimizing the cost function, and so helping the net learn.Indeed, there's even a sense in which gradient descent is the optimal strategy for searching for a minimum. Let's suppose that we're trying to make a move ΔvΔv\Delta v in position so as to decrease CCC as much as possible. This is equivalent to minimizing ΔC≈∇C⋅ΔvΔC≈∇C⋅Δv\Delta C \approx \nabla C \cdot \Delta v. We'll constrain the size of the move so that ∥Δv∥=ϵ‖Δv‖=ϵ\| \Delta v \| = \epsilon for some small fixed ϵ>0ϵ>0\epsilon > 0. In other words, we want a move that is a small step of a fixed size, and we're trying to find the movement direction which decreases CCC as much as possible. It can be proved that the choice of ΔvΔv\Delta v which minimizes ∇C⋅Δv∇C⋅Δv\nabla C \cdot \Delta v is Δv=−η∇CΔv=−η∇C\Delta v = - \eta \nabla C, where η=ϵ/∥∇C∥η=ϵ/‖∇C‖\eta = \epsilon / \|\nabla C\| is determined by the size constraint ∥Δv∥=ϵ‖Δv‖=ϵ\|\Delta v\| = \epsilon. So gradient descent can be viewed as a way of taking small steps in the direction which does the most to immediately decrease CCC.Exercises Prove the assertion of the last paragraph. Hint: If you're not already familiar with the Cauchy-Schwarz inequality, you may find it helpful to familiarize yourself with it. I explained gradient descent when CCC is a function of two variables, and when it's a function of more than two variables. What happens when CCC is a function of just one variable? Can you provide a geometric interpretation of what gradient descent is doing in the one-dimensional case? People have investigated many variations of gradient descent, including variations that more closely mimic a real physical ball. These ball-mimicking variations have some advantages, but also have a major disadvantage: it turns out to be necessary to compute second partial derivatives of CCC, and this can be quite costly. To see why it's costly, suppose we want to compute all the second partial derivatives ∂2C/∂vj∂vk∂2C/∂vj∂vk\partial^2 C/ \partial v_j \partial v_k. If there are a million such vjvjv_j variables then we'd need to compute something like a trillion (i.e., a million squared) second partial derivatives* *Actually, more like half a trillion, since ∂2C/∂vj∂vk=∂2C/∂vk∂vj∂2C/∂vj∂vk=∂2C/∂vk∂vj\partial^2 C/ \partial v_j \partial v_k = \partial^2 C/ \partial v_k \partial v_j. Still, you get the point.! That's going to be computationally costly. With that said, there are tricks for avoiding this kind of problem, and finding alternatives to gradient descent is an active area of investigation. But in this book we'll use gradient descent (and variations) as our main approach to learning in neural networks.How can we apply gradient descent to learn in a neural network? The idea is to use gradient descent to find the weights wkwkw_k and biases blblb_l which minimize the cost in Equation (6)C(w,b)≡12n∑x∥y(x)−a∥2C(w,b)≡12n∑x‖y(x)−a‖2\begin{eqnarray} C(w,b) \equiv \frac{1}{2n} \sum_x \| y(x) - a\|^2 \nonumber\end{eqnarray}$('#margin_1246306310_reveal').click(function() {$('#margin_1246306310').toggle('slow', function() {});});. To see how this works, let's restate the gradient descent update rule, with the weights and biases replacing the variables vjvjv_j. In other words, our "position" now has components wkwkw_k and blblb_l, and the gradient vector ∇C∇C\nabla C has corresponding components ∂C/∂wk∂C/∂wk\partial C / \partial w_k and ∂C/∂bl∂C/∂bl\partial C / \partial b_l. Writing out the gradient descent update rule in terms of components, we have wkbl→→w′k=wk−η∂C∂wkb′l=bl−η∂C∂bl.(16)(17)(16)wk→wk′=wk−η∂C∂wk(17)bl→bl′=bl−η∂C∂bl.\begin{eqnarray} w_k & \rightarrow & w_k' = w_k-\eta \frac{\partial C}{\partial w_k} \tag{16}\\ b_l & \rightarrow & b_l' = b_l-\eta \frac{\partial C}{\partial b_l}. \tag{17}\end{eqnarray} By repeatedly applying this update rule we can "roll down the hill", and hopefully find a minimum of the cost function. In other words, this is a rule which can be used to learn in a neural network.There are a number of challenges in applying the gradient descent rule. We'll look into those in depth in later chapters. But for now I just want to mention one problem. To understand what the problem is, let's look back at the quadratic cost in Equation (6)C(w,b)≡12n∑x∥y(x)−a∥2C(w,b)≡12n∑x‖y(x)−a‖2\begin{eqnarray} C(w,b) \equiv \frac{1}{2n} \sum_x \| y(x) - a\|^2 \nonumber\end{eqnarray}$('#margin_214093216664_reveal').click(function() {$('#margin_214093216664').toggle('slow', function() {});});. Notice that this cost function has the form C=1n∑xCxC=1n∑xCxC = \frac{1}{n} \sum_x C_x, that is, it's an average over costs Cx≡∥y(x)−a∥22Cx≡‖y(x)−a‖22C_x \equiv \frac{\|y(x)-a\|^2}{2} for individual training examples. In practice, to compute the gradient ∇C∇C\nabla C we need to compute the gradients ∇Cx∇Cx\nabla C_x separately for each training input, xxx, and then average them, ∇C=1n∑x∇Cx∇C=1n∑x∇Cx\nabla C = \frac{1}{n} \sum_x \nabla C_x. Unfortunately, when the number of training inputs is very large this can take a long time, and learning thus occurs slowly.An idea called stochastic gradient descent can be used to speed up learning. The idea is to estimate the gradient ∇C∇C\nabla C by computing ∇Cx∇Cx\nabla C_x for a small sample of randomly chosen training inputs. By averaging over this small sample it turns out that we can quickly get a good estimate of the true gradient ∇C∇C\nabla C, and this helps speed up gradient descent, and thus learning.To make these ideas more precise, stochastic gradient descent works by randomly picking out a small number mmm of randomly chosen training inputs. We'll label those random training inputs X1,X2,…,XmX1,X2,…,XmX_1, X_2, \ldots, X_m, and refer to them as a mini-batch. Provided the sample size mmm is large enough we expect that the average value of the ∇CXj∇CXj\nabla C_{X_j} will be roughly equal to the average over all ∇Cx∇Cx\nabla C_x, that is, ∑mj=1∇CXjm≈∑x∇Cxn=∇C,(18)(18)∑j=1m∇CXjm≈∑x∇Cxn=∇C,\begin{eqnarray} \frac{\sum_{j=1}^m \nabla C_{X_{j}}}{m} \approx \frac{\sum_x \nabla C_x}{n} = \nabla C, \tag{18}\end{eqnarray} where the second sum is over the entire set of training data. Swapping sides we get ∇C≈1m∑j=1m∇CXj,(19)(19)∇C≈1m∑j=1m∇CXj,\begin{eqnarray} \nabla C \approx \frac{1}{m} \sum_{j=1}^m \nabla C_{X_{j}}, \tag{19}\end{eqnarray} confirming that we can estimate the overall gradient by computing gradients just for the randomly chosen mini-batch. To connect this explicitly to learning in neural networks, suppose wkwkw_k and blblb_l denote the weights and biases in our neural network. Then stochastic gradient descent works by picking out a randomly chosen mini-batch of training inputs, and training with those, wkbl→→w′k=wk−ηm∑j∂CXj∂wkb′l=bl−ηm∑j∂CXj∂bl,(20)(21)(20)wk→wk′=wk−ηm∑j∂CXj∂wk(21)bl→bl′=bl−ηm∑j∂CXj∂bl,\begin{eqnarray} w_k & \rightarrow & w_k' = w_k-\frac{\eta}{m} \sum_j \frac{\partial C_{X_j}}{\partial w_k} \tag{20}\\ b_l & \rightarrow & b_l' = b_l-\frac{\eta}{m} \sum_j \frac{\partial C_{X_j}}{\partial b_l}, \tag{21}\end{eqnarray} where the sums are over all the training examples XjXjX_j in the current mini-batch. Then we pick out another randomly chosen mini-batch and train with those. And so on, until we've exhausted the training inputs, which is said to complete an epoch of training. At that point we start over with a new training epoch.Incidentally, it's worth noting that conventions vary about scaling of the cost function and of mini-batch updates to the weights and biases. In Equation (6)C(w,b)≡12n∑x∥y(x)−a∥2C(w,b)≡12n∑x‖y(x)−a‖2\begin{eqnarray} C(w,b) \equiv \frac{1}{2n} \sum_x \| y(x) - a\|^2 \nonumber\end{eqnarray}$('#margin_85851492824_reveal').click(function() {$('#margin_85851492824').toggle('slow', function() {});}); we scaled the overall cost function by a factor 1n1n\frac{1}{n}. People sometimes omit the 1n1n\frac{1}{n}, summing over the costs of individual training examples instead of averaging. This is particularly useful when the total number of training examples isn't known in advance. This can occur if more training data is being generated in real time, for instance. And, in a similar way, the mini-batch update rules (20)wk→w′k=wk−ηm∑j∂CXj∂wkwk→wk′=wk−ηm∑j∂CXj∂wk\begin{eqnarray} w_k & \rightarrow & w_k' = w_k-\frac{\eta}{m} \sum_j \frac{\partial C_{X_j}}{\partial w_k} \nonumber\end{eqnarray}$('#margin_801900730537_reveal').click(function() {$('#margin_801900730537').toggle('slow', function() {});}); and (21)bl→b′l=bl−ηm∑j∂CXj∂blbl→bl′=bl−ηm∑j∂CXj∂bl\begin{eqnarray} b_l & \rightarrow & b_l' = b_l-\frac{\eta}{m} \sum_j \frac{\partial C_{X_j}}{\partial b_l} \nonumber\end{eqnarray}$('#margin_985072620111_reveal').click(function() {$('#margin_985072620111').toggle('slow', function() {});}); sometimes omit the 1m1m\frac{1}{m} term out the front of the sums. Conceptually this makes little difference, since it's equivalent to rescaling the learning rate ηη\eta. But when doing detailed comparisons of different work it's worth watching out for.We can think of stochastic gradient descent as being like political polling: it's much easier to sample a small mini-batch than it is to apply gradient descent to the full batch, just as carrying out a poll is easier than running a full election. For example, if we have a training set of size n=60,000n=60,000n = 60,000, as in MNIST, and choose a mini-batch size of (say) m=10m=10m = 10, this means we'll get a factor of 6,0006,0006,000 speedup in estimating the gradient! Of course, the estimate won't be perfect - there will be statistical fluctuations - but it doesn't need to be perfect: all we really care about is moving in a general direction that will help decrease CCC, and that means we don't need an exact computation of the gradient. In practice, stochastic gradient descent is a commonly used and powerful technique for learning in neural networks, and it's the basis for most of the learning techniques we'll develop in this book.Exercise An extreme version of gradient descent is to use a mini-batch size of just 1. That is, given a training input, xxx, we update our weights and biases according to the rules wk→w′k=wk−η∂Cx/∂wkwk→wk′=wk−η∂Cx/∂wkw_k \rightarrow w_k' = w_k - \eta \partial C_x / \partial w_k and bl→b′l=bl−η∂Cx/∂blbl→bl′=bl−η∂Cx/∂blb_l \rightarrow b_l' = b_l - \eta \partial C_x / \partial b_l. Then we choose another training input, and update the weights and biases again. And so on, repeatedly. This procedure is known as online, on-line, or incremental learning. In online learning, a neural network learns from just one training input at a time (just as human beings do). Name one advantage and one disadvantage of online learning, compared to stochastic gradient descent with a mini-batch size of, say, 202020. Let me conclude this section by discussing a point that sometimes bugs people new to gradient descent. In neural networks the cost CCC is, of course, a function of many variables - all the weights and biases - and so in some sense defines a surface in a very high-dimensional space. Some people get hung up thinking: "Hey, I have to be able to visualize all these extra dimensions". And they may start to worry: "I can't think in four dimensions, let alone five (or five million)". Is there some special ability they're missing, some ability that "real" supermathematicians have? Of course, the answer is no. Even most professional mathematicians can't visualize four dimensions especially well, if at all. The trick they use, instead, is to develop other ways of representing what's going on. That's exactly what we did above: we used an algebraic (rather than visual) representation of ΔCΔC\Delta C to figure out how to move so as to decrease CCC. People who are good at thinking in high dimensions have a mental library containing many different techniques along these lines; our algebraic trick is just one example. Those techniques may not have the simplicity we're accustomed to when visualizing three dimensions, but once you build up a library of such techniques, you can get pretty good at thinking in high dimensions. I won't go into more detail here, but if you're interested then you may enjoy reading this discussion of some of the techniques professional mathematicians use to think in high dimensions. While some of the techniques discussed are quite complex, much of the best content is intuitive and accessible, and could be mastered by anyone. Implementing our network to classify digitsAlright, let's write a program that learns how to recognize handwritten digits, using stochastic gradient descent and the MNIST training data. We'll do this with a short Python (2.7) program, just 74 lines of code! The first thing we need is to get the MNIST data. If you're a git user then you can obtain the data by cloning the code repository for this book,git clone https://github.com/mnielsen/neural-networks-and-deep-learning.git If you don't use git then you can download the data and code here.Incidentally, when I described the MNIST data earlier, I said it was


    1. A wireless keyboard available in a classroom can work just as well [as a smartboard], as can simply having people talk to each other and write down the upshot of their conversation.

      This is reminding me of the trichotomy theory (I think that's what they called it?) of something being neither greater than nor less than nor equal to--which strikes me as an appropriate description of the three different technologies mentioned in this sentence (the last being a pencil and paper summary).

    1. 902

      the 902 stamps are removed. maybe they were stored in the backfile before the tag was added? we will need to adapt the import profile to keep this.

    1. Cite error: Invalid <ref> tag; no text was provided for refs named Lensu17 ↑ Cite error: Invalid <ref> tag; no text was provided for refs named Lensu2011 ↑ Cite error: Invalid <ref> tag; no text was provided for refs named Kattainen01 ↑ Miettinen, HM; Sorvari, R; Alaluusua, S; Murtomaa, M; Tuukkanen, J; Viluksela, M (June 2006). "The effect of perinatal TCDD exposure on caries susceptibility in rats.". Toxicological sciences : an official journal of the Society of Toxicology 91 (2): 568-75. doi:10.1093/toxsci/kfj158. PMID 16543294.

      ref errors

    1. More Copy link to Tweet Embed Tweet The FBI said it has stopped using the "Black Identity Extremist" tag and acknowledged that white supremacist violence is the biggest terrorist threat this country faces.

      AMEN!! About time somebody mentions this!

  2. Oct 2019
    1. mapping (bytes32 => uint256) public tag; // cage price [ray]

      Cage price / price per collateral type at time of settlement.

    2. function cage(bytes32 ilk) external note {

      Tags the Ilk prices / sets the settlement price for an ilk (tag).

    1. you

      Wow! Just wow! Is all I can say. If you do not understand this poem, or nothing about it resonates with you, allow me to translate. Imagine: it’s 3am, and you hear the loud crash of someone who has broken into your home. There is more than one of them, they are coming through doors, windows, sky-lights. They are bearing guns, batons, stones, chains. They snatch you and your family out of bed and force you outside. You look up and down your street and see all your neighbors outside, screaming, crying. Children in one section, mothers in another, and fathers in another. They separate you and your family in the same fashion. They force the fathers to watch as they violate their wives and daughters. They say to sons, “Your father cannot protect you, he can’t protect your women, he can’t even protect himself!” Then, when they are done violating, killing, and battering the many people in your neighborhood, when they feel they have “made their point”—whatever point that is—they ship you all off in separate directions. They don’t care where you go, or what happens. They don’t care if you die, but you are worth more alive—so they won’t kill ALL of you. That’s bad for business. They take you to a new place, where you do not know anyone. You are a million miles from home, from family. You do not understand this new tongue. You are stripped naked and inspected (from head to toe) by other human beings, just like you, but a different shade. And you are given a price tag. You are bought and sold and bought again. You are forced to work tirelessly, until you physically can’t—you are at their service, whatever service they require. Where is your voice? Imagine, after all this, after a few decades, they release you in the middle of nowhere. They release you into a society that is not your own, and tell you to go back to where you came from, as if that were even an option. That place is not the same home you left. The people are different, your home is not there, you wouldn’t even know where to go if you could. You have been uprooted, assaulted, shipped off, sold, insulted, degraded, and nearly-broken. Are you still with me? All this, and those who did this to you, play the victim—as if it hurt them to starve, rape, whip, work, shoot...I mean lynch your families. They play stupid, like they don’t know what you’re talking about, like what you went through, was simply a bad dream, or a horror film played for Halloween. Now, imagine those few decades were 400 years. How would you feel?

    1. Notice that the Child component class has no state field. The component object created by the custom JSX tag has values of the JSX tag properties. If you define a state field in the component, your state field overwrites the JSX properties.

      no state field

    1. The FBI said it has stopped using the "Black Identity Extremist" tag and acknowledged that white supremacist violence is the biggest terrorist threat this country faces. https://trib.al/OepGw2S

      I love the way she looks because it's like she saying..."wow what took you sooooo long." The Root uses data report from California State University, San Bernardino. The 137-page research helps support the allegations that White Supremacy has been a threat to national security for many years.



    1. it may be possible to tag specific organisms and use these as monitor systems to estimate local chemical composition directly in the biofilms
    1. Create a Git repository for every new project. Learn more about what a Git repo is in this beginner Learning Git with GitKraken tutorial. Always create a new branch for every new feature and bug. Regularly commit and push changes to the remote branch to avoid loss of work. Include a gitignore file in your project to avoid unwanted files being committed. Always commit changes with a concise and useful commit message.  Utilize git-submodule for large projects. Keep your branch up to date with development branches. Follow a workflow like Gitflow. There are many workflows available, so choose the one that best suits your needs. Always create a pull request for merging changes from one branch to another. Learn more about what a pull request is and how to create them in this intermediate Learning Git with GitKraken tutorial. Always create one pull request addressing one issue. Always review your code once by yourself before creating a pull request. Have more than one person review a pull request. It’s not necessary, but is a best practice. Enforce standards by using pull request templates and adding continuous integrations. Learn more about enhancing the pull request process with templates.  Merge changes from the release branch to master after each release. Tag the master sources after every release. Delete branches if a feature or bug fix is merged to its intended branches and the branch is no longer required. Automate general workflow checks using Git hooks. Learn more about how to trigger Git hooks in this intermediate Learning Git with GitKraken tutorial. Include read/write permission access control to repositories to prevent unauthorized access. Add protection for special branches like master and development to safeguard against accidental deletion.

      Git Dos

  3. Sep 2019
    1. use the REPOSITORY:TAG combination rather than IMAGE ID

      Error response from daemon: conflict: unable to delete c565603bc87f (cannot be forced) - image has dependent child images

      I really feel like this should be the accepted answer here but it does depend on the root cause of the problem. When you create a tag it creates a dependency and thus you have to delete the tag and the image in that order. If you delete the image by using the tag rather than the id then you are effectively doing just that.

    1. 5) The selective function. This refers not to human choice at all but to Darwin's theory of natural selection as applied to what he called "the favored races." In short, the idea is to help things along by consciously attempting to improve the breeding stock. Schools are meant to tag the unfit - with poor grades, remedial placement, and other punishments - clearly enough that their peers will accept them as inferior and effectively bar them from the reproductive sweepstakes. That's what all those little humiliations from first grade onward were intended to do: wash the dirt down the drain.


    1. The FBI said it has stopped using the "Black Identity Extremist" tag and acknowledged that white supremacist violence is the biggest terrorist threat this country faces.

      The term was broadened to, "racially motivated violent extremism," which could still permit abuse. But, at least it is a start.

      Here is link to a WSJ article about the issue: https://www.wsj.com/articles/fbi-abandons-use-of-terms-black-identity-extremism-11563921355

      -M. Lewis

    1. "Occupation: At Home."a Noone claimed that she was "Poet" or "Writer."

      How completely infuriating, and disrespectful to the legacy of one of the world's most brilliant writers. It's incredible how badly society wants to hold on to this image of Dickinson as an outcast / recluse instead of recognizing her brilliance; the fact that her sister in law had to tag on some information about Dickinson's achievements and intellect (almost as an afterthought) is really unfortunate.

    1. Agregát popisuje slovní tvar, který zastupuje dva nebo více slovních tvarů (složek agregátu) a většinou mu není možné přiřadit jednoduše slovní druh.

      Domníváme se se Sašou, že by bylo žádoucí definovat v rámci NovaMorf u agregátů i (možné) dělení ortografických slov na syntaktická slova. Ortografické slovo by zůstalo jako "hlavní", ale jasně se řekne, že pro účely syntaktického zpracování lze agregát rozdělit na syntaktická slova, odpovídající níže v kapitole uvedeným kombinacím lemma+tag: naň: na|ň očs: o|č|s dělals: dělal|s abyste: a|byste kdybychom: kdy|bychom. U "abyste", "kdybychom" je to poněkud nepřirozené, ale asi nejlepší řešení. Nežádáme "interpretované tokeny" jako v UD.

    1. show only
      1. The typeface 'Avenir Roman' should be used.
      2. The premium tag should be right aligned to the container below.
      3. The premium tag's rectangle should be of dimensions 81x22px.
      4. The check box used is not the same one.
    1. we injected Cre recombinase–inducible adeno-associated viruses (AAV) expressing the optogenetic channelrhodopsin-like ChIEF fused with a tdTomato reporter [AAVdj-CAG-DIO-ChIEF-tdTomato (driven by the CAG promoter) (10, 11)] bilaterally into the rostral ZI of vesicular GABA transporter (VGAT)–Cre mice that express Cre recombinase in GABA neurons

      To target a neuron population of interest, e.g. those that express GABA, scientists use genetically modified viruses (AAVs) to deliver proteins into the brain (such as optogenetic tools).

      This is achieved by using two tools: 1) a mouse line that expressed the enzyme Cre recombinase in a specific population of neurons (e.g. those that express the GABA transporter VGAT) and 2) an AAV that expresses an optogenetic protein only in the presence of Cre. The AAV is injected into the brain region of interest in the Cre mice. This AAV has a tdTomato tag which allows the injection site to be visualized under a fluorescent microscope.

      For further information on these tools see how mice optogenetics are used this video.

      The ZI in both hemispheres of the brain was injected with the AAV (bilaterally), with the region lying towards the front of the brain (rostral) being targeted. The optogenetic tool used (ChIEF) activates neurons when blue light is shone on the cells.

    1. Recently I was thrilled to learn that the web platform offers such an affordance in the form of the <details> tag.
    1. The Task Annotation Project in Science (TAPS) provides K-12 educators with annotated assessment tasks, aligned to the Next Generation Science Standards, that help guide teachers in more equitably monitoring their students’ learning.37 Osmosis is a repository of open educational resources (OER) created to crowdsource the future of medical education.38 Undergraduate and graduate medical students have access to thousands of digital resources, and they have also used annotation - through comments, feedback forms, and ratings - to improve the quality of these learning materials.39 The National Science Digital Library (NSDL), created in 2000, is an archive of open access teaching and learning resources for learners of all ages across science, technology, engineering, and mathematics disciplines.40 Annotation has been used to tag the NSDL’s resources and improve information accessibility, support student interaction with multimedia content through a digital notebook, and educators have annotated NSDL resources to design online learning activities for their students.41 And research about the digital annotation tool Perusall.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.5) !important; }3Troy Hicks, Nate Angell, Jeremy Dean, often used in conjunction with science textbooks, has shown that college students’ pre-reading and annotation practices can subsequently improve exam performance.
    1. The Path-Goal model emphasizes the importance of the leader’s ability to interpret follower’s needs accurately and to respond flexibly to the requirements of a situation.


      I had hard time as well. Here is the link. Hope it works.


      Free online word cloud generator and tag cloud creator. (2019). Retrieved 5 September 2019, from https://www.wordclouds.com/

      Russell, R., & Gregory Stone, A. (2002). A review of servant leadership attributes: developing a practical model. Leadership & Organization Development Journal, 23(3), 145-157. doi: 10.1108/01437730210424

    1. an iPhone was out of reach for even the richest man on earth.

      I'm not sure I get this. The first IPhone was released with a price tag of $499.00, which, with inflation calculated in, was $617.48. The richest man in 2007 was Bill Gates, who had $62.29 billion. He could literally buy over 100 million IPhones.

    1. A utility function to safely escape JSON for embedding in a <script> tag function safeStringify(obj)
    1. September 1939 aus Wehrmacht-Perspektive Deutsche Soldaten fotografierten vom ersten Tag an, wie sie den Beginn des Zweiten Weltkriegs erlebten - und welches Leid sie über ihr Nachbarland brachten. Das Berliner "Haus der Wannsee-Konferenz" zeigt auf seiner Website Fotos des Hauptmanns Kurt Seeliger. Seine Bilder und die anderer Soldaten aus dem Projekt "Stumme Zeugnisse 1939" lassen erahnen, wie stark der Vernichtungswille oft war.

      Drei Männer stehen vor Häuser. Die Häuser sind verbrannt weil die Zweiter Weltkrieg begann hat. Meiner Meinung nach ist es sehr interessant eine Foto von 1939 zu sehen.

  4. Aug 2019
    1. bough    .

      Sonnenizio does have a capping couplet

    2. 1913

      An italian sonnet has one octave with the rhyme scheme abba abba with a sestet with the rhyme scheme of the poets choice

    3. ; Petals

      The poem is so short that there isn't really a true Volta (Turn of phrase), but if there was one, it would be between the two lines

    4. crowd

      This poem has a rhyme scheme, crowd and bough

    5. The

      100% not a sonnenizio

    6. crowd

      An English Sonnet/Elizabethan Sonnet is a type of sonnet with 3 quatrains and 1 couplet (usually a capping couplet). The rhyme scheme is abab cdcd efef gg

    7. Brief Poems by Ezra Pound

      Here is small collection. Thanks DP!

    1. No, I’m not very good in school. This is my second year in the seventh grade, and I’m bigger and taller than the other kids. They like me all right, though, even if I don’t say much in the classroom, because outside I can tell them how to do a lot of things. They tag me around and that sort of makes up for what goes on in school.

      This definitely relates to today's situation in American education. He has many different interests and talents, but none of them are being put to use because of the curriculum of schools. The subjects in school relate to his situation at home in some cases. But the teachers only have one way to teach the students, and that's why students' who can't learn that way become bored, and lose desire to learn.

    1. Self-driving cars would not merely represent a private luxury

      But it would be though. Currently, having autonomous driving abilities in your car, adds about $100,000 to the price tag, according to Esurance. Semi-autonomous features would add between $5,000-$10,000 to the base vehicle cost. Where there is a way to profit off something, it'll be made more and more expensive, with fully autonomous vehicles not being available to the general public, due to the cost of producing self-driving cars.

    1. Another field where XSL is expected to beimportant is print XML data with layoutand output it to PDF.For example, when we convert form datainto XML, it is natural to input data onbrowser form and specify the data format isXML, output to paper by specifying XMLformat. Therefore, XML is a tag mappedform, it is necessary for men to understandthe contents and confirm XML to print bygiving format type, or to output digitalizedpaper such as PDF.


    1. 15 Mutace

      Rozumím-li tomu správně, nemohou být mutace (na rozdíl od všech výše uvedených gram. kategorií) pojaty jako jeden znak v pozičním tagsetu. Bylo by asi vhodné popsat, jak se s tím v tagsetu vyrovnat, zda navrhujete, aby byly v pozičním tagsetu hodnoty mutací uvedeny samostatně mimo tag, nebo máte nějaké jiné řešení...

    1. by f=fR(1+vc1−vc)

      to fix, put \left tag before opening parenthesis and \right before closing parenthesis $latex f={f}_{R}\left(\frac{1+\frac{v}{c}}{1-\frac{v}{c}}\right) $

    1. Update the P and I values using the values recorded on the blue tag that shipped with the module.

      I don't think this information is on the blue tag but I could be wrong.

  5. Jul 2019
    1. Anyone in the Slack group interested in sharing comments directly on this page via this hypothesis tool? We can use the tag "r4ds_slackers"

    1. Your teacher may require you to use tags for a variety of reasons.

      Tags would also be a great way to practice metacognition! One technique would be to have a list of categories you use when annotating (such as "Evidence", "Central Idea", and "Vocab") and apply one of these as a tag to each annotation.

      The goal is to go beyond analyzing the text and identify why you're analyzing that part of the text. You can read more about metacognition here.

    1. locate the blue tag that shipped with the replacement centrifuge

      Perhaps this would be better to state after removing the new centrifuge from the box?

    1. I was viewed, initially, as a tag-along in this process, not an equal partner.

      Pregnant mother = two lives, partner = one lift, no partner can ever expect to be “equal”

    1. In Python, every object has a unique identification tag. Likewise, there is a built-in function that can be called on any object to return its unique id. The function is appropriately called id and takes a single parameter, the object that you are interested in knowing about. You can see in the example below that a real id is usually a very large integer value (corresponding to an address in memory).
    1. ear again. Sometimes he would come up unexpectedly on the opposite side of me, having apparently passed directly und

      It's as if they were playing a game of tag, and Thoreau was horribly under qualified.

    1. Highlights are private, visible only to you when you’re logged in (even if they are made in the Public layer).

      Adding a tag or note appears to convert a highlight to a full-fledged annotation in that you may then choose Post to Public (or to group visibility). Without these, Post to Only Me is the only available option for bare highlights.

    1. It is very important to keep the differences in mind between head, header element, and heading element.

      • The head goes at the very top of the file and is not visible on the page. It is the section where the title tag as well as external files such as CSS files go.

      • The header element is used at the top but inside the body element. It is visible on the page. It is most often used once but some pages include separate headers.

      • The heading element is used to create titles that identify the content or purpose of a section of a page like h1, h2, and so on.
    1. By Product

      if your header is in other non-western language, you need to set id tag. For example:


      should be


      This is because tab's hyperlink can not take Chinese characters, you have to specify its non-Chinese link name yourself.

    1. , and LinkedIn.

      I see that it's crossed out. Does it mean that you're not on LinkedIn anymore? How do I tag you?

    1. A class examining and exploring a series of online texts (websites, blogs, wikis) to aid in comprehension and synthesis. Hypothesis allows for a common tag to organize readings across a group.

      Sounds exciting! Looking forward to seeing my classmates annotations!

    1. Nikkei Asian Review

      Hello, I am the Citerpress bot :) I think this sentence is mentioning a news article without an explicit link. I looked in my news database and here is what I found:

      Hit #1 (score of 39.0)

      Hit #2 (score of 37.9)

      Hit #3 (score of 37.2)

      Hit #4 (score of 34.3)

      I did my best! My annotations will get better and better with time, as I index new pages every day.

    1. Manywho cannot afford half the items listed above include themselves in thatdefinition. Also included are most of the nation’s millionaires,who consider themselves middle or upper-middle class despite their obviouslyoutsized income11and their small representation (3.3%) among the country’s overall population.
  6. Jun 2019
    1. Aufwühlende Zeiten sind das für Annegret Kramp-Karrenbauer. Als die auch AKK genannte CDU-Bundesvorsitzende am Montagabend beim Ständehaustreff, einer Live-Talkshow der „Rheinischen Post“, in Düsseldorf auf die Bühne kommt, hat sie einen langen Tag mit Gremiensitzungen in Berlin mit vielen wichtigen Klärungen hinter sich.

      "These are disturbing times for Annagret Kramp-Karrenbauer. On monday evening, after a long day of committee meetings in Berlin with important clarifications, she sat down on Ständehaustreff, a live talk show hosted by the Rhenish Post in Berlin.

    1. It is “undermining behaviour from managers” that is forcing women out of the tech industry.

    2. Noble describes entering the term “beautiful,” and shows a screen of pictures of white people. She entered “ugly”, and the results were a racial mix.

    3. She search for “three black teenagers” in 2010, and getting mug shots as the result. Then searched “black girls” in that same year brought the viewer to porn sites.

    4. Noble focuses on degrading stereotypes of women of African descent as a prime example of these prejudices, which translate to overt racism.

    1. The FBI said it has stopped using the "Black Identity Extremist" tag and acknowledged that white supremacist violence is the biggest terrorist threat this country faces.

      I am not surprised when the article stated, that white supremacist violence is the biggest terrorist threat that this country faces.

    1. Hi everyone! Thanks for opening the PDF of the evidence for the case study! My hope is that everyone will be able to tag/highlight between 3-5 important pieces of evidence using hypothesis. I know that 185 pages is a lot to go through, so I highlighted a couple of important places for you all to look at for evidence to make it easier for you.

      While looking at the evidence, please look for the following information:

      -Highlight points where Conrad Roy III shows warning signs, risk factors, or protective factors. -Highlight points where Michelle Carter attempted to help Conrad Roy. -Highlight points where you would have said or done something differently than Michelle Carter did.


    1. tag or mention district schools or communities.

      I think that this is something important to highlight as well. There are a lot of people that are looking at a district from online. If people are posting wrong or offensive things about the district, this could create potential problems for people who are looking at the district. I think the true image of the district should be able to be reflected, and not false allegations or posts from kids on social media. Therefore, monitoring the public posts per the districts own well being is important.

    1. Si les réponses ne peuvent être données et rédigées en live j'ai pour habitude de créer et placer un tag "Attente spécifications" sur les US incriminées. Une seconde relecture permettra de l'en enlever une fois tous les doutes levés.

      Il est également possible d'utiliser le statut "Blocked" plus visible (pastille rouge)

    1. sa vision irréaliste de la synthèse des savoirs n’a jamais pu suivre le rythme des progrès de plus en plus rapides des sciences

      Beaucoup croient en Sciences de l'Information et de la Communication en cette "vision irréaliste" ;-) Voir par exemple le récent projet Hyper Otlet : https://hyperotlet.hypotheses.org/tag/otlet

    1. Thus, you could use a spreadsheet program to create bar charts of the count of documents with an ‘architecture’ tag or a ‘union history’ tag, or ‘children’, ‘women’, ‘agriculture,’ etc. We might wonder how plaques concerned with ‘children’, ‘women’, ‘agriculture’, ‘industry’, etc might be grouped, so we could use Overview’s search function to identify these plaques by search for a word or phrase, and applying that word or phrase as a tag to everything that is found. One could then visually explore the way various tags correspond with particular folders of similar documents.[5]
    1. hypothes.is has developed some prototypes that allow Hypothesis to tag with controlled vocabularies and ontologies.

  7. May 2019
    1. Then we accept UDP traffic if the value of the udpserver tag is 1 when both sender and receiver tags are ORed together, or if UDP traffic is multicast. This allows multicast mDNS and Netbios announcements and allows UDP traffic to and from UDP servers, but prohibits other horizontal UDP traffic.

      is this the statement indicating MDNS support in ZeroTier?

    1. We hypothesized that the goodperformance in the \exec" condition, but not in the \tag" con-dition,

      Tag condition did better though?

    1. Go forth and annotate! Enable the sidebar via the button in the location bar.

      some tag

    1. If innerHTML inserts a <script> tag into the document – it doesn’t execute. It becomes a part of HTML, just as a script that has already run.

      key point!

    1. computerized tools which automatically tag,index, transfer, and navigate through that information

      second inversion

    1. Gracechurch Street

      Gracechurch Street is a main road in London and is historic for its shops and restaurants and serves as an entrance to Londons Leadenhall Market which has been open since the 14th century. https://janeaustensworld.wordpress.com/tag/gracechurch-street/

    1. Menu items are links <a>, not buttons. There are several benefits, for instance: Many people like to use “right click” – “open in a new window”. If we use <button> or <span>, that doesn’t work. Search engines follow <a href="..."> links while indexing.

      key point on why use 'a' tag instead of buttons for navigation purposes

    1. They’ve learned, and that’s more dangerous than caring, because that means they’re rationally pricing these harms. The day that 20% of consumers put a price tag on privacy, freemium is over and privacy is back.

      Google want you to say yes, not because they're inviting positivity more than ever, but because they want you to purchase things and make them richer. This is the essence of capitalism.

    1. Suppose college tuition was free and every first-year had a guaranteed job lined up for after graduation. This parallel universe does exist at military-service academies—and at West Point, Annapolis, and Colorado Springs, humanities majors are at about the same level as they were in 2008.

      Huge clue re: precarity's effect. Also a bit of a portent in the possible effect of a Universal Basic Income. Which has to now become a new tag I use...

    1. yes there are many threads

      Yes. Every time one of these accounts appears, I tag it as a throwaway account. I am doing that will all immediately registered accounts in those threads, no matter who they look like. This allows later analysis, because, without this, the accounts become [deleted] and cannot then be correlated. Interesting patterns appear when this is done. Sometimes it becomes possible to correlate accounts across multiple platforms.

    1. vector, under the phage T7 promoter, in BL21 (DE3) cells, and under the T5 phage promoter, in the pQE30 vector for expression in SG13009[pREP4] and M15[pREP4] cell strains. For cloning in pRSET B, the full length bZP3 initially subcloned in the pBacPAK8 vector at the Kpn I and Sac I sites was released after digestion with Kpn I and EcoR I and cloned in a similarly restricted pRSETB vector inframe with an N-terminal His6 tag. For cloning in the pQE30 vector, the pBacPAK8 carrying the full length bZP3 was initially digested with Not I, filled in with Klenow and then digested with Kpn I. The purified bZP3 fragment was then cloned in the vector digested with Kpn I and Sma I in frame with an N-terminal His6 tag. Though transformants positive for the bZP3 insert in the right reading frame were recovered, no expression could be detected by SDS-PAGE or immunoblots in either case. An alternate strategy was then devised in which an internal fragment of the gene, excluding the signal sequence and the transmembrane-like domain, following the putative furin cleavage site, was amplified by PCR using the forward primer 5'-CGGGATCCCAACCCTTCTGGCTCTTG-3' incorporating a BamH I site and the reverse primer 5'-CCGAGCTCAGAAGCAGACCTGGACCA-3' incorporating a Sac I site. The PCR was done in a 50 J!l volume using 50 pM of each primer and Vent polymerase for extension. The pBluescript-bZP3 (1 0 ng) having a full length bZP3 insert was used as the template and was initially denatured at 95°C for 10 min. Amplification was carried out for 35 cycles of denaturation at 95°C for 2 min, primer annealing at 600C for 2 min and extension at 72°C for 3 min followed by a final extension at 72oc .for 15 min. The amplified bZP3 fragment was digested with BamH I and Sac I and cloned in frame downstream of a His6 tag under the T5 promoter-lac operator control in the pQE30 vector. The authenticity of the construct was confirmed by N-terminal sequencing using an upstream sequencing primer GGCGT ATCACGAGGCCCTTTCG.
    1. carriage was sent to meet them at — , and they were to return in it by dinner-time.

      Hackneys, or public carriages for hire, made their first significant appearance in the early 17th century. By 1694, this method of transportation was very popular so the Hackney Coach Commission was established in London.


    2. dressing-room

      A room primarily used during one's morning routine for dressing and washing. A woman's dressing room was made to be private and comfortable, and the intimacy of these small places allowed women to entertain small parties of other female guests. The wealthier the woman, the more luxurious her dressing room.


    1. argP+, argPd-S94L, argPd-P108S, argPd-P274Sfragment downstream of the phage T7-promoter, such that the encoded proteins beara C-terminal His6-tag provided by the vector DNA sequence. Theresultant plasmid was transformed into strain BL21(DE3) which has the T7 RNA Polymerase under the isopropyl thio-β-D-galactoside (IPTG) inducible lacUV5promoter.The resultant strains were grownin LB (500-1000 ml) to an A600of around 0.6and were then induced with 1 mM IPTG and harvested after 4-hrs of induction.Bacterial cells were recovered by centrifugation, resuspended in 20 ml of lysis buffer(20 mM Tris-Cl, pH-8; 300 mM NaCl; 10 mM DTT and 10 mM imidazole) containing20 μg/ml lysozyme, and lysed by sonication with 30-sec pulses for 10-min. Theprotocol for His6-ArgP(ArgPds)protein purification involved (i) passing the lysate through a 5ml Ni-NTA (Qiagen) chromatographic columnequilibrated with lysis buffer, (ii) washing thecolumn with 100 ml of washing buffer (20 mM Tris-Cl, pH-8; 300 mM NaCl; 10 mMDTT; 30 mM imidazole), and (iii) elution of His6-ArgP(ArgPds)from the column with elutionbuffer (20 mM Tris-Cl, pH-8;300 mM NaCl; 10 mM DTT and 250 mM imidazole) andcollection of 1.5 ml eluate fractions (10 fractions). The fractions were tested forprotein by Bradford method and the protein-carrying fractions (generally tubes 2 to 5)were pooled and dialysed in a 1:200 volume ratio against 20 mM Tris-Cl, pH-8 with 10mM DTT, 300 mMNaCl for 5 hrs followedby a change to buffer of composition 20 mM Tris-Cl, pH-8 with 10 mM DTT, 300 mM NaCl and 40% glycerol for 24 hrs. The proteins were concentrated by centrifugation toaround 1 mg/ml by using Amicon filter (pore size 10-KDa) and stored at −20ºC or −70ºC
    1. The TEI's adoption as a model in digital library projects raised some interesting issues about the whole philosophy of the TEI, which had been designed mostly by scholars who wanted to be as flexible as possible. Any TEI tag can be redefined and tags can be added where appropriate

      Question - What were some of the issues that arose with the TEI's adoption as a model in digital library projects?

      The ability to add tags where needed throughout the texts seems to be a positive aspect due to the added ease of searching key words and it being included in said search.

    1. After 2-3 h exposure,phosphorimagerscreenwas scannedon Fuji FLA-9000 to acquire hybridization images. Next,signal intensity for each spot on the membrane for both input and outputsampleswas quantifiedusing Fuji Multi Gauge V3.0 software andpercentage intensity foreach spot relative tothe whole signal intensity ofthe membranewas determined.To identify mutants with altered survival profiles,ratio of output (Op) to input (Ip) signal for each spot (oligonucleotide tag)present on the membranewas calculated.Mutantsdisplaying at least 6-fold higher and 10-fold lower survival were selectedas “up’ (Op/Ip= 6.0) and ‘down’ (Op/Ip = 0.1) mutants, respectively
    2. 24 h post infection, THP-1 macrophages were washed thrice with PBS, lysed in water and recovered yeast cells were used to infect THP-1 cells at a MOIof 1:10. Three rounds of macrophage infection foreach mutant pool were carried out to enrich for the desired mutants in the final population. The lysate of 3rdround infection was inoculated in YPD medium for overnight (output). Cells were harvested, genomic DNA isolated from each input and output cell pellet andunique signature tags were PCR-amplified with P32-labeledα-dCTP using primers complementary to theinvariant region flanking each unique tag sequence. LabeledPCR products were denatured at 95°C for 10 min, chilled on ice and were hybridized tonylon membranescarrying immobilized plasmid DNA containing 96 unique tagsfor 14-16 h at 42°C.Membranes were washed twicewith 0.1X SSC bufferand exposed to phosphorimager screen for 2-4 h. Radioactive counts for each spot were quantified using Image Quant and Fuji Multi Gauge V3.0 software. Relative percentage intensity for individual spot was calculated with respect to allspots present oneach hybridizedmembrane
    3. YPD-grown cultures (0.05 OD600) of each mutant pool (96 mutants, each carrying a unique signature tag) were either inoculated in YPD medium for overnight (input) or used to infect differentiated THP-1 cells (1X106). After 2 h incubation, non-cell-associated yeastcellswere removed by washing THP-1 cellsthricewith PBS. At
    1. Afteractivating Hypothes.is in the browser, a learner can annotate any piece of text on a webpage with her own ideas, to whichother learners could respond (Principle 1). Tags can be attached to a Hypothes.is annotation, enabling the aggregation of webannotations that are scattered across webpages through a tag (Principle 2). Because Hypothes.is adheres to the Open AnnotationModel, annotations are uniquely identifiable and retrievable, allowing portability of ideas not widely supported in threadeddiscussion forums (Principle 2). The search functionality of Hypothes.is offers means to aggregate annotations according to avariety of criteria such as by tags, users, user groups, and annotated web URLs. Users are thus able to enter the discourse notonly from a specific webpage (e.g., the textbook), but also from the search page that provides another view of the discourse(Principle 3)

      Alignment statement for Hypo

    1. The following antibodies were used in the present study:Primary antibodies against GAPDH (anti-rabbit), FLAG (anti-mouse), Immunoglobulin (IgG, anti-rabbit or anti-mouse),profilin-1 (anti-rabbit), tubulin (anti-mouse) and ubiquitin (anti-rabbit) were obtained from Sigma Aldrich Chemicals(St Louis, MO, USA). Antibodies againstAKT (anti-rabbit), cleaved caspases-3, 8 and 9 (anti-rabbit),HA-tag(anti-rabbit), Myc-tag (anti-rabbit), p21 (anti-rabbit), phospho-p53 (anti-mouse), PTEN (anti-mouse), phospho-AKT (Ser473; anti-rabbit), phospho-GSK-3β (Ser9; anti-rabbit), phospho-IKKα/β (Ser177/181; anti-rabbit), phospho-IκBα (Ser32; anti-rabbit), and phospho-p65 (Ser276; anti-rabbit) were obtained from Cell Signaling Technologies(Danvers, MA, USA), whereas antibodies for cox-2 (anti rabbit), c-Rel (anti-rabbit), ICAM-1 (anti-rabbit), IKKα/β (anti rabbit), IκBα (anti-rabbit), Mdm2 (anti-rabbit), PARP-1/2 (anti-rabbit), Rel-B (anti-rabbit), p50 (anti-rabbit), p53 (anti-mouse), p65 (anti-rabbit) were obtained from Santa Cruz Biotechnology(Santa Cruz, CA, USA).HRP (Horse radish peroxidase)-conjugated secondary antibodies (anti mouse and anti-rabbit) were obtained from Bangalore Genie(Peenya, India). For immuno-fluorescencestudies, secondary antibodiesconjugated toAlexa Fluor (488 and 594, anti-mouse and anti-rabbit) were obtained from Molecular Probes, Invitrogen(Eugene, OR, USA)
    1. immunoprecipitation by using substrate specific antibody or pull-down by affinity trapping the substrate tag. The IP/pull-down complexes wereanalyzed by detecting the ubiquitination of substrate protein by using either substrate specificantibody or ubiquitin antibody through western blotting
    2. using gateway cloning method (Invitrogen). P73domain deletions were cloned in SFB destination vector. WWP2, WWP1, HACE1, E6AP, and PPM1G were cloned into SFB (S-protein/Flag/streptavidin binding protein (SBP) triple tag), GFP,and Myc mammalian destination vectors using the Gateway cloning technology (Invitrogen). WWP2 domain deletions were cloned into Myc-destination vector. WWP1 domain deletions were cloned into SFB-destination vector. PPM1G domain deletions were cloned into SFB mammalian destination vector using Gateway cloning. Bacterially expressing GST-p73, GST-∆Np73, GST-PPM1G, MBP-WWP1, MBP-WWP2, GST-WWP2, GST-WWP1 and GST-HACE1 were generated by using gateway technology. Ubiquitin WT and all the mutants were cloned into hemagglutinin (HA) mammalian destination vector. Flag-tagged Dvl2was purchased from Addgene. Dvl2 domain deletions were cloned into SFB-destination vectors. All the plasmid constructs generated in the present study are mentioned in table 2.Table 2: Plasmids used in the study
    1. ability to take notes anchored to specific passages of text, and have them tagged and searchable across papers, are such major benefits

      For this to happen, start with:

      1. Open an article in the web browser (Google Chrome or Firefox) that can support pdf.js or epub.js or html if that is available
      2. Open hypothesis app, and start annotating at passages (as in this annotation)
      3. Add a consistent tag (see the other how_to tagged articles)
      4. Then search all related hypothesis annotations and notes on pages using those tags and organise knowledge that way.
    1. agree on a standard set of tags to classify sets of resources

      This is step one. Set up a standard set of tags. These tags will then classify resources. Let's say I want to classify all studies on polygenic risk scores on the web. These documents come in the form of HTML, or PDF. If on the top of that, you'd also like to tag something like tutorial, or software to do the job, then write those tags as well when you come across a resource and tag it. So, a list of tags to learn about polygenic risk scores could be something like:

      • polygenic risk score
      • polygenic_risk_score
      • tutorial
      • utility
      • why_how So, all resources that pertain to polygenic risk scores can now be divided into a set of main document types: some that discuss tutorials and how tos, others discuss the utilisation values and debates. Later, these could be reassembled.
  8. Apr 2019
    1. seminary

      Austen is referring to a boarding school attended by young women from wealthy families who, for some reason, were not educated at home by a governess. https://judeknightauthor.com/tag/girls-education-in-regency-england/

    2. any complaint which asses' milk could possibly relieve

      Donkey milk was considered a viable medical treatment from antiquity (Cleopatra bathed in it) until the turn of the 19th century, when it largely went out of fashion. It was considered a generic cure for a variety of conditions, including gout, scurvy, coughs, colds and asthma. For many, donkey milk caused stomach problems and "lactose intolerance."


    1. Amdo Group 654 For more information about this term, see Full Entry below. SubjectsLanguage TreeSino-TibetanTibeto-BurmanBodicBodishTibetanFull EntryRelated Subjects (7)Related Texts (12)Related Audio-video (1864)

      KMaps tag: bold, with popover.

    1. Annotation Profile Follow learners as they bookmark content, highlight selected text, and tag digital resources. Analyze annotations to better assess learner engagement, comprehension and satisfaction with the materials assigned.

      There is already a Caliper profile for "annotation." Do we have any suggestions about the model?

    1. 1) We identify and investigate two major kinds of lin-guistic activities on Quora: user level [e.g., basicactivities like posting a question/answer/comment aswell as linguistic styles that involves word/char usage,and part-of-speech (POS) tag usage] and questionlevel [e.g., content, topic associations, and edits for aquestion). Remarkably, many of these activities arefound to have a natural correspondence to the qualitiesthat human judges would consider while deciding if aquestion would remain unanswered (see Table I for a setof motivating examples).2) We perform an extensive measurement study to showthat answerability can be indeed characterized based onthe above-mentioned linguistic activities.3) A central finding is that the language use patterns ofthe users is one of the most effective mechanisms tocharacterize answerability.


    1. appears

      The link is to my blog, but is defective. (a link that works is this, but this is to all occurrences of a tag.) Elsewhere, I found the page that the author was attempting to cite. Here I am not. The text is confused. I have never seen Oliver write "under his brother's name," but he has mostly claimed his brother is different, and then he claimed that he was lying about that and had been lying for years, then he took it back. Liars lie, no way around it. If a liar says "I'm a liar," a sane answer is "Not always." After all, stopped clock.

    2. claim he made this up.

      This is a link to my blog, the URL is broken, though. The core points to this, a tag. Searching that display for "schizo," I find eventually material copied from this page: Authentic Darryl Smith on himself That page is Darryl making the claim of "made up." Recently, socks on Reddit -- probably also Darryl -- have repeated the claim that Oliver denied "schizophrenia," but I have never seen Oliver openly deny it. And it makes sense.

    1. If this is a production situation, and security and stability are important, then just "convenience" is likely not the best deciding factor (any more than leaving your house unlocked all the time might be "convenient").


      • 您可以考虑将每个push to registry的版本 - 以某种形式(毕竟,您发布了新版本的代码,并使其他人可以访问)。
      • :latest与Git存储库中的master分支相当。是否每个push to master都考虑准备投入生产?
      • Releases将(通常)通过验证过程(CI/QA /acceptance/etc)。是否应首先验证master中的更改,并且仅在验证(标记并)部署到生产之后?
      • 发行版(Releases)带有版本;这可以是显式版本(标记),也可以是隐式(不可变标记:图像的摘要)

      显式版本 -- image tag<br> 隐式版本 -- 不可变标记 :image digest

    2. However, there is not a 1:1 relation of digests to tags, so when pulling an image by digest, only the digest is known. If you happen to have an image pulled (manually) with a tag that matches that digest, the tag is shown, but not otherwise

      但是,摘要与标签之间没有1:1的关系,因此在通过摘要pull image时,只知道摘要。如果您碰巧使用与该摘要匹配的标记(手动)拉出图像,则会显示标记,否则不会显示

    1. Impact of the Legion 9 plain 2019-04-16T02:18:41+00:00 Joseph Brown 5eac7ef3705d9f80e567b77e809a27064bd00249 1 2019-04-11T22:00:35+00:00 Joseph Brown 5eac7ef3705d9f80e567b77e809a27064bd00249 Organization of the Legion

      As far as I can tell, you are using the tag function for the purpose of forward and backward movement. I'd reconsider using it for this purpose.

    1. “the expression or application of human creative skill and imagination […] producing works to be appreciated primarily for their beauty or emotional power.”

      Who is the speaker on this quote? Braidotti? I would tag the source or do what you did above (Author last name, page #)

    1. Alexander McQueen: Relationality between human and non-human Charles Sirisawat-Larouche 6 plain 2019-04-10T17:03:44+00:00 Charles Sirisawat-Larouche b6ad6d1a6de4c926cdef3dafe6adb7d5f4702fd8 Contents of this tag: 1 2019-04-04T13:19:50+00:00 Deterritorialize the Fashion Practices 11 vistag 2019-04-10T17:15:57+00:00 1 2019-04-01T13:48:01+00:00 Charles Sirisawat-Larouche b6ad6d1a6de4c926cdef3dafe6adb7d5f4702fd8 The Deconstructivist Approach Charles Sirisawat-Larouche 22 plain 2019-04-10T16:51:05+00:00 Charles Sirisawat-Larouche b6ad6d1a6de4c926cdef3dafe6adb7d5f4702fd8 Contents of this tag: 1 2019-04-10T11:43:52+00:00 Alexander McQueen: Relationality between human and non-human 6 plain 2019-04-10T17:03:44+00:00 1 2019-04-04T13:19:50+00:00 Deterritorialize the Fashion Practices 11 vistag 2019-04-10T17:15:57+00:00 1 2019-04-10T13:46:55+00:00 The display of its fashion is done toward alien like figure or cyborg incarnation which allow a certain extent of dis-identification. 3 plain 2019-04-10T13:48:27+00:00 1 2019-04-10T12:26:35+00:00 Fashion and Anti-fashion Dichotomy 2 plain 2019-04-10T16:34:01+00:00 1 2019-04-10T12:22:36+00:00 Phallogocentrism 3 plain 2019-04-10T16:38:25+00:00 1 2019-04-10T12:26:06+00:00 Self-Styling and Self-Fashion 3 plain 2019-04-10T16:39:08+00:00 1 media/Fashion-Looks-Forward_Utopian-Bodies_Exhibition_Stockholm-_dezeen_936_12.jpg 2019-04-01T15:44:50+00:00 Charles Sirisawat-Larouche b6ad6d1a6de4c926cdef3dafe6adb7d5f4702fd8 Post-Gender Charles Sirisawat-Larouche 15 visual_path 2019-04-10T17:26:43+00:00 Charles Sirisawat-Larouche b6ad6d1a6de4c926cdef3dafe6adb7d5f4702fd8 Contents of this tag: 1 2019-04-04T13:19:50+00:00 Deterritorialize the Fashion Practices 11 vistag 2019-04-10T17:15:57+00:00 1 2019-04-10T00:30:04+00:00 Rei Kawakubo and the excessive use of material as a strategy to reassert the conceptual embodied entity toward subversive site. 5 Rei Kawakubo and the excessive use of material as a strategy to reassert the conceptual embodied entity toward subversive site. Click and Drag to Annotate plain 2019-04-10T16:17:23+00:00 1 media/IMG_5473 2.jpg media/Animated GIF-original.mp4 2019-04-01T13:55:26+00:00 Thierry Mugler: The contradictory abjection 27 visual_path 2019-04-10T17:32:51+00:00 1 media/2000px-Polka_dots.svg.png media/Animated GIF-original.mp4 2019-04-04T13:16:11+00:00 Charles Sirisawat-Larouche b6ad6d1a6de4c926cdef3dafe6adb7d5f4702fd8 Posthuman Museum Practices: From Runway to exhibition Charles Sirisawat-Larouche 34 Preceding the essence of fashion tags 2019-04-10T17:47:08+00:00 Charles Sirisawat-Larouche b6ad6d1a6de4c926cdef3dafe6adb7d5f4702fd8 Contents of this tag: 1 2019-04-04T13:19:50+00:00 Deterritorialize the Fashion Practices 11 vistag 2019-04-10T17:15:57+00:00 1 2019-04-08T15:25:59+00:00 Charles Sirisawat-Larouche b6ad6d1a6de4c926cdef3dafe6adb7d5f4702fd8 Self-organizing: Curator, Curatorial. Curating... Charles Sirisawat-Larouche 10 visual_path 2019-04-10T14:09:14+00:00 Charles Sirisawat-Larouche b6ad6d1a6de4c926cdef3dafe6adb7d5f4702fd8 Contents of this tag: 1 2019-04-04T13:19:50+00:00 Deterritorialize the Fashion Practices 11 vistag 2019-04-10T17:15:57+00:00 1 media/IMG_5473 2.jpg media/Animated GIF-original.mp4 2019-04-01T13:55:26+00:00 Charles Sirisawat-Larouche b6ad6d1a6de4c926cdef3dafe6adb7d5f4702fd8 Thierry Mugler: The contradictory abjection

      Nice idea to use tags. Relevant to the style of assignment.

    1. Can we at this point have the option of also adding a "TAG" function It's much faster to be able to tag as when new questions are being inputted

      Also if the question TAG is shown here, new quizzes can be created on the spot with a specific question TAG (if that makes sense)