Hypothesis

5 Matching Annotations

May 2019
towardsdatascience.com towardsdatascience.com

One Feature Attribution Method to (Supposedly) Rule Them All: Shapley Values

3
1. namanb 07 May 2019
  
  in Public
  
  we represent that to the model as that feature taking on it’s expected value over the whole dataset
  
  Kind of similar to occlusion if you consider simplistic view in terms of pixels as opposed to super-pixels
2. namanb 07 May 2019
  
  in Public
  
  In the case where the model is linear, or where the features are truly independent, this problem is a trivial one: no matter the values of other features, or the sequence in which features are added to the model, the contribution of a given feature is the same.
  
  I guess this is in agreement with the equation in the paper. So basically inp*grad (from the paper)
3. namanb 07 May 2019
  
  in Public
  
  finding each
  
  This also takes care of finding marginal contribution as compareed to when all the players are absent (set is empty)(subtraction over expectation of dataset as in (https://christophm.github.io/interpretable-ml-book/shapley.html))
Visit annotations in context

Annotators

namanb

URL

towardsdatascience.com/one-feature-attribution-method-to-supposedly-rule-them-all-shapley-values-f3e04534983d
Apr 2018
neuralnetworksanddeeplearning.com neuralnetworksanddeeplearning.com

Neural Networks and Deep Learning

1
1. namanb 24 Apr 2018
  
  in Public
  
  Where does the "softmax" name come from
  
  This one's quite interesting. The output of the maximum function would look something like [0, 0,...,1, 0..., 0] (1 for the maximum value). That's why the name softmax when c = 1.
  
  Another interesting article explaining why to use softmax over simple normalization.
  
  Interesting
Visit annotations in context

Tags

Interesting

Annotators

namanb

URL

neuralnetworksanddeeplearning.com/chap3.html
neuralnetworksanddeeplearning.com neuralnetworksanddeeplearning.com

Neural Networks and Deep Learning

1
1. namanb 24 Apr 2018
  
  in Public
  
  There are other insights along these lines which can be obtained from (BP1)δLj=∂C∂aLjσ′(zLj)δjL=∂C∂ajLσ′(zjL)\begin{eqnarray} \delta^L_j = \frac{\partial C}{\partial a^L_j} \sigma'(z^L_j) \nonumber\end{eqnarray}$('#margin_406519642632_reveal').click(function() {$('#margin_406519642632').toggle('slow', function() {});});-(BP4)∂C∂wljk=al−1kδlj∂C∂wjkl=akl−1δjl\begin{eqnarray} \frac{\partial C}{\partial w^l_{jk}} = a^{l-1}_k \delta^l_j \nonumber\end{eqnarray}$('#margin_780300064432_reveal').click(function() {$('#margin_780300064432').toggle('slow', function() {});});. Let's start by looking at the output layer. Consider the term σ′(zLj)σ′(zjL)\sigma'(z^L_j) in (BP1)δLj=∂C∂aLjσ′(zLj)δjL=∂C∂ajLσ′(zjL)\begin{eqnarray} \delta^L_j = \frac{\partial C}{\partial a^L_j} \sigma'(z^L_j) \nonumber\end{eqnarray}$('#margin_935860455964_reveal').click(function() {$('#margin_935860455964').toggle('slow', function() {});});. Recall from the graph of the sigmoid function in the last chapter that the σσ\sigma function becomes very flat when σ(zLj)σ(zjL)\sigma(z^L_j) is approximately 000 or 111. When this occurs we will have σ′(zLj)≈0σ′(zjL)≈0\sigma'(z^L_j) \approx 0. And so the lesson is that a weight in the final layer will learn slowly if the output neuron is either low activation (≈0≈0\approx 0) or high activation (≈1≈1\approx 1). In this case it's common to say the output neuron has saturated and, as a result, the weight has stopped learning (or is learning slowly). Similar remarks hold also for the biases of output neuron.
  
  True for sigmoid layer
Visit annotations in context

Annotators

namanb

URL

neuralnetworksanddeeplearning.com/chap2.html

Annotators

URL

Tags

Annotators

URL

Annotators

URL