Hypothesis

44 Matching Annotations

May 2023
writings.stephenwolfram.com writings.stephenwolfram.com

<div style="max-width: 480px;">What Is ChatGPT Doing … and Why Does It Work?</div>

2
1. siva82kb 31 May 2023
  
  in Public
  
  “secondary pathway” that takes the sequence of (integer) positions for the tokens, and from these integers creates another embedding vector
  
  What is this position? Why are we embedding this? What does this embedding mean?
2. siva82kb 31 May 2023
  
  in Public
  
  The input is a vector of n tokens (represented as in the previous section by integers from 1 to about 50,000).
  
  What is 'n' here? Is this the number of tokens identified in the given sentence? Once we've found the embedding can't we use a look up instead of a single layer NN?
Visit annotations in context

Annotators

siva82kb

URL

writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/
Nov 2021
www.analyticsvidhya.com www.analyticsvidhya.com

Class Imbalance | Handling Imbalanced Data Using Python

1
1. siva82kb 10 Nov 2021
  
  in Public
  
  Cluster-Based Over Sampling
  
  Not sure how this will help with the imbalance issue. How does equal representation of subclasses lead to better results?
Visit annotations in context

Annotators

siva82kb

URL

analyticsvidhya.com/blog/2017/03/imbalanced-data-classification/
Jul 2021
christophm.github.io christophm.github.io

5.8 Scoped Rules (Anchors) | Interpretable Machine Learning

1
1. siva82kb 21 Jul 2021
  
  in Public
  
  EDx(z|A)[1f(x)=f(z)]≥τ,A(x)=1
  
  Not clear to me what this means.
Visit annotations in context

Annotators

siva82kb

URL

christophm.github.io/interpretable-ml-book/anchors.html
christophm.github.io christophm.github.io

5.7 Local Surrogate (LIME) | Interpretable Machine Learning

1
1. siva82kb 20 Jul 2021
  
  in Public
  
  The x-axis shows the feature effect: The weight times the actual feature value.
  
  I do not understand this.
Visit annotations in context

Annotators

siva82kb

URL

christophm.github.io/interpretable-ml-book/lime.html
christophm.github.io christophm.github.io

5.3 Accumulated Local Effects (ALE) Plot | Interpretable Machine Learning

3
1. siva82kb 14 Jul 2021
  
  in Public
  
  You can subtract the lower-order effects in a partial dependence plot to get the pure main or second-order effects
  
  How do we do this?
2. siva82kb 14 Jul 2021
  
  in Public
  
  Well, that sounds stupid. Derivation and integration usually cancel each other out, like first subtracting, then adding the same number. Why does it make sense here? The derivative (or interval difference) isolates the effect of the feature of interest and blocks the effect of correlated features.
  
  Say what? How does it remove the correlation? It remove the offset, but correlation?
3. siva82kb 14 Jul 2021
  
  in Public
  
  ALE plots are a faster and unbiased alternative to partial dependence plots (PDPs).
  
  Why are the PDPs biased?
Visit annotations in context

Annotators

siva82kb

URL

christophm.github.io/interpretable-ml-book/ale.html
christophm.github.io christophm.github.io

Interpretable Machine Learning

2
1. siva82kb 14 Jul 2021
  
  in Public
  
  For each of the categories, we get a PDP estimate by forcing all data instances to have the same category. For example, if we look at the bike rental dataset and are interested in the partial dependence plot for the season, we get 4 numbers, one for each season. To compute the value for "summer", we replace the season of all data instances with "summer" and average the predictions.
  
  Why would be change the season for all? This does not make sense. We simply have to take the average of all instances corresponding to a particular season.
  
  Update: I got it now. You do replace every instance by that value and simply run all modified instances through the ML model and average across its output.
2. siva82kb 14 Jul 2021
  
  in Public
  
  An assumption of the PDP is that the features in C are not correlated with the features in S. If this assumption is violated, the averages calculated for the partial dependence plot will include data points that are very unlikely or even impossible (see disadvantages).
  
  I do not follow this.
Visit annotations in context

Annotators

siva82kb

URL

christophm.github.io/interpretable-ml-book/pdp.html
christophm.github.io christophm.github.io

5.2 Individual Conditional Expectation (ICE) | Interpretable Machine Learning

1
1. siva82kb 14 Jul 2021
  
  in Public
  
  f(x)=^f(xS,xC)=g(xS)+h(xC)
  
  How do we know we can express it like this?
Visit annotations in context

Annotators

siva82kb

URL

christophm.github.io/interpretable-ml-book/ice.html
nba.uth.tmc.edu nba.uth.tmc.edu

Motor Units and Muscle Receptors (Section 3, Chapter 1) Neuroscience Online: An Electronic Textbook for the Neurosciences | Department of Neurobiology and Anatomy - The University of Texas Medical School at Houston

1
1. siva82kb 12 Jul 2021
  
  in Public
  
  innervation ratio
  
  How is this a ratio?
Visit annotations in context

Annotators

siva82kb

URL

nba.uth.tmc.edu/neuroscience/m/s3/chapter01.html
christophm.github.io christophm.github.io

4.4 Decision Tree | Interpretable Machine Learning

3
1. siva82kb 12 Jul 2021
  
  in Public
  
  A tree with a depth of three requires a maximum of three features and split points to create the explanation for the prediction of an individual instance.
  
  This means that predicting the value for any instance only requires a maximum of three features. Even though the overall tree itself can can use up to 7 features.
2. siva82kb 12 Jul 2021
  
  in Public
  
  ∑j=1feat.contrib(j,x)
  
  How do we get the feature contribution?
3. siva82kb 12 Jul 2021
  
  in Public
  
  Feature importance
  
  I do not follow this.
Visit annotations in context

Annotators

siva82kb

URL

christophm.github.io/interpretable-ml-book/tree.html
christophm.github.io christophm.github.io

4.3 GLM, GAM and more | Interpretable Machine Learning

1
1. siva82kb 10 Jul 2021
  
  in Public
  
  How will this help with comparison? Are we assuming the other model uses categorization?
Visit annotations in context

Annotators

siva82kb

URL

christophm.github.io/interpretable-ml-book/extend-lm.html
christophm.github.io christophm.github.io

4.2 Logistic Regression | Interpretable Machine Learning

2
1. siva82kb 10 Jul 2021
  
  in Public
  
  Logistic regression can suffer from complete separation. If there is a feature that would perfectly separate the two classes, the logistic regression model can no longer be trained. This is because the weight for that feature would not converge, because the optimal weight would be infinite. This is really a bit unfortunate, because such a feature is really useful. But you do not need machine learning if you have a simple rule that separates both classes. The problem of complete separation can be solved by introducing penalization of the weights or defining a prior probability distribution of weights.
  
  Cannot understand this.
2. siva82kb 10 Jul 2021
  
  in Public
  
  But usually you do not deal with the odds and interpret the weights only as the odds ratios. Because for actually calculating the odds you would need to set a value for each feature, which only makes sense if you want to look at one specific instance of your dataset.
  
  I do not follow this.
Visit annotations in context

Annotators

siva82kb

URL

christophm.github.io/interpretable-ml-book/logistic.html
christophm.github.io christophm.github.io

4.1 Linear Regression | Interpretable Machine Learning

2
1. siva82kb 09 Jul 2021
  
  in Public
  
  days_since_2011 4.9 0.2 28.5
  
  It looks like there was a steady increase in the number of bikes rented every single day.
2. siva82kb 09 Jul 2021
  
  in Public
  
  R2=1−(1−R2)n−1n−p−1
  
  I do not follow this. Is this always guaranteed to be within 0 and 1?
Visit annotations in context

Annotators

siva82kb

URL

christophm.github.io/interpretable-ml-book/limo.html
christophm.github.io christophm.github.io

2.4 Evaluation of Interpretability | Interpretable Machine Learning

2
1. siva82kb 08 Jul 2021
  
  in Public
  
  2.4 Evaluation of Interpretability
  
  I do not follow this section.
2. siva82kb 08 Jul 2021
  
  in Public
  
  Application level evaluation (real task)
  
  I do not understand this. Where is the interoperability here? The software could person as well as the radiologist, but we might still have no idea how it is doing it.
Visit annotations in context

Annotators

siva82kb

URL

christophm.github.io/interpretable-ml-book/evaluation-of-interpretability.html
May 2021
www.gammon.com.au www.gammon.com.au

Gammon Forum : Electronics : Microprocessors : Interrupts

1
1. siva82kb 18 May 2021
  
  in Public
  
  This article discusses interrupts on the Arduino Uno (Atmega328) and similar processors, using the Arduino IDE. The concepts however are very general. The code examples provided should compile on the Arduino IDE (Integrated Development Environment).
  
  This is such a great resource!
  
  embedded; arduinio;
Visit annotations in context

Tags

embedded; arduinio;

Annotators

siva82kb

URL

gammon.com.au/interrupts
Apr 2021
qz.com qz.com

The idea that everything from spoons to stones is conscious is gaining academic credibility

1
1. siva82kb 21 Apr 2021
  
  in Public
  
  It’s very hard to get consciousness out of non-consciousness
  
  If particles can come in and go out of existence from nothing, why can't consciousness?
Visit annotations in context

Annotators

siva82kb

URL

qz.com/1184574/the-idea-that-everything-from-spoons-to-stones-are-conscious-is-gaining-academic-credibility/
Mar 2021
www.scholarpedia.org www.scholarpedia.org

Muscle Physiology and Modeling

4
1. siva82kb 17 Mar 2021
  
  in Public
  
  low firing rates,
  
  What is low firing rate?
2. siva82kb 17 Mar 2021
  
  in Public
  
  fenvi=fmin+⎛⎝1−UthMUifmax−fmin⎞⎠∗U
  
  I do not follow this. This means the neuron is always firing at f_min? How foes f_0.5 come into the picture?
3. siva82kb 17 Mar 2021
  
  in Public
  
  model is an assembly of phenomenological models,
  
  Not sure how this avoids the issues of over-fitting.
4. siva82kb 17 Mar 2021
  
  in Public
  
  All of the muscle fibers in all of the motor units of a given muscle tend to move together, experiencing the same sarcomere lengths and velocities
  
  How do we know this? What about motor units that are not activated? What about motor units that are activated with different time delays and different rates?
Visit annotations in context

Annotators

siva82kb

URL

scholarpedia.org/article/Muscle_Physiology_and_Modeling
Jul 2020
en.wikipedia.org en.wikipedia.org

Intention-to-treat analysis - Wikipedia

2
1. siva82kb 17 Jul 2020
  
  in Public
  
  effects of crossover and dropout
  
  I understand dropout, but how crossover?
2. siva82kb 17 Jul 2020
  
  in Public
  
  Randomized clinical trials analyzed by the intention-to-treat (ITT) approach provide unbiased comparisons among the treatment groups
  
  What is the proof for this? Is there a statistical proof?
Visit annotations in context

Annotators

siva82kb

URL

en.wikipedia.org/wiki/Intention-to-treat_analysis
Jan 2020
rdgao.github.io rdgao.github.io

The (Epi)Phenomenal Oscillation, Spike, and LFP

2
1. siva82kb 01 Jan 2020
  
  in Public
  
  Additionally, one of my all time favorite papers (Fröhlich & McCormick, 2010) showed that an applied external field can entrain spiking in ferret cortical slice, parametrically to the oscillatory field frequency.
  
  What are the sources of LFP? If its only the current induced by EPSP and IPSP, then it is not clear if the entertainment through external field makes a strong argument. If LFPs can be modified by other factors like it is pointed out (Ca++ currents and other glial cells currents), then it is possible that these are not epiphenomenal. But we still need to show that they actually play a causal role in the observed system output.
2. siva82kb 01 Jan 2020
  
  in Public
  
  But spikes do not compute! The cells “compute”, dendrites “compute”, the axon hillock “computes”. In that sense, spikes are epiphenomenal: they are the secondary consequences of dendritic computation, of which you can fully infer by knowing the incoming synaptic inputs and biophysical properties of the neuron.
  
  This is correct. But in that case, LFPs and any other oscillations that we are recording are also epiphenomenal.
Visit annotations in context

Annotators

siva82kb

URL

rdgao.github.io/epiphenomenal-oscillations/
Dec 2019
link.springer.com link.springer.com

Which Reference Should We Use for EEG and ERP practice?

2
1. siva82kb 25 Dec 2019
  
  in Public
  
  estimators is the prior covariance ΣΣφφ
  
  How do we know this covariance?
2. siva82kb 25 Dec 2019
  
  in Public
  
  If the sensor noise has an independent identical distribution (IID) across channels, the covariance of the sensor noise in the referenced data will be Σεεrεεr=σ2TrTTr
  
  I do not understand this.
Visit annotations in context

Annotators

siva82kb

URL

link.springer.com/article/10.1007/s10548-019-00707-x
Mar 2019
docs.microsoft.com docs.microsoft.com

Managing the Lifetime of an Object - Windows applications

1
1. siva82kb 15 Mar 2019
  
  in Public
  
  When the number of references drops to zero, the object deletes itself. The last part is worth repeating: The object deletes itself
  
  How does that work? How does an object delete itself?
Visit annotations in context

Annotators

siva82kb

URL

docs.microsoft.com/en-us/windows/desktop/learnwin32/managing-the-lifetime-of-an-object
docs.microsoft.com docs.microsoft.com

Writing the Window Procedure - Windows applications

1
1. siva82kb 14 Mar 2019
  
  in Public
  
  CALLBACK is the calling convention for the functio
  
  What is a calling convention?
Visit annotations in context

Annotators

siva82kb

URL

docs.microsoft.com/en-us/windows/desktop/learnwin32/writing-the-window-procedure
Feb 2019
colah.github.io colah.github.io

Calculus on Computational Graphs: Backpropagation -- colah's blog

1
1. siva82kb 05 Feb 2019
  
  in Public
  
  Calculus on Computational Graphs: Backpropagation
  
  Good article on computational graphs and their role in back propagation.
Visit annotations in context

Annotators

siva82kb

URL

colah.github.io/posts/2015-08-Backprop/
stats.stackexchange.com stats.stackexchange.com

Batch gradient descent versus stochastic gradient descent

1
1. siva82kb 05 Feb 2019
  
  in Public
  
  One benefit of SGD is that it's computationally a whole lot faster. Large datasets often can't be held in RAM, which makes vectorization much less efficient. Rather, each sample or batch of samples must be loaded, worked with, the results stored, and so on. Minibatch SGD, on the other hand, is usually intentionally made small enough to be computationally tractable. Usually, this computational advantage is leveraged by performing many more iterations of SGD, making many more steps than conventional batch gradient descent. This usually results in a model that is very close to that which would be found via batch gradient descent, or better.
  
  Good explanation for why SGD is computationally better. I was confused about the benefits of repeated performing mini-batch GD, and why it might be better than batch GD. But I guess the advantage comes from being able to get better performance by vecotrizing computation.
  
  NeuralNetworks ML
Visit annotations in context

Tags

NeuralNetworks

ML

Annotators

siva82kb

URL

stats.stackexchange.com/questions/49528/batch-gradient-descent-versus-stochastic-gradient-descent
neuralnetworksanddeeplearning.com neuralnetworksanddeeplearning.com

Neural Networks and Deep Learning

6
1. siva82kb 05 Feb 2019
  
  in Public
  
  And so it makes most sense to regard epoch 280 as the point beyond which overfitting is dominating learning in our neural network.
  
  I do not get this. Epoch 15 indicates that we are already over-fitting to the training data set, on? Assuming both training and test set come from the same population that we are trying to learn from.
  
  NeuralNetworks mathematics ML questions
2. siva82kb 05 Feb 2019
  
  in Public
  
  If we see that the accuracy on the test data is no longer improving, then we should stop training
  
  This contradicts the earlier statement about epoch 280 being the point where there is over-training.
  
  NeuralNetworks mathematics ML questions
3. siva82kb 05 Feb 2019
  
  in Public
  
  It might be that accuracy on the test data and the training data both stop improving at the same time
  
  Can this happen? Can the accuracy on the training data set ever increase with the training epoch?
  
  NeuralNetworks mathematics ML questions
4. siva82kb 05 Feb 2019
  
  in Public
  
  test data
  
  Shouldn't this be "training data"?
5. siva82kb 05 Feb 2019
  
  in Public
  
  What is the limiting value for the output activations aLj
  
  When c is large, small differences in z_j^L are magnified and the function jumps between 0 and 1, depending on the sign of the differences. On the other hand, when c is very small, all activation values will be close to 1/N; where N is the number of neurons in layer L.
  
  NeuralNetworks mathematics ML
6. siva82kb 05 Feb 2019
  
  in Public
  
  zLj=lnaLj+C
  
  How can the constant C be independent of j? It will have a e^{z_j^L} term in it. This is not correct.
Visit annotations in context

Tags

mathematics

ML

questions

NeuralNetworks

Annotators

siva82kb

URL

neuralnetworksanddeeplearning.com/chap3.html

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL