Hypothesis

7 Matching Annotations

Feb 2017
wiki.fast.ai wiki.fast.ai

Lesson 4 Notes - Deep Learning Course Wiki

1
1. sravya8 03 Feb 2017
  
  in Public
  
  start with a very low learning rate to avoid jumping to an undesirable minimum, and then increase once we're no longer at risk of getting stuck ther
  
  ?
Visit annotations in context

Annotators

sravya8

URL

wiki.fast.ai/index.php/Lesson_4_Notes
sebastianruder.com sebastianruder.com

An overview of gradient descent optimization algorithms

3
1. sravya8 01 Feb 2017
  
  in Public
  
  many recent papers use vanilla SGD without momentum and a simple learning rate annealing schedule.
  
  So does annealing help with Adam and other adaptive learning techniques?
2. sravya8 01 Feb 2017
  
  in Public
  
  erforms redundant computations for large datasets, as it recomputes gradients for similar examples before each parameter update.
  
  Coudl not follow the redundancy claim here
3. sravya8 01 Feb 2017
  
  in Public
  
  Batch gradient descent is guaranteed to converge to the global minimum for convex error surfaces and to a local minimum for non-convex surfaces.
  
  How?
Visit annotations in context

Annotators

sravya8

URL

sebastianruder.com/optimizing-gradient-descent/
Jan 2017
cs231n.github.io cs231n.github.io

CS231n Convolutional Neural Networks for Visual Recognition

1
1. sravya8 25 Jan 2017
  
  in Public
  
  Due to parameter sharing, you can easily run a pretrained network on images of different spatial size. This is clearly evident in the case of Conv/Pool layers because their forward function is independent of the input volume spatial size (as long as the strides “fit”). In case of FC layers, this still holds true because FC layers can be converted to a Convolutional Layer: For example, in an AlexNet, the final pooling volume before the first FC layer is of size [6x6x512]. Therefore, the FC layer looking at this volume is equivalent to having a Convolutional Layer that has receptive field size 6x6, and is applied with padding of 0.
  
  Not sure how FCS can be made size agnostic?
Visit annotations in context

Annotators

sravya8

URL

cs231n.github.io/transfer-learning/
www.youtube.com www.youtube.com

Basic tmux Tutorial - Windows, Panes, and Sessions over SSH

1
1. sravya8 04 Jan 2017
  
  in Public
  
  Tmux notes:
  
  Terminal Multiplexer
  
  Multiple panes
  
  Attach and detach
  
  Share sessions
  
  Steps:
  
  Install tmux on the server
  
  start tmux ; tmux
  
  Run tmux commands => ctrl +b -> <command>
  
  c => new window
  
  , => rename window
  
  n,p => next previous window
  
  w => list windows
  
  % => split vertical
  
  " => horizantal
  
  o => switch between panes
  
  space => rearrange panes (gives different layouts)
Visit annotations in context

Annotators

sravya8

URL

youtube.com/watch
Dec 2016
www.youtube.com www.youtube.com

Nuts and Bolts of Applying Deep Learning (Andrew Ng)

1
1. sravya8 31 Dec 2016
  
  in Public
  
  Key points:
  
  Scale of data is especially good for large NNs
  
  Having a combination of HPC and AI skills is important to have optimal impact (handle scale challenges and bigger/complex NN)
  
  Most of the value right now comes from CNNS, FCs, RNNS. Unsupervised, GANs and others might be future but they are research topics right now.
  
  E2E DL might be relevant for some cases in future like speech -> transcript, Image -> captioning, text -> image
  
  Self driving cars might also move to E2E, but none of us have enough data image -> steer
  
  Workflow:
  
  Bias = Training error - Human error. Try Bigger model, run longer, New model architecture
  
  Variance = Dev error - Train error. Try More data, Regularization, New model architecture.
  
  Conflict between bias and variance is weaker in DL. We can have bigger model with more data.
  
  More data:
  
  Data synthesis/augmentation is becoming useful and popular: OCR (superpose alphabets on various images), Speech (Superpose various background noises), NLP(?) But does have drawbacks, if it is not representative
  
  Unified data warehouse helps leverage data usage across company
  
  Data set breakdown:
  
  Dev and test should come from same distribution. As we spend a lot of time optimizing for Dev accuracy.
  
  Progress plateaus above Human level performance:
  
  But there is theoretical optimal error rate (Bayes rate)
  
  What to do when bias is high:
  
  Look at examples of the ones machine got it wrong
  
  Get labels from humans?
  
  Error analysis: Segment training - identify segments where training error is higher than human.
  
  Estimate bias/variance effect?
  
  How do you define human level performance: Example: Error of a panel of experts
  
  Size of data:
  
  How do you define a NN as small vs medium vs large?
  
  Is the reason large NN can leverage bigger data is because it would not cause overfitting unlike on smaller NNs?
  
  Deep learning
Visit annotations in context

Tags

Deep learning

Annotators

sravya8

URL

youtube.com/watch

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL