1 Matching Annotations
- Mar 2015
-
www.youtube.com www.youtube.com
-
Too long; didn't watch: Starts with introduction to reinforcement learning. From 20:30 he starts formalising the problem, to derive in the last ten minutes (from 45:00) how to compute policy gradients using backprop - supposedly the same method used by DeepMind to learn to play arcade games.
-