  1. Mar 2015
    1. Too long; didn't watch: Starts with introduction to reinforcement learning. From 20:30 he starts formalising the problem, to derive in the last ten minutes (from 45:00) how to compute policy gradients using backprop - supposedly the same method used by DeepMind to learn to play arcade games.