7 Matching Annotations
 Dec 2017

medium.com medium.com

ach episode/game is relatively short, of approximately 200 actions
It's a show action.

 Jun 2017

www.alexirpan.com www.alexirpan.com

What ideas from this work are applicable to actorcritic RL? At a first glance, I’m now very interested in investigating the magnitude of the actor gradients. If they tend to be very large or very small, we may have a similar saturation problem, and adding a Lipschitz bound through weight clamping could help.
Good question.

he weights wwww are constrained to lie within [−c,c][c, c][−c,c][c, c], by clipping wwww after every update to wwww.
Tanh, sigmoid is allowed. But exp is not. The nonlinear function itself should be Klipschitz

Directly learn the probability density function PθP_\thetaPθP_\theta. Meaning, PθP_\thetaPθP_\theta is some differentiable function such that Pθ(x)≥0P_\theta(x) \ge 0Pθ(x)≥0P_\theta(x) \ge 0 and ∫xPθ(x)dx=1\int_x P_\theta(x)\, dx = 1∫xPθ(x)dx=1\int_x P_\theta(x)\, dx = 1. We optimize PθP_\thetaPθP_\theta through maximum likelihood estimation
It's more like a classification model.

KL(Pr∥Pθ)KL(P_r \ P_\theta).
Code Pr with P\theta


offconvex.github.io offconvex.github.io

Trust region algorithms
Also see TRPO

One explanation of Nonconvex optimization
Tags
Annotators
URL
