7 Matching Annotations
1. Dec 2017
2. medium.com medium.com
1. ach episode/game is relatively short, of approximately 200 actions

It's a show action.

#### URL

3. Jun 2017
4. www.alexirpan.com www.alexirpan.com
1. What ideas from this work are applicable to actor-critic RL? At a first glance, I’m now very interested in investigating the magnitude of the actor gradients. If they tend to be very large or very small, we may have a similar saturation problem, and adding a Lipschitz bound through weight clamping could help.

Good question.

2. he weights wwww are constrained to lie within [−c,c][-c, c][−c,c][-c, c], by clipping wwww after every update to wwww.

Tanh, sigmoid is allowed. But exp is not. The non-linear function itself should be K-lipschitz

3. Directly learn the probability density function PθP_\thetaP​θ​​P_\theta. Meaning, PθP_\thetaP​θ​​P_\theta is some differentiable function such that Pθ(x)≥0P_\theta(x) \ge 0P​θ​​(x)≥0P_\theta(x) \ge 0 and ∫xPθ(x)dx=1\int_x P_\theta(x)\, dx = 1∫​x​​P​θ​​(x)dx=1\int_x P_\theta(x)\, dx = 1. We optimize PθP_\thetaP​θ​​P_\theta through maximum likelihood estimation

It's more like a classification model.

4. KL(P​r​​∥P​θ​​)KL(P_r \| P_\theta).

Code Pr with P\theta

#### URL

5. offconvex.github.io offconvex.github.io
1. Trust region algorithms

Also see TRPO

2. One explanation of Non-convex optimization