Hypothesis

1 Matching Annotations

Jul 2023
arxiv.org arxiv.org

1707.06347.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Paper that introduced the PPO algorithm. PPO is, in a way, a response to the TRPO algorithm, trying to use the core idea but implement a more efficient and simpler algorithm.
  
  TRPO defines the problem as a straight optimization problem, no learning is actually involved.
  
  ppo reinforcement-learning policy-gradients trpo
Visit annotations in context

Tags

ppo

trpo

reinforcement-learning

policy-gradients

Annotators

mark.crowley

URL

arxiv.org/pdf/1707.06347