- Jul 2024
-
en.wikipedia.org en.wikipedia.org
-
Most contemporary implementations of Monte Carlo tree search are based on some variant of UCT
The UCB algorithm for bandits comes back again as UCT to form the basis for model estimation via MCTS
-
The main difficulty in selecting child nodes is maintaining some balance between the exploitation of deep variants after moves with high average win rate and the exploration of moves with few simulations.
Tree search makes this tradeoff very clear, how many paths will you explore before you stop and use the knowledge you already have?
-
-
-
The summary paper for AlphaGo.
-
-
-
A realy nice visual history of the development of Deep Learning, the cornerstone of modern AI and ML.
-
-
en.wikipedia.org en.wikipedia.org
-
An illustration of alpha–beta pruning. The grayed-out subtrees don't need to be explored (when moves are evaluated from left to right), since it is known that the group of subtrees as a whole yields the value of an equivalent subtree or worse, and as such cannot influence the final result. The max and min levels represent the turn of the player and the adversary, respectively.
Alpha-Beta pruning comes down to being smart about searching the tree of possible future game states to be more efficient about rollouts.
Tags
Annotators
URL
-
-
en.wikipedia.org en.wikipedia.org
-
For example, the chess computer Deep Blue (the first one to beat a reigning world champion, Garry Kasparov at that time) looked ahead at least 12 plies, then applied a heuristic evaluation function.[6]
Deep Blue used a kind of minimax algorithm to beat Garry Kasparov at chess, 12 step lookehead.
-
-
en.wikipedia.org en.wikipedia.org
-
Wikipedia: AlphaZero
-
-
arxiv.org arxiv.org
-
2024 paper arguing that other methods beyond PPO could be better for "value alignment" of LLMs
Tags
Annotators
URL
-
-
arxiv.org arxiv.org
-
Paper "Deep Reinforcement Learning that Matters" on evaluating RL algorithms.
Tags
Annotators
URL
-
- Oct 2023
-
arxiv.org arxiv.org
-
[Kapturowski, DeepMind, Sep 2022] "Human-level Atari 200x Faster"
Improving the 2020 Agent57 performance to be more efficeint.
-