It's a little disingenuous to say that the 4000x speedup is due to Jax. I'm a hu...

luchris429 · on April 8, 2023

Author here! I didn't realize this got posted on HN. While indeed we do get a speedup by putting the environments on the GPU, most of the speedup seems to come from the ability to easily parallelize RL training with Jax.

While there is work on putting RL environments on accelerators, the main speedup from this work comes from also training many RL agents in parallel. This is largely because the neural networks we use in RL are relatively small and thus don't utilize the GPU very efficiently.

While this was always possible to do, Jax makes it way easier because we just need to call `jax.vmap` to get it to work.

Inufu · on April 7, 2023

AlphaZero did not run game logic on TPUs (neither chess nor other games), implementing it in C++ is more than fast enough and much simpler.

TPUs were used for neural network inference and training, but game logic as well as MCTS was on the CPU using C++.

JAX is awesome though, I use it for all my neural network stuff!

sillysaurusx · on April 7, 2023

According to the AlphaZero paper (https://arxiv.org/pdf/1712.01815.pdf) they ran game logic on TPUs:

> Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters, using 5,000 first-generation TPUs to generate self-play games and 64 second-generation TPUs to train the neural networks. Further details of the training procedure are provided in the Methods.