Reinforcement Learning Implementation Template
This project is meant to provide a quick and easy template to implement or develop a wide variety of RL algorithms. When starting a new RL project, I would always find myself rewriting the same code and having to spend unecessary time debugging code I had already written dozens of times before. This project provides a simple and adjustable implementation of policy gradient in both 1D and 2D environments. Environments are simulated over multiple processes using mpi4py making the process very efficient.
First you need to install a working version of MPI. Many different versions exist, but I personally recommend OpenMPI, Instructions can be found here.
The template has only been tested with Python3.6, so versions 3.6 and above should all work. Other versions may also work, but are not guaranteed. Clone the repository then run a pip install on the
requirements.txt file to get the necessary modules installed.
git clone https://github.com/ejmejm/RL-template.git cd ./RL-template pip install -r requirements.txt
* It is recommended that you use Tensorflow with GPU support for faster training times, but the demos can also be run on just a CPUs.
To run the CartPole demo on 4 cores run
mpiexec -n 4 python3 run_cart_pole.py
To run the Breakout demo on 4 cores run
mpiexec -n 4 python3 run_breakout.py
By changing the
-n flag, you can adjust the number of cores used to run the program.
The implementation is meant to be simple and easily understandable while still being maximally efficient. Simulation of environments is done over multiple processes in parallel using mpi4py. To achieve this, the main network is copied to all processes, and weights are synced between all networks after each training epoch. Below is a list of individual files and corresponding functions.
models.py- Implements a
BaseModelclass that can used as a basis to implement most policy value based RL algorithms and sync weights with MPI. Implements a
BaseModeland implements vanilla policy gradient in a 1D environment with a discrete action-space. Implements a
BaseModeland implements vanilla policy gradient in a 2D environment with a discrete action-space.
utils.py- Defines utility functions for logging, filtering 2D observations, and calculating GAEs + dicounted rewards.
run_breakout.py- Both implement worker functions that define how an environment is simulated. They then run a main loop that consists of running environments in parallel, gathering + formatting the data, using the data to train the main network, and then propagating the weights from the main network to all copied networks.
- Edan Meyer - GitHub Profile
This project is licensed under the MIT License - see the LICENSE.md file for details