Current Setup: DQN and Q-learning are already done with Discrete Control Actions To be introduced: Policy Gradient Algorithm with Continuous Control Actions Major Changes: 1. Add a continuous control environment derived from DQN environment by Change the action space 2. Add a brain to the f1rl module to execute the actions