graph LR;
A[Conv Block] --> B[Conv Block]
B --> C[FF block]
C --> D[Value head]
C --> G[Policy head]
Where:
Conv Block:Conv: 4n in, 4xn out, 5x5 convolution with stride 1, padding 2Activation: SeluMaxPool: 2x2 max poolingDropout: 0.1
FF block:LazyLinearwith output 256, Selu activation and dropoutValue head:Linear: 256, 64 outActivation: SeluDropout: 0.5Linear: 64, 1 outActivation: Tanh
Policy head:Linear: 256, 128 outActivation: SeluLinear: 128, 128 outActivation: SeluLinear: 128, 64 outActivation: SeluDropout: 0.5Linear: 64, 8 out
In order to train the model, the following sequence of steps is applied:
- For each episode do the following:
- Create two agents, randomly choose one to start.
- Play the game until the game is over.
- Record the choices of each player.
- The winner will take positive score whereas the loser will take negative score. Draws result with score of 0.
- Run around 50 episodes in parallel and record the results
- Train the model on the recorded results
- Repeat the process
- To train the model: run the
trainer_main.pyfile. If you want to use the recorded model, use theloadoption to load the saved model. - To test the model in an actual main, use the normal main file. You can use the
loadto use the latest checkpoint of the model