This was a Deep Learning project for the "NDM-08-02 - Computetional Intelligence - Deep Reinforcement Learning" class of School of Informatics of AUTh.
In this project you will find a Deep Q-Learning Algorithm Implementation and all the code for the experiments I ran.
The requirements.txt file can be used to install all the important Python packages for this project if you want to run the code. Use this command:
pip install requirements.txt
For each experiment there is a .py file with already set parameters.
I use the Farama Foundation Gymnasium Python Package to manage the enviroment of the game.
The Game State is the 210x160 screenshot of the game's current frame. I grayscale, crop, normalize and downscale it to a 84x84 image.
I use PyTorch to define and manage the architectures of the Neural Networks. I used 2 kinds of architectures for this project. I wanted to check if the second one was indeed better for this kind of problem as stated in the scientific papers I read. For both kinds of networks the input layer is the 4 last preprocessed images of the game state while the output is a list of 6 numbers (as many as the option the agent has for the game).
As you can see in the image, I use 2 Convolutional Layers and 1 Fully Connected one before the output layer.
The only difference from the CNN is that I use 2 parallel Fully Connected layers (streams) instead of one. One is a Value Layer and the other is an Advantage Layer. I aggregate those 2 layers on the output layer. This architecture tends to have better results and that's why I choose to use this one after the 2 first expirements where I compared the 2 architectures.
I implemented the Deep Q-Learning Algorithm using 2 Neural Networks, a main and a target one. I'm also using epsilon-greedy policy.
I implemented 2 different kinds of memory to save experiences that will help the agent during the training.
This is just a simple queue where I save recent experiences and I sample a mini-batch when I need some of them. It is used on Expirements 1 and 2.
This is a better memory implementation where I use a smarter way to save and sample experiences using their TD-error. Experiences with higher TD-error tend to be more usefull for the training process, but we also need to use those ones with lower TD-error as well. That's why I use a Sum Tree Data Structure to implement this kind of memory. It is used on Expirements 3 and 4.
If you want to train one of the models I implemented on your own (or maybe you want to edit one of those and try it out), you can edit the code of the .py file to change some parameters.
If you want to test one of the Saved Models simply change the TRAINING boolean to False and change the num_episodes_model
variable to choose the saved model you want.
In saliency_maps.py
file I load a model, I let it play and I plot the Saliency Maps to see which parts of the input are more important for the decision making process of the agent.
Models from all experiments can be found in the Saved Models folder in .pth format. Each model name is as follows:
architecture_algorithm_model_numberOfTrainingEpisodes.pth
For example dueling_dqn_model_200.pth
refers to the saved model of a dueling network architecture that was trained for 200 episodes with the Deep Q-Learning Algorithm.
The English presentation of this project is available in this repo in PDF form. There is a full report in PDF format but currently is only available in Greek.
In the links below you can find 2 videos that showcase how Agents from Experiment 3 and Experiment 4 play one match (3 lives) of the game on Hard Mode.
You can also read my article on Deep Learning on Medium where I use this project to explain it better to non Computer Science people. Currently, it's only available in Greek, but you can use an auto-translation tool to read it.
[1] Mark Towers et al. “Gymnasium”. In: (Mar. 2023).
[2] M. G. Bellemare et al. “The Arcade Learning Environment: An Evaluation Platform for General Agents”. In: Journal of Artificial Intelligence Research 47 (June 2013), pp. 253–279.
[3] Ziyun Wang et al. “Dueling Network Architectures for Deep Reinforcement Learning”. In: International Conference on Machine Learning. 2015. Url: https://api.semanticscholar.org/CorpusID:5389801
[4] H. V. Hasselt, Arthur Guez, and David Silver. “Deep Reinforcement Learning with Double Q-Learning”. In: AAAI Conference on Artificial Intelligence. 2015. url: https://api.semanticscholar.org/CorpusID:5389801