Reinforcement Learning Bomberman
For the final project of my Artificial Intelligence for Robotics class, I worked in a team of 3 to code an AI capable of playing Bomberman in 5 different variants. The 5 variants are as follows:
Variant 1: No monsters
Variant 2: 1 stupid monster, move randomly and not aware of the character
Variant 3: 1 self-preserving monster that avoids explosions and attacks the character when they're within a 1-block distance
Variant 4: 1 aggressive monster, same as variant 3 but the detection range is increased to 2-block
Variant 5: 1 stupid monster and 1 aggressive monster
Approximate Q-Learning
We experimented with various techniques such as minimax, expectimax, A*, etc. I was responsible for the final implementation of approximate Q-Learning. Approximate Q-Learning proved to be the most effective, meeting a satisfactory survival rate that was required.
The features that I used for the algorithm include:
Euclidean distance between the character and the monsters.
Shortest A* distance between the character and the exit.
If no A* path exists due to the walls, the euclidean distance is used instead.
We ran this algorithm on each variant 100 times with different seeds while updating the weight values after each run to get the final optimized weights.
Variant 1
The easiest variant, 100% survival rate
Variant 3
Bit harder with the self-preserving monster, 90% survival rate
Variant 5
The hardest variant with 2 monsters, 80% survival rate
Future Improvements
Due to time constraints, there are places we haven't experimented with that'd improve the performance. Some improvements I'd like to implement in the future include:
By default, the character only places a bomb when it's next to a wall. The character can be improved to also place a bomb when it gets close to a monster.
After placing a bomb, the character runs away in a random diagonal direction. This can be changed so that they'll move in the direction furthest away from the monsters instead.