Trackmania is a video game, and though it can be played without a monitor, or blindfolded, it was designed with the assumption that the players could see what they’re doing. So our first step, now that the nightmares of setup were over and the actual work could begin, was to give our nascent network a way to see the game. The goal throughout this project has been to make a bot that learns as fast as a human does, so we wanted the program to learn from the kind of graphics that human players use, eventually deciding to give our bot an image input with a resolution of 1600x900, at 30 frames per second.
Driving: It is obviously important for the program to know where it is going, so the program needs to decide which way to steer the car, every 33,000 μs. Thus, we need to get the image to the neural network for processing in much less than 33,000 μs, because the program will need time to process the image, make a decision about which keys to press, and send the keypress decision to Trackmania.
Training: The program needs to learn from prior runs, minutes or hours or days from now. If we want to ever generalize and have a program that can learn new maps, the network will need periodic reminders of previous maps, or else risk overfitting to whatever map it happens to be playing right this second. You can think of this as the memory of past races.