Training a Simple Transformer Neural Net on Conway's Game of Life

submited by
Style Pass
2024-07-07 22:30:06

The pattern that emerges is the model learning to attend to just the 8 neighbours of each cell. The attention of the model becomes nearly equivalent to a 3x3 average pool, as is used in convolutional neural networks, although unlike an average pool, it excludes the middle cell from the average. It is vastly more efficient to directly use an average pool, rather than an attention layer, but it’s interesting to show that the attention layer can learn to approximate it. (We found that average pooling does also work, even with the middle cell included.)

Which means the model will take a life_grid as input, and the output will be the state of the grid in the next step, next_life_grid.

In order to train our model, we show it many examples of (life_grid, next_life_grid) pairs. We can generate a practically limitless amount of these, by randomly initialising grids and running the Game of Life on them. The following plot shows some examples, where each row represents a pair.

Our model uses embeddings to represent a Life grid as a set of tokens, with one token per grid cell. These tokens then go through single-head attention, a hidden layer, and a classifier head, which classifies each token/grid cell as dead or alive in the next step.

Leave a Comment