Hi! I’m @naklecha & I love learning through examples and jumping right into things. It works well for me, it’s fun and imo it’s the best way to learn anything. So, that’s what I’m going to do. :) Fuck it! Let’s start by trying to solve chess.
In the above board state, white is winning because white has an extra rook! But let's look at it from a computer's pov. This is what a computer would know without calculations or lookahead:
It doesn’t matter if the player is 1 move away from the win, 30 moves away or losing in 3 moves. Without additional calculations, the computer has no idea if the position is good and by how much. Let’s try and calculate how good the position is.
Now, for any given chess board let’s say we have a magical function that can give us a True or False boolean output if the position is winning or losing. Let’s call this magical function (in reinforcement learning it’s called the value function) - V(s) , this function takes in a state of the board and returns True or False if the position is winning or not.
Okay, but realistically can this function ever exist for the game of chess? The branching factor in chess (the number of possible moves at each turn) is around 30-35 on average. After just 5 moves, you're already looking at millions of positions. Now think of a 20-30 move game. We are looking at roughly 2^50 possible moves. That’s fucked up.