In this post we introduce the fundamental ideas underlying Markov Chains and how PageRank leverages them to obtain a measure of the relative importance of each node that is capable of providing an important new view on the structure of our network and the role that each node plays.
We hope you enjoy today’s post. We’re looking forward to your comments, feedback and suggestions for follow ups. And of course, don’t forget to:
We introduced the concept of a Random Walk in a previous post when we discussed Language Generation. The idea is a simple one: image you live on the nodes of a Graph and that at each time step you must follow a randomly chosen outgoing edges of your current node.
In this simple case, your next step depends only on your current position on the graph and not on other factors like how many times you’ve visited this how, how long since your last visits, etc. This is perhaps the simplest example of a class of Memoryless processes, where the next step depends only on your current location and not on any previous history, known as a Markov Chain.
Markov Chains are a well know case of a stochastic process, first studied by Andrey Markov in the later half of the 19th century. You can represent a Markov Chain as a matrix where each element pij represents the probability of moving from node i to node j. As a consequence, the sum of the elements of each column is exactly one (by the definition of probability):