If you remember the  stone age of the internet, there was a tool similar to Google Trends but for finding correlated search patterns. You can see in t

Google Correlate alternative: Similiarity search of Wikipedia Pageview Statistics in Python

submited by
Style Pass
2024-09-16 15:00:07

If you remember the stone age of the internet, there was a tool similar to Google Trends but for finding correlated search patterns. You can see in the video below, how each search term would be correlated. Its mechanism was even able to predict flu trends for a period of time. Google’s white paper explains how it’s done at scale. Alphabet being Alphabet of course shut down the tool in 2019. So let’s build our own with the power of open data.

There is no real alternative since no one has as much search data like Google. But with a bit of creativity you can build your own in a weekend project. The idea builds on the previous post for trend prediction using Wikipedia views to see for example the demand for political parties.

We can think of the time series data (X page, Y clicks per day) as a vector. We want to find the closest vector according to for example ‘Cosine distance’ metric. Python provides simple tooling for this with scikit-learn.

Now we have our dataset we can use similarity search. To understand how nearest neighbors works I can recommend the explanation by statquest:

Leave a Comment