When working with data a common task is fetching data from some external source on a recurring basis and importing into a database for further analysi

No Cost Data Scraping With GitHub Actions And Neo4j Aura – William Lyon

submited by
Style Pass
2021-07-21 15:30:08

When working with data a common task is fetching data from some external source on a recurring basis and importing into a database for further analysis or as part of our application. Setting up servers to handle this can be time consuming and error prone. I recently came across a workflow using GitHub Actions and Neo4j Aura that makes this a breeze and with the free tiers of both GitHub Actions and Neo4j Aura is free to set up and run forever - great for side projects!

In this post we'll take a look at setting up this workflow to scrape data from the Lobsters news aggregator and import into a Neo4j Aura Free instance using GitHub Actions. We built this on the Neo4j livestream so check out the recording if you prefer video:

Lobsters is a social news aggregator. Users post links to articles and the community votes and comments on them in a discussion thread. An algorithm determines the ranking of the submissions, with the most recent and noteworthy articles floating to the front page. To be able to submit and comment users must be invited by an existing user and this user-invite graph is publicly available, helping to keep discussions civil and avoid fraudulent upvoting to game submission rankings.

I've been thinking about social networks recently and want to build an application to help explore relevant news articles using graph visualization so building something using data from Lobsters seems like a great fit. We want to import data about users and article submissions into Neo4j as the basis of our application, but how to get started? Fortunately, Lobsters makes two JSON endpoints available to fetch data about the newest and the "hottest" submissions. Each has a similar format and looks like this.

Leave a Comment