I prepared a lightning talk about Git scraping for the NICAR 2021 data journalism conference. In the talk I explain the idea of running scheduled scra

Simon Willison’s Weblog

submited by
Style Pass
2021-06-21 07:00:08

I prepared a lightning talk about Git scraping for the NICAR 2021 data journalism conference. In the talk I explain the idea of running scheduled scrapers in GitHub Actions, show some examples and then live code a new scraper for the CDC’s vaccination data using the GitHub web interface. Here’s the video.

Here’s the PG&E outage map that I scraped. The trick here is to open the browser developer tools network tab, then order resources by size and see if you can find the JSON resource that contains the most interesting data.

The scraper code itself is here. I wrote about the project in detail in Tracking PG&E outages by scraping to a git repo—my database of outages database is at pge-outages.simonwillison.net and the animation I made of outages over time is attached to this tweet.

Here’s a video animation of PG&E’s outages from October 5th up until just a few minutes ago pic.twitter.com/50K3BrROZR

Leave a Comment