It then repeats the same thing for datasette.io and commits the results. The code lives in this scheduled GitHub Actions file: https://github.com/simo

Scraping Reddit and writing data to the Datasette write API

submited by
Style Pass
2023-03-17 22:00:05

It then repeats the same thing for datasette.io and commits the results. The code lives in this scheduled GitHub Actions file: https://github.com/simonw/scrape-reddit-by-domain/blob/main/.github/workflows/scrape.yml

I created myself a Datasette signed API token with full permissions that would expire after five minutes using the /-/create-token interface.

This uses jq to extract the bits of the JSON I care about and reformat them into a smaller set of colunms. It constructs a JSON document that matches that expected by the /-/create API, documented here.

It pipes the resulting JSON to curl and makes an authenticated POST request to the /-/create API. This created the reddit_posts table and populated it with the initial data.

I used the /-/create-token interface again but this time I created a token that would never expire but that only had write and alter permission to the new reddit_posts table I had just created.

This uses the /-/insert API instead, which is a little different from the /-/create API. It doesn't take a table argument, instead expecting the table name to be part of the URL.

Leave a Comment