A dataset of all HN API items from id=0 till id=41422887 (so from 2006 till 02 Sep 2024). The dataset is build by scraping the HN API according to its

Datasets: nixiesearch / hackernews-comments like 0

submited by
Style Pass
2024-10-11 16:00:12

A dataset of all HN API items from id=0 till id=41422887 (so from 2006 till 02 Sep 2024). The dataset is build by scraping the HN API according to its official schema and docs. Scraper code is also available on github: nixiesearch/hnscrape

No cleaning, validation or filtering was performed. The resulting data files are raw JSON API response dumps in zstd-compressed JSONL files. An example payload:

Leave a Comment
Related Posts