Hacker News is a popular “hacker” news board. One thing I love about HN is that the moderation generally does an excellent job. The site is free of spam and the conversations are usually respectful and meaningful (if pessimistic at times). However, there is always room for improvement, and moderation on Hacker News is no exception.
Notice: on 2017-10-19 this article was updated to incorporate feedback the Hacker News moderators sent to me to clarify some of the points herein. You may view a diff of these changes here.
For some time now, I’ve been scraping the HN API and website to learn how the moderators work, and to gather some interesting statistics about posts there in general. Every 5 minutes, I take a sample of the front page, and every 30 minutes, I sample the top 500 posts (note that HN may return fewer than this number). During each sample, I record the ID, author, title, URL, status (dead/flagged/dupe/alive), score, number of comments, rank, and compute the rank based on HN’s published algorithm. A note is made when the title, URL, or status changes.
The information gathered is publicly available at hn.0x2237.club (sorry about the stupid domain, I just picked one at random). You can search for most posts here going back to 2017-04-14, as well as view recent title and url changes or deleted posts (score>10). Raw data is available as JSON for any post at https://hn.0x2237.club/post/:id/json. Feel free to explore the site later, or its shitty code. For now, let’s dive into what I’ve learned from this data.