A machine learning librarian at Hugging Face just released a dataset composed of one million Bluesky posts, complete with when they were posted and who posted them, intended for machine learning research.
First dataset for the new @huggingface.bsky.social @bsky.app community organisation: one-million-bluesky-posts π¦ π 1M public posts from Bluesky's firehose API π Includes text, metadata, and language predictions π¬ Perfect to experiment with using ML for Bluesky π€ huggingface.co/datasets/blu...
βThis dataset contains 1 million public posts collected from Bluesky Social's firehose API, intended for machine learning research and experimentation with social media data,β the dataset description says. βEach post contains text content, metadata, and information about media attachments and reply relationships.β