A machine learning librarian at Hugging Face just released a dataset composed of one million Bluesky posts, complete with when they were posted and wh

Someone Made a Dataset of One Million Bluesky Posts for 'Machine Learning Research'

submited by
Style Pass
2024-11-27 07:00:06

A machine learning librarian at Hugging Face just released a dataset composed of one million Bluesky posts, complete with when they were posted and who posted them, intended for machine learning research.

First dataset for the new @huggingface.bsky.social @bsky.app community organisation: one-million-bluesky-posts πŸ¦‹ πŸ“Š 1M public posts from Bluesky's firehose API πŸ” Includes text, metadata, and language predictions πŸ”¬ Perfect to experiment with using ML for Bluesky πŸ€— huggingface.co/datasets/blu...

β€œThis dataset contains 1 million public posts collected from Bluesky Social's firehose API, intended for machine learning research and experimentation with social media data,” the dataset description says. β€œEach post contains text content, metadata, and information about media attachments and reply relationships.” 

Leave a Comment