YARS is a Python package designed to simplify the process of scraping Reddit for posts, comments, user data, and other media. The package also includes utility functions. It is built using Python and relies on the requests module for fetching data from Reddit’s public API. The scraper uses simple .json requests, avoiding the need for official Reddit API keys, making it lightweight and easy to use.
Use with rotating proxies, or Reddit might gift you with an IP ban. I could extract max 2552 posts at once from 'r/all' using this. Here is a 7.1 MB JSON file containing the top 100 posts from 'r/nosleep', which included post titles, body text, all comments and their replies, post scores, time of upload etc.
The search_reddit method allows you to search Reddit using a query string. Here, we search for posts containing "OpenAI" and limit the results to 3 posts. The display_results function is used to present the results in a formatted way.
Next, we scrape details of a specific Reddit post by passing its permalink. If the post details are successfully retrieved, they are displayed using display_results. Otherwise, an error message is printed.