I cover everything, from setting basic headers to proxying requests, including extra stealth evasions, rotating your user agents and more. I've b

Stealthly Browsing and Scraping with Ferrum

submited by

Style Pass

2025-01-21 11:30:02

I cover everything, from setting basic headers to proxying requests, including extra stealth evasions, rotating your user agents and more.

I've been using Ferrum to do a lot of web-scraping lately (I'm building a link checker tool), and I wanted to share some tips and best-practices I've stumbled on developed.

Ferrum is a headless browser driver, similar to Playwright and Puppeteer, which you can use to automatically visit, interact with, and scrape data from websites. Ferrum has been gaining popularity lately in the Ruby on Rails community for being fast and Ruby-native. Lately, I needed to do some web-scraping for Affimon, so I decided to give Ferrum a go.

In this article, I share everything I've learned about stealthy scraping with Ferrum — how to avoid basic blocks, preserve bandwidth, rotate user agents and integrate with proxies. I've also included a few sites in this article's appendix to test your bot detection.

We start by setting a bunch of "normal" looking browser headers, which I copied directly from the Chrome Network tab on my Macbook. You should match these headers as closely as possible to the actual hardware you're scraping on (so consider adjusting the User Agent and hints if you use a different version of Chrome, different OS etc.)