I scraped over 500 real estate projects from private portals. Here’s an overview of the technical challenges encountered and how this data can turn

I scraped 500+ real estate projects from private portals

submited by
Style Pass
2024-10-01 10:00:08

I scraped over 500 real estate projects from private portals. Here’s an overview of the technical challenges encountered and how this data can turn into business opportunities for my client.

The first challenge was getting access to these portals. Luckily, with a few good contacts, we managed to get accounts for 6 major French real estate developers’ portals without too much trouble.

Next, we had to deal with the tech behind these portals. Old systems, no public API available, so I had to write some "homemade" scripts, but that’s also what makes scraping quite interesting.

On top of that, the sites were often slow and would go "offline" at certain times, like nights and weekends. It’s a bit frustrating when you want to make progress and everything is at a standstill.

I always prefer direct HTTP requests over using a headless browser for scraping. Why? Simply because headless browsers (like Puppeteer or Selenium) are very resource-intensive. They fully simulate a real user’s navigation, which is useful but often overkill for most scraping tasks. In this case, I managed to do everything with HTTP requests, avoiding the heaviness of a headless browser.

Leave a Comment