Data on web pages don't always pre-exist on the HTML. Instead, they often require steps before they are rendered using browser actions. One of such

How to Scrape Forms

submited by
Style Pass
2024-05-10 15:30:36

Data on web pages don't always pre-exist on the HTML. Instead, they often require steps before they are rendered using browser actions. One of such action is forms, which render data behind clicks or entering credentials.

Note that httpx can be replaced with any other HTTP client, such as requests. As for parsel, another alternative is BeautifulSoup.

Both of the above form scraping methods are similar, except for the way they handle the required request configuration. The first approach is ideal for requests that have static request configuration. However, in most cases, these values expire over time, making the second approach more suitable.

To scrape the form of our target web page, we'll replicate the above two requests within our script. The easiest way to replicate them is by copying as cURL or importing them to Postman:

In the above code, we start by initiating an httpx client with the request headers and use it to send the form POST request for the auth token cookie retrieval. Then, we utilize the obtained auth token while requesting the target web page itself.

Leave a Comment