Can AI agents do scraping without guidance? Yes

submited by
Style Pass
2024-09-24 10:30:04

With agents able to run code safely now, I wanted to test their abilities to scrape data from websites without specifying any input requirements such as elements or classes. The result is shockingly good, considering I spent less than 4 hours doing it.

Not all LLMs will give good results; I tried DeepSeek coder, Llama 3 70B, Mixtral Instruct 22B and GPT-4o. Only GPT-4o was able to run the code successfully in a few attempts. Others struggled to generate the correct code after 5 attempts.

Running code in Docker will require setting up the correct Docker environment for your code. It’s about generating the code and ensuring the environment has the required libraries and dependencies.

I had to tweak the prompt to direct the LLM to the tech requirements. For instance, I asked to use Playwright, and after a few attempts, I noticed Chromium was getting timeout, so I added it to the prompt (always use Firefox).

All this with a single agent, having more agents to spec the requirements, doing QA, analysing output … etc, would go a lot further.

Leave a Comment