We are thrilled to release Llama-3-8B-Web, the most capable agent built with 🦙 Llama 3 and finetuned for web navigation with dialogue. You can down

Search code, repositories, users, issues, pull requests...

submited by

Style Pass

2024-04-24 05:30:03

We are thrilled to release Llama-3-8B-Web, the most capable agent built with 🦙 Llama 3 and finetuned for web navigation with dialogue. You can download the agent from the 🤗 Huggingface Model Hub.

The model is available on the 🤗 Hugging Face Model Hub as McGill-NLP/Llama-3-8B-Web. The training and evaluation data is available on Huggingface Hub as McGill-NLP/WebLINX.

Our first agent is a finetuned Meta-Llama-3-8B-Instruct model, which was recently released by Meta GenAI team. We have finetuned this model on the WebLINX dataset, which contains over 100K instances of web navigation and dialogue, each collected and verified by expert annotators. We use a 24K curated subset for training the data.

It surpasses GPT-4V (zero-shot *) by over 18% on the WebLINX benchmark, achieving an overall score of 28.8% on the out-of-domain test splits (compared to 10.5% for GPT-4V). It chooses more useful links (34.1% vs 18.9% seg-F1), clicks on more relevant elements (27.1% vs 13.6% IoU) and formulates more aligned responses (37.5% vs 3.1% chr-F1).

We believe short demo videos showing how well an agent performs is NOT enough to judge an agent. Simply put, we do not know if we have a good agent if we do not have good benchmarks. We need to systematically evaluate agents on wide range of tasks, spanning from simple instruction-following web navigation to complex dialogue-guided browsing.