This is best-in-class performance of all WebAgents, giving advanced closed-source web agents like Google Mariner a run for its money
Achieving this SOTA result required expanding Skyvern’s original architecture from a single actor prompt to a planner-actor-validator agent loop.
This approach was a good starting point, but scored ~45% on the WebVoyager benchmark. It was really well suited to simple single-objective goals, and could handle complex objectives if the website provided enough feedback to the client.
❌ If you asked Skyvern to "Go to Amazon.com and add an iPhone 16, a screen protector, and a case to cart", it would add an iPhone 16 to cart, and use visual feedback to determine what to do next. You would sometimes end up with 3 iPhone 16s in the cart, or 1 iPhone 16 and 2 screen protectors.
To solve this problem, we added a “Planner” phase which could decompose very complex objectives down into smaller achievable goals