Constantly seeking ways to evaluate and benchmark the capabilities of our AI models helps us understand their current performance and drives future de

Claude plays GeoGuessr

submited by
Style Pass
2025-01-07 20:00:06

Constantly seeking ways to evaluate and benchmark the capabilities of our AI models helps us understand their current performance and drives future development and improvements. In this spirit, I recently examined a fun task: How well can Claude play the popular web game GeoGuessr ?

For those who are unfamiliar, GeoGuessr is an online game that challenges players to guess the location of a randomly-selected Google Street View image. Players can pan the camera and navigate along roads to gather more context before placing their guess on a world map. The closer the guess is to the actual location, the higher the score.

Traditionally, GeoGuessr is a test of geographical knowledge, visual perception, and deductive reasoning. But could an AI excel at this task? I decided to find out.

While the full GeoGuessr game allows for camera movement and map-based guessing, I simplified the task for my experiments. I presented Claude with a single static Google Street View image and asked it to directly output its guess as a latitude–longitude coordinate pair. I used imagery from the OpenStreetView-5M dataset, which contains 5.1 million images spanning 225 countries; each image is annotated with latitude–longitude coordinates representing its geolocation. For these experiments, I used the test set containing 210,122 image–coordinate pairs.

Leave a Comment