WebMarker is a JavaScript library used for adding visual markers and labels to elements on a web page. This can be used for Set-of-Mark (opens in a new tab) prompting, which improves visual grounding abilities of vision-language models such as GPT-4o (opens in a new tab) , Claude 3.5 (opens in a new tab) , and Google Gemini 1.5 (opens in a new tab) . This library aims to:
This marks the interactive elements on the page, and returns an object containing the marked elements, where each key is a mark label string, and each value is an object with the following properties:
A CSS style to apply to the bounding box element. You can also specify a function that returns a CSS style object. Bounding boxes are only shown if showBoundingBoxes is true.