All these methods accept natural language prompt as param. Obviously, the cost of script maintenance will be greatly decreased.
To quickly experience the main features of Midscene, you can use the Midscene Chrome extension. It allows you to use Midscene on any webpage without writing any code.
Maintaining automation scripts by Midscene could be a brand new experience. For example, to search for headphones on a website, you can do this:
Midscene provides a visual report after each run. With this report, you can review the animated replay and view the details of each step in the process. What's more, there is a playground in the report file for you to adjust your prompt without re-running all your scripts.
Currently, the model we are using by default is the OpenAI GPT-4o model, while you can customize it to a different multimodal model like Gemini, Qwen, etc if needed.
All data gathered from pages will be sent directly to OpenAI or the custom model provider according to your configuration. Therefore, no third-party platform will access the data.