The idea is to have images taken at a set interval, which are then described using an AI model, and read back to the user using voice synthesis. Since

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2025-01-04 11:00:04

The idea is to have images taken at a set interval, which are then described using an AI model, and read back to the user using voice synthesis.

Since I was going for low cost (<30$), and wanted to learn more about software development on arduino, I bought a ESP32-CAM with built-in WiFi to capture the images.

To describe the image I selected the gpt-4o-mini model. I didn't think much about which model to use, but this seemed like a good start.

One driving force for creating this tool is how expensive the alternatives are. However alternatives seem to be emerging at the moment so there is hope.

Use a cell phone with internet sharing enabled that the ESP32-CAM connects to. Then software on the ESP32-CAM uploads images to a HTTP server, which then provides the image the OpenAI API which returns the description of the image. The description is updated on a page in the HTTP server which the cellphone has open.

The source code for the ESP32-CAM is located in the "esp32" folder. Open it using the Arduino IDE. Update the WiFi credentials and target HTTP server before flashing. Some links that may help with installing the ESP32-CAM drivers.

Leave a Comment