Library to build personalized AI powered by what you've seen, said, or heard. Works with Ollama. Alternative to Rewind.ai. Open. Secure. You own your data. Rust. We are shipping daily, make suggestions, post bugs, give feedback.
Building a reliable stream of audio and screenshot data, where a user simply clicks a button and the script runs in the background 24/7, collecting and extracting data from screen and audio input/output, can be frustrating.
There are numerous use cases that can be built on top of this layer. To simplify life for other developers, we decided to solve this non-trivial problem. It's still in its early stages, but it works end-to-end. We're working on this full-time and would love to hear your feedback and suggestions.
by default screenpipe is using Silero-VAD to identify speech and non-speech tokens to improve audio transcription, bu you can use WebRTC if needed by passing the following command:
by default screenpipe is using whisper-large that runs LOCALLY to get better quality or lower compute you can use cloud model (we use Deepgram) via cloud api: