We were invited to speak at OpenAI DevDay Singapore today ( video), and as part of our talk we worked on building a coding voice AI agent ( demo). Since release at DevDay SF, we’ve been building lots of ideas with the Realtime API, and benefited greatly from the experience and advice of Kwindla Hultman Kramer, cofounder of Daily.co, which has been in the realtime voice business since before it was cool.
So today as part of our talk, we are releasing this guest post, which has been vetted by OpenAI staff, actively discussing his learnings building Pipecat, the open source project started by Daily, which has now become a full vendor-neutral Realtime API framework with more non-Daily users than there are Daily (including us!).
But first we wanted to share a couple tips that -we- have learned working with the raw Realtime API (no frameworks, no external dependencies) especially in prepping for our talk at DevDay Singapore. The standard OpenAI reference application comes with a lot of batteries included, and so we stripped as much of it out as possible while still focusing on VAD and function calling, creating a `simple-realtime-console` demo.
Practically speaking, Voice Activity Detection (VAD) is still sometimes buggy, and most times you will want to demo voice applications in imperfect environments (it’s rare to actually be in a quiet room). Hence we recommend always having “mute” and “force reply” buttons as we show in the demo. This demo also shows simple patterns for adding and inserting memory and displaying transcripts of both sides.