I built Speech-to-Note, an innovative web application that combines speech recognition and musical note detection. The application allows users to record audio (either speech or singing) and processes it in two ways:
The application features a modern, responsive UI built with React and TailwindCSS, and a robust backend powered by FastAPI. It's particularly useful for musicians, music teachers, and anyone interested in analyzing the musical properties of their voice or instruments.
AssemblyAI's Universal-2 Speech-to-Text Model was integrated into the application through their Python SDK. The implementation can be found in the upload_audio endpoint of our FastAPI backend:
This creates a unique tool that bridges the gap between spoken word and musical notation, making it valuable for various musical applications, from education to composition.
The project demonstrates how AssemblyAI's technology can be combined with custom audio processing to create innovative applications that go beyond simple speech-to-text conversion.