WellSaid Labs, a leading artificial intelligence (AI) voice company, unveiled a new technology today that allows users to direct the performance of AI voices in a more natural, nuanced way. The technology, called HINTS (Highly Intuitive Naturally Tailored Speech), enables content creators to shape AI voices by adding contextual annotations, like tempo or loudness adjustments, just like a movie director.
“We have long heard from our customers that they would like to have more direction in shaping our AI’s vocal outputs,” Michael Petrochuk, co-founder and CTO of WellSaid Labs, said in an exclusive interview with VentureBeat. “We wanted to develop a system that is intuitive and natural, that allows our model to predict natural performances based on the users’ production context, so that creatives can better see their artistic vision through.”
Unlike current methods of controlling AI voices through rigid markup languages or prompts, HINTS allows for fine-grained and interpolable adjustments. For example, users can make a specific passage slower by precisely 0.7x or louder by 5 dB, with the AI voice responding naturally. The contextual awareness means annotations can be nested and layered across long scripts.