Assessing ASR performance with meaning preservation

submited by
Style Pass
2024-07-09 21:00:04

We strive to create an environment conducive to many different types of research across many different time scales and levels of risk.

We report progress on using large language models (LLMs) to assess meaning preservation of ASR transcripts, proposing it as an alternative metric to WER, especially for low-resource scenarios and atypical speech.

Word error rate (WER) and its inverse, word accuracy (WACC), are established metrics for assessing the syntactic accuracy of automatic speech recognition (ASR) models. However, these metrics do not measure a critical aspect of ASR performance: comprehensibility. This limitation is especially pronounced for users with atypical speech patterns where WER often exceeds 20% and can surpass 60% for certain severities. Despite this, individuals with atypical speech patterns can still benefit from ASR models that have relatively high WER, as long as the comprehensibility, or meaning, of their speech is preserved. This is particularly relevant for use cases like live conversations, voice input for text messages, home automation, and other applications that are tolerant of minor grammatical errors. Indeed, these users and use cases stand to benefit most significantly from ASR models that preserve meaning as they can greatly improve communication.

Below are a few examples demonstrating that WACC does not accurately reflect the severity of a transcription error. For several possible error types, we present two examples. While both examples have a similar WACC, the first example has a rather benign error, while the second has a more severe error.

Leave a Comment