Communities for your favorite technologies.  Explore all Collectives

How do I determine if a random string sounds like English?

submited by
Style Pass
2024-09-23 15:30:37

Communities for your favorite technologies. Explore all Collectives

Ask questions, find answers and collaborate at work with Stack Overflow for Teams. Explore Teams

I have an algorithm that generates strings based on a list of input words. How do I separate only the strings that sounds like English words? ie. discard RDLO while keeping LORD.

EDIT: To clarify, they do not need to be actual words in the dictionary. They just need to sound like English. For example KEAL would be accepted.

In a nutshell: The markov-chain stores for each character the probabilities of which next character will follow. You can extend this idea to two or three characters if you have enough memory.

You could approach this by tokenizing a candidate string into bigrams—pairs of adjascent letters—and checking each bigram against a table of English bigram frequencies.

Leave a Comment