John Battelle’s Searchblog points to a multimedia search company Nexidia that has an innovative idea about how to search video rather than using text transcripts - indexing the phonetic bits of audio that goes along with video footage.
As archives of digital audio expand, and people need to find specific information within those archives, it becomes clear that a highly efficient method of searching recorded media is required. The metadata that currently tags audio information (such as title, date of recording, subject, or person) is not sufficient for the accurate and rapid retrieval of specifically requested data.
There are compelling reasons why using the Phonetic Search Engine is preferable to using speech-to-text searches. The PSE has a completely open vocabulary. No base lexicon is required. In contrast, the speech-to-text method must map all words into lexicon entries. For example, if a word is not in the dictionary, the speech-to-text solution will not find it in the audio
Another advantage of the PSE is that accuracy is not compromised for speed. Speech-to-text must limit its search and must make hard decisions about word bindings - else searches are too slow and unpredictable. This is why speech-to-text lexicons are never large enough and seldom contain enough key search terms, which are often proper names or unusual phrases.














