RSS
Follow us on:
  • Recommend:
  • 0 Comments

How It Works: Speech Recognition

Speech recognition software is better and less expensive than ever. Find out how your words go from voice to text on the screen.

Speech recognition: a technology that transforms spoken words into alphanumeric text and navigational commands that can be recognized by a PC.

For years, speech recognition has been the poster child for technology that never lived up to its promise. Only three years ago, the products were expensive, inaccurate, and hard to use. That's changing. Fast PCs and ingenious software improvements mean that speech recognition technology finally offers real benefits. And it's appearing in places you might not have expected, including your mobile phone. Want to compose e-mail or surf the Web? All you'll have to do is talk.

Here's what you need to know:

  • You can dictate text into applications and control your desktop with up to 95 percent accuracy.

  • Speech recognition software requires a fast CPU, plenty of RAM, a good microphone, and a good sound card.

  • New developments let you take speech recognition to the Internet and even beyond your PC.

A computer doesn't speak your language, so it must transform your words into something it can understand. A microphone converts your voice into an analog signal and feeds it to your PC's sound card. An analog-to-digital converter takes the signal and converts it to a stream of digital data (ones and zeros). Then the software goes to work.

While each of the leading speech recognition companies has its own proprietary methods, the two primary components of speech recognition are common across products. The first piece, called the acoustic model, analyzes the sounds of your voice and converts them to phonemes, the basic elements of speech. The English language contains approximately 50 phonemes.

Here's how it breaks down your voice: First, the acoustic model removes noise and unneeded information such as changes in volume. Then, using mathematical calculations, it reduces the data to a spectrum of frequencies (the pitches of the sounds), analyzes the data, and converts the words into digital representations of phonemes.

For example, look at this sentence, which has been broken down into phonemes:

Putting It in Context

Now the second major component of speech recognition software, the language model, kicks in. The language model analyzes the content of your speech. It compares the combinations of phonemes to the words in its digital dictionary, a huge database of the most common words in the English language. Most of today's packages come with dictionaries containing about 150,000 words. The language model quickly decides which words you said and displays them on the screen (in theory).

Unfortunately, the English language complicates things. For example, "there," "their," and "they're" all sound the same. A key to the power of today's speech recognition is its use of trigrams, which analyze the context in which a word is used. In many cases, the software can recognize a word by looking at the two words that come before it. If you say, "let's go there," for example, the "let's go" helps the software decide to use "there" instead of "their."

Speech recognition packages also tune themselves to the individual user. The software customizes itself based on your voice, your unique speech patterns, and your accent. To improve dictation accuracy, it creates a supplementary dictionary of the words you use.

Would you recommend this story? YES NO

  • Recommend:
  • 0 Comments

Subscribe to the Daily Downloads Newsletter - every weekday

See All Newsletters »
Lenovo Laptop Deals

Subscribe to the Daily Downloads Newsletter - every weekday

See All Newsletters »
Today's Special Offers