Microsoft Research Spawns a New
Era in Speech Technology:

Simpler, Faster, and Easier Speech
Application Development

By Don Barker

Just as the Kokanee salmon steadfastly progress up the Wallowa River to spawn, Microsoft is equally determined to move speech technology into the mainstream, making it a widespread industry. Kokanee is the codename for a major research and development effort at Microsoft designed to grow the speech industry. Its focus, the delivery of the .Net Speech Platform, will make speech-enabled application development and deployment simpler, faster, and easier. If Kokanee succeeds in stimulating the creation of killer speech applications, we may soon find ourselves talking with computers on a more frequent basis and actually enjoying the experience.
Before delving into the details of Kokanee, it is helpful to quickly review the origins of speech recognition and its history at Microsoft. The former lays the groundwork necessary for understanding the technologies underlying this ambitious initiative, while the latter reveals how the key players arrived at Microsoft and their quintessential roles in shaping the field’s future.

Brief History of Speech Recognition
The first machine to recognize speech was likely a commercial toy dog called Radio Rex, manufactured in 1920. Designed to respond to its name, Rex actually responded to almost any sound with sufficient 500-Hz energy. Rex’s inability to detect out-of-vocabulary sounds foreshadowed a problem that would plague speech recognizers (speech recognition engines) to this day.
Although the 1930s and 1940s saw some advances in speech recognition, in the areas of vocoding (voice compression) and speech analysis, the first computer system came from Bell Labs in the 1950s. This speech recognition system may have been the first actual “word” recognizer. It was able to discern digits (numbers) spoken by a single speaker, with long isolated (discrete) pauses.

       In general, the speech recognizers developed in the 1950s relied on acoustic input for identifying a few words, syllables, vowels, or digits. However, in 1959, the College of London (www.rdg.ac.uk/EPU) demonstrated the use of a linguistic unit to predict the probability of the next linguistic unit, a concept that one day would lead to major breakthroughs in speech research.
       In the 1960s, automatic speech recognition systems made minor improvements in vocabulary and accuracy. However, the focus was on acoustic models, with little attention given to speech understanding. Consequently, accuracy and vocabularies remained quite limited, plus systems required discreetly spoken words (i.e., one at a time, with pauses in between).
       In the tumultuous spring of 1968, the movie classic 2001: A Space Odyssey gave voice to the calm but psychologically unbalanced HAL 9000 computer. For the public, HAL set exceedingly high expectations for speech recognition and understanding (and speech synthesis). The smooth talking HAL sounded entirely human in it conversations with crewmembers — although HAL’s voice synthesis resembled a computerized version of Mr. Rogers.

Government Funds Speech Research
In 1971, with speech technology at the forefront of public consciousness, the U.S. Department of Defense’s Advanced Research Projects Agency (ARPA—later renamed DARPA) funded a five-year study, dubbed the Speech Understanding Research (SUR) project (www.nvrc.org/Conferences %20and%20Workshops/ 1999/auto_speech recognition_systems.htm) Its goal was achieving a breakthrough in continuous (connected) speech recognition. The SUR project Advisory Board, which included Allen Newell (a founding father of artificial intelligence), specified that the systems should recognize normally spoken English in a quiet room, with a restricted vocabulary


To Page 17	16.6 Table of Contents	Top of Page	To Page 19

16.6

PC AI Magazine - PO Box 30130 Phoenix, AZ 85046 - Voice: 602.971.1869 Fax: 602.971.2321

e-mail: info@pcai.com - Comments? webmaster@pcai.com

Microsoft Research Spawns a New Era in Speech Technology:

Simpler, Faster, and Easier Speech Application Development

Microsoft Research Spawns a New
Era in Speech Technology:

Simpler, Faster, and Easier Speech
Application Development