Tutorial 7: Speech Recognition in Noisy Environments on Mobile Devices

Instructors

Yifan Gong; Texas Instruments

Time & Location

Monday Afternoon, May 17, 13:30 - 16:30, Location: Jolliet

Abstract

One of the most technically challenging issues in speech recognition is to handle additive (from background noises) and convolutive (from e.g. microphone and data acquisition line) noises. In typical acoustic environments for mobile devices, such as in an automobile with a hands-free microphone, such noises may render the speech recognition on the device unusable unless these noises are properly compensated for. Over the past several years, robust speech recognition has been the most popular topic in the speech processing area of the ICASSP. As other breakthrough concepts in speech recognition technology, such as hidden Markov modeling, context-dependent modeling and speaker adaptation, recent speech recognition techniques that compensate the noises substantially enhance speech recognition performance. This tutorial provides attendees with an understanding of modern techniques that compensate for the two types of noises in automatic speech recognition.

The tutorial is designed to provide the attendees with a highlight of important techniques, along with the relationship among them, that have been proven in practical applications to be effective in improving recognition performance under additive and convolutive noises. We first characterize the acoustic conditions for speech recognizers on mobile devices and identify major sources of performance degradation as additive and convolutive distortions. We will discuss why, how and to what degree speech recognition performance degrades when these two types of distortion corrupt speech signal simultaneously. We will then show that most techniques developed for speaker adaptation are generally not adequate for the compensation of additive and convolutive noises, and more dedicated modeling of the noises is needed. Techniques to recognize speech in noisy conditions will then be introduced under two categories: A. modeling the effects of distortion and B. modeling the sources of distortion. The first category aims at the joint effect of the distortion sources, and includes techniques for adaptation to noisy environment and training with noisy speech data. The second category aims at the physical law that causes the distortions, and covers techniques for noise compensation in the observation feature space or in the recognizer's speech model set. Experimental evaluations of these techniques will be presented and commented. A comparison of feature-based and model-based approaches will be given. Finally, a speech recognizer designed for mobile devices will be outlined in terms of architecture, computation platform and resource requirement. A review of the fundamentals of speech recognition and principles of some parameter estimation algorithms will also be provided. Some background in signal processing and statistical modeling is assumed. A copy of the presentation materials (about 120 slides) will be made available to the attendees.

Presenter Information

Yifan Gong received his B. Sc. degree from the Department of Communication Engineering, South-East University (China), M.Sc. degree in Electrical Engineering and Instrumentation from the Department of Electronics, University of Paris (France), and Ph.D. degree (with highest honor) in Computer Science from the Department of Mathematics and Computer Science, University of Henri Poincaré (France).

As Associate Lecturer, he taught Computer Programming and Digital Signal Processing at the Department of Computer Science of the University of Henri Poincaré. He served the National Scientific Research Center (CNRS, France) and INRIA-Lorraine as Research Engineer and then joined CNRS as Senior Research Scientist. He also worked as Visiting Research Fellow at the Communications Research Center of Canada.

Dr. Gong joined Texas Instruments in 1996. He is currently a Senior Member of Technical Staff at the Speech Technologies Lab of TI's DSP Solutions R&D Center. At Texas Instruments he has developed speech and speaker recognition technologies robust against noisy environments, designed systems, algorithms and software for speech and speaker recognition, and delivered memory and CPU efficient speech recognizers for mobile devices. His research interests include mathematical models, software tools and systems for signal processing, speech and speaker recognition, speech recognition in noisy conditions, and pattern recognition.

Dr. Gong has authored more than one hundred publications in journals, IEEE transactions, books, and conferences, and has been awarded nine patents. He is a Senior Member of the IEEE since 1993, and has served the IEEE Signal Processing Society Speech Technical Committee during 1998-2002. He is an Associate Editor of the Pattern Recognition journal. Dr. Gong has been selected to give tutorials and other invited presentations in international conferences. He has been serving as member of technical committee and session chair for many international conferences.

Home -||- Organizing Committee -||- Technical Committee -||- Technical Program -||- Plenaries
Paper Submission -||- Special Sessions -||- ITT -||- Paper Review -||- Exhibits -||- Tutorials
Information -||- Registration -||- Travel Insurance -||- Housing -||- Workshops