Technical Program

Paper Detail

Paper:SS-2.4
Session:Multi-Sensory Processing for Context-Aware Computing
Time:Tuesday, May 18, 14:00 - 14:20
Presentation: Special Session Lecture
Topic: Special Sessions: Multi-sensory Processing for Context-Aware Computing
Title: AUDIO VISUAL WORD SPOTTING
Authors: Ming Liu; University of Illinois at Urbana-Champaign 
 Ziyou Xiong; University of Illinois at Urbana-Champaign 
 Zhenqiu Zhang; University of Illinois at Urbana-Champaign 
 Thomas S. Huang; University of Illinois at Urbana-Champaign 
 Stephen Chu; IBM T. J. Watson Research Center 
Abstract: The task of word spotting is to detect and verify some specific words embedded in unconstrained speech. Most Hidden Markov Model(HMM)-based word spotters have the same noise robustness problem as a speech recognizer. The performance of a word spotter will drop significantly under noisy environment. Visual speech information has been shown to improve noise robustness of speechrecognizer. In this paper, we combine the visual speech information to improve the noise robustness of the word spotter. In visual frontend processing, the Information-Based Maximum Discrimination(IBMD) algorithm is used to detect the face/mouth corners. In audio-visual fusion, the feature-level fusion is adopted. We compare the audio-visual word-spotter with the audio-only spotter and show the advantage of the former approach over the latter.
 
           Back


Home -||- Organizing Committee -||- Technical Committee -||- Technical Program -||- Plenaries
Paper Submission -||- Special Sessions -||- ITT -||- Paper Review -||- Exhibits -||- Tutorials
Information -||- Registration -||- Travel Insurance -||- Housing -||- Workshops

©2015 Conference Management Services, Inc. -||- email: webmaster@icassp2004.org -||- Last updated Wednesday, April 07, 2004