Paper: | SS-2.4 | ||
Session: | Multi-Sensory Processing for Context-Aware Computing | ||
Time: | Tuesday, May 18, 14:00 - 14:20 | ||
Presentation: | Special Session Lecture | ||
Topic: | Special Sessions: Multi-sensory Processing for Context-Aware Computing | ||
Title: | AUDIO VISUAL WORD SPOTTING | ||
Authors: | Ming Liu; University of Illinois at Urbana-Champaign | ||
Ziyou Xiong; University of Illinois at Urbana-Champaign | |||
Zhenqiu Zhang; University of Illinois at Urbana-Champaign | |||
Thomas S. Huang; University of Illinois at Urbana-Champaign | |||
Stephen Chu; IBM T. J. Watson Research Center | |||
Abstract: | The task of word spotting is to detect and verify some specific words embedded in unconstrained speech. Most Hidden Markov Model(HMM)-based word spotters have the same noise robustness problem as a speech recognizer. The performance of a word spotter will drop significantly under noisy environment. Visual speech information has been shown to improve noise robustness of speechrecognizer. In this paper, we combine the visual speech information to improve the noise robustness of the word spotter. In visual frontend processing, the Information-Based Maximum Discrimination(IBMD) algorithm is used to detect the face/mouth corners. In audio-visual fusion, the feature-level fusion is adopted. We compare the audio-visual word-spotter with the audio-only spotter and show the advantage of the former approach over the latter. | ||
Back |
Home -||-
Organizing Committee -||-
Technical Committee -||-
Technical Program -||-
Plenaries
Paper Submission -||-
Special Sessions -||-
ITT -||-
Paper Review -||-
Exhibits -||-
Tutorials
Information -||-
Registration -||-
Travel Insurance -||-
Housing -||-
Workshops