Technical Program

Paper Detail

Paper:	MLSP-P3.8
Session:	Speech and Audio Processing
Time:	Wednesday, May 19, 15:30 - 17:30
Presentation:	Poster
Topic:	Machine Learning for Signal Processing: Speech and Audio Processing Applications
Title:	AUDIO-VISUAL GRAPHICAL MODELS FOR SPEECH PROCESSING
Authors:	John Hershey; University of California, San Diego
	Hagai Attias; Microsoft Research
	Nebojša Jojic; Microsoft Research
	Trausti Kristjansson; Microsoft Research
Abstract:	Perceiving sounds in a noisy environment is a challenging problem. Visual lip-reading can provide relevant information but is also challenging because lips are moving and a tracker must deal with a variety of conditions. Typically audio-visual systems have been assembled from individually engineered modules. We propose to fuse audio and video in a probabilistic generative model that implements cross-model self-supervised learning, enabling adaptation to audio-visual data. The video model features a Gaussian mixture model embedded in a linear subspace of a sprite which translates in he video. The system can learn to detect and enhance speech in noise given only a short sequence of audio-visual data. We show some results for speech enhancement, and discuss extensions to the model that are under investigation.

Back

Home -||- Organizing Committee -||- Technical Committee -||- Technical Program -||- Plenaries
Paper Submission -||- Special Sessions -||- ITT -||- Paper Review -||- Exhibits -||- Tutorials
Information -||- Registration -||- Travel Insurance -||- Housing -||- Workshops

©2015 Conference Management Services, Inc. -||- email: webmaster@icassp2004.org -||- Last updated Wednesday, April 07, 2004