Technical Program

Paper Detail

Paper:SP-P14.9
Session:Acoustic Modeling: Tone, Prosody, and Features
Time:Thursday, May 20, 15:30 - 17:30
Presentation: Poster
Topic: Speech Processing: Acoustic Modeling for Speech Recognition
Title: PARSING SPEECH INTO ARTICULATORY EVENTS
Authors: Kadri Hacioglu; University of Colorado, Boulder 
 Bryan Pellom; University of Colorado, Boulder 
 Wayne Ward; University of Colorado, Boulder 
Abstract: In this paper, the state of speech production is defined by a number of categorical articulatory features. We describe a detector that outputs a stream (sequence of classes) for each articulatory feature given the Mel frequency cepstral coefficients (MFCCs) representation of the input speech. The detector consists of a bank of recurrent neural network (RNN) classifiers, a dynamic N-best lattice generator and the Viterbi decoder. A bank of classifiers has been previously used for the articulatory feature detection by many researchers. We extend their work first by creating dynamic N-best lattices for each feature and then by combining them into product lattices for rescoring using the Viterbi algorithm. During the rescoring we incorporate language and duration constraints along with the posterior probabilities of classes provided by the RNN classifiers. We present our results using the TIMIT data for place and manner features, and compare the results to a baseline system. We report performance improvements both at the frame and segment levels.
 
           Back


Home -||- Organizing Committee -||- Technical Committee -||- Technical Program -||- Plenaries
Paper Submission -||- Special Sessions -||- ITT -||- Paper Review -||- Exhibits -||- Tutorials
Information -||- Registration -||- Travel Insurance -||- Housing -||- Workshops

©2015 Conference Management Services, Inc. -||- email: webmaster@icassp2004.org -||- Last updated Wednesday, April 07, 2004