Technical Program

Paper Detail

Session:Acoustic Modeling: New Search Features and Supervised Training
Time:Friday, May 21, 11:10 - 11:30
Presentation: Lecture
Topic: Speech Processing: Acoustic Modeling for Speech Recognition
Authors: Langzhou Chen; LIMSI-CNRS 
 Lori Lamel; LIMSI-CNRS 
 Jean-Luc Gauvain; LIMSI-CNRS 
Abstract: This paper presents some recent work on using consensus networks to improve lightly supervised acoustic model training for the LIMSI Mandarin BN system. Lightly supervised acoustic model training has been attracting growing interest, since it can help to substantially reduce the development costs for speech recognition systems. Compared to supervised training with accurate transcriptions, the key problem in lightly supervised training is getting the approximate transcripts to be as close as possible to manually produced detailed ones, i.e. finding a proper way to provide the information for supervision. Previous work using a language model to provide supervision has been quite successful. This paper extends the original method presenting a new way to get the information needed for supervision during training.Studies are carried out using the TDT4 Mandarin audio corpus andassociated closed-captions. After automatically recognizing thetraining data, the closed-captions are aligned with a consensusnetwork derived from the hypothesized lattices. As is the case withclosed-caption filtering, this method can remove speech segments whoseautomatic transcripts contain errors, but it can also recover errorsin the hypothesis if the information is present in thelattice. Experiment results show that compared with simply training onall of the data, consensus network based lightly supervised acousticmodel training based results in about a small reduction in thecharacter error rate on the DARPA/NIST RT'03 development andevaluation data.

