Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
| Paper: | SP-L8.5 |
| Session: | Acoustic Modeling: New Search Features and Supervised Training |
| Time: | Friday, May 21, 10:50 - 11:10 |
| Presentation: |
Lecture |
| Topic: |
Speech Processing: Acoustic Modeling for Speech Recognition |
| Title: |
LIGHT SUPERVISION IN ACOUSTIC MODEL TRAINING |
| Authors: |
Long Nguyen; BBN Technologies | | |
| | Bing Xiang; BBN Technologies | | |
| Abstract: |
In this paper, we present a new light supervision method to automatically derive additional acoustic training data for broadcast news transcription systems. In this method, a subset of the TDT corpus, which consists of broadcast audio with corresponding closed-caption (CC) transcripts, is identified by aligning the CC transcripts and the hypotheses generated by lightly-supervised decoding. Phrases of three or more contiguous words, that both the CC transcripts and the decoder's hypotheses agree, are selected. The selection yields 702 hours, or 72% of the captioned data. When adding 700 hours of selected data to the baseline 141-hour broadcast news training data set, we achieved a 13% relative word error rate reduction. The key to the effectiveness of this light supervision method is the use of a biased language model (LM) in the lightly supervised decoding. The biased LM, in which the CC transcripts are added with a heavy weight, helps in selecting words the recognizer could have misrecognized if using a fair LM. |
| |
| Back | |