Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
| Paper: | SP-P11.2 |
| Session: | Topics in Large Vocabulary Continuous Speech Recognition |
| Time: | Thursday, May 20, 09:30 - 11:30 |
| Presentation: |
Poster |
| Topic: |
Speech Processing: Large Vocabulary Recognition/Search |
| Title: |
ADVANCES IN UNSUPERVISED AUDIO SEGMENTATION FOR THE BROADCAST NEWS AND NGSW CORPORA |
| Authors: |
Rongqing Huang; University of Colorado, Boulder | | |
| | John H. L. Hansen; University of Colorado, Boulder | | |
| Abstract: |
The problem of unsupervised audio segmentation continues to be a challenging research problem which significantly impacts Automatic Speech Recognition (ASR) and Spoken Document Retrieval (SDR) performance. This paper addresses novel advances in audiosegmentation for unsupervised multi-speaker change detection. First, we investigate new features which are intended to be more appropriate for segmentation that include: PMVDR (Perceptual Minimum Variance Distortionless Response), SZCR ( Smoothed ZeroCrossing Rate), and FBLC (FilterBank Log Coefficients); next we consider a new distance metric, T2-mean which is intended to improve segmentation for short segments (<5s). A novel false alarm compensation procedure is also developed and used after thesegmentation phase. We establish a more effective evaluation procedure for segmentation versus the more traditional EER and Frame Accuracy approaches. Employing these advances within our new scheme, results in more than a 30% improvement in segmentation performance using the 3-hour Hub4 Broadcast news 1997 evaluation data. Evaluations are also presented for audio from the NGSW corpus. |
| |
| Back | |