Paper: | SP-P6.2 | ||
Session: | Feature Analysis for ASR, TTS, and Verification | ||
Time: | Wednesday, May 19, 09:30 - 11:30 | ||
Presentation: | Poster | ||
Topic: | Speech Processing: Speech Analysis | ||
Title: | AN AUTOMATIC PROSODY LABELING SYSTEM USING ANN-BASED SYNTACTIC-PROSODIC MODEL AND GMM-BASED ACOUSTIC-PROSODIC MODEL | ||
Authors: | Ken Chen; University of Illinois at Urbana-Champaign | ||
Mark Hasegawa-Johnson; University of Illinois at Urbana-Champaign | |||
Aaron Cohen; University of Illinois at Urbana-Champaign | |||
Abstract: | Automatic prosody labeling is important for both speech synthesis and automatic speech understanding. Humans use both syntactic cues and acoustic cues to develop their prediction of prosody for a given utterance. This process can be effectively modeled by an ANN-based syntactic-prosodic model that predicts prosody from syntax and a GMM-based acoustic-prosodic model that predicts prosody from acoustic-prosodic observations. Our experiments on Radio News Corpus show that ANN is effective in learning the stochastic mapping from the syntactical representation of word strings to prosody labels, with an accuracy of 82.7% for pitch accent labeling and 90.5% for intonational phrase boundary (IPB) labeling. When acoustic observations and reasonably accurate phoneme transcriptions are given, a GMM-based acoustic-prosodic model, coupled with the syntactial-prosodic model, can achieve 84% pitch accent recognition accuracy and 93% IPB recognition accuracy. These results are obtained using different speakers for training and testing and have considerably exceeded all previously reported results on the same corpus. | ||
Back |
Home -||-
Organizing Committee -||-
Technical Committee -||-
Technical Program -||-
Plenaries
Paper Submission -||-
Special Sessions -||-
ITT -||-
Paper Review -||-
Exhibits -||-
Tutorials
Information -||-
Registration -||-
Travel Insurance -||-
Housing -||-
Workshops