Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
| Paper: | SP-L11.5 |
| Session: | Language Modeling and Search |
| Time: | Friday, May 21, 16:50 - 17:10 |
| Presentation: |
Lecture |
| Topic: |
Speech Processing: Language Modeling |
| Title: |
CROSS-LINGUAL LATENT SEMANTIC ANALYSIS FOR LANGUAGE MODELING |
| Authors: |
Woosung Kim; Johns Hopkins University | | |
| | Sanjeev Khudanpur; Johns Hopkins University | | |
| Abstract: |
Statistical language model estimation requires large amounts of domain-specifictext, which is difficult to obtain in many languages. We propose techniqueswhich exploit domain-specific text in a resource-rich language to adapt alanguage model in a resource-deficient language. A primary advantage of ourtechnique is that in the process of cross-lingual language model adaptation, wedo not rely on the availability of any machine translation capability.Instead, we assume that only a modest-sized collection of story-aligneddocument-pairs in the two languages is available. We use ideas fromcross-lingual latent semantic analysis to develop a single low-dimensionalrepresentation shared by words and documents in both languages, which enablesus to (i) find documents in the resource-rich language pertaining to a specificstory in the resource-deficient language, and (ii) extract statistics from thepertinent documents to adapt a language model to the story of interest. Wedemonstrate significant reductions in perplexity and error rates in a Mandarinspeech recognition task using this technique. |
| |
| Back | |