Tutorial 4: Hidden Markov Models in Multimedia Signal Processing and Multimodal Human Machine Communication

Instructors

Gerhard Rigoll; Munich University of Technology

Time & Location

Monday Morning, May 17, 9:00 - 12:00, Location: Duluth

Abstract

Hidden Markov Models (HMMs) have emerged during the last 20 years as probably the most powerful paradigm for processing of dynamic patterns, such as time series, speech signals, and other pattern sequences. Especially in speech recognition, HMMs became the dominating technology. However, in multimedia signal processing applications, involving mostly image processing and computer vision problems, HMMs are still far less often used. On the other hand, for the ICASSP community, the area of multimedia signal processing became more and more important during recent years, and is now one of the most important topics in the ICASSP call for papers. The goal of this tutorial is therefore not to explain HMMs in speech recognition, but to:

demonstrate the suitability of HMMs for a large variety of problems in Multimedia Signal Processing and Human-Computer-Interaction, involving mostly image processing and computer vision applications, including both, dynamic and static patterns and the fusion of different modalities
introduce the participants to the theory and foundations of HMMs
show the close relation between HMMs and other popular statistical pattern recognition techniques, such as Bayes classifiers or neural networks
The tutorial is roughly subdivided into a section concerned with some limited fundamental issues of HMMs and a section addressing typical applications in multimedia signal processing.

General section: General structure of HMMs, topology and notations. Basic principle of probabilistic pattern recognition with HMMs. Modeling of emission probabilities. Discrete, continuous density, and semi-continuous HMMs. Calculation of observation probabilities. Training of HMMs. EM-Algorithm, Forward-Backward Algorithm, Baum-Welch and Viterbi training. Training criteria, Maximum Likelihood and Mutual Information objective functions. Recognition and decoding issues. Context-dependent units. Higher-level modeling of classes and recognition units with n-grams. HMMs and neural networks. Neural implementation of HMMs. Two-dimensional HMMs, Pseudo-2D-HMMs, Pseudo-3D-HMMs.

Applications and case study section: Very brief review of HMM's in speech recognition. HMMs for character, handwriting and formula recognition. Context-dependent on- and off-line character recognition. Image sequence processing with HMMs. HMMs for gesture recognition. Video-indexing with HMMs and stochastic video models. HMM-based audio-visual information processing. Circular 1D- and 2D-HMMs for rotation-invariant recognition of symbols. Recognition of deformed and occluded objects. HMMs in image databases and image retrieval. Pseudo-2D-HMMs for face recognition. Pseudo-2D-HMMs for pictogram recognition and spotting. HMM-applications for person detection and object tracking. Speech-based emotion and facial expression recognition with 1D- and Pseudo-3D-HMMs.

Presenter Information

Gerhard Rigoll obtained the Dipl.-Ing. degree from Stuttgart University / Germany, in 1982. He joined the Fraunhofer-Institute (IAO) in Stuttgart as a researcher in the department of advanced information and communication technologies and received the Dr.-Ing. degree in 1986 in the area of automatic speech recognition. From 1986 to 1988 he worked as postdoctoral fellow at IBM T.J. Watson Research Center in Yorktown Heights/USA on acoustic modelling and speaker adaptation for the IBM Tangora speech recognition system. He received the Dr.-Ing. habil. degree in 1991 from Stuttgart University with a thesis in speech synthesis. From 1991 to 1993 he worked as guest researcher in the framework of the EC Scientific Training Programme in Japan for the NTT Human Interface Laboratories in Tokyo/Japan, in the area of neural networks and hybrid speech recognition systems. In 1993 he was appointed to full professor of computer science at Gerhard-Mercator-University in Duisburg, Germany. In 2002, Prof. Rigoll joined Munich University of Technology, where is he now heading the institute for Human-Machine Communication.

His research interests are in the field of multimodal human-machine communication and multimedia information processing, covering areas such as speech and handwriting recognition, gesture recognition, face detection & identification, emotion recognition, person tracking, image retrieval and video-indexing. Dr. Rigoll is a Senior Member of the IEEE and is the author and co-author of more than 200 refereed papers in the field of pattern recognition, covering all the previously mentioned application areas. Among those are about 30 papers at ICASSP conferences, since 1986. He is member of the editorial board of the journal “Pattern Analysis & Applications, serves as reviewer for many scientific journals, including the IEEE Transactions on Speech and Audio Processing, on Neural Networks, on Systems, Man & Cybernetics, and on Pattern Analysis and Machine Intelligence and has served as session chairman and member of the programme committee for numerous international conferences. He holds a personal patent in the area of neural vector quantizers for distributed speech recognition. He has been involved in a large number of research projects in the above mentioned areas and has been active for the last years as project reviewer and proposal evaluator in a variety of national and international projects, sponsored by the European Commission, the German National Science Foundation (DFG), the German Ministry for Research and Education (BMBF), and other research foundations in the UK, The Netherlands and Switzerland.

Home -||- Organizing Committee -||- Technical Committee -||- Technical Program -||- Plenaries
Paper Submission -||- Special Sessions -||- ITT -||- Paper Review -||- Exhibits -||- Tutorials
Information -||- Registration -||- Travel Insurance -||- Housing -||- Workshops