Technical Program

Paper Detail

Paper:IMDSP-P11.12
Session:Video Analysis
Time:Friday, May 21, 09:30 - 11:30
Presentation: Poster
Topic: Image and Multidimensional Signal Processing: Image and Video Indexing and Retrieval
Title: NEWS VIDEO STORY SEGMENTATION USING FUSION OF MULTI-LEVEL MULTI-MODAL FEATURES IN TRECVID 2003
Authors: Winston Hsu; Columbia University 
 Lyndon Kennedy; Columbia University 
 Chih-wei Huang; Columbia University 
 Shih-Fu Chang; Columbia University 
 Ching-Yung Lin; IBM T. J. Watson Research Center 
 Giridharan Iyengar; IBM T. J. Watson Research Center 
Abstract: In this paper, we present our new results in news video story segmentation and classification in the context of TRECVID video retrieval benchmarking event 2003. We applied and extended the Maximum Entropy statistical model to effectively fuse diverse features from multiple levels and modalities, including visual, audio, and text. We have included various features such as motion, face, music/speech types, prosody, and high-level text segmentation information. The statistical fusion model is used to automatically discover relevant features contributing to the detection of story boundaries. One novel aspect of our method is the use of a feature wrapper to address different types of features -- asynchronous, discrete, continuous and delta ones. We also developed several novel features related to prosody. Using the large news video set from the TRECVID 2003 benchmark, we demonstrate satisfactory performance (F1 measure up to 0.77 ) and more importantly observe an interesting opportunity for further improvement.
 
           Back


Home -||- Organizing Committee -||- Technical Committee -||- Technical Program -||- Plenaries
Paper Submission -||- Special Sessions -||- ITT -||- Paper Review -||- Exhibits -||- Tutorials
Information -||- Registration -||- Travel Insurance -||- Housing -||- Workshops

©2015 Conference Management Services, Inc. -||- email: webmaster@icassp2004.org -||- Last updated Wednesday, April 07, 2004