A compact shot representation for video semantic indexing

Authors :: Ronggang Wang
Jinzhuo Wang
Wenmin Wang
Wen Gao
Source :: ICIP
Publication Year :: 2015
Publisher :: IEEE, 2015.
Abstract: This paper presents a compact shot representation for video semantic indexing (SIN). The proposed representation consists of visual cues from only two frames, i.e., key frame (KF) and difference frame (DF), which are both constructed with spatial pyramid. The KF describes static information while the generated DF captures non-static information. Each region of DF is derived from the same location in a selected frame, which has the most salient difference compared with the key frame in that region. We introduce a variation of DF to further enhance our model. Experimental results on TRECVID SIN demonstrate that our method obtains better accuracy than the state-of-the-art, while requiring less storage space and consuming time.