1. Activity representation with motion hierarchies
- Author
-
Adrien Gaidon, Zaid Harchaoui, Cordelia Schmid, Learning and recognition in vision (LEAR), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), Microsoft Research - Inria Joint Centre (MSR - INRIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Microsoft Research Laboratory Cambridge-Microsoft Corporation [Redmond, Wash.], Computer Vision, Xerox Research Centre Europe [Meylan], Xerox Company-Xerox Company, ERC_Allegro, MSR-Inria, AXES, ANR, ANR-11-LABX-0025,PERSYVAL-lab,Systemes et Algorithmes Pervasifs au confluent des mondes physique et numérique(2011), European Project: 269980,EC:FP7:ICT,FP7-ICT-2009-6,AXES(2011), European Project: 320559,EC:FP7:ERC,ERC-2012-ADG_20120216,ALLEGRO(2013), and European Project: 216886,EC:FP7:ICT,FP7-ICT-2007-1,PASCAL2(2008)
- Subjects
Context (language use) ,02 engineering and technology ,Video analysis ,Action recognition ,Activity recognition ,Artificial Intelligence ,Spectral clustering ,0202 electrical engineering, electronic engineering, information engineering ,Cluster analysis ,Mathematics ,Binary tree ,business.industry ,Kernel methods ,[INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV] ,020206 networking & telecommunications ,Pattern recognition ,Tree (data structure) ,Kernel method ,Pattern recognition (psychology) ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Motion decomposition ,Software - Abstract
Complex activities, e.g. pole vaulting, are composed of a variable number of sub-events connected by complex spatio-temporal relations, whereas simple actions can be represented as sequences of short temporal parts. In this paper, we learn hierarchical representations of activity videos in an unsupervised manner. These hierarchies of mid-level motion components are data-driven decompositions specific to each video. We introduce a spectral divisive clustering algorithm to efficiently extract a hierarchy over a large number of tracklets (i.e. local trajectories). We use this structure to represent a video as an unordered binary tree. We model this tree using nested histograms of local motion features. We provide an efficient positive definite kernel that computes the structural and visual similarity of two hierarchical decompositions by relying on models of their parent---child relations. We present experimental results on four recent challenging benchmarks: the High Five dataset (Patron-Perez et al., High five: recognising human interactions in TV shows, 2010), the Olympics Sports dataset (Niebles et al., Modeling temporal structure of decomposable motion segments for activity classification, 2010), the Hollywood 2 dataset (Marszalek et al., Actions in context, 2009), and the HMDB dataset (Kuehne et al., HMDB: A large video database for human motion recognition, 2011). We show that per-video hierarchies provide additional information for activity recognition. Our approach improves over unstructured activity models, baselines using other motion decomposition algorithms, and the state of the art.
- Published
- 2014
- Full Text
- View/download PDF