1. Egocentric activity recognition using gaze
- Author
-
Hipiny, Irwandi
- Subjects
006.3 - Abstract
When coupled with an egocentric camera, a gaze tracker provides the image point of where the person is fixating at. While performing a familiar task, we tend to fixate on activity- relevant objects at the points in time required in the task at hand. The resulting sequence of gaze regions is therefore very useful for inferring the subject 's activity and action class. This thesis addresses the problem of visual recognition of human activity and action from an egocentric point of view. The higher level task of activity recognition is based on processing the entire sequence of gaze regions as users perform tasks such as cooking or assembling objects, while the mid-level task of action recognition , such as pouring into a cup, is addressed via the automatic segmentation of mutually exclusive sequences prior to recognition. Temporal segmentation is performed by tracking two motion based features inside the successive gaze regions. These features model the underlying structure of image motion data at natural temporal cuts. This segmentation is further improved by the incorporation of a 2D color histogram based detection of human hands inside gaze regions . The proposed method learns activity and action models from the sequence of gaze regions. Activities are learned as a bag of visual words, however we introduce a multi-voting scheme to reduce the effect of noisy matching. Actions are, in addition, modeled as a string of visual words which enforces the structural constraint of an action. We introduce contextual information in the form of location based priors. Furthermore, this thesis addresses the problem of measuring task performance from gaze region modeling. The hypothesis is that subjects with greater task performance scores demonstrate specific gaze patterns as they conduct the task, which is expected to indicate the presence of domain knowledge. This may be reflected in for example requiring minimal visual feedback during the completion of a task. This consistent and strategic use of gaze produces nearly identical activity models among those that score higher, whilst a greater variation is observed between models learned from subjects that have performed less well in the given task. Results are shown on datasets captured using an egocentric gaze tracker with two cameras, a frontal facing camera that captures the scene, and an inward facing camera that tracks the movement of the pupil to estimate the subject's gaze fixation. Our activity and action recognition results are comparable to current literature in egocentric activity recognition, and to the best of our knowledge, the results from the task performance evaluation are the first steps towards automatically modeling user performance from gaze patterns.
- Published
- 2013