Start Over

Keyframe summarisation of egocentric video

Authors :: Yousefi, Paria
Kuncheva, Ludmila
Publication Year :: 2019
Publisher :: Bangor University, 2019.
Abstract: Egocentric data refers to collections of images by a user wearing a camera over a period of time. The pictures taken provide considerable potential for knowledge mining related to the user's life, and consequently open up a wide range of opportunities for new applications on health-care, protection and security, law enforcement and training, leisure, and self-monitoring. As a result, large volumes of egocentric data are being continually collected every day, which highlights the importance of developing video analysis techniques to facilitate browsing the created video data. Generating condensed yet informative version from the original unstructured egocentric frame stream eases comprehending content, and browsing the narratives. Given the great interest in creating keyframe summaries from video, it is surprising how little has been done to formalise their evaluation and comparison. The thesis first carries out a series of investigations related to automatic evaluation of video summaries, and their comparisons. A discrimination capacity measure is proposed as a formal way to quantify the improvement over the uniform baseline, assuming that one or more ground truth summaries are available. Subsequently, a formal protocol for comparing summaries when ground truth is available is proposed. We noticed the mostly used benchmark summarisation methods: random, uniform, and mid-event selections, are weak competitors. Therefore, we propose a new benchmark method for creating a keyframe summary, called "closest-to-centroid". We examined the presented baseline method on 20 different image descriptors to demonstrate its performance against the typical choices of baseline methods. Thereafter, the problem of selecting a keyframe summary is addressed as a problem of prototype (instance) selection for the nearest neighbour classifier (1-nn). Assuming that the video is already segmented into events of interest (classes), and represented as a data set in some feature space, we propose a Greedy Tabu Selector algorithm which picks one frame to represent each class. Summaries generated by the algorithm are evaluated on a widely-used egocentric video database, and compared against the proposed baseline (closest-to-centroid). The Greedy Tabu Selector algorithm leads to an improved match to the user ground truth, compared to the closest-to-centroid baseline summarisation method. Next, a method for selective video summarisation of egocentric video is introduced. It extracts multiple summaries from the same stream based upon different user queries. The result is a time-tagged summary of keyframes related to the query concept. The method is evaluated on two commonly used egocentric and lifelog databases. Further to this, it is noted that despite the existence of a large number of approaches for generating summaries from egocentric video, on-line video summarisation has not been fully explored yet. This type of summary can be useful where memory constraints mean it is not practical to wait for the full video to be available for processing. We propose a classification (taxonomy) for on-line video summarisation methods based upon their descriptive and distinguishing properties. Afterwards, we develop an on-line video summarisation algorithm to generate keyframe summaries during video capture. Results are evaluated on an egocentric database. The summaries generated by the proposed method outperform those generated by the two competitors.