Back to Search
Start Over
Graph Unfolding and Sampling for Transitory Video Summarization via Gershgorin Disc Alignment
- Publication Year :
- 2024
-
Abstract
- User-generated videos (UGVs) uploaded from mobile phones to social media sites like YouTube and TikTok are short and non-repetitive. We summarize a transitory UGV into several keyframes in linear time via fast graph sampling based on Gershgorin disc alignment (GDA). Specifically, we first model a sequence of $N$ frames in a UGV as an $M$-hop path graph $\mathcal{G}^o$ for $M \ll N$, where the similarity between two frames within $M$ time instants is encoded as a positive edge based on feature similarity. Towards efficient sampling, we then "unfold" $\mathcal{G}^o$ to a $1$-hop path graph $\mathcal{G}$, specified by a generalized graph Laplacian matrix $\mathcal{L}$, via one of two graph unfolding procedures with provable performance bounds. We show that maximizing the smallest eigenvalue $\lambda_{\min}(\mathbf{B})$ of a coefficient matrix $\mathbf{B} = \textit{diag}\left(\mathbf{h}\right) + \mu \mathcal{L}$, where $\mathbf{h}$ is the binary keyframe selection vector, is equivalent to minimizing a worst-case signal reconstruction error. We maximize instead the Gershgorin circle theorem (GCT) lower bound $\lambda^-_{\min}(\mathbf{B})$ by choosing $\mathbf{h}$ via a new fast graph sampling algorithm that iteratively aligns left-ends of Gershgorin discs for all graph nodes (frames). Extensive experiments on multiple short video datasets show that our algorithm achieves comparable or better video summarization performance compared to state-of-the-art methods, at a substantially reduced complexity.<br />Comment: 13 pages, 5 figures
Details
- Database :
- arXiv
- Publication Type :
- Report
- Accession number :
- edsarx.2408.01859
- Document Type :
- Working Paper