Author: "Lin, Jianzhe" / Language: undetermined - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Lin, Jianzhe"' showing total 5 results

Start Over Author "Lin, Jianzhe" Language undetermined

5 results on '"Lin, Jianzhe"'

1. IntentVizor: Towards Generic Query Guided Interactive Video Summarization

Author: Wu, Guande, Lin, Jianzhe, and Silva, Claudio T.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The target of automatic video summarization is to create a short skim of the original long video while preserving the major content/events. There is a growing interest in the integration of user queries into video summarization or query-driven video summarization. This video summarization method predicts a concise synopsis of the original video based on the user query, which is commonly represented by the input text. However, two inherent problems exist in this query-driven way. First, the text query might not be enough to describe the exact and diverse needs of the user. Second, the user cannot edit once the summaries are produced, while we assume the needs of the user should be subtle and need to be adjusted interactively. To solve these two problems, we propose IntentVizor, an interactive video summarization framework guided by generic multi-modality queries. The input query that describes the user's needs are not limited to text but also the video snippets. We further represent these multi-modality finer-grained queries as user `intent', which is interpretable, interactable, editable, and can better quantify the user's needs. In this paper, we use a set of the proposed intents to represent the user query and design a new interactive visual analytic interface. Users can interactively control and adjust these mixed-initiative intents to obtain a more satisfying summary through the interface. Also, to improve the summarization quality via video understanding, a novel Granularity-Scalable Ego-Graph Convolutional Networks (GSE-GCN) is proposed. We conduct our experiments on two benchmark datasets. Comparisons with the state-of-the-art methods verify the effectiveness of the proposed framework. Code and dataset are available at https://github.com/jnzs1836/intent-vizor., Comment: 10 pages and 4 figures, CVPR 2022
Published: 2022
Full Text: View/download PDF

2. ERA: Entity Relationship Aware Video Summarization with Wasserstein GAN

Author: Wu, Guande, Lin, Jianzhe, and Silva, Claudio T.
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Video summarization aims to simplify large scale video browsing by generating concise, short summaries that diver from but well represent the original video. Due to the scarcity of video annotations, recent progress for video summarization concentrates on unsupervised methods, among which the GAN based methods are most prevalent. This type of methods includes a summarizer and a discriminator. The summarized video from the summarizer will be assumed as the final output, only if the video reconstructed from this summary cannot be discriminated from the original one by the discriminator. The primary problems of this GAN based methods are two folds. First, the summarized video in this way is a subset of original video with low redundancy and contains high priority events/entities. This summarization criterion is not enough. Second, the training of the GAN framework is not stable. This paper proposes a novel Entity relationship Aware video summarization method (ERA) to address the above problems. To be more specific, we introduce an Adversarial Spatio Temporal network to construct the relationship among entities, which we think should also be given high priority in the summarization. The GAN training problem is solved by introducing the Wasserstein GAN and two newly proposed video patch/score sum losses. In addition, the score sum loss can also relieve the model sensitivity to the varying video lengths, which is an inherent problem for most current video analysis tasks. Our method substantially lifts the performance on the target benchmark datasets and exceeds the current leaderboard Rank 1 state of the art CSNet (2.1% F1 score increase on TVSum and 3.1% F1 score increase on SumMe). We hope our straightforward yet effective approach will shed some light on the future research of unsupervised video summarization., Comment: 8 pages, 3 figures
Published: 2021
Full Text: View/download PDF

3. IntentVizor: Towards Generic Query Guided Interactive Video Summarization Using Slow-Fast Graph Convolutional Networks

Author: Wu, Guande, Lin, Jianzhe, and Silva, Claudio T.
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV)
Abstract: The target of automatic Video summarization is to create a short skim of the original long video while preserving the major content/events. There is a growing interest in the integration of user's queries into video summarization, or query-driven video summarization. This video summarization method predicts a concise synopsis of the original video based on the user query, which is commonly represented by the input text. However, two inherent problems exist in this query-driven way. First, the query text might not be enough to describe the exact and diverse needs of the user. Second, the user cannot edit once the summaries are produced, limiting this summarization technique's practical value. We assume the needs of the user should be subtle and need to be adjusted interactively. To solve these two problems, we propose a novel IntentVizor framework, which is an interactive video summarization framework guided by genric multi-modality queries. The input query that describes the user's needs is not limited to text but also the video snippets. We further conclude these multi-modality finer-grained queries as user `intent', which is a newly proposed concept in this paper. This intent is interpretable, interactable, and better quantifies/describes the user's needs. To be more specific, We use a set of intents to represent the inputs of users to design our new interactive visual analytic interface. Users can interactively control and adjust these mixed-initiative intents to obtain a more satisfying summary of this newly proposed interface. Also, as algorithms help users achieve their summarization goal via video understanding, we propose two novel intent/scoring networks based on the slow-fast feature for our algorithm part. We conduct our experiments on two benchmark datasets. The comparison with the state-of-the-art methods verifies the effectiveness of the proposed framework., 11 pages and 3 figures for main paper, 8 pages and 6 figures for the appendix
Published: 2021
Full Text: View/download PDF

4. Aerial scene understanding in the wild: Multi-scene recognition via prototype-based memory networks

Author: Hua, Yuansheng, Mou, Lichao, Lin, Jianzhe, Heidler, Konrad, and Zhu, Xiao Xiang
Subjects: ddc
Published: 2020

5. DT-LET: Deep Transfer Learning by Exploring where to Transfer

Author: Lin, Jianzhe, Wang, Qi, Ward, Rabab, and Wang, Z. Jane
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Previous transfer learning methods based on deep network assume the knowledge should be transferred between the same hidden layers of the source domain and the target domains. This assumption doesn't always hold true, especially when the data from the two domains are heterogeneous with different resolutions. In such case, the most suitable numbers of layers for the source domain data and the target domain data would differ. As a result, the high level knowledge from the source domain would be transferred to the wrong layer of target domain. Based on this observation, "where to transfer" proposed in this paper should be a novel research frontier. We propose a new mathematic model named DT-LET to solve this heterogeneous transfer learning problem. In order to select the best matching of layers to transfer knowledge, we define specific loss function to estimate the corresponding relationship between high-level features of data in the source domain and the target domain. To verify this proposed cross-layer model, experiments for two cross-domain recognition/classification tasks are conducted, and the achieved superior results demonstrate the necessity of layer correspondence searching., Comment: Conference paper submitted to AAAI 2019
Published: 2018
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

5 results on '"Lin, Jianzhe"'

1. IntentVizor: Towards Generic Query Guided Interactive Video Summarization

2. ERA: Entity Relationship Aware Video Summarization with Wasserstein GAN

3. IntentVizor: Towards Generic Query Guided Interactive Video Summarization Using Slow-Fast Graph Convolutional Networks

4. Aerial scene understanding in the wild: Multi-scene recognition via prototype-based memory networks

5. DT-LET: Deep Transfer Learning by Exploring where to Transfer

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Database

Publisher

5 results on '"Lin, Jianzhe"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources