Author: "Hou, Jingwen" / Language: undetermined - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Hou, Jingwen"' showing total 7 results

Start Over Author "Hou, Jingwen" Language undetermined

7 results on '"Hou, Jingwen"'

1. Integrating Large Pre-trained Models into Multimodal Named Entity Recognition with Evidential Fusion

Author: Liu, Weide, Zhong, Xiaoyang, Hou, Jingwen, Li, Shaohua, Huang, Haozhe, and Fang, Yuming
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Multimodal Named Entity Recognition (MNER) is a crucial task for information extraction from social media platforms such as Twitter. Most current methods rely on attention weights to extract information from both text and images but are often unreliable and lack interpretability. To address this problem, we propose incorporating uncertainty estimation into the MNER task, producing trustworthy predictions. Our proposed algorithm models the distribution of each modality as a Normal-inverse Gamma distribution, and fuses them into a unified distribution with an evidential fusion mechanism, enabling hierarchical characterization of uncertainties and promotion of prediction accuracy and trustworthiness. Additionally, we explore the potential of pre-trained large foundation models in MNER and propose an efficient fusion approach that leverages their robust feature representations. Experiments on two datasets demonstrate that our proposed method outperforms the baselines and achieves new state-of-the-art performance.
Published: 2023
Full Text: View/download PDF

2. Towards Explainable In-the-Wild Video Quality Assessment: a Database and a Language-Prompted Approach

Author: Wu, Haoning, Zhang, Erli, Liao, Liang, Chen, Chaofeng, Hou, Jingwen, Wang, Annan, Sun, Wenxiu, Yan, Qiong, and Lin, Weisi
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Computation and Language (cs.CL), Computer Science - Multimedia, Multimedia (cs.MM)
Abstract: The proliferation of in-the-wild videos has greatly expanded the Video Quality Assessment (VQA) problem. Unlike early definitions that usually focus on limited distortion types, VQA on in-the-wild videos is especially challenging as it could be affected by complicated factors, including various distortions and diverse contents. Though subjective studies have collected overall quality scores for these videos, how the abstract quality scores relate with specific factors is still obscure, hindering VQA methods from more concrete quality evaluations (e.g. sharpness of a video). To solve this problem, we collect over two million opinions on 4,543 in-the-wild videos on 13 dimensions of quality-related factors, including in-capture authentic distortions (e.g. motion blur, noise, flicker), errors introduced by compression and transmission, and higher-level experiences on semantic contents and aesthetic issues (e.g. composition, camera trajectory), to establish the multi-dimensional Maxwell database. Specifically, we ask the subjects to label among a positive, a negative, and a neural choice for each dimension. These explanation-level opinions allow us to measure the relationships between specific quality factors and abstract subjective quality ratings, and to benchmark different categories of VQA algorithms on each dimension, so as to more comprehensively analyze their strengths and weaknesses. Furthermore, we propose the MaxVQA, a language-prompted VQA approach that modifies vision-language foundation model CLIP to better capture important quality issues as observed in our analyses. The MaxVQA can jointly evaluate various specific quality factors and final quality scores with state-of-the-art accuracy on all dimensions, and superb generalization ability on existing datasets. Code and data available at \url{https://github.com/VQAssessment/MaxVQA}., Comment: 12 pages (with appendix). Under review, non-finalised version
Published: 2023
Full Text: View/download PDF

3. Exploring Opinion-unaware Video Quality Assessment with Semantic Affinity Criterion

Author: Wu, Haoning, Liao, Liang, Hou, Jingwen, Chen, Chaofeng, Zhang, Erli, Wang, Annan, Sun, Wenxiu, Yan, Qiong, and Lin, Weisi
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia, Multimedia (cs.MM)
Abstract: Recent learning-based video quality assessment (VQA) algorithms are expensive to implement due to the cost of data collection of human quality opinions, and are less robust across various scenarios due to the biases of these opinions. This motivates our exploration on opinion-unaware (a.k.a zero-shot) VQA approaches. Existing approaches only considers low-level naturalness in spatial or temporal domain, without considering impacts from high-level semantics. In this work, we introduce an explicit semantic affinity index for opinion-unaware VQA using text-prompts in the contrastive language-image pre-training (CLIP) model. We also aggregate it with different traditional low-level naturalness indexes through gaussian normalization and sigmoid rescaling strategies. Composed of aggregated semantic and technical metrics, the proposed Blind Unified Opinion-Unaware Video Quality Index via Semantic and Technical Metric Aggregation (BUONA-VISTA) outperforms existing opinion-unaware VQA methods by at least 20% improvements, and is more robust than opinion-aware approaches.
Published: 2023
Full Text: View/download PDF

4. Neighbourhood Representative Sampling for Efficient End-to-end Video Quality Assessment

Author: Wu, Haoning, Chen, Chaofeng, Liao, Liang, Hou, Jingwen, Sun, Wenxiu, Yan, Qiong, Gu, Jinwei, and Lin, Weisi
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia, Multimedia (cs.MM)
Abstract: The increased resolution of real-world videos presents a dilemma between efficiency and accuracy for deep Video Quality Assessment (VQA). On the one hand, keeping the original resolution will lead to unacceptable computational costs. On the other hand, existing practices, such as resizing and cropping, will change the quality of original videos due to the loss of details and contents, and are therefore harmful to quality assessment. With the obtained insight from the study of spatial-temporal redundancy in the human visual system and visual coding theory, we observe that quality information around a neighbourhood is typically similar, motivating us to investigate an effective quality-sensitive neighbourhood representatives scheme for VQA. In this work, we propose a unified scheme, spatial-temporal grid mini-cube sampling (St-GMS) to get a novel type of sample, named fragments. Full-resolution videos are first divided into mini-cubes with preset spatial-temporal grids, then the temporal-aligned quality representatives are sampled to compose the fragments that serve as inputs for VQA. In addition, we design the Fragment Attention Network (FANet), a network architecture tailored specifically for fragments. With fragments and FANet, the proposed efficient end-to-end FAST-VQA and FasterVQA achieve significantly better performance than existing approaches on all VQA benchmarks while requiring only 1/1612 FLOPs compared to the current state-of-the-art. Codes, models and demos are available at https://github.com/timothyhtimothy/FAST-VQA-and-FasterVQA.
Published: 2022
Full Text: View/download PDF

5. Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives

Author: Wu, Haoning, Zhang, Erli, Liao, Liang, Chen, Chaofeng, Hou, Jingwen, Wang, Annan, Sun, Wenxiu, Yan, Qiong, and Lin, Weisi
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Image and Video Processing (eess.IV), Computer Science - Computer Vision and Pattern Recognition, FOS: Electrical engineering, electronic engineering, information engineering, Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Multimedia, Machine Learning (cs.LG), Multimedia (cs.MM)
Abstract: The rapid increase in user-generated-content (UGC) videos calls for the development of effective video quality assessment (VQA) algorithms. However, the objective of the UGC-VQA problem is still ambiguous and can be viewed from two perspectives: the technical perspective, measuring the perception of distortions; and the aesthetic perspective, which relates to preference and recommendation on contents. To understand how these two perspectives affect overall subjective opinions in UGC-VQA, we conduct a large-scale subjective study to collect human quality opinions on overall quality of videos as well as perceptions from aesthetic and technical perspectives. The collected Disentangled Video Quality Database (DIVIDE-3k) confirms that human quality opinions on UGC videos are universally and inevitably affected by both aesthetic and technical perspectives. In light of this, we propose the Disentangled Objective Video Quality Evaluator (DOVER) to learn the quality of UGC videos based on the two perspectives. The DOVER proves state-of-the-art performance in UGC-VQA under very high efficiency. With perspective opinions in DIVIDE-3k, we further propose DOVER++, the first approach to provide reliable clear-cut quality evaluations from a single aesthetic or technical perspective. Code at https://github.com/VQAssessment/DOVER.
Published: 2022
Full Text: View/download PDF

6. FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling

Author: Wu, Haoning, Chen, Chaofeng, Hou, Jingwen, Liao, Liang, Wang, Annan, Sun, Wenxiu, Yan, Qiong, and Lin, Weisi
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia, Multimedia (cs.MM)
Abstract: Current deep video quality assessment (VQA) methods are usually with high computational costs when evaluating high-resolution videos. This cost hinders them from learning better video-quality-related representations via end-to-end training. Existing approaches typically consider naive sampling to reduce the computational cost, such as resizing and cropping. However, they obviously corrupt quality-related information in videos and are thus not optimal for learning good representations for VQA. Therefore, there is an eager need to design a new quality-retained sampling scheme for VQA. In this paper, we propose Grid Mini-patch Sampling (GMS), which allows consideration of local quality by sampling patches at their raw resolution and covers global quality with contextual relations via mini-patches sampled in uniform grids. These mini-patches are spliced and aligned temporally, named as fragments. We further build the Fragment Attention Network (FANet) specially designed to accommodate fragments as inputs. Consisting of fragments and FANet, the proposed FrAgment Sample Transformer for VQA (FAST-VQA) enables efficient end-to-end deep VQA and learns effective video-quality-related representations. It improves state-of-the-art accuracy by around 10% while reducing 99.5% FLOPs on 1080P high-resolution videos. The newly learned video-quality-related representations can also be transferred into smaller VQA datasets, boosting performance in these scenarios. Extensive experiments show that FAST-VQA has good performance on inputs of various resolutions while retaining high efficiency. We publish our code at https://github.com/timothyhtimothy/FAST-VQA., Comment: Will appear on ECCV 2022. 14 Pages
Published: 2022
Full Text: View/download PDF

7. Learning Image Aesthetic Assessment from Object-level Visual Components

Author: Hou, Jingwen, Yang, Sheng, Lin, Weisi, Zhao, Baoquan, and Fang, Yuming
Subjects: FOS: Computer and information sciences, Computer Science - Multimedia, Multimedia (cs.MM)
Abstract: As it is said by Van Gogh, great things are done by a series of small things brought together. Aesthetic experience arises from the aggregation of underlying visual components. However, most existing deep image aesthetic assessment (IAA) methods over-simplify the IAA process by failing to model image aesthetics with clearly-defined visual components as building blocks. As a result, the connection between resulting aesthetic predictions and underlying visual components is mostly invisible and hard to be explicitly controlled, which limits the model in both performance and interpretability. This work aims to model image aesthetics from the level of visual components. Specifically, object-level regions detected by a generic object detector are defined as visual components, namely object-level visual components (OVCs). Then generic features representing OVCs are aggregated for the aesthetic prediction based upon proposed object-level and graph attention mechanisms, which dynamically determines the importance of individual OVCs and relevance between OVC pairs, respectively. Experimental results confirm the superiority of our framework over previous relevant methods in terms of SRCC and PLCC on the aesthetic rating distribution prediction. Besides, quantitative analysis is done towards model interpretation by observing how OVCs contribute to aesthetic predictions, whose results are found to be supported by psychology on aesthetics and photography rules. To the best of our knowledge, this is the first attempt at the interpretation of a deep IAA model.
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

7 results on '"Hou, Jingwen"'

1. Integrating Large Pre-trained Models into Multimodal Named Entity Recognition with Evidential Fusion

2. Towards Explainable In-the-Wild Video Quality Assessment: a Database and a Language-Prompted Approach

3. Exploring Opinion-unaware Video Quality Assessment with Semantic Affinity Criterion

4. Neighbourhood Representative Sampling for Efficient End-to-end Video Quality Assessment

5. Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives

6. FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling

7. Learning Image Aesthetic Assessment from Object-level Visual Components

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Database

7 results on '"Hou, Jingwen"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources