Descriptor: "Transformer Network" / Journal: neurocomputing - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Transformer Network"' showing total 6 results

Start Over Descriptor "Transformer Network" Journal neurocomputing

6 results on '"Transformer Network"'

1. Detecting severity of Diabetic Retinopathy from fundus images: A transformer network-based review

Author: Karkera, Tejas, Adak, Chandranath, Chattopadhyay, Soumi, and Saqib, Muhammad
Published: 2024
Full Text: View/download PDF

2. GraphFusion: Integrating multi-level semantic information with graph computing for enhanced 3D instance segmentation.

Author: Pan, Lei, Luan, Wuyang, Zheng, Yuan, Li, Junhui, Tao, Linwei, and Xu, Chang
Subjects: *TRANSFORMER models, *POINT cloud, *SOCIAL dominance
Abstract: Graph computing has emerged as a focal point in recent research across various fields, including the realm of 3D instance segmentation, where it aids in detecting and segmenting objects within volumetric data. Our study introduces GraphFusion, a state-of-the-art network that harnesses the power of graph computing to enhance the segmentation of 3D point clouds. GraphFusion is equipped with a Multi-Level Semantic Aggregation Module, architectured akin to a graph, to capture comprehensive features from 3D point clouds. Utilizing graph-based methodologies, this module proficiently aggregates multi-scale semantic information, illuminating insights from both global and local contexts. Additionally, our Parallel Feature Fusion Transformer Module leverages graph-transformer techniques to intricately process complex spatial relationships within point clouds, culminating in a more cohesive feature representation. Rigorous experiments on the ScanNetv2 dataset affirm the dominance of GraphFusion, which eclipses current methods by 2.2% in mean Average Precision (mAP) on the hidden test set. The model's code is accessible at https://github.com/3171228612/GraphFusion. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Bi-syntax guided transformer network for aspect sentiment triplet extraction.

Author: Hao, Shufeng, Zhou, Yu, Liu, Ping, and Xu, Shuang
Subjects: *SENTIMENT analysis, *USER-generated content, *END-to-end delay
Abstract: Aspect Sentiment Triplet Extraction is an emerging and challenging task that attempts to present a complete picture of aspect-based sentiment analysis. Prior research efforts mostly leverage various tagging schemes to extract the three elements in a triplet. However, these methods fail to explicitly model the complicated relations between aspects and opinions and the boundaries of multi-word aspects and opinions. In this paper, we propose a bi-syntax guided transformer network in an end-to-end manner to address these challenges. Firstly, we devise three types of representations, including sequence distance representation, constituency distance representation, and dependency distance representation, to learn the comprehensive language representation. Specifically, sequence distance representation utilizes sequence distance between words to enhance the contextual representation. Constituency distance representation adopts constituency distance between words in a constituency tree to capture the intra-span relation between words. Dependency distance representation employs dependency distance between words in a dependency tree to capture the long-distance relation between aspects and opinions. Extensive experiments are conducted on four benchmark datasets to validate the effectiveness of our method. The results demonstrate that the proposed approach achieves better performance than baseline methods. We conduct further detailed analysis to demonstrate that our method effectively handles multi-word terms and overlapping triplets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Blind face restoration: Benchmark datasets and a baseline model.

Author: Zhang, Puyang, Zhang, Kaihao, Luo, Wenhan, Li, Changsheng, and Wang, Guoren
Subjects: *TRANSFORMER models, *JPEG (Image coding standard)
Abstract: Blind Face Restoration (BFR) aims to generate high-quality face images from low-quality inputs. However, existing BFR methods often use private datasets for training and evaluation, making it challenging for future approaches to compare fairly. To address this issue, we introduce two benchmark datasets, BFRBD128 and BFRBD512, for evaluating state-of-the-art methods in five scenarios: blur, noise, low resolution, JPEG compression artifacts, and full degradation. We use seven standard quantitative metrics and two task-specific metrics, AFLD and AFICS. Additionally, we propose an efficient baseline model called Swin Transformer U-Net (STUNet), which outperforms state-of-the-art methods in various BFR tasks. The codes, datasets, and trained models are publicly available at: https://github.com/bitzpy/Blind-Face-Restoration-Benchmark-Datasets-and-a-Baseline-Model. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Boundary-guided part reasoning network for human parsing.

Author: Su, Zhuo, Guan, Huiqiang, Lai, Yuntian, Zhou, Fan, and Liang, Yun
Subjects: *TRANSFORMER models, *HUMAN body, *HUMAN beings
Abstract: The task of human parsing aims to segment the human body into different semantic regions. Despite advancements in this field, there are still two issues with current works: boundary indistinction and parsing inconsistency. In this paper, we investigate how to utilize structural information and auxiliary information to jointly solve the above two problems. Drawing inspiration from Transformer architecture, a Boundary-guided Part Reasoning Network (BPRNet) is proposed to combine edge information and associated semantics of body parts for human parsing. Specifically, we design a part representation module to represent human body parts as part features. Based on the Transformer decoder, a multi-head self-attention is used to capture the semantic correlation between the human body. Moreover, we propose a boundary-guided module consisting of absolute boundary attention and reinforced boundary attention. They take advantage of edge information and multi-scale image features to jointly constrain cross-attention to extract global features. Experiments and corresponding results on three public datasets show that the proposed method performs favorably against the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

6. A comprehensive empirical review of modern voice activity detection approaches for movies and TV shows.

Author: Sharma, Mayank, Joshi, Sandeep, Chatterjee, Tamojit, and Hamid, Raffay
Subjects: *TELEVISION programs, *ENVIRONMENTAL music, *SIGNAL-to-noise ratio, *MUSIC scores, *SPEECH
Abstract: A robust and language agnostic Voice Activity Detection (VAD) is crucial for Digital Entertainment Content (DEC). Primary examples of DEC include movies and TV series. Some ways in which VAD systems are used for DEC creation include augmenting subtitle creation, subtitle drift detection and correction, and audio diarisation. Majority of the previous work on VAD focuses on scenarios that: (a) have minimal background noise, and (b) where the audio content is delivered in English language. However, movies and TV shows can: (a) have substantial amounts of non-voice background signal (e.g. musical score and environmental sounds), and (b) are released worldwide in a variety of languages. This makes most of the previous standard VAD approaches not readily applicable for DEC related applications. Furthermore, there does not exist a comprehensive analysis of Deep Neural Network's (DNN) performance for the task of VAD applied to DEC. In this work, we present a thorough survey on DNN based VADs on DEC data in terms of their accuracy, Area Under Curve (AUC), noise sensitivity, and language agnostic behaviour. For our analysis we use 1100 proprietary DEC videos spanning 450 h of content in 9 languages and 5 + genres, making our study the largest of its kind ever published. The key findings of our analysis are: (a) even high quality timed-text or subtitle 2 2 subtitles and timed-text are used interchangeably in the manuscript files contain significant levels of label-noise (up to 15%). Despite high label noise, deep networks are robust and are able to retain high AUCs (∼ 0.94). (b) Using larger labelled dataset can substantially increase neural VAD model's True Positive Rate (TPR) with up to 1.3% and 18% relative improvement over current state-of-the-art methods in Hebbar et al. (2019) and Chaudhuri et al. (2018) respectively. This effect is more pronounced in noisy environments such as music and environmental sounds. This insight is particularly instructive while prioritizing domain specific labelled data acquisition versus exploring model structure and complexity. (c) Currently available sequence based neural models show similar levels of competence in terms of their language agnostic behaviour for VAD at high Signal-to-Noise Ratios (SNRs) and for clean speech, (d) Deep models exhibit varied performance across different SNRs with CLDNN (Zazo et al., 2016) being the most robust, and (e) models with comparatively larger number of parameters (∼ 2 M) are less robust to input noise as opposed to models having smaller number of parameters (∼ 0.5 M). [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

6 results on '"Transformer Network"'

1. Detecting severity of Diabetic Retinopathy from fundus images: A transformer network-based review

2. GraphFusion: Integrating multi-level semantic information with graph computing for enhanced 3D instance segmentation.

3. Bi-syntax guided transformer network for aspect sentiment triplet extraction.

4. Blind face restoration: Benchmark datasets and a baseline model.

5. Boundary-guided part reasoning network for human parsing.

6. A comprehensive empirical review of modern voice activity detection approaches for movies and TV shows.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

6 results on '"Transformer Network"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources