Author: "Zhang, Meishan" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhang, Meishan"' showing total 389 results

Start Over Author "Zhang, Meishan"

389 results on '"Zhang, Meishan"'

1. Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image

Author: Zhao, Yu, Fei, Hao, Li, Xiangtai, Qin, Libo, Ji, Jiayi, Zhu, Hongyuan, Zhang, Meishan, Zhang, Min, and Wei, Jianguo
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: In the visual spatial understanding (VSU) area, spatial image-to-text (SI2T) and spatial text-to-image (ST2I) are two fundamental tasks that appear in dual form. Existing methods for standalone SI2T or ST2I perform imperfectly in spatial understanding, due to the difficulty of 3D-wise spatial feature modeling. In this work, we consider modeling the SI2T and ST2I together under a dual learning framework. During the dual framework, we then propose to represent the 3D spatial scene features with a novel 3D scene graph (3DSG) representation that can be shared and beneficial to both tasks. Further, inspired by the intuition that the easier 3D$\to$image and 3D$\to$text processes also exist symmetrically in the ST2I and SI2T, respectively, we propose the Spatial Dual Discrete Diffusion (SD$^3$) framework, which utilizes the intermediate features of the 3D$\to$X processes to guide the hard X$\to$3D processes, such that the overall ST2I and SI2T will benefit each other. On the visual spatial understanding dataset VSD, our system outperforms the mainstream T2I and I2T methods significantly. Further in-depth analysis reveals how our dual learning strategy advances.
Published: 2024

2. Grammar Induction from Visual, Speech and Text

Author: Zhao, Yu, Fei, Hao, Wu, Shengqiong, Zhang, Meishan, Zhang, Min, and Chua, Tat-seng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Grammar Induction could benefit from rich heterogeneous signals, such as text, vision, and acoustics. In the process, features from distinct modalities essentially serve complementary roles to each other. With such intuition, this work introduces a novel \emph{unsupervised visual-audio-text grammar induction} task (named \textbf{VAT-GI}), to induce the constituent grammar trees from parallel images, text, and speech inputs. Inspired by the fact that language grammar natively exists beyond the texts, we argue that the text has not to be the predominant modality in grammar induction. Thus we further introduce a \emph{textless} setting of VAT-GI, wherein the task solely relies on visual and auditory inputs. To approach the task, we propose a visual-audio-text inside-outside recursive autoencoder (\textbf{VaTiora}) framework, which leverages rich modal-specific and complementary features for effective grammar parsing. Besides, a more challenging benchmark data is constructed to assess the generalization ability of VAT-GI system. Experiments on two benchmark datasets demonstrate that our proposed VaTiora system is more effective in incorporating the various multimodal signals, and also presents new state-of-the-art performance of VAT-GI.
Published: 2024

3. SpeechEE: A Novel Benchmark for Speech Event Extraction

Author: Wang, Bin, Zhang, Meishan, Fei, Hao, Zhao, Yu, Li, Bobo, Wu, Shengqiong, Ji, Wei, and Zhang, Min
Subjects: Computer Science - Multimedia
Abstract: Event extraction (EE) is a critical direction in the field of information extraction, laying an important foundation for the construction of structured knowledge bases. EE from text has received ample research and attention for years, yet there can be numerous real-world applications that require direct information acquisition from speech signals, online meeting minutes, interview summaries, press releases, etc. While EE from speech has remained under-explored, this paper fills the gap by pioneering a SpeechEE, defined as detecting the event predicates and arguments from a given audio speech. To benchmark the SpeechEE task, we first construct a large-scale high-quality dataset. Based on textual EE datasets under the sentence, document, and dialogue scenarios, we convert texts into speeches through both manual real-person narration and automatic synthesis, empowering the data with diverse scenarios, languages, domains, ambiences, and speaker styles. Further, to effectively address the key challenges in the task, we tailor an E2E SpeechEE system based on the encoder-decoder architecture, where a novel Shrinking Unit module and a retrieval-aided decoding mechanism are devised. Extensive experimental results on all SpeechEE subsets demonstrate the efficacy of the proposed model, offering a strong baseline for the task. At last, being the first work on this topic, we shed light on key directions for future research.
Published: 2024

4. An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation

Author: Guo, Peiming, Liu, Sinuo, Zhang, Yanzhao, Long, Dingkun, Xie, Pengjun, Zhang, Meishan, and Zhang, Min
Subjects: Computer Science - Computation and Language
Abstract: Photo-Sharing Multi-modal dialogue generation requires a dialogue agent not only to generate text responses but also to share photos at the proper moment. Using image text caption as the bridge, a pipeline model integrates an image caption model, a text generation model, and an image generation model to handle this complex multi-modal task. However, representing the images with text captions may loss important visual details and information and cause error propagation in the complex dialogue system. Besides, the pipeline model isolates the three models separately because discrete image text captions hinder end-to-end gradient propagation. We propose the first end-to-end model for photo-sharing multi-modal dialogue generation, which integrates an image perceptron and an image generator with a large language model. The large language model employs the Q-Former to perceive visual images in the input end. For image generation in the output end, we propose a dynamic vocabulary transformation matrix and use straight-through and gumbel-softmax techniques to align the large language model and stable diffusion model and achieve end-to-end gradient propagation. We perform experiments on PhotoChat and DialogCC datasets to evaluate our end-to-end model. Compared with pipeline models, the end-to-end model gains state-of-the-art performances on various metrics of text and image generation. More analysis experiments also verify the effectiveness of the end-to-end model for photo-sharing multi-modal dialogue generation., Comment: Work in progress
Published: 2024

5. mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval

Author: Zhang, Xin, Zhang, Yanzhao, Long, Dingkun, Xie, Wen, Dai, Ziqi, Tang, Jialong, Lin, Huan, Yang, Baosong, Xie, Pengjun, Huang, Fei, Zhang, Meishan, Li, Wenjie, and Zhang, Min
Subjects: Computer Science - Computation and Language, Computer Science - Information Retrieval
Abstract: We present systematic efforts in building long-context multilingual text representation model (TRM) and reranker from scratch for text retrieval. We first introduce a text encoder (base size) enhanced with RoPE and unpadding, pre-trained in a native 8192-token context (longer than 512 of previous multilingual encoders). Then we construct a hybrid TRM and a cross-encoder reranker by contrastive learning. Evaluations show that our text encoder outperforms the same-sized previous state-of-the-art XLM-R. Meanwhile, our TRM and reranker match the performance of large-sized state-of-the-art BGE-M3 models and achieve better results on long-context retrieval benchmarks. Further analysis demonstrate that our proposed models exhibit higher efficiency during both training and inference. We believe their efficiency and effectiveness could benefit various researches and industrial applications., Comment: Camera-ready version of EMNLP 2024: Industry Track
Published: 2024

6. Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment

Author: Fei, Hao, Wu, Shengqiong, Zhang, Meishan, Zhang, Min, Chua, Tat-Seng, and Yan, Shuicheng
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: While pre-training large-scale video-language models (VLMs) has shown remarkable potential for various downstream video-language tasks, existing VLMs can still suffer from certain commonly seen limitations, e.g., coarse-grained cross-modal aligning , under-modeling of temporal dynamics, detached video-language view. In this work, we target enhancing VLMs with a fine-grained structural spatio-temporal alignment learning method (namely Finsta). First of all, we represent the input texts and videos with fine-grained scene graph (SG) structures, both of which are further unified into a holistic SG (HSG) for bridging two modalities. Then, an SG-based framework is built, where the textual SG (TSG) is encoded with a graph Transformer, while the video dynamic SG (DSG) and the HSG are modeled with a novel recurrent graph Transformer for spatial and temporal feature propagation. A spatial-temporal Gaussian differential graph Transformer is further devised to strengthen the sense of the changes in objects across spatial and temporal dimensions. Next, based on the fine-grained structural features of TSG and DSG, we perform object-centered spatial alignment and predicate-centered temporal alignment respectively, enhancing the video-language grounding in both the spatiality and temporality. We design our method as a plug&play system, which can be integrated into existing well-trained VLMs for further representation augmentation, without training from scratch or relying on SG annotations in downstream applications. On 6 representative VL modeling tasks over 12 datasets in both standard and long-form video scenarios, Finsta consistently improves the existing 13 strong-performing VLMs persistently, and refreshes the current state-of-the-art end task performance significantly in both the fine-tuning and zero-shot settings., Comment: Accepted by IEEE TPAMI 2024
Published: 2024
Full Text: View/download PDF

7. LLM-Driven Multimodal Opinion Expression Identification

Author: Jia, Bonian, Chen, Huiyao, Sun, Yueheng, Zhang, Meishan, and Zhang, Min
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Opinion Expression Identification (OEI) is essential in NLP for applications ranging from voice assistants to depression diagnosis. This study extends OEI to encompass multimodal inputs, underlining the significance of auditory cues in delivering emotional subtleties beyond the capabilities of text. We introduce a novel multimodal OEI (MOEI) task, integrating text and speech to mirror real-world scenarios. Utilizing CMU MOSEI and IEMOCAP datasets, we construct the CI-MOEI dataset. Additionally, Text-to-Speech (TTS) technology is applied to the MPQA dataset to obtain the CIM-OEI dataset. We design a template for the OEI task to take full advantage of the generative power of large language models (LLMs). Advancing further, we propose an LLM-driven method STOEI, which combines speech and text modal to identify opinion expressions. Our experiments demonstrate that MOEI significantly improves the performance while our method outperforms existing methods by 9.20\% and obtains SOTA results., Comment: 5 pages, 3 Figures, Accept by Interspeech 2024
Published: 2024
Full Text: View/download PDF

8. Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification

Author: Chen, Huiyao, Zhao, Yu, Chen, Zulong, Wang, Mengjia, Li, Liangyue, Zhang, Meishan, and Zhang, Min
Subjects: Computer Science - Computation and Language
Abstract: Hierarchical text classification (HTC) is an important task with broad applications, while few-shot HTC has gained increasing interest recently. While in-context learning (ICL) with large language models (LLMs) has achieved significant success in few-shot learning, it is not as effective for HTC because of the expansive hierarchical label sets and extremely-ambiguous labels. In this work, we introduce the first ICL-based framework with LLM for few-shot HTC. We exploit a retrieval database to identify relevant demonstrations, and an iterative policy to manage multi-layer hierarchical labels. Particularly, we equip the retrieval database with HTC label-aware representations for the input texts, which is achieved by continual training on a pretrained language model with masked language modeling (MLM), layer-wise classification (CLS, specifically for HTC), and a novel divergent contrastive learning (DCL, mainly for adjacent semantically-similar labels) objective. Experimental results on three benchmark datasets demonstrate superior performance of our method, and we can achieve state-of-the-art results in few-shot HTC., Comment: 17 pages
Published: 2024

9. AutoSurvey: Large Language Models Can Automatically Write Surveys

Author: Wang, Yidong, Guo, Qi, Yao, Wenjin, Zhang, Hongbo, Zhang, Xin, Wu, Zhen, Zhang, Meishan, Dai, Xinyu, Zhang, Min, Wen, Qingsong, Ye, Wei, Zhang, Shikun, and Zhang, Yue
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: This paper introduces AutoSurvey, a speedy and well-organized methodology for automating the creation of comprehensive literature surveys in rapidly evolving fields like artificial intelligence. Traditional survey paper creation faces challenges due to the vast volume and complexity of information, prompting the need for efficient survey methods. While large language models (LLMs) offer promise in automating this process, challenges such as context window limitations, parametric knowledge constraints, and the lack of evaluation benchmarks remain. AutoSurvey addresses these challenges through a systematic approach that involves initial retrieval and outline generation, subsection drafting by specialized LLMs, integration and refinement, and rigorous evaluation and iteration. Our contributions include a comprehensive solution to the survey problem, a reliable evaluation method, and experimental validation demonstrating AutoSurvey's effectiveness.We open our resources at \url{https://github.com/AutoSurveys/AutoSurvey}.
Published: 2024

10. Recognizing Everything from All Modalities at Once: Grounded Multimodal Universal Information Extraction

Author: Zhang, Meishan, Fei, Hao, Wang, Bin, Wu, Shengqiong, Cao, Yixin, Li, Fei, and Zhang, Min
Subjects: Computer Science - Multimedia
Abstract: In the field of information extraction (IE), tasks across a wide range of modalities and their combinations have been traditionally studied in isolation, leaving a gap in deeply recognizing and analyzing cross-modal information. To address this, this work for the first time introduces the concept of grounded Multimodal Universal Information Extraction (MUIE), providing a unified task framework to analyze any IE tasks over various modalities, along with their fine-grained groundings. To tackle MUIE, we tailor a multimodal large language model (MLLM), Reamo, capable of extracting and grounding information from all modalities, i.e., recognizing everything from all modalities at once. Reamo is updated via varied tuning strategies, equipping it with powerful capabilities for information recognition and fine-grained multimodal grounding. To address the absence of a suitable benchmark for grounded MUIE, we curate a high-quality, diverse, and challenging test set, which encompasses IE tasks across 9 common modality combinations with the corresponding multimodal groundings. The extensive comparison of Reamo with existing MLLMs integrated into pipeline approaches demonstrates its advantages across all evaluation dimensions, establishing a strong benchmark for the follow-up research. Our resources are publicly released at https://haofei.vip/MUIE.
Published: 2024

11. Chinese Sequence Labeling with Semi-Supervised Boundary-Aware Language Model Pre-training

Author: Zhang, Longhui, Long, Dingkun, Zhang, Meishan, Zhang, Yanzhao, Xie, Pengjun, and Zhang, Min
Subjects: Computer Science - Computation and Language
Abstract: Chinese sequence labeling tasks are heavily reliant on accurate word boundary demarcation. Although current pre-trained language models (PLMs) have achieved substantial gains on these tasks, they rarely explicitly incorporate boundary information into the modeling process. An exception to this is BABERT, which incorporates unsupervised statistical boundary information into Chinese BERT's pre-training objectives. Building upon this approach, we input supervised high-quality boundary information to enhance BABERT's learning, developing a semi-supervised boundary-aware PLM. To assess PLMs' ability to encode boundaries, we introduce a novel ``Boundary Information Metric'' that is both simple and effective. This metric allows comparison of different PLMs without task-specific fine-tuning. Experimental results on Chinese sequence labeling datasets demonstrate that the improved BABERT variant outperforms the vanilla version, not only on these tasks but also more broadly across a range of Chinese natural language understanding tasks. Additionally, our proposed metric offers a convenient and accurate means of evaluating PLMs' boundary awareness., Comment: Accepted to COLING 2024
Published: 2024

12. Cross-domain Chinese Sentence Pattern Parsing

Author: Yu, Jingsi, Kong, Cunliang, Yang, Liner, Zhang, Meishan, Zhu, Lin, Wang, Yujie, Lin, Haozhe, Sun, Maosong, and Yang, Erhong
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Sentence Pattern Structure (SPS) parsing is a syntactic analysis method primarily employed in language teaching.Existing SPS parsers rely heavily on textbook corpora for training, lacking cross-domain capability.To overcome this constraint, this paper proposes an innovative approach leveraging large language models (LLMs) within a self-training framework. Partial syntactic rules from a source domain are combined with target domain sentences to dynamically generate training data, enhancing the adaptability of the parser to diverse domains.Experiments conducted on textbook and news domains demonstrate the effectiveness of the proposed method, outperforming rule-based baselines by 1.68 points on F1 metrics.
Published: 2024

13. In-Context Learning for Few-Shot Nested Named Entity Recognition

Author: Zhang, Meishan, Wang, Bin, Fei, Hao, and Zhang, Min
Subjects: Computer Science - Computation and Language
Abstract: In nested Named entity recognition (NER), entities are nested with each other, and thus requiring more data annotations to address. This leads to the development of few-shot nested NER, where the prevalence of pretrained language models with in-context learning (ICL) offers promising solutions. In this work, we introduce an effective and innovative ICL framework for the setting of few-shot nested NER. We improve the ICL prompt by devising a novel example demonstration selection mechanism, EnDe retriever. In EnDe retriever, we employ contrastive learning to perform three types of representation learning, in terms of semantic similarity, boundary similarity, and label similarity, to generate high-quality demonstration examples. Extensive experiments over three nested NER and four flat NER datasets demonstrate the efficacy of our system., Comment: 5 figures
Published: 2024

14. A Two-Stage Adaptation of Large Language Models for Text Ranking

Author: Zhang, Longhui, Zhang, Yanzhao, Long, Dingkun, Xie, Pengjun, Zhang, Meishan, and Zhang, Min
Subjects: Computer Science - Information Retrieval
Abstract: Text ranking is a critical task in information retrieval. Recent advances in pre-trained language models (PLMs), especially large language models (LLMs), present new opportunities for applying them to text ranking. While supervised fine-tuning (SFT) with ranking data has been widely explored to better align PLMs with text ranking goals, previous studies have focused primarily on encoder-only and encoder-decoder PLMs. Research on leveraging decoder-only LLMs for text ranking remains scarce. An exception to this is RankLLaMA, which uses direct SFT to explore LLaMA's potential for text ranking. In this work, we propose a two-stage progressive paradigm to better adapt LLMs to text ranking. First, we conduct continual pre-training (CPT) of LLMs on a large weakly-supervised corpus. Second, we perform SFT, and propose an improved optimization strategy building upon RankLLaMA. Our experimental results on multiple benchmarks show that our approach outperforms previous methods in both in-domain and out-domain scenarios., Comment: Accepted to Findings of ACL 2024. Code and models available at https://github.com/Alibaba-NLP/RankingGPT
Published: 2023

15. LLM-enhanced Self-training for Cross-domain Constituency Parsing

Author: Li, Jianling, Zhang, Meishan, Guo, Peiming, Zhang, Min, and Zhang, Yue
Subjects: Computer Science - Computation and Language
Abstract: Self-training has proven to be an effective approach for cross-domain tasks, and in this study, we explore its application to cross-domain constituency parsing. Traditional self-training methods rely on limited and potentially low-quality raw corpora. To overcome this limitation, we propose enhancing self-training with the large language model (LLM) to generate domain-specific raw corpora iteratively. For the constituency parsing, we introduce grammar rules that guide the LLM in generating raw corpora and establish criteria for selecting pseudo instances. Our experimental results demonstrate that self-training for constituency parsing, equipped with an LLM, outperforms traditional methods regardless of the LLM's performance. Moreover, the combination of grammar rules and confidence criteria for pseudo-data selection yields the highest performance in the cross-domain constituency parsing., Comment: Accepted by EMNLP 2023 main conf
Published: 2023

16. Language Models are Universal Embedders

Author: Zhang, Xin, Li, Zehan, Zhang, Yanzhao, Long, Dingkun, Xie, Pengjun, Zhang, Meishan, and Zhang, Min
Subjects: Computer Science - Computation and Language
Abstract: In the large language model (LLM) revolution, embedding is a key component of various systems. For example, it is used to retrieve knowledge or memories for LLMs, to build content moderation filters, etc. As such cases span from English to other natural or programming languages, from retrieval to classification and beyond, it is desirable to build a unified embedding model rather than dedicated ones for each scenario. In this work, we make an initial step towards this goal, demonstrating that multiple languages (both natural and programming) pre-trained transformer decoders can embed universally when finetuned on limited English data. We provide a comprehensive practice with thorough evaluations. On English MTEB, our models achieve competitive performance on different embedding tasks by minimal training data. On other benchmarks, such as multilingual classification and code search, our models (without any supervision) perform comparably to, or even surpass heavily supervised baselines and/or APIs. These results provide evidence of a promising path towards building powerful unified embedders that can be applied across tasks and languages., Comment: 13 pages, in progress
Published: 2023

17. Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language Navigation

Author: Cui, Yibo, Xie, Liang, Zhang, Yakun, Zhang, Meishan, Yan, Ye, and Yin, Erwei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Cross-modal alignment is one key challenge for Vision-and-Language Navigation (VLN). Most existing studies concentrate on mapping the global instruction or single sub-instruction to the corresponding trajectory. However, another critical problem of achieving fine-grained alignment at the entity level is seldom considered. To address this problem, we propose a novel Grounded Entity-Landmark Adaptive (GELA) pre-training paradigm for VLN tasks. To achieve the adaptive pre-training paradigm, we first introduce grounded entity-landmark human annotations into the Room-to-Room (R2R) dataset, named GEL-R2R. Additionally, we adopt three grounded entity-landmark adaptive pre-training objectives: 1) entity phrase prediction, 2) landmark bounding box prediction, and 3) entity-landmark semantic alignment, which explicitly supervise the learning of fine-grained cross-modal alignment between entity phrases and environment landmarks. Finally, we validate our model on two downstream benchmarks: VLN with descriptive instructions (R2R) and dialogue instructions (CVDN). The comprehensive experiments show that our GELA model achieves state-of-the-art results on both tasks, demonstrating its effectiveness and generalizability., Comment: ICCV 2023 Oral
Published: 2023

18. Constructing Holistic Spatio-Temporal Scene Graph for Video Semantic Role Labeling

Author: Zhao, Yu, Fei, Hao, Cao, Yixin, Li, Bobo, Zhang, Meishan, Wei, Jianguo, Zhang, Min, and Chua, Tat-Seng
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Video Semantic Role Labeling (VidSRL) aims to detect the salient events from given videos, by recognizing the predict-argument event structures and the interrelationships between events. While recent endeavors have put forth methods for VidSRL, they can be mostly subject to two key drawbacks, including the lack of fine-grained spatial scene perception and the insufficiently modeling of video temporality. Towards this end, this work explores a novel holistic spatio-temporal scene graph (namely HostSG) representation based on the existing dynamic scene graph structures, which well model both the fine-grained spatial semantics and temporal dynamics of videos for VidSRL. Built upon the HostSG, we present a nichetargeting VidSRL framework. A scene-event mapping mechanism is first designed to bridge the gap between the underlying scene structure and the high-level event semantic structure, resulting in an overall hierarchical scene-event (termed ICE) graph structure. We further perform iterative structure refinement to optimize the ICE graph, such that the overall structure representation can best coincide with end task demand. Finally, three subtask predictions of VidSRL are jointly decoded, where the end-to-end paradigm effectively avoids error propagation. On the benchmark dataset, our framework boosts significantly over the current best-performing model. Further analyses are shown for a better understanding of the advances of our methods., Comment: Accepted by ACM MM 2023
Published: 2023

19. Towards General Text Embeddings with Multi-stage Contrastive Learning

Author: Li, Zehan, Zhang, Xin, Zhang, Yanzhao, Long, Dingkun, Xie, Pengjun, and Zhang, Meishan
Subjects: Computer Science - Computation and Language
Abstract: We present GTE, a general-purpose text embedding model trained with multi-stage contrastive learning. In line with recent advancements in unifying various NLP tasks into a single format, we train a unified text embedding model by employing contrastive learning over a diverse mixture of datasets from multiple sources. By significantly increasing the number of training data during both unsupervised pre-training and supervised fine-tuning stages, we achieve substantial performance gains over existing embedding models. Notably, even with a relatively modest parameter count of 110M, GTE$_\text{base}$ outperforms the black-box embedding API provided by OpenAI and even surpasses 10x larger text embedding models on the massive text embedding benchmark. Furthermore, without additional fine-tuning on each programming language individually, our model outperforms previous best code retrievers of similar size by treating code as text. In summary, our model achieves impressive results by effectively harnessing multi-stage contrastive learning, offering a powerful and efficient text embedding model with broad applicability across various NLP and code-related tasks.
Published: 2023

20. XNLP: An Interactive Demonstration System for Universal Structured NLP

Author: Fei, Hao, Zhang, Meishan, Zhang, Min, and Chua, Tat-Seng
Subjects: Computer Science - Computation and Language
Abstract: Structured Natural Language Processing (XNLP) is an important subset of NLP that entails understanding the underlying semantic or syntactic structure of texts, which serves as a foundational component for many downstream applications. Despite certain recent efforts to explore universal solutions for specific categories of XNLP tasks, a comprehensive and effective approach for unifying all XNLP tasks long remains underdeveloped. In the meanwhile, while XNLP demonstration systems are vital for researchers exploring various XNLP tasks, existing platforms can be limited to, e.g., supporting few XNLP tasks, lacking interactivity and universalness. To this end, we propose an advanced XNLP demonstration platform, where we propose leveraging LLM to achieve universal XNLP, with one model for all with high generalizability. Overall, our system advances in multiple aspects, including universal XNLP modeling, high performance, interpretability, scalability, and interactivity, providing a unified platform for exploring diverse XNLP tasks in the community. XNLP is online: https://xnlp.haofei.vip, Comment: ACL 2024 Demonstration Paper
Published: 2023

21. Holistic Exploration on Universal Decompositional Semantic Parsing: Architecture, Data Augmentation, and LLM Paradigm

Author: Deng, Hexuan, Zhang, Xin, Zhang, Meishan, Liu, Xuebo, and Zhang, Min
Subjects: Computer Science - Computation and Language
Abstract: In this paper, we conduct a holistic exploration of the Universal Decompositional Semantic (UDS) Parsing. We first introduce a cascade model for UDS parsing that decomposes the complex parsing task into semantically appropriate subtasks. Our approach outperforms the prior models, while significantly reducing inference time. We also incorporate syntactic information and further optimized the architecture. Besides, different ways for data augmentation are explored, which further improve the UDS Parsing. Lastly, we conduct experiments to investigate the efficacy of ChatGPT in handling the UDS task, revealing that it excels in attribute parsing but struggles in relation parsing, and using ChatGPT for data augmentation yields suboptimal results. Our code is available at https://github.com/hexuandeng/HExp4UDS., Comment: 12 pages, 7 figures, 3 tables
Published: 2023

22. A Pilot Study on Dialogue-Level Dependency Parsing for Chinese

Author: Jiang, Gongyao, Liu, Shuang, Zhang, Meishan, and Zhang, Min
Subjects: Computer Science - Computation and Language
Abstract: Dialogue-level dependency parsing has received insufficient attention, especially for Chinese. To this end, we draw on ideas from syntactic dependency and rhetorical structure theory (RST), developing a high-quality human-annotated corpus, which contains 850 dialogues and 199,803 dependencies. Considering that such tasks suffer from high annotation costs, we investigate zero-shot and few-shot scenarios. Based on an existing syntactic treebank, we adopt a signal-based method to transform seen syntactic dependencies into unseen ones between elementary discourse units (EDUs), where the signals are detected by masked language modeling. Besides, we apply single-view and multi-view data selection to access reliable pseudo-labeled instances. Experimental results show the effectiveness of these baselines. Moreover, we discuss several crucial points about our dataset and approach., Comment: Accepted by Findings of ACL 2023 (Camera-ready version)
Published: 2023

23. Constructing Code-mixed Universal Dependency Forest for Unbiased Cross-lingual Relation Extraction

Author: Fei, Hao, Zhang, Meishan, Zhang, Min, and Chua, Tat-Seng
Subjects: Computer Science - Computation and Language
Abstract: Latest efforts on cross-lingual relation extraction (XRE) aggressively leverage the language-consistent structural features from the universal dependency (UD) resource, while they may largely suffer from biased transfer (e.g., either target-biased or source-biased) due to the inevitable linguistic disparity between languages. In this work, we investigate an unbiased UD-based XRE transfer by constructing a type of code-mixed UD forest. We first translate the sentence of the source language to the parallel target-side language, for both of which we parse the UD tree respectively. Then, we merge the source-/target-side UD structures as a unified code-mixed UD forest. With such forest features, the gaps of UD-based XRE between the training and predicting phases can be effectively closed. We conduct experiments on the ACE XRE benchmark datasets, where the results demonstrate that the proposed code-mixed UD forests help unbiased UD-based XRE transfer, with which we achieve significant XRE performance gains.
Published: 2023

24. Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination

Author: Fei, Hao, Liu, Qian, Zhang, Meishan, Zhang, Min, and Chua, Tat-Seng
Subjects: Computer Science - Computation and Language
Abstract: In this work, we investigate a more realistic unsupervised multimodal machine translation (UMMT) setup, inference-time image-free UMMT, where the model is trained with source-text image pairs, and tested with only source-text inputs. First, we represent the input images and texts with the visual and language scene graphs (SG), where such fine-grained vision-language features ensure a holistic understanding of the semantics. To enable pure-text input during inference, we devise a visual scene hallucination mechanism that dynamically generates pseudo visual SG from the given textual SG. Several SG-pivoting based learning objectives are introduced for unsupervised translation training. On the benchmark Multi30K data, our SG-based method outperforms the best-performing baseline by significant BLEU scores on the task and setup, helping yield translations with better completeness, relevance and fluency without relying on paired images. Further in-depth analyses reveal how our model advances in the task setting., Comment: ACL 2023
Published: 2023

25. Generating Visual Spatial Description via Holistic 3D Scene Understanding

Author: Zhao, Yu, Fei, Hao, Ji, Wei, Wei, Jianguo, Zhang, Meishan, Zhang, Min, and Chua, Tat-Seng
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Visual spatial description (VSD) aims to generate texts that describe the spatial relations of the given objects within images. Existing VSD work merely models the 2D geometrical vision features, thus inevitably falling prey to the problem of skewed spatial understanding of target objects. In this work, we investigate the incorporation of 3D scene features for VSD. With an external 3D scene extractor, we obtain the 3D objects and scene features for input images, based on which we construct a target object-centered 3D spatial scene graph (Go3D-S2G), such that we model the spatial semantics of target objects within the holistic 3D scenes. Besides, we propose a scene subgraph selecting mechanism, sampling topologically-diverse subgraphs from Go3D-S2G, where the diverse local structure features are navigated to yield spatially-diversified text generation. Experimental results on two VSD datasets demonstrate that our framework outperforms the baselines significantly, especially improving on the cases with complex visual spatial relations. Meanwhile, our method can produce more spatially-diversified generation. Code is available at https://github.com/zhaoyucs/VSD., Comment: ACL 2023
Published: 2023

26. On the Robustness of Aspect-based Sentiment Analysis: Rethinking Model, Data, and Training

Author: Fei, Hao, Chua, Tat-Seng, Li, Chenliang, Ji, Donghong, Zhang, Meishan, and Ren, Yafeng
Subjects: Computer Science - Computation and Language
Abstract: Aspect-based sentiment analysis (ABSA) aims at automatically inferring the specific sentiment polarities toward certain aspects of products or services behind the social media texts or reviews, which has been a fundamental application to the real-world society. Since the early 2010s, ABSA has achieved extraordinarily high accuracy with various deep neural models. However, existing ABSA models with strong in-house performances may fail to generalize to some challenging cases where the contexts are variable, i.e., low robustness to real-world environments. In this study, we propose to enhance the ABSA robustness by systematically rethinking the bottlenecks from all possible angles, including model, data, and training. First, we strengthen the current best-robust syntax-aware models by further incorporating the rich external syntactic dependencies and the labels with aspect simultaneously with a universal-syntax graph convolutional network. In the corpus perspective, we propose to automatically induce high-quality synthetic training data with various types, allowing models to learn sufficient inductive bias for better robustness. Last, we based on the rich pseudo data perform adversarial training to enhance the resistance to the context perturbation and meanwhile employ contrastive learning to reinforce the representations of instances with contrastive sentiments. Extensive robustness evaluations are conducted. The results demonstrate that our enhanced syntax-aware model achieves better robustness performances than all the state-of-the-art baselines. By additionally incorporating our synthetic corpus, the robust testing results are pushed with around 10% accuracy, which are then further improved by installing the advanced training strategies. In-depth analyses are presented for revealing the factors influencing the ABSA robustness., Comment: Accepted in ACM Transactions on Information Systems
Published: 2023
Full Text: View/download PDF

27. LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model

Author: Fei, Hao, Wu, Shengqiong, Li, Jingye, Li, Bobo, Li, Fei, Qin, Libo, Zhang, Meishan, Zhang, Min, and Chua, Tat-Seng
Subjects: Computer Science - Computation and Language
Abstract: Universally modeling all typical information extraction tasks (UIE) with one generative language model (GLM) has revealed great potential by the latest study, where various IE predictions are unified into a linearized hierarchical expression under a GLM. Syntactic structure information, a type of effective feature which has been extensively utilized in IE community, should also be beneficial to UIE. In this work, we propose a novel structure-aware GLM, fully unleashing the power of syntactic knowledge for UIE. A heterogeneous structure inductor is explored to unsupervisedly induce rich heterogeneous structural representations by post-training an existing GLM. In particular, a structural broadcaster is devised to compact various latent trees into explicit high-order forests, helping to guide a better generation during decoding. We finally introduce a task-oriented structure fine-tuning mechanism, further adjusting the learned structures to most coincide with the end-task's need. Over 12 IE benchmarks across 7 tasks our system shows significant improvements over the baseline UIE system. Further in-depth analyses show that our GLM learns rich task-adaptive structural bias that greatly resolves the UIE crux, the long-range dependence issue and boundary identifying. Source codes are open at https://github.com/ChocoWu/LasUIE., Comment: NeurIPS2022 conference paper
Published: 2023

28. ChatIE: Zero-Shot Information Extraction via Chatting with ChatGPT

Author: Wei, Xiang, Cui, Xingyu, Cheng, Ning, Wang, Xiaobin, Zhang, Xin, Huang, Shen, Xie, Pengjun, Xu, Jinan, Chen, Yufeng, Zhang, Meishan, Jiang, Yong, and Han, Wenjuan
Subjects: Computer Science - Computation and Language
Abstract: Zero-shot information extraction (IE) aims to build IE systems from the unannotated text. It is challenging due to involving little human intervention. Challenging but worthwhile, zero-shot IE reduces the time and effort that data labeling takes. Recent efforts on large language models (LLMs, e.g., GPT-3, ChatGPT) show promising performance on zero-shot settings, thus inspiring us to explore prompt-based methods. In this work, we ask whether strong IE models can be constructed by directly prompting LLMs. Specifically, we transform the zero-shot IE task into a multi-turn question-answering problem with a two-stage framework (ChatIE). With the power of ChatGPT, we extensively evaluate our framework on three IE tasks: entity-relation triple extract, named entity recognition, and event extraction. Empirical results on six datasets across two languages show that ChatIE achieves impressive performance and even surpasses some full-shot models on several datasets (e.g., NYT11-HRL). We believe that our work could shed light on building IE models with limited resources.
Published: 2023

29. Improving Simultaneous Machine Translation with Monolingual Data

Author: Deng, Hexuan, Ding, Liang, Liu, Xuebo, Zhang, Meishan, Tao, Dacheng, and Zhang, Min
Subjects: Computer Science - Computation and Language
Abstract: Simultaneous machine translation (SiMT) is usually done via sequence-level knowledge distillation (Seq-KD) from a full-sentence neural machine translation (NMT) model. However, there is still a significant performance gap between NMT and SiMT. In this work, we propose to leverage monolingual data to improve SiMT, which trains a SiMT student on the combination of bilingual data and external monolingual data distilled by Seq-KD. Preliminary experiments on En-Zh and En-Ja news domain corpora demonstrate that monolingual data can significantly improve translation quality (e.g., +3.15 BLEU on En-Zh). Inspired by the behavior of human simultaneous interpreters, we propose a novel monolingual sampling strategy for SiMT, considering both chunk length and monotonicity. Experimental results show that our sampling strategy consistently outperforms the random sampling strategy (and other conventional typical NMT monolingual sampling strategies) by avoiding the key problem of SiMT -- hallucination, and has better scalability. We achieve +0.72 BLEU improvements on average against random sampling on En-Zh and En-Ja. Data and codes can be found at https://github.com/hexuandeng/Mono4SiMT., Comment: Accepted by AAAI 2023. Extended version includes supplementary material. 10 pages, 4 figures, 8 tables
Published: 2022

30. Unsupervised Boundary-Aware Language Model Pretraining for Chinese Sequence Labeling

Author: Jiang, Peijie, Long, Dingkun, Zhang, Yanzhao, Xie, Pengjun, Zhang, Meishan, and Zhang, Min
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Boundary information is critical for various Chinese language processing tasks, such as word segmentation, part-of-speech tagging, and named entity recognition. Previous studies usually resorted to the use of a high-quality external lexicon, where lexicon items can offer explicit boundary information. However, to ensure the quality of the lexicon, great human effort is always necessary, which has been generally ignored. In this work, we suggest unsupervised statistical boundary information instead, and propose an architecture to encode the information directly into pre-trained language models, resulting in Boundary-Aware BERT (BABERT). We apply BABERT for feature induction of Chinese sequence labeling tasks. Experimental results on ten benchmarks of Chinese sequence labeling demonstrate that BABERT can provide consistent improvements on all datasets. In addition, our method can complement previous supervised lexicon exploration, where further improvements can be achieved when integrated with external lexicon information., Comment: 12 pages, 2 figures, 7 tables, EMNLP 2022
Published: 2022

31. Extending Phrase Grounding with Pronouns in Visual Dialogues

Author: Lu, Panzhong, Zhang, Xin, Zhang, Meishan, and Zhang, Min
Subjects: Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: Conventional phrase grounding aims to localize noun phrases mentioned in a given caption to their corresponding image regions, which has achieved great success recently. Apparently, sole noun phrase grounding is not enough for cross-modal visual language understanding. Here we extend the task by considering pronouns as well. First, we construct a dataset of phrase grounding with both noun phrases and pronouns to image regions. Based on the dataset, we test the performance of phrase grounding by using a state-of-the-art literature model of this line. Then, we enhance the baseline grounding model with coreference information which should help our task potentially, modeling the coreference structures with graph convolutional networks. Experiments on our dataset, interestingly, show that pronouns are easier to ground than noun phrases, where the possible reason might be that these pronouns are much less ambiguous. Additionally, our final model with coreference information can significantly boost the grounding performance of both noun phrases and pronouns., Comment: Accepted by EMNLP 2022
Published: 2022

32. Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation

Author: Zhao, Yu, Wei, Jianguo, Lin, Zhichao, Sun, Yueheng, Zhang, Meishan, and Zhang, Min
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Image-to-text tasks, such as open-ended image captioning and controllable image description, have received extensive attention for decades. Here, we further advance this line of work by presenting Visual Spatial Description (VSD), a new perspective for image-to-text toward spatial semantics. Given an image and two objects inside it, VSD aims to produce one description focusing on the spatial perspective between the two objects. Accordingly, we manually annotate a dataset to facilitate the investigation of the newly-introduced task and build several benchmark encoder-decoder models by using VL-BART and VL-T5 as backbones. In addition, we investigate pipeline and joint end-to-end architectures for incorporating visual spatial relationship classification (VSRC) information into our model. Finally, we conduct experiments on our benchmark dataset to evaluate all our models. Results show that our models are impressive, providing accurate and human-like spatial-oriented text descriptions. Meanwhile, VSRC has great potential for VSD, and the joint end-to-end architecture is the better choice for their integration. We make the dataset and codes public for research purposes.
Published: 2022

33. Conversational Semantic Role Labeling with Predicate-Oriented Latent Graph

Author: Fei, Hao, Wu, Shengqiong, Zhang, Meishan, Ren, Yafeng, and Ji, Donghong
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Conversational semantic role labeling (CSRL) is a newly proposed task that uncovers the shallow semantic structures in a dialogue text. Unfortunately several important characteristics of the CSRL task have been overlooked by the existing works, such as the structural information integration, near-neighbor influence. In this work, we investigate the integration of a latent graph for CSRL. We propose to automatically induce a predicate-oriented latent graph (POLar) with a predicate-centered Gaussian mechanism, by which the nearer and informative words to the predicate will be allocated with more attention. The POLar structure is then dynamically pruned and refined so as to best fit the task need. We additionally introduce an effective dialogue-level pre-trained language model, CoDiaBERT, for better supporting multiple utterance sentences and handling the speaker coreference issue in CSRL. Our system outperforms best-performing baselines on three benchmark CSRL datasets with big margins, especially achieving over 4% F1 score improvements on the cross-utterance argument detection. Further analyses are presented to better understand the effectiveness of our proposed methods.
Published: 2022

34. Domain-Specific NER via Retrieving Correlated Samples

Author: Zhang, Xin, Jiang, Yong, Wang, Xiaobin, Hu, Xuming, Sun, Yueheng, Xie, Pengjun, and Zhang, Meishan
Subjects: Computer Science - Computation and Language
Abstract: Successful Machine Learning based Named Entity Recognition models could fail on texts from some special domains, for instance, Chinese addresses and e-commerce titles, where requires adequate background knowledge. Such texts are also difficult for human annotators. In fact, we can obtain some potentially helpful information from correlated texts, which have some common entities, to help the text understanding. Then, one can easily reason out the correct answer by referencing correlated samples. In this paper, we suggest enhancing NER models with correlated samples. We draw correlated samples by the sparse BM25 retriever from large-scale in-domain unlabeled data. To explicitly simulate the human reasoning process, we perform a training-free entity type calibrating by majority voting. To capture correlation features in the training stage, we suggest to model correlated samples by the transformer-based multi-instance cross-encoder. Empirical results on datasets of the above two domains show the efficacy of our methods., Comment: Accepted by COLING 2022, added dev results of the address data
Published: 2022

35. Robust Self-Augmentation for Named Entity Recognition with Meta Reweighting

Author: Wu, Linzhi, Xie, Pengjun, Zhou, Jie, Zhang, Meishan, Ma, Chunping, Xu, Guangwei, and Zhang, Min
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Self-augmentation has received increasing research interest recently to improve named entity recognition (NER) performance in low-resource scenarios. Token substitution and mixup are two feasible heterogeneous self-augmentation techniques for NER that can achieve effective performance with certain specialized efforts. Noticeably, self-augmentation may introduce potentially noisy augmented data. Prior research has mainly resorted to heuristic rule-based constraints to reduce the noise for specific self-augmentation methods individually. In this paper, we revisit these two typical self-augmentation methods for NER, and propose a unified meta-reweighting strategy for them to achieve a natural integration. Our method is easily extensible, imposing little effort on a specific self-augmentation method. Experiments on different Chinese and English NER benchmarks show that our token substitution and mixup method, as well as their integration, can achieve effective performance improvement. Based on the meta-reweighting mechanism, we can enhance the advantages of the self-augmentation techniques without much extra effort., Comment: Accepted by NAACL 2022
Published: 2022

36. Identifying Chinese Opinion Expressions with Extremely-Noisy Crowdsourcing Annotations

Author: Zhang, Xin, Xu, Guangwei, Sun, Yueheng, Zhang, Meishan, Wang, Xiaobin, and Zhang, Min
Subjects: Computer Science - Computation and Language
Abstract: Recent works of opinion expression identification (OEI) rely heavily on the quality and scale of the manually-constructed training corpus, which could be extremely difficult to satisfy. Crowdsourcing is one practical solution for this problem, aiming to create a large-scale but quality-unguaranteed corpus. In this work, we investigate Chinese OEI with extremely-noisy crowdsourcing annotations, constructing a dataset at a very low cost. Following zhang et al. (2021), we train the annotator-adapter model by regarding all annotations as gold-standard in terms of crowd annotators, and test the model by using a synthetic expert, which is a mixture of all annotators. As this annotator-mixture for testing is never modeled explicitly in the training phase, we propose to generate synthetic training samples by a pertinent mixup strategy to make the training and testing highly consistent. The simulation experiments on our constructed dataset show that crowdsourcing is highly promising for OEI, and our proposed annotator-mixup can further enhance the crowdsourcing modeling., Comment: Accepted by ACL 2022 main conf
Published: 2022

37. On the Role of Pre-trained Language Models in Word Ordering: A Case Study with BART

Author: Ou, Zebin, Zhang, Meishan, and Zhang, Yue
Subjects: Computer Science - Computation and Language
Abstract: Word ordering is a constrained language generation task taking unordered words as input. Existing work uses linear models and neural networks for the task, yet pre-trained language models have not been studied in word ordering, let alone why they help. We use BART as an instance and show its effectiveness in the task. To explain why BART helps word ordering, we extend analysis with probing and empirically identify that syntactic dependency knowledge in BART is a reliable explanation. We also report performance gains with BART in the related partial tree linearization task, which readily extends our analysis., Comment: COLING 2022
Published: 2022

38. ILP1 and NTR1 affect the stability of U6 snRNA during spliceosome complex disassembly in Arabidopsis

Author: Wu, Jiaming, Chen, Wei, Ge, Shengchao, Liu, Xueliang, Shan, Junling, Zhang, Meishan, Su, Yuan, and Liu, Yunfeng
Published: 2024
Full Text: View/download PDF

39. AISHELL-NER: Named Entity Recognition from Chinese Speech

Author: Chen, Boli, Xu, Guangwei, Wang, Xiaobin, Xie, Pengjun, Zhang, Meishan, and Huang, Fei
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Named Entity Recognition (NER) from speech is among Spoken Language Understanding (SLU) tasks, aiming to extract semantic information from the speech signal. NER from speech is usually made through a two-step pipeline that consists of (1) processing the audio using an Automatic Speech Recognition (ASR) system and (2) applying an NER tagger to the ASR outputs. Recent works have shown the capability of the End-to-End (E2E) approach for NER from English and French speech, which is essentially entity-aware ASR. However, due to the many homophones and polyphones that exist in Chinese, NER from Chinese speech is effectively a more challenging task. In this paper, we introduce a new dataset AISEHLL-NER for NER from Chinese speech. Extensive experiments are conducted to explore the performance of several state-of-the-art methods. The results demonstrate that the performance could be improved by combining entity-aware ASR and pretrained NER tagger, which can be easily applied to the modern SLU pipeline. The dataset is publicly available at github.com/Alibaba-NLP/AISHELL-NER.
Published: 2022

40. Quantifying Robustness to Adversarial Word Substitutions

Author: Yang, Yuting, Huang, Pei, Ma, FeiFei, Cao, Juan, Zhang, Meishan, Zhang, Jian, and Li, Jintao
Subjects: Computer Science - Computation and Language
Abstract: Deep-learning-based NLP models are found to be vulnerable to word substitution perturbations. Before they are widely adopted, the fundamental issues of robustness need to be addressed. Along this line, we propose a formal framework to evaluate word-level robustness. First, to study safe regions for a model, we introduce robustness radius which is the boundary where the model can resist any perturbation. As calculating the maximum robustness radius is computationally hard, we estimate its upper and lower bound. We repurpose attack methods as ways of seeking upper bound and design a pseudo-dynamic programming algorithm for a tighter upper bound. Then verification method is utilized for a lower bound. Further, for evaluating the robustness of regions outside a safe radius, we reexamine robustness from another view: quantification. A robustness metric with a rigorous statistical guarantee is introduced to measure the quantification of adversarial examples, which indicates the model's susceptibility to perturbations outside the safe radius. The metric helps us figure out why state-of-the-art models like BERT can be easily fooled by a few word substitutions, but generalize well in the presence of real-world noises.
Published: 2022

41. Unified Named Entity Recognition as Word-Word Relation Classification

Author: Li, Jingye, Fei, Hao, Liu, Jiang, Wu, Shengqiong, Zhang, Meishan, Teng, Chong, Ji, Donghong, and Li, Fei
Subjects: Computer Science - Computation and Language
Abstract: So far, named entity recognition (NER) has been involved with three major types, including flat, overlapped (aka. nested), and discontinuous NER, which have mostly been studied individually. Recently, a growing interest has been built for unified NER, tackling the above three jobs concurrently with one single model. Current best-performing methods mainly include span-based and sequence-to-sequence models, where unfortunately the former merely focus on boundary identification and the latter may suffer from exposure bias. In this work, we present a novel alternative by modeling the unified NER as word-word relation classification, namely W^2NER. The architecture resolves the kernel bottleneck of unified NER by effectively modeling the neighboring relations between entity words with Next-Neighboring-Word (NNW) and Tail-Head-Word-* (THW-*) relations. Based on the W^2NER scheme we develop a neural framework, in which the unified NER is modeled as a 2D grid of word pairs. We then propose multi-granularity 2D convolutions for better refining the grid representations. Finally, a co-predictor is used to sufficiently reason the word-word relations. We perform extensive experiments on 14 widely-used benchmark datasets for flat, overlapped, and discontinuous NER (8 English and 6 Chinese datasets), where our model beats all the current top-performing baselines, pushing the state-of-the-art performances of unified NER., Comment: Accepted by AAAI'22
Published: 2021

42. Mastering the Explicit Opinion-role Interaction: Syntax-aided Neural Transition System for Unified Opinion Role Labeling

Author: Wu, Shengqiong, Fei, Hao, Li, Fei, Ji, Donghong, Zhang, Meishan, Liu, Yijiang, and Teng, Chong
Subjects: Computer Science - Computation and Language
Abstract: Unified opinion role labeling (ORL) aims to detect all possible opinion structures of 'opinion-holder-target' in one shot, given a text. The existing transition-based unified method, unfortunately, is subject to longer opinion terms and fails to solve the term overlap issue. Current top performance has been achieved by employing the span-based graph model, which however still suffers from both high model complexity and insufficient interaction among opinions and roles. In this work, we investigate a novel solution by revisiting the transition architecture, and augmenting it with a pointer network (PointNet). The framework parses out all opinion structures in linear-time complexity, meanwhile breaks through the limitation of any length of terms with PointNet. To achieve the explicit opinion-role interactions, we further propose a unified dependency-opinion graph (UDOG), co-modeling the syntactic dependency structure and the partial opinion-role structure. We then devise a relation-centered graph aggregator (RCGA) to encode the multi-relational UDOG, where the resulting high-order representations are used to promote the predictions in the vanilla transition system. Our model achieves new state-of-the-art results on the MPQA benchmark. Analyses further demonstrate the superiority of our methods on both efficacy and efficiency., Comment: AAAI2022
Published: 2021

43. A Graph-Based Neural Model for End-to-End Frame Semantic Parsing

Author: Lin, Zhichao, Sun, Yueheng, and Zhang, Meishan
Subjects: Computer Science - Computation and Language
Abstract: Frame semantic parsing is a semantic analysis task based on FrameNet which has received great attention recently. The task usually involves three subtasks sequentially: (1) target identification, (2) frame classification and (3) semantic role labeling. The three subtasks are closely related while previous studies model them individually, which ignores their intern connections and meanwhile induces error propagation problem. In this work, we propose an end-to-end neural model to tackle the task jointly. Concretely, we exploit a graph-based method, regarding frame semantic parsing as a graph construction problem. All predicates and roles are treated as graph nodes, and their relations are taken as graph edges. Experiment results on two benchmark datasets of frame semantic parsing show that our method is highly competitive, resulting in better performance than pipeline models.
Published: 2021

44. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition

Author: Li, Fei, Lin, Zhichao, Zhang, Meishan, and Ji, Donghong
Subjects: Computer Science - Computation and Language
Abstract: Research on overlapped and discontinuous named entity recognition (NER) has received increasing attention. The majority of previous work focuses on either overlapped or discontinuous entities. In this paper, we propose a novel span-based model that can recognize both overlapped and discontinuous entities jointly. The model includes two major steps. First, entity fragments are recognized by traversing over all possible text spans, thus, overlapped entities can be recognized. Second, we perform relation classification to judge whether a given pair of entity fragments to be overlapping or succession. In this way, we can recognize not only discontinuous entities, and meanwhile doubly check the overlapped entities. As a whole, our model can be regarded as a relation extraction paradigm essentially. Experimental results on multiple benchmark datasets (i.e., CLEF, GENIA and ACE05) show that our model is highly competitive for overlapped and discontinuous NER., Comment: Accepted in the main conference of ACL 2021
Published: 2021

45. Crowdsourcing Learning as Domain Adaptation: A Case Study on Named Entity Recognition

Author: Zhang, Xin, Xu, Guangwei, Sun, Yueheng, Zhang, Meishan, and Xie, Pengjun
Subjects: Computer Science - Computation and Language, Computer Science - Human-Computer Interaction, Computer Science - Machine Learning
Abstract: Crowdsourcing is regarded as one prospective solution for effective supervised learning, aiming to build large-scale annotated training data by crowd workers. Previous studies focus on reducing the influences from the noises of the crowdsourced annotations for supervised models. We take a different point in this work, regarding all crowdsourced annotations as gold-standard with respect to the individual annotators. In this way, we find that crowdsourcing could be highly similar to domain adaptation, and then the recent advances of cross-domain methods can be almost directly applied to crowdsourcing. Here we take named entity recognition (NER) as a study case, suggesting an annotator-aware representation learning model that inspired by the domain adaptation methods which attempt to capture effective domain-aware features. We investigate both unsupervised and supervised crowdsourcing learning, assuming that no or only small-scale expert annotations are available. Experimental results on a benchmark crowdsourced NER dataset show that our method is highly effective, leading to a new state-of-the-art performance. In addition, under the supervised setting, we can achieve impressive performance gains with only a very small scale of expert annotations., Comment: ACL-IJCNLP 2021 main conf, long paper; corrected the wrong reference for "argument retrieval" in first paragraph of Introduction
Published: 2021
Full Text: View/download PDF

46. End-to-end Semantic Role Labeling with Neural Transition-based Model

Author: Fei, Hao, Zhang, Meishan, Li, Bobo, and Ji, Donghong
Subjects: Computer Science - Computation and Language
Abstract: End-to-end semantic role labeling (SRL) has been received increasing interest. It performs the two subtasks of SRL: predicate identification and argument role labeling, jointly. Recent work is mostly focused on graph-based neural models, while the transition-based framework with neural networks which has been widely used in a number of closely-related tasks, has not been studied for the joint task yet. In this paper, we present the first work of transition-based neural models for end-to-end SRL. Our transition model incrementally discovers all sentential predicates as well as their arguments by a set of transition actions. The actions of the two subtasks are executed mutually for full interactions. Besides, we suggest high-order compositions to extract non-local features, which can enhance the proposed transition model further. Experimental results on CoNLL09 and Universal Proposition Bank show that our final model can produce state-of-the-art performance, and meanwhile keeps highly efficient in decoding. We also conduct detailed experimental analysis for a deep understanding of our proposed model., Comment: Accepted at AAAI 2021
Published: 2021

47. Joint Cross-Domain and Cross-Lingual Adaptation for Chinese Opinion Element Extraction

Author: Zhang, Ruoding, Zhang, Meishan, Song, Xiaoyi, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Fei, editor, Duan, Nan, editor, Xu, Qingting, editor, and Hong, Yu, editor
Published: 2023
Full Text: View/download PDF

48. Cross-lingual Semantic Role Labeling with Model Transfer

Author: Fei, Hao, Zhang, Meishan, Li, Fei, and Ji, Donghong
Subjects: Computer Science - Computation and Language
Abstract: Prior studies show that cross-lingual semantic role labeling (SRL) can be achieved by model transfer under the help of universal features. In this paper, we fill the gap of cross-lingual SRL by proposing an end-to-end SRL model that incorporates a variety of universal features and transfer methods. We study both the bilingual transfer and multi-source transfer, under gold or machine-generated syntactic inputs, pre-trained high-order abstract features, and contextualized multilingual word representations. Experimental results on the Universal Proposition Bank corpus indicate that performances of the cross-lingual SRL can vary by leveraging different cross-lingual features. In addition, whether the features are gold-standard also has an impact on performances. Precisely, we find that gold syntax features are much more crucial for cross-lingual SRL, compared with the automatically-generated ones. Moreover, universal dependency structure features are able to give the best help, and both pre-trained high-order features and contextualized word representations can further bring significant improvements., Comment: Accepted at TASLP
Published: 2020

49. A Survey of Syntactic-Semantic Parsing Based on Constituent and Dependency Structures

Author: Zhang, Meishan
Subjects: Computer Science - Computation and Language
Abstract: Syntactic and semantic parsing has been investigated for decades, which is one primary topic in the natural language processing community. This article aims for a brief survey on this topic. The parsing community includes many tasks, which are difficult to be covered fully. Here we focus on two of the most popular formalizations of parsing: constituent parsing and dependency parsing. Constituent parsing is majorly targeted to syntactic analysis, and dependency parsing can handle both syntactic and semantic analysis. This article briefly reviews the representative models of constituent parsing and dependency parsing, and also dependency graph parsing with rich semantics. Besides, we also review the closely-related topics such as cross-domain, cross-lingual and joint parsing models, parser application as well as corpus development of parsing in the article., Comment: SCIENCE CHINA Technological Sciences
Published: 2020

50. APPCorp: A Corpus for Android Privacy Policy Document Structure Analysis

Author: Liu, Shuang, Guo, Renjie, Zhao, Baiyang, Chen, Tao, and Zhang, Meishan
Subjects: Computer Science - Computers and Society
Abstract: With the increasing popularity of mobile devices and the wide adoption of mobile Apps, an increasing concern of privacy issues is raised. Privacy policy is identified as a proper medium to indicate the legal terms, such as GDPR, and to bind legal agreement between service providers and users. However, privacy policies are usually long and vague for end users to read and understand. It is thus important to be able to automatically analyze the document structures of privacy policies to assist user understanding. In this work we create a manually labelled corpus containing $167$ privacy policies (of more than $447$K words and $5,276$ annotated paragraphs). We report the annotation process and details of the annotated corpus. We also benchmark our data corpus with $4$ document classification models, thoroughly analyze the results and discuss challenges and opportunities for the research committee to use the corpus. We release our labelled corpus as well as the classification models for public access., Comment: 10 pages
Published: 2020

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

389 results on '"Zhang, Meishan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources