Author: "Chen, Jiajun" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Chen, Jiajun"' showing total 2,382 results

Start Over Author "Chen, Jiajun"

2,382 results on '"Chen, Jiajun"'

1. Formality is Favored: Unraveling the Learning Preferences of Large Language Models on Data with Conflicting Knowledge

Author: Li, Jiahuan, Cao, Yiqing, Huang, Shujian, and Chen, Jiajun
Subjects: Computer Science - Computation and Language
Abstract: Having been trained on massive pretraining data, large language models have shown excellent performance on many knowledge-intensive tasks. However, pretraining data tends to contain misleading and even conflicting information, and it is intriguing to understand how LLMs handle these noisy data during training. In this study, we systematically analyze LLMs' learning preferences for data with conflicting knowledge. We find that pretrained LLMs establish learning preferences similar to humans, i.e., preferences towards formal texts and texts with fewer spelling errors, resulting in faster learning and more favorable treatment of knowledge in data with such features when facing conflicts. This finding is generalizable across models and languages and is more evident in larger models. An in-depth analysis reveals that LLMs tend to trust data with features that signify consistency with the majority of data, and it is possible to instill new preferences and erase old ones by manipulating the degree of consistency with the majority data., Comment: accepted by EMNLP 2024, main conference
Published: 2024

2. Dynamic compensation for pump-induced frequency shift in Kerr-cat qubit initialization

Author: Xu, Yifang, Hua, Ziyue, Wang, Weiting, Ma, Yuwei, Li, Ming, Chen, Jiajun, Zhou, Jie, Pan, Xiaoxuan, Xiao, Lintao, Huang, Hongwei, Cai, Weizhou, Ai, Hao, Liu, Yu-xi, Zou, Chang-Ling, and Sun, Luyan
Subjects: Quantum Physics
Abstract: The noise-biased Kerr-cat qubit is an attractive candidate for fault-tolerant quantum computation; however, its initialization faces challenges due to the squeezing pump-induced frequency shift (PIFS). Here, we propose and demonstrate a dynamic compensation method to mitigate the effect of PIFS during the Kerr-cat qubit initialization. Utilizing a novel nonlinearity-engineered triple-loop SQUID device, we realize a stabilized Kerr-cat qubit and validate the advantages of the dynamic compensation method by improving the initialization fidelity from 57% to 78%, with a projected fidelity of 91% after excluding state preparation and measurement errors. Our results not only advance the practical implementation of Kerr-cat qubits, but also provide valuable insights into the fundamental adiabatic dynamics of these systems. This work paves the way for scalable quantum processors that leverage the bias-preserving properties of Kerr-cat qubits.
Published: 2024

3. MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing

Author: Zhou, Hao, Wang, Zhijun, Huang, Shujian, Huang, Xin, Han, Xue, Feng, Junlan, Deng, Chao, Luo, Weihua, and Chen, Jiajun
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) are often English-centric due to the disproportionate distribution of languages in their pre-training data. Enhancing non-English language capabilities through post-pretraining often results in catastrophic forgetting of the ability of original languages. Previous methods either achieve good expansion with severe forgetting or slight forgetting with poor expansion, indicating the challenge of balancing language expansion while preventing forgetting. In this paper, we propose a method called MoE-LPR (Mixture-of-Experts with Language Priors Routing) to alleviate this problem. MoE-LPR employs a two-stage training approach to enhance the multilingual capability. First, the model is post-pretrained into a Mixture-of-Experts (MoE) architecture by upcycling, where all the original parameters are frozen and new experts are added. In this stage, we focus improving the ability on expanded languages, without using any original language data. Then, the model reviews the knowledge of the original languages with replay data amounting to less than 1% of post-pretraining, where we incorporate language priors routing to better recover the abilities of the original languages. Evaluations on multiple benchmarks show that MoE-LPR outperforms other post-pretraining methods. Freezing original parameters preserves original language knowledge while adding new experts preserves the learning ability. Reviewing with LPR enables effective utilization of multilingual knowledge within the parameters. Additionally, the MoE architecture maintains the same inference overhead while increasing total model parameters. Extensive experiments demonstrate MoE-LPR's effectiveness in improving expanded languages and preserving original language proficiency with superior scalability. Code and scripts are freely available at https://github.com/zjwang21/MoE-LPR.git.
Published: 2024

4. MDM: Advancing Multi-Domain Distribution Matching for Automatic Modulation Recognition Dataset Synthesis

Author: Xu, Dongwei, Chen, Jiajun, Lu, Yao, Xia, Tianhao, Xuan, Qi, Wang, Wei, Lin, Yun, and Yang, Xiaoniu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Recently, deep learning technology has been successfully introduced into Automatic Modulation Recognition (AMR) tasks. However, the success of deep learning is all attributed to the training on large-scale datasets. Such a large amount of data brings huge pressure on storage, transmission and model training. In order to solve the problem of large amount of data, some researchers put forward the method of data distillation, which aims to compress large training data into smaller synthetic datasets to maintain its performance. While numerous data distillation techniques have been developed within the realm of image processing, the unique characteristics of signals set them apart. Signals exhibit distinct features across various domains, necessitating specialized approaches for their analysis and processing. To this end, a novel dataset distillation method--Multi-domain Distribution Matching (MDM) is proposed. MDM employs the Discrete Fourier Transform (DFT) to translate timedomain signals into the frequency domain, and then uses a model to compute distribution matching losses between the synthetic and real datasets, considering both the time and frequency domains. Ultimately, these two losses are integrated to update the synthetic dataset. We conduct extensive experiments on three AMR datasets. Experimental results show that, compared with baseline methods, our method achieves better performance under the same compression ratio. Furthermore, we conduct crossarchitecture generalization experiments on several models, and the experimental results show that our synthetic datasets can generalize well on other unseen models.
Published: 2024

5. Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs

Author: Ding, Peng, Wu, Jingyu, Kuang, Jun, Ma, Dan, Cao, Xuezhi, Cai, Xunliang, Chen, Shi, Chen, Jiajun, and Huang, Shujian
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia
Abstract: Multi-modal Large Language Models (MLLMs) have demonstrated remarkable performance on various visual-language understanding and generation tasks. However, MLLMs occasionally generate content inconsistent with the given images, which is known as "hallucination". Prior works primarily center on evaluating hallucination using standard, unperturbed benchmarks, which overlook the prevalent occurrence of perturbed inputs in real-world scenarios-such as image cropping or blurring-that are critical for a comprehensive assessment of MLLMs' hallucination. In this paper, to bridge this gap, we propose Hallu-PI, the first benchmark designed to evaluate Hallucination in MLLMs within Perturbed Inputs. Specifically, Hallu-PI consists of seven perturbed scenarios, containing 1,260 perturbed images from 11 object types. Each image is accompanied by detailed annotations, which include fine-grained hallucination types, such as existence, attribute, and relation. We equip these annotations with a rich set of questions, making Hallu-PI suitable for both discriminative and generative tasks. Extensive experiments on 12 mainstream MLLMs, such as GPT-4V and Gemini-Pro Vision, demonstrate that these models exhibit significant hallucinations on Hallu-PI, which is not observed in unperturbed scenarios. Furthermore, our research reveals a severe bias in MLLMs' ability to handle different types of hallucinations. We also design two baselines specifically for perturbed scenarios, namely Perturbed-Reminder and Perturbed-ICL. We hope that our study will bring researchers' attention to the limitations of MLLMs when dealing with perturbed inputs, and spur further investigations to address this issue. Our code and datasets are publicly available at https://github.com/NJUNLP/Hallu-PI., Comment: Acccepted by ACM MM 2024, 14 pages, 11 figures, 9 tables
Published: 2024

6. Vector Dark Matter Halo: From Polarization Dynamics to Direct Detection

Author: Chen, Jiajun, Nguyen, Le Hoang, and Marsh, David J. E.
Subjects: Astrophysics - Cosmology and Nongalactic Astrophysics
Abstract: This study investigates the characteristic polarization formation and evolution of vector dark matter (VDM) in the outer halo of galaxies. By employing numerical simulations, we analyze the behavior of VDM under different initial conditions -- homogeneous, isotropic, and partially polarized. The simulations solve the Schr\"odinger-Poisson equations, examining the spin density distribution and its evolution during gravitational collapse and halo formation. Our results reveal that VDM forms halos and central Proca stars from homogeneous and isotropic conditions, with the polarization density fluctuation amplitude mirroring VDM matter density. In scenarios with no initial polarization, spin density remains stable in the halo core but fluctuates in outer regions. Partially polarized initial conditions lead to a conservation of total polarization, with increased core polarization resulting in opposite polarization in the periphery. We examine the novel consequences of the partially polarized state for direct detection of dark photons, i.e., VDM kinetically mixed with ordinary photons., Comment: 12 pages,12 figures
Published: 2024

7. PreAlign: Boosting Cross-Lingual Transfer by Early Establishment of Multilingual Alignment

Author: Li, Jiahuan, Huang, Shujian, Ching, Aarron, Dai, Xinyu, and Chen, Jiajun
Subjects: Computer Science - Computation and Language
Abstract: Large language models demonstrate reasonable multilingual abilities, despite predominantly English-centric pretraining. However, the spontaneous multilingual alignment in these models is shown to be weak, leading to unsatisfactory cross-lingual transfer and knowledge sharing. Previous works attempt to address this issue by explicitly injecting multilingual alignment information during or after pretraining. Thus for the early stage in pretraining, the alignment is weak for sharing information or knowledge across languages. In this paper, we propose PreAlign, a framework that establishes multilingual alignment prior to language model pretraining. PreAlign injects multilingual alignment by initializing the model to generate similar representations of aligned words and preserves this alignment using a code-switching strategy during pretraining. Extensive experiments in a synthetic English to English-Clone setting demonstrate that PreAlign significantly outperforms standard multilingual joint training in language modeling, zero-shot cross-lingual transfer, and cross-lingual knowledge application. Further experiments in real-world scenarios further validate PreAlign's effectiveness across various model sizes.
Published: 2024

8. Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping

Author: Zhu, Wenhao, Liu, Sizhe, Huang, Shujian, She, Shuaijie, Wendler, Chris, and Chen, Jiajun
Subjects: Computer Science - Computation and Language
Abstract: Decoding by contrasting layers (DoLa), is designed to improve the generation quality of large language models (LLMs) by contrasting the prediction probabilities between an early exit output (amateur logits) and the final output (expert logits). However, we find that this approach does not work well on non-English tasks. Inspired by previous interpretability work on language transition during the model's forward pass, we discover that this issue arises from a language mismatch between early exit output and final output. In this work, we propose an improved contrastive decoding algorithm that is effective for diverse languages beyond English. To obtain more helpful amateur logits, we devise two strategies to skip a set of bottom, language-agnostic layers based on our preliminary analysis. Experimental results on multilingual reasoning benchmarks demonstrate that our proposed method outperforms previous contrastive decoding baselines and substantially improves LLM's chain-of-thought reasoning accuracy across 11 languages. The project will be available at: https://github.com/NJUNLP/SkipLayerCD.
Published: 2024

9. Novel structures and collapse of solitons in nonminimally gravitating dark matter halos

Author: Chen, Jiajun and Zhang, Hong-Yi
Subjects: High Energy Physics - Phenomenology
Abstract: Ultralight dark matter simulations predict Bose-Einstein condensations with short-range correlation, known as solitons or boson stars, at the centers of dark matter halos. This paper investigates the formation and collapse of dark matter solitons influenced by nonminimal gravitational effects, characterized by gradient-dependent self-interactions of dark matter and an additional source in Poisson's equation for gravity. Our simulations suggest that the initial evolution of dark matter resembles that without nonminimal gravitational effects. However, regions with negative mass density may develop, and solitons will collapse when their densities reach certain critical values for both positive and negative coupling constants. With strong nonminimal coupling, structure growth could be significantly enhanced., Comment: 10 pages, 7 big figures
Published: 2024

10. Large Language Models are Limited in Out-of-Context Knowledge Reasoning

Author: Hu, Peng, Gao, Changjiang, Gao, Ruiqi, Chen, Jiajun, and Huang, Shujian
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) possess extensive knowledge and strong capabilities in performing in-context reasoning. However, previous work challenges their out-of-context reasoning ability, i.e., the ability to infer information from their training data, instead of from the context or prompt. This paper focuses on a significant aspect of out-of-context reasoning: Out-of-Context Knowledge Reasoning (OCKR), which is to combine multiple knowledge to infer new knowledge. We designed a synthetic dataset with seven representative OCKR tasks to systematically assess the OCKR capabilities of LLMs. Using this dataset, we evaluated several LLMs and discovered that their proficiency in this aspect is limited, regardless of whether the knowledge is trained in a separate or adjacent training settings. Moreover, training the model to reason with reasoning examples does not result in significant improvement, while training the model to perform explicit knowledge retrieval helps for retrieving attribute knowledge but not the relation knowledge, indicating that the model's limited OCKR capabilities are due to difficulties in knowledge retrieval. Furthermore, we treat cross-lingual knowledge transfer as a distinct form of OCKR, and evaluate this ability. Our results show that the evaluated model also exhibits limited ability in transferring knowledge across languages.
Published: 2024

11. Why Not Transform Chat Large Language Models to Non-English?

Author: Geng, Xiang, Zhu, Ming, Li, Jiahuan, Lai, Zhejian, Zou, Wei, She, Shuaijie, Guo, Jiaxin, Zhao, Xiaofeng, Li, Yinglu, Li, Yuang, Su, Chang, Zhao, Yanqing, Lyu, Xinglin, Zhang, Min, Chen, Jiajun, Yang, Hao, and Huang, Shujian
Subjects: Computer Science - Computation and Language
Abstract: The scarcity of non-English data limits the development of non-English large language models (LLMs). Transforming English-centric LLMs to non-English has been identified as an effective and resource-efficient method. Previous works start from base LLMs and perform knowledge distillation (KD) with data generated by stronger LLMs, e.g. GPT-4. Compared to base LLMs, chat LLMs are further optimized for advanced abilities, e.g. multi-turn conversation and human preference alignment, and thus more powerful in both helpfulness and safety. However, transforming a chat LLM involves two critical issues: (1) How can we effectively transfer advanced abilities without their supervised data? (2) How can we prevent the original knowledge from catastrophic forgetting during transformation? We target these issues by introducing a simple framework called TransLLM. For the first issue, TransLLM divides the transfer problem into some common sub-tasks with the translation chain-of-thought, which uses the translation as the bridge between English and non-English step-by-step. We further enhance the performance of sub-tasks with publicly available data. For the second issue, we propose a method comprising two synergistic components: low-rank adaptation for training to maintain the original LLM parameters, and recovery KD, which utilizes data generated by the chat LLM itself to recover the original knowledge from the frozen parameters. In the experiments, we transform the LLaMA-2-chat-7B to the Thai language. Our method, using only single-turn data, outperforms strong baselines and ChatGPT on multi-turn benchmark MT-bench. Furthermore, our method, without safety data, rejects more harmful queries of safety benchmark AdvBench than both ChatGPT and GPT-4.
Published: 2024

12. Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners

Author: Zhang, Shimao, Gao, Changjiang, Zhu, Wenhao, Chen, Jiajun, Huang, Xin, Han, Xue, Feng, Junlan, Deng, Chao, and Huang, Shujian
Subjects: Computer Science - Computation and Language
Abstract: Recently, Large Language Models (LLMs) have shown impressive language capabilities. While most of the existing LLMs have very unbalanced performance across different languages, multilingual alignment based on translation parallel data is an effective method to enhance the LLMs' multilingual capabilities. In this work, we discover and comprehensively investigate the spontaneous multilingual alignment improvement of LLMs. We find that LLMs instruction-tuned on the question translation data (i.e. without annotated answers) are able to encourage the alignment between English and a wide range of languages, even including those unseen during instruction-tuning. Additionally, we utilize different settings and mechanistic interpretability methods to analyze the LLM's performance in the multilingual scenario comprehensively. Our work suggests that LLMs have enormous potential for improving multilingual alignment efficiently with great language and task generalization.
Published: 2024

13. The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights

Author: Zhu, Wenhao, Huang, Shujian, Yuan, Fei, Chen, Cheng, Chen, Jiajun, and Birch, Alexandra
Subjects: Computer Science - Computation and Language
Abstract: Bridging the significant gap between large language model's English and non-English performance presents a great challenge. While some previous studies attempt to mitigate this gap with translated training data, the recently proposed question alignment framework leverages the model's English expertise to improve multilingual performance with minimum usage of expensive, error-prone translation. In this paper, we explore how broadly this method can be applied by examining its effects in reasoning with and without chain-of-thought, as well as with program-of-thought. We also explore applying this framework to extremely large language models in an efficient manner, such as through proxy-tuning. Experiment results on multilingual reasoning benchmarks mGSM, mSVAMP, xCSQA and xNLI demonstrate that we can extend question alignment framework to boost multilingual performance across diverse reasoning scenarios, model families, and sizes. For instance, when applied to the LLaMA2 models, it brings an average accuracy improvements of 12.2% on mGSM even with the 70B model. To understand the mechanism of its success, we analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs and shapes their working patterns.
Published: 2024

14. Enforcing Paraphrase Generation via Controllable Latent Diffusion

Author: Zou, Wei, Zhuang, Ziyuan, Huang, Shujian, Liu, Jia, and Chen, Jiajun
Subjects: Computer Science - Computation and Language
Abstract: Paraphrase generation aims to produce high-quality and diverse utterances of a given text. Though state-of-the-art generation via the diffusion model reconciles generation quality and diversity, textual diffusion suffers from a truncation issue that hinders efficiency and quality control. In this work, we propose \textit{L}atent \textit{D}iffusion \textit{P}araphraser~(LDP), a novel paraphrase generation by modeling a controllable diffusion process given a learned latent space. LDP achieves superior generation efficiency compared to its diffusion counterparts. It facilitates only input segments to enforce paraphrase semantics, which further improves the results without external features. Experiments show that LDP achieves improved and diverse paraphrase generation compared to baselines. Further analysis shows that our method is also helpful to other similar text generations and domain adaptations. Our code and data are available at https://github.com/NIL-zhuang/ld4pg.
Published: 2024

15. Multilingual Pretraining and Instruction Tuning Improve Cross-Lingual Knowledge Alignment, But Only Shallowly

Author: Gao, Changjiang, Hu, Hongda, Hu, Peng, Chen, Jiajun, Li, Jixing, and Huang, Shujian
Subjects: Computer Science - Computation and Language
Abstract: Despite their strong ability to retrieve knowledge in English, current large language models show imbalance abilities in different languages. Two approaches are proposed to address this, i.e., multilingual pretraining and multilingual instruction tuning. However, whether and how do such methods contribute to the cross-lingual knowledge alignment inside the models is unknown. In this paper, we propose CLiKA, a systematic framework to assess the cross-lingual knowledge alignment of LLMs in the Performance, Consistency and Conductivity levels, and explored the effect of multilingual pretraining and instruction tuning on the degree of alignment. Results show that: while both multilingual pretraining and instruction tuning are beneficial for cross-lingual knowledge alignment, the training strategy needs to be carefully designed. Namely, continued pretraining improves the alignment of the target language at the cost of other languages, while mixed pretraining affect other languages less. Also, the overall cross-lingual knowledge alignment, especially in the conductivity level, is unsatisfactory for all tested LLMs, and neither multilingual pretraining nor instruction tuning can substantially improve the cross-lingual knowledge conductivity.
Published: 2024

16. General Meshing Modeling and Dynamic Characteristics Analysis of Helical Gear Pair with Tooth Surface Deviation

Author: Chen, Jiajun, Zhu, Rupeng, Chen, Weifang, Li, Miaomiao, Yin, Xunmin, and Zhang, Xiaoxu
Published: 2024
Full Text: View/download PDF

17. Investigation of TSRP reverses imatinib resistance through the PI3K / Akt pathway in chronic myeloid leukemia

Author: He, Ying, Ding, Jiyuan, Liu, Liqin, Chen, Jiajun, Zhong, Hong, Li, Changyu, and Xu, Xiaofeng
Published: 2024
Full Text: View/download PDF

18. Effect of calcination temperature and superplasticizer on the properties of anhydrite II from phosphogypsum

Author: Liu, Dezhi, Chen, Jiajun, Ma, Xiaoling, Tan, Hongbin, Yang, Feihua, Yang, Guozhen, Yang, Fei, Kamarou, Maksim, Moskovskikh, Dmitry, and Ramanovski, Valentin
Published: 2024
Full Text: View/download PDF

19. MixRED: A Mix-lingual Relation Extraction Dataset

Author: Kong, Lingxing, Chu, Yougang, Ma, Zheng, Zhang, Jianbing, He, Liang, and Chen, Jiajun
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Relation extraction is a critical task in the field of natural language processing with numerous real-world applications. Existing research primarily focuses on monolingual relation extraction or cross-lingual enhancement for relation extraction. Yet, there remains a significant gap in understanding relation extraction in the mix-lingual (or code-switching) scenario, where individuals intermix contents from different languages within sentences, generating mix-lingual content. Due to the lack of a dedicated dataset, the effectiveness of existing relation extraction models in such a scenario is largely unexplored. To address this issue, we introduce a novel task of considering relation extraction in the mix-lingual scenario called MixRE and constructing the human-annotated dataset MixRED to support this task. In addition to constructing the MixRED dataset, we evaluate both state-of-the-art supervised models and large language models (LLMs) on MixRED, revealing their respective advantages and limitations in the mix-lingual scenario. Furthermore, we delve into factors influencing model performance within the MixRE task and uncover promising directions for enhancing the performance of both supervised models and LLMs in this novel task.
Published: 2024

20. Ultra-Long Homochiral Graphene Nanoribbons Grown Within h-BN Stacks for High-Performance Electronics

Author: Lyu, Bosai, Chen, Jiajun, Wang, Sen, Lou, Shuo, Shen, Peiyue, Xie, Jingxu, Qiu, Lu, Mitchell, Izaac, Li, Can, Hu, Cheng, Zhou, Xianliang, Watanabe, Kenji, Taniguchi, Takashi, Wang, Xiaoqun, Jia, Jinfeng, Liang, Qi, Chen, Guorui, Li, Tingxin, Wang, Shiyong, Ouyang, Wengen, Hod, Oded, Ding, Feng, Urbakh, Michael, and Shi, Zhiwen
Subjects: Condensed Matter - Mesoscale and Nanoscale Physics, Condensed Matter - Materials Science
Abstract: Van der Waals encapsulation of two-dimensional materials within hexagonal boron nitride (h-BN) stacks has proven to be a promising way to create ultrahigh-performance electronic devices. However, contemporary approaches for achieving van der Waals encapsulation, which involve artificial layer stacking using mechanical transfer techniques, are difficult to control, prone to contamination, and unscalable. Here, we report on the transfer-free direct growth of high-quality graphene nanoribbons (GNRs) within h-BN stacks. The as-grown embedded GNRs exhibit highly desirable features being ultralong (up to 0.25 mm), ultranarrow ( < 5 nm), and homochiral with zigzag edges. Our atomistic simulations reveal that the mechanism underlying the embedded growth involves ultralow GNR friction when sliding between AA'-stacked h-BN layers. Using the grown structures, we demonstrate the transfer-free fabrication of embedded GNR field-effect devices that exhibit excellent performance at room temperature with mobilities of up to 4,600 $cm^{2} V^{-1} s^{-1}$ and on-off ratios of up to $10^{6}$. This paves the way to the bottom-up fabrication of high-performance electronic devices based on embedded layered materials.
Published: 2024

21. MT-PATCHER: Selective and Extendable Knowledge Distillation from Large Language Models for Machine Translation

Author: Li, Jiahuan, Cheng, Shanbo, Huang, Shujian, and Chen, Jiajun
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLM) have demonstrated their strong ability in the field of machine translation (MT), yet they suffer from high computational cost and latency. Therefore, transferring translation knowledge from giant LLMs to medium-sized machine translation models is a promising research direction. However, traditional knowledge distillation methods do not take the capability of student and teacher models into consideration, therefore repeatedly teaching student models on the knowledge they have learned, and failing to extend to novel contexts and knowledge. In this paper, we propose a framework called MT-Patcher, which transfers knowledge from LLMs to existing MT models in a selective, comprehensive and proactive manner. Considering the current translation ability of student MT models, we only identify and correct their translation errors, instead of distilling the whole translation from the teacher. Leveraging the strong language abilities of LLMs, we instruct LLM teachers to synthesize diverse contexts and anticipate more potential errors for the student. Experiment results on translating both specific language phenomena and general MT benchmarks demonstrate that finetuning the student MT model on about 10% examples can achieve comparable results to the traditional knowledge distillation method, and synthesized potential errors and diverse contexts further improve translation performances on unseen contexts and words., Comment: Accepted to NAACL-2024 main conference
Published: 2024

22. Measuring Meaning Composition in the Human Brain with Composition Scores from Large Language Models

Author: Gao, Changjiang, Li, Jixing, Chen, Jiajun, and Huang, Shujian
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The process of meaning composition, wherein smaller units like morphemes or words combine to form the meaning of phrases and sentences, is essential for human sentence comprehension. Despite extensive neurolinguistic research into the brain regions involved in meaning composition, a computational metric to quantify the extent of composition is still lacking. Drawing on the key-value memory interpretation of transformer feed-forward network blocks, we introduce the Composition Score, a novel model-based metric designed to quantify the degree of meaning composition during sentence comprehension. Experimental findings show that this metric correlates with brain clusters associated with word frequency, structural processing, and general sensitivity to words, suggesting the multifaceted nature of meaning composition during human sentence comprehension., Comment: Accepted by ACL 2024 (main conference, long paper)
Published: 2024

23. EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving

Author: Lin, Jiacheng, Chen, Jiajun, Peng, Kunyu, He, Xuan, Li, Zhiyong, Stiefelhagen, Rainer, and Yang, Kailun
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics, Electrical Engineering and Systems Science - Audio and Speech Processing, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: This paper introduces the task of Auditory Referring Multi-Object Tracking (AR-MOT), which dynamically tracks specific objects in a video sequence based on audio expressions and appears as a challenging problem in autonomous driving. Due to the lack of semantic modeling capacity in audio and video, existing works have mainly focused on text-based multi-object tracking, which often comes at the cost of tracking quality, interaction efficiency, and even the safety of assistance systems, limiting the application of such methods in autonomous driving. In this paper, we delve into the problem of AR-MOT from the perspective of audio-video fusion and audio-video tracking. We put forward EchoTrack, an end-to-end AR-MOT framework with dual-stream vision transformers. The dual streams are intertwined with our Bidirectional Frequency-domain Cross-attention Fusion Module (Bi-FCFM), which bidirectionally fuses audio and video features from both frequency- and spatiotemporal domains. Moreover, we propose the Audio-visual Contrastive Tracking Learning (ACTL) regime to extract homogeneous semantic features between expressions and visual objects by learning homogeneous features between different audio and video objects effectively. Aside from the architectural design, we establish the first set of large-scale AR-MOT benchmarks, including Echo-KITTI, Echo-KITTI+, and Echo-BDD. Extensive experiments on the established benchmarks demonstrate the effectiveness of the proposed EchoTrack and its components. The source code and datasets are available at https://github.com/lab206/EchoTrack., Comment: Accepted to IEEE Transactions on Intelligent Transportation Systems (T-ITS). The source code and datasets are available at https://github.com/lab206/EchoTrack
Published: 2024

24. Cobra Effect in Reference-Free Image Captioning Metrics

Author: Ma, Zheng, Wang, Changxin, Ouyang, Yawen, Zhao, Fei, Zhang, Jianbing, Huang, Shujian, and Chen, Jiajun
Subjects: Computer Science - Computation and Language
Abstract: Evaluating the compatibility between textual descriptions and corresponding images represents a core endeavor within multi-modal research. In recent years, a proliferation of reference-free methods, leveraging visual-language pre-trained models (VLMs), has emerged. Empirical evidence has substantiated that these innovative approaches exhibit a higher correlation with human judgment, marking a significant advancement in the field. However, does a higher correlation with human evaluations alone sufficiently denote the complete of a metric? In response to this question, in this paper, we study if there are any deficiencies in reference-free metrics. Specifically, inspired by the Cobra Effect, we utilize metric scores as rewards to direct the captioning model toward generating descriptions that closely align with the metric's criteria. If a certain metric has flaws, it will be exploited by the model and reflected in the generated sentences. Our findings reveal that descriptions guided by these metrics contain significant flaws, e.g. incoherent statements and excessive repetition. Subsequently, we propose a novel method termed Self-Improving to rectify the identified shortcomings within these metrics. We employ GPT-4V as an evaluative tool to assess generated sentences and the result reveals that our approach achieves state-of-the-art (SOTA) performance. In addition, we also introduce a challenging evaluation benchmark called Flaws Caption to evaluate reference-free image captioning metrics comprehensively. Our code is available at https://github.com/aaronma2020/robust_captioning_metric, Comment: pre-print version
Published: 2024

25. Question Translation Training for Better Multilingual Reasoning

Author: Zhu, Wenhao, Huang, Shujian, Yuan, Fei, She, Shuaijie, Chen, Jiajun, and Birch, Alexandra
Subjects: Computer Science - Computation and Language
Abstract: Large language models show compelling performance on reasoning tasks but they tend to perform much worse in languages other than English. This is unsurprising given that their training data largely consists of English text and instructions. A typical solution is to translate instruction data into all languages of interest, and then train on the resulting multilingual data, which is called translate-training. This approach not only incurs high cost, but also results in poorly translated data due to the non-standard formatting of mathematical chain-of-thought. In this paper, we explore the benefits of question alignment, where we train the model to translate reasoning questions into English by finetuning on X-English parallel question data. In this way we perform targeted, in-domain language alignment which makes best use of English instruction data to unlock the LLMs' multilingual reasoning abilities. Experimental results on LLaMA2-13B show that question alignment leads to consistent improvements over the translate-training approach: an average improvement of 11.3% and 16.1% accuracy across ten languages on the MGSM and MSVAMP multilingual reasoning benchmarks. The project will be available at: https://github.com/NJUNLP/QAlign., Comment: Accepted to Findings of ACL 2024
Published: 2024

26. MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization

Author: She, Shuaijie, Zou, Wei, Huang, Shujian, Zhu, Wenhao, Liu, Xiang, Geng, Xiang, and Chen, Jiajun
Subjects: Computer Science - Computation and Language
Abstract: Though reasoning abilities are considered language-agnostic, existing LLMs exhibit inconsistent reasoning abilities across different languages, e.g., reasoning in the dominant language like English is superior to other languages due to the imbalance of multilingual training data. To enhance reasoning abilities in non-dominant languages, we propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO), aiming to align the reasoning processes in other languages with the dominant language. Specifically, we harness an off-the-shelf translation model for the consistency between answers in non-dominant and dominant languages, which we adopt as the preference for optimization, e.g., Direct Preference Optimization (DPO) or Proximal Policy Optimization (PPO). Experiments show that MAPO stably achieves significant improvements in the multilingual reasoning of various models on all three benchmarks (MSVAMP +16.2%, MGSM +6.1%, and MNumGLUESub +13.3%), with improved reasoning consistency across languages., Comment: The project is available at https://github.com/NJUNLP/MAPO
Published: 2024

27. Multi-Candidate Speculative Decoding

Author: Yang, Sen, Huang, Shujian, Dai, Xinyu, and Chen, Jiajun
Subjects: Computer Science - Computation and Language
Abstract: Large language models have shown impressive capabilities across a variety of NLP tasks, yet their generating text autoregressively is time-consuming. One way to speed them up is speculative decoding, which generates candidate segments (a sequence of tokens) from a fast draft model that is then verified in parallel by the target model. However, the acceptance rate of candidate tokens receives limitations from several factors, such as the model, the dataset, and the decoding setup. This paper proposes sampling multiple candidates from a draft model and then organising them in batches for verification. We design algorithms for efficient multi-candidate verification while maintaining the distribution of the target model. Our approach shows significant improvements in acceptance rates on multiple datasets and models, consistently outperforming standard speculative decoding.
Published: 2024

28. Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation

Author: Huang, Xu, Zhang, Zhirui, Geng, Xiang, Du, Yichao, Chen, Jiajun, and Huang, Shujian
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: This study investigates how Large Language Models (LLMs) leverage source and reference data in machine translation evaluation task, aiming to better understand the mechanisms behind their remarkable performance in this task. We design the controlled experiments across various input modes and model types, and employ both coarse-grained and fine-grained prompts to discern the utility of source versus reference information. We find that reference information significantly enhances the evaluation accuracy, while surprisingly, source information sometimes is counterproductive, indicating LLMs' inability to fully leverage the cross-lingual capability when evaluating translations. Further analysis of the fine-grained evaluation and fine-tuning experiments show similar results. These findings also suggest a potential research direction for LLMs that fully exploits the cross-lingual capability of LLMs to achieve better performance in machine translation evaluation tasks., Comment: Accepted by ACL2024 Findings
Published: 2024

29. Monitoring and engineering interface coupling between monolayer WS2 and substrate through controllably introducing interfacial strain

Author: Yue, Xiaofei, Chen, Jiajun, Han, Jinkun, Shan, Yabing, Shen, Shuwen, Wu, Wenxuan, Liu, Bingjie, Li, Lijia, Chen, Yu, Zhang, Rongjun, Hu, Laigui, Liu, Ran, Qiu, Zhijun, and Cong, Chunxiao
Published: 2024
Full Text: View/download PDF

30. Zn0.76Co0.24S Embedded in Multiwalled Carbon Nanotubes as Anode Material for Sodium-Ion Batteries

Author: Yuan, Jing, Zhang, Jingjing, Zhao, Jiachang, Zhang, Lijuan, Jin, Jun, and Chen, Jiajun
Published: 2024
Full Text: View/download PDF

31. Probing Commonsense Reasoning Capability of Text-to-Image Generative Models via Non-visual Description

Author: Pan, Mianzhi, Li, Jianfei, Yu, Mingyue, Ma, Zheng, Cheng, Kanzhi, Zhang, Jianbing, and Chen, Jiajun
Subjects: Computer Science - Multimedia
Abstract: Commonsense reasoning, the ability to make logical assumptions about daily scenes, is one core intelligence of human beings. In this work, we present a novel task and dataset for evaluating the ability of text-to-image generative models to conduct commonsense reasoning, which we call PAINTaboo. Given a description with few visual clues of one object, the goal is to generate images illustrating the object correctly. The dataset was carefully hand-curated and covered diverse object categories to analyze model performance comprehensively. Our investigation of several prevalent text-to-image generative models reveals that these models are not proficient in commonsense reasoning, as anticipated. We trust that PAINTaboo can improve our understanding of the reasoning abilities of text-to-image generative models., Comment: It is an incomplete work
Published: 2023

32. A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily

Author: Ding, Peng, Kuang, Jun, Ma, Dan, Cao, Xuezhi, Xian, Yunsen, Chen, Jiajun, and Huang, Shujian
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs), such as ChatGPT and GPT-4, are designed to provide useful and safe responses. However, adversarial prompts known as 'jailbreaks' can circumvent safeguards, leading LLMs to generate potentially harmful content. Exploring jailbreak prompts can help to better reveal the weaknesses of LLMs and further steer us to secure them. Unfortunately, existing jailbreak methods either suffer from intricate manual design or require optimization on other white-box models, which compromises either generalization or efficiency. In this paper, we generalize jailbreak prompt attacks into two aspects: (1) Prompt Rewriting and (2) Scenario Nesting. Based on this, we propose ReNeLLM, an automatic framework that leverages LLMs themselves to generate effective jailbreak prompts. Extensive experiments demonstrate that ReNeLLM significantly improves the attack success rate while greatly reducing the time cost compared to existing baselines. Our study also reveals the inadequacy of current defense methods in safeguarding LLMs. Finally, we analyze the failure of LLMs defense from the perspective of prompt execution priority, and propose corresponding defense strategies. We hope that our research can catalyze both the academic community and LLMs developers towards the provision of safer and more regulated LLMs. The code is available at https://github.com/NJUNLP/ReNeLLM., Comment: Acccepted by NAACL 2024, 18 pages, 7 figures, 13 tables
Published: 2023

33. Exploring the Factual Consistency in Dialogue Comprehension of Large Language Models

Author: She, Shuaijie, Huang, Shujian, Wang, Xingyun, Zhou, Yanke, and Chen, Jiajun
Subjects: Computer Science - Computation and Language
Abstract: LLMs (Large Language Models) usually interact with users in the form of dialogue and generate responses following their instructions, which naturally require dialogue comprehension abilities. However, dialogue comprehension is a general language ability which is hard to be evaluated directly. In this work, we propose to perform the evaluation focusing on the factual consistency issue with the help of the dialogue summarization task. Besides evaluating and analyzing the dialogue summarization performance (DIAC-Sum) of different LLMs, we also derive factual questions from the generated summaries and use them as a more flexible measurement of dialogue comprehension (DIAC-QA). Our evaluation shows that, on average, 26.8% of the summaries generated by LLMs contain factual inconsistency. Even ChatGPT, the strongest model evaluated, has such errors in 16% of its summaries. For answering the factual questions, which is more challenging, the average error rate of all evaluated LLMs is 36.1%. Both results indicate serious deficiencies. Detailed analysis shows that the understanding of subject/object of the conversation is still challenging for LLMs. Furthermore, to stimulate and enhance the dialogue comprehension ability of LLMs, we propose a fine-tuning paradigm with auto-constructed multi-task data, which achieved a relative error rate reduction of 11% on DIAC-QA., Comment: Accepted at NAACL2024 Main
Published: 2023

34. Roles of Scaling and Instruction Tuning in Language Perception: Model vs. Human Attention

Author: Gao, Changjiang, Huang, Shujian, Li, Jixing, and Chen, Jiajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Recent large language models (LLMs) have revealed strong abilities to understand natural language. Since most of them share the same basic structure, i.e. the transformer block, possible contributors to their success in the training process are scaling and instruction tuning. However, how these factors affect the models' language perception is unclear. This work compares the self-attention of several existing LLMs (LLaMA, Alpaca and Vicuna) in different sizes (7B, 13B, 30B, 65B), together with eye saccade, an aspect of human reading attention, to assess the effect of scaling and instruction tuning on language perception. Results show that scaling enhances the human resemblance and improves the effective attention by reducing the trivial pattern reliance, while instruction tuning does not. However, instruction tuning significantly enhances the models' sensitivity to instructions. We also find that current LLMs are consistently closer to non-native than native speakers in attention, suggesting a sub-optimal language perception of all models. Our code and data used in the analysis is available on GitHub.
Published: 2023

35. IMTLab: An Open-Source Platform for Building, Evaluating, and Diagnosing Interactive Machine Translation Systems

Author: Huang, Xu, Zhang, Zhirui, Gao, Ruize, Du, Yichao, Liu, Lemao, Huang, Gouping, Shi, Shuming, Chen, Jiajun, and Huang, Shujian
Subjects: Computer Science - Computation and Language
Abstract: We present IMTLab, an open-source end-to-end interactive machine translation (IMT) system platform that enables researchers to quickly build IMT systems with state-of-the-art models, perform an end-to-end evaluation, and diagnose the weakness of systems. IMTLab treats the whole interactive translation process as a task-oriented dialogue with a human-in-the-loop setting, in which human interventions can be explicitly incorporated to produce high-quality, error-free translations. To this end, a general communication interface is designed to support the flexible IMT architectures and user policies. Based on the proposed design, we construct a simulated and real interactive environment to achieve end-to-end evaluation and leverage the framework to systematically evaluate previous IMT systems. Our simulated and manual experiments show that the prefix-constrained decoding approach still gains the lowest editing cost in the end-to-end evaluation, while BiTIIMT achieves comparable editing cost with a better interactive experience., Comment: Accepted by EMNLP2023
Published: 2023

36. Towards Better Chain-of-Thought Prompting Strategies: A Survey

Author: Yu, Zihan, He, Liang, Wu, Zhen, Dai, Xinyu, and Chen, Jiajun
Subjects: Computer Science - Computation and Language
Abstract: Chain-of-Thought (CoT), a step-wise and coherent reasoning chain, shows its impressive strength when used as a prompting strategy for large language models (LLM). Recent years, the prominent effect of CoT prompting has attracted emerging research. However, there still lacks of a systematic summary about key factors of CoT prompting and comprehensive guide for prompts utilizing. For a deeper understanding about CoT prompting, we survey on a wide range of current research, presenting a systematic and comprehensive analysis on several factors that may influence the effect of CoT prompting, and introduce how to better apply it in different applications under these discussions. We further analyze the challenges and propose some future directions about CoT prompting. This survey could provide an overall reference on related research.
Published: 2023

37. Integrated analysis of single-cell RNA sequencing and bulk transcriptome data identifies a pyroptosis-associated diagnostic model for Parkinson’s disease

Author: Wang, Lin, Qin, Yidan, Song, Jia, Xu, Jing, Quan, Wei, Su, Hang, Zeng, Huibin, Zhang, Jian, Li, Jia, and Chen, Jiajun
Published: 2024
Full Text: View/download PDF

38. Effects of clear aligners and traditional removable appliances on oral microbiome in mixed dentition: a comparative study

Author: Chen, Wanxi, Chen, Jiajun, Bai, Ding, Wang, Peiqi, and Shu, Rui
Published: 2024
Full Text: View/download PDF

39. Data-driven solutions and parameter estimations of a family of higher-order KdV equations based on physics informed neural networks

Author: Chen, Jiajun, Shi, Jianping, He, Ao, and Fang, Hui
Published: 2024
Full Text: View/download PDF

40. LINC00461 promotes bladder cancer cells EMT through miR-518b/HNRNPUL1 axis

Author: Zhou, Yijie, Zhao, Keyuan, Li, Junlong, Peng, Chao, Jin, Jing, Chen, Jiajun, Li, Yulei, Xu, Gang, and Pan, Shouhua
Published: 2024
Full Text: View/download PDF

41. Urinary microbiome dysbiosis is associated with an inflammatory environment and perturbed fatty acids metabolism in the pathogenesis of bladder cancer

Author: Wu, Cen, Wei, Xiaoyu, Huang, Zhiyang, Zheng, Zhixiong, Zhang, Wei, Chen, Jiajun, Hong, Hongchang, and Li, Weili
Published: 2024
Full Text: View/download PDF

42. Divergent and gram-scale syntheses of (–)-veratramine and (–)-cyclopamine

Author: Hou, Wenlong, Lin, Hao, Wu, Yanru, Li, Chuang, Chen, Jiajun, Liu, Xiao-Yu, and Qin, Yong
Published: 2024
Full Text: View/download PDF

43. Collapse of carbon nanotubes due to local high-pressure from van der Waals encapsulation

Author: Hu, Cheng, Chen, Jiajun, Zhou, Xianliang, Xie, Yufeng, Huang, Xinyue, Wu, Zhenghan, Ma, Saiqun, Zhang, Zhichun, Xu, Kunqi, Wan, Neng, Zhang, Yueheng, Liang, Qi, and Shi, Zhiwen
Published: 2024
Full Text: View/download PDF

44. Graphene nanoribbons: current status, challenges and opportunities

Author: Lou, Shuo, Lyu, Bosai, Zhou, Xianliang, Shen, Peiyue, Chen, Jiajun, and Shi, Zhiwen
Published: 2024
Full Text: View/download PDF

45. Exploring the causal relationship between B lymphocytes and Parkinson’s disease: a bidirectional, two-sample Mendelian randomization study

Author: Song, Jia, Qin, Yidan, Wang, Lin, Quan, Wei, Xu, Jing, Li, Jia, and Chen, Jiajun
Published: 2024
Full Text: View/download PDF

46. Unify word-level and span-level tasks: NJUNLP's Participation for the WMT2023 Quality Estimation Shared Task

Author: Geng, Xiang, Lai, Zhejian, Zhang, Yu, Tao, Shimin, Yang, Hao, Chen, Jiajun, and Huang, Shujian
Subjects: Computer Science - Computation and Language
Abstract: We introduce the submissions of the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task. Our team submitted predictions for the English-German language pair on all two sub-tasks: (i) sentence- and word-level quality prediction; and (ii) fine-grained error span detection. This year, we further explore pseudo data methods for QE based on NJUQE framework (https://github.com/NJUNLP/njuqe). We generate pseudo MQM data using parallel data from the WMT translation task. We pre-train the XLMR large model on pseudo QE data, then fine-tune it on real QE data. At both stages, we jointly learn sentence-level scores and word-level tags. Empirically, we conduct experiments to find the key hyper-parameters that improve the performance. Technically, we propose a simple method that covert the word-level outputs to fine-grained error span results. Overall, our models achieved the best results in English-German for both word-level and fine-grained error span detection sub-tasks by a considerable margin., Comment: WMT2023 System Paper
Published: 2023

47. Learning Complete Topology-Aware Correlations Between Relations for Inductive Link Prediction

Author: Wang, Jie, Chen, Hanzhu, Lv, Qitan, Shi, Zhihao, Chen, Jiajun, He, Huarui, Xie, Hongtao, Lian, Defu, Chen, Enhong, and Wu, Feng
Subjects: Computer Science - Artificial Intelligence
Abstract: Inductive link prediction -- where entities during training and inference stages can be different -- has shown great potential for completing evolving knowledge graphs in an entity-independent manner. Many popular methods mainly focus on modeling graph-level features, while the edge-level interactions -- especially the semantic correlations between relations -- have been less explored. However, we notice a desirable property of semantic correlations between relations is that they are inherently edge-level and entity-independent. This implies the great potential of the semantic correlations for the entity-independent inductive link prediction task. Inspired by this observation, we propose a novel subgraph-based method, namely TACO, to model Topology-Aware COrrelations between relations that are highly correlated to their topological structures within subgraphs. Specifically, we prove that semantic correlations between any two relations can be categorized into seven topological patterns, and then proposes Relational Correlation Network (RCN) to learn the importance of each pattern. To further exploit the potential of RCN, we propose Complete Common Neighbor induced subgraph that can effectively preserve complete topological patterns within the subgraph. Extensive experiments demonstrate that TACO effectively unifies the graph-level information and edge-level interactions to jointly perform reasoning, leading to a superior performance over existing state-of-the-art methods for the inductive link prediction task., Comment: arXiv admin note: text overlap with arXiv:2103.03642
Published: 2023

48. Extrapolating Large Language Models to Non-English by Aligning Languages

Author: Zhu, Wenhao, Lv, Yunzhe, Dong, Qingxiu, Yuan, Fei, Xu, Jingjing, Huang, Shujian, Kong, Lingpeng, Chen, Jiajun, and Li, Lei
Subjects: Computer Science - Computation and Language
Abstract: Existing large language models show disparate capability across different languages, due to the imbalance in the training data. Their performances on English tasks are often stronger than on tasks of other languages. In this paper, we empower pre-trained LLMs on non-English languages by building semantic alignment across languages. We start from targeting individual languages by performing cross-lingual instruction-tuning (CoIT) on LLaMA, i.e. tuning it with translation task data and cross-lingual general task data to obtain cross-lingual models (x-LLaMAs), and formulate underlying scaling laws to investigate the advantages of using scalable translation data. Then we perform multilingual instruction-tuning (MuIT) with mixed resources to build multilingual m-LLaMA. We also illustrate how we leverage the scaling laws to optimize data allocation in a resource-constrained setting. Experiment results on cross-lingual benchmarks XQUAD and MLQA show that x-LLaMAs surpass the English instruction-tuned counterpart (Alpaca) by an average of 27.83% across six non-English languages. Evaluation results on translation dataset Flores-101 show that x-LLaMAs outperform previous LLaMA-based models by an average of 18.89%. Encouragingly, m-LLaMA achieves comparable performance to x-LLaMAs on individual languages and demonstrates the ability to follow multilingual instructions. Further analysis on response content and representation space reveals the alignment of the multilingual semantic space within the middle layers of m-LLaMA.
Published: 2023

49. EPCFormer: Expression Prompt Collaboration Transformer for Universal Referring Video Object Segmentation

Author: Chen, Jiajun, Lin, Jiacheng, Xiao, Zhiqiang, Fu, Haolong, Nai, Ke, Yang, Kailun, and Li, Zhiyong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Audio and Speech Processing, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Audio-guided Video Object Segmentation (A-VOS) and Referring Video Object Segmentation (R-VOS) are two highly-related tasks, which both aim to segment specific objects from video sequences according to user-provided expression prompts. However, due to the challenges in modeling representations for different modalities, contemporary methods struggle to strike a balance between interaction flexibility and high-precision localization and segmentation. In this paper, we address this problem from two perspectives: the alignment representation of audio and text and the deep interaction among audio, text, and visual features. First, we propose a universal architecture, the Expression Prompt Collaboration Transformer, herein EPCFormer. Next, we propose an Expression Alignment (EA) mechanism for audio and text expressions. By introducing contrastive learning for audio and text expressions, the proposed EPCFormer realizes comprehension of the semantic equivalence between audio and text expressions denoting the same objects. Then, to facilitate deep interactions among audio, text, and video features, we introduce an Expression-Visual Attention (EVA) mechanism. The knowledge of video object segmentation in terms of the expression prompts can seamlessly transfer between the two tasks by deeply exploring complementary cues between text and audio. Experiments on well-recognized benchmarks demonstrate that our universal EPCFormer attains state-of-the-art results on both tasks. The source code of EPCFormer will be made publicly available at https://github.com/lab206/EPCFormer., Comment: The source code will be made publicly available at https://github.com/lab206/EPCFormer
Published: 2023

50. Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models

Author: Ma, Zheng, Pan, Mianzhi, Wu, Wenhan, Cheng, Kanzhi, Zhang, Jianbing, Huang, Shujian, and Chen, Jiajun
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Vision-language models (VLMs) have shown impressive performance in substantial downstream multi-modal tasks. However, only comparing the fine-tuned performance on downstream tasks leads to the poor interpretability of VLMs, which is adverse to their future improvement. Several prior works have identified this issue and used various probing methods under a zero-shot setting to detect VLMs' limitations, but they all examine VLMs using general datasets instead of specialized ones. In practical applications, VLMs are usually applied to specific scenarios, such as e-commerce and news fields, so the generalization of VLMs in specific domains should be given more attention. In this paper, we comprehensively investigate the capabilities of popular VLMs in a specific field, the food domain. To this end, we build a food caption dataset, Food-500 Cap, which contains 24,700 food images with 494 categories. Each image is accompanied by a detailed caption, including fine-grained attributes of food, such as the ingredient, shape, and color. We also provide a culinary culture taxonomy that classifies each food category based on its geographic origin in order to better analyze the performance differences of VLM in different regions. Experiments on our proposed datasets demonstrate that popular VLMs underperform in the food domain compared with their performance in the general domain. Furthermore, our research reveals severe bias in VLMs' ability to handle food items from different geographic regions. We adopt diverse probing methods and evaluate nine VLMs belonging to different architectures to verify the aforementioned observations. We hope that our study will bring researchers' attention to VLM's limitations when applying them to the domain of food or culinary cultures, and spur further investigations to address this issue., Comment: Accepted at ACM Multimedia (ACMMM) 2023
Published: 2023

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

2,382 results on '"Chen, Jiajun"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources