Author: "ZHANG, Yue" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"ZHANG, Yue"' showing total 46,000 results

Start Over Author "ZHANG, Yue"

46,000 results on '"ZHANG, Yue"'

1. Anecdote, Network, Gossip, Performance: Essays on the "Shishuo xinyu." by Jack Chen (review)

Author: Zhang, Yue and Jiang, Yi
Published: 2024

2. Wang Anshi and Song Poetic Culture by Xiaoshan Yang (review)

Author: Zhang, Yue
Published: 2023

3. Reading Philosophy, Writing Poetry: Intertextual Modes of Making Meaning in Early Medieval China by Wendy Swartz (review)

Author: Zhang, Yue
Published: 2021

4. The Halberd at Red Cliff: Jian’an and the Three Kingdoms by Xiaofei Tian (review)

Author: Zhang, Yue
Published: 2021

5. Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement

Author: Ma, Yingwei, Cao, Rongyu, Cao, Yongchang, Zhang, Yue, Chen, Jue, Liu, Yibo, Liu, Yuchen, Li, Binhua, Huang, Fei, and Li, Yongbin
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence
Abstract: Recent advancements in LLM-based agents have led to significant progress in automatic software engineering, particularly in software maintenance and evolution. Despite these encouraging advances, current research faces two major challenges. First, SOTA performance primarily depends on closed-source models, which significantly limits the technology's accessibility, and potential for customization in diverse SE tasks. Second, these models are predominantly trained on static code data, lacking a deep understanding of the dynamic interactions, iterative problem-solving processes, and evolutionary characteristics inherent in software development. To address these challenges, our study adopts a software engineering perspective. We recognize that real-world software maintenance and evolution processes encompass not only static code data but also developers' thought processes, utilization of external tools, and the interaction between different functional personnel. Consequently, we introduce the Lingma SWE-GPT series, comprising Lingma SWE-GPT 7B and 72B. By learning from and simulating real-world code submission activities, Lingma SWE-GPT systematically incorporates the dynamic interactions and iterative problem-solving inherent in software development process, thereby achieving a more comprehensive understanding of software improvement processes. We conducted experimental evaluations using SWE-bench Verified benchmark. The results demonstrate that Lingma SWE-GPT 72B successfully resolves 30.20% of the GitHub issues, marking a significant improvement in automatic issue resolution (22.76% relative improvement compared to Llama 3.1 405B), approaching the performance of closed-source models (31.80\% issues of GPT-4o resolved). Notably, Lingma SWE-GPT 7B resolves 18.20% of the issues, highlighting the potential for applying smaller models to ASE tasks.
Published: 2024

6. Dark Matter Candidates and Searches

Author: Bozorgnia, Nassim, Bramante, Joseph, Cline, James M., Curtin, David, McKeen, David, Morrissey, David E., Ritz, Adam, Viel, Simon, Vincent, Aaron C., and Zhang, Yue
Subjects: High Energy Physics - Phenomenology
Abstract: Astrophysical observations suggest that most of the matter in the cosmos consists of a new form that has not been observed on Earth. The nature and origin of this mysterious dark matter are among the most pressing questions in fundamental science. In this review we summarize the current state of dark matter research from two perspectives. First, we provide an overview of the leading theoretical proposals for dark matter. And second, we describe how these proposals have driven a broad and diverse global search program for dark matter involving direct laboratory searches and astrophysical observations. This review is based on a Green Paper on dark matter prepared as part of the 2020 Astroparticle Community Planning initiative undertaken by the Canadian Subatomic Physics community but has been significantly updated to reflect recent advances., Comment: 40 pages, 15 figures, invited review article accepted for publication in the Canadian Journal of Physics for a special issue on "Particle Astrophysics in Canada," figures from other works reproduced with permission
Published: 2024

7. SVIP: Towards Verifiable Inference of Open-source Large Language Models

Author: Sun, Yifan, Li, Yuhang, Zhang, Yue, Jin, Yuchen, and Zhang, Huan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Cryptography and Security
Abstract: Open-source Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language understanding and generation, leading to widespread adoption across various domains. However, their increasing model sizes render local deployment impractical for individual users, pushing many to rely on computing service providers for inference through a blackbox API. This reliance introduces a new risk: a computing provider may stealthily substitute the requested LLM with a smaller, less capable model without consent from users, thereby delivering inferior outputs while benefiting from cost savings. In this paper, we formalize the problem of verifiable inference for LLMs. Existing verifiable computing solutions based on cryptographic or game-theoretic techniques are either computationally uneconomical or rest on strong assumptions. We introduce SVIP, a secret-based verifiable LLM inference protocol that leverages intermediate outputs from LLM as unique model identifiers. By training a proxy task on these outputs and requiring the computing provider to return both the generated text and the processed intermediate outputs, users can reliably verify whether the computing provider is acting honestly. In addition, the integration of a secret mechanism further enhances the security of our protocol. We thoroughly analyze our protocol under multiple strong and adaptive adversarial scenarios. Our extensive experiments demonstrate that SVIP is accurate, generalizable, computationally efficient, and resistant to various attacks. Notably, SVIP achieves false negative rates below 5% and false positive rates below 3%, while requiring less than 0.01 seconds per query for verification., Comment: 20 pages
Published: 2024

8. Task Calibration: Calibrating Large Language Models on Inference Tasks

Author: Li, Yingjie, Luo, Yun, Xie, Xiaotian, and Zhang, Yue
Subjects: Computer Science - Computation and Language
Abstract: Large language models (LLMs) have exhibited impressive zero-shot performance on inference tasks. However, LLMs may suffer from spurious correlations between input texts and output labels, which limits LLMs' ability to reason based purely on general language understanding. In other words, LLMs may make predictions primarily based on premise or hypothesis, rather than both components. To address this problem that may lead to unexpected performance degradation, we propose task calibration (TC), a zero-shot and inference-only calibration method inspired by mutual information which recovers LLM performance through task reformulation. TC encourages LLMs to reason based on both premise and hypothesis, while mitigating the models' over-reliance on individual premise or hypothesis for inference. Experimental results show that TC achieves a substantial improvement on 13 inference tasks in the zero-shot setup. We further validate the effectiveness of TC in few-shot setups and various natural language understanding tasks. Further analysis indicates that TC is also robust to prompt templates and has the potential to be integrated with other calibration methods.
Published: 2024

9. Building Dialogue Understanding Models for Low-resource Language Indonesian from Scratch

Author: Di, Donglin, Zhang, Weinan, Zhang, Yue, and Wang, Fanglin
Subjects: Computer Science - Computation and Language
Abstract: Making use of off-the-shelf resources of resource-rich languages to transfer knowledge for low-resource languages raises much attention recently. The requirements of enabling the model to reach the reliable performance lack well guided, such as the scale of required annotated data or the effective framework. To investigate the first question, we empirically investigate the cost-effectiveness of several methods to train the intent classification and slot-filling models for Indonesia (ID) from scratch by utilizing the English data. Confronting the second challenge, we propose a Bi-Confidence-Frequency Cross-Lingual transfer framework (BiCF), composed by ``BiCF Mixing'', ``Latent Space Refinement'' and ``Joint Decoder'', respectively, to tackle the obstacle of lacking low-resource language dialogue data. Extensive experiments demonstrate our framework performs reliably and cost-efficiently on different scales of manually annotated Indonesian data. We release a large-scale fine-labeled dialogue dataset (ID-WOZ) and ID-BERT of Indonesian for further research.
Published: 2024

10. Attention Is All You Need for LLM-based Code Vulnerability Localization

Author: Li, Yue, Li, Xiao, Wu, Hao, Zhang, Yue, Cheng, Xiuzhen, Zhong, Sheng, and Xu, Fengyuan
Subjects: Computer Science - Cryptography and Security
Abstract: The rapid expansion of software systems and the growing number of reported vulnerabilities have emphasized the importance of accurately identifying vulnerable code segments. Traditional methods for vulnerability localization, such as manual code audits or rule-based tools, are often time-consuming and limited in scope, typically focusing on specific programming languages or types of vulnerabilities. In recent years, the introduction of large language models (LLMs) such as GPT and LLaMA has opened new possibilities for automating vulnerability detection. However, while LLMs show promise in this area, they face challenges, particularly in maintaining accuracy over longer code contexts. This paper introduces LOVA, a novel framework leveraging the self-attention mechanisms inherent in LLMs to enhance vulnerability localization. Our key insight is that self-attention mechanisms assign varying importance to different parts of the input, making it possible to track how much attention the model focuses on specific lines of code. In the context of vulnerability localization, the hypothesis is that vulnerable lines of code will naturally attract higher attention weights because they have a greater influence on the model's output. By systematically tracking changes in attention weights and focusing on specific lines of code, LOVA improves the precision of identifying vulnerable lines across various programming languages. Through rigorous experimentation and evaluation, we demonstrate that LOVA significantly outperforms existing LLM-based approaches, achieving up to a 5.3x improvement in F1-scores. LOVA also demonstrated strong scalability, with up to a 14.6x improvement in smart contract vulnerability localization across languages like C, Python, Java, and Solidity. Its robustness was proven through consistent performance across different LLM architectures.
Published: 2024

11. EchoApex: A General-Purpose Vision Foundation Model for Echocardiography

Author: Amadou, Abdoul Aziz, Zhang, Yue, Piat, Sebastien, Klein, Paul, Schmuecking, Ingo, Passerini, Tiziano, and Sharma, Puneet
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Quantitative evaluation of echocardiography is essential for precise assessment of cardiac condition, monitoring disease progression, and guiding treatment decisions. The diverse nature of echo images, including variations in probe types, manufacturers, and pathologies, poses challenges for developing artificial intelligent models that can generalize across different clinical practice. We introduce EchoApex, the first general-purpose vision foundation model echocardiography with applications on a variety of clinical practice. Leveraging self-supervised learning, EchoApex is pretrained on over 20 million echo images from 11 clinical centres. By incorporating task-specific decoders and adapter modules, we demonstrate the effectiveness of EchoApex on 4 different kind of clinical applications with 28 sub-tasks, including view classification, interactive structure segmentation, left ventricle hypertrophy detection and automated ejection fraction estimation from view sequences. Compared to state-of-the-art task-specific models, EchoApex attains improved performance with a unified image encoding architecture, demonstrating the benefits of model pretraining at scale with in-domain data. Furthermore, EchoApex illustrates the potential for developing a general-purpose vision foundation model tailored specifically for echocardiography, capable of addressing a diverse range of clinical applications with high efficiency and efficacy.
Published: 2024

12. Locking Down the Finetuned LLMs Safety

Author: Zhu, Minjun, Yang, Linyi, Wei, Yifan, Zhang, Ningyu, and Zhang, Yue
Subjects: Computer Science - Computation and Language
Abstract: Fine-tuning large language models (LLMs) on additional datasets is often necessary to optimize them for specific downstream tasks. However, existing safety alignment measures, which restrict harmful behavior during inference, are insufficient to mitigate safety risks during fine-tuning. Alarmingly, fine-tuning with just 10 toxic sentences can make models comply with harmful instructions. We introduce SafetyLock, a novel alignment intervention method that maintains robust safety post-fine-tuning through efficient and transferable mechanisms. SafetyLock leverages our discovery that fine-tuned models retain similar safety-related activation representations to their base models. This insight enables us to extract what we term the Meta-SafetyLock, a set of safety bias directions representing key activation patterns associated with safe responses in the original model. We can then apply these directions universally to fine-tuned models to enhance their safety. By searching for activation directions across multiple token dimensions, SafetyLock achieves enhanced robustness and transferability. SafetyLock re-aligns fine-tuned models in under 0.01 seconds without additional computational cost. Our experiments demonstrate that SafetyLock can reduce the harmful instruction response rate from 60% to below 1% in toxic fine-tuned models. It surpasses traditional methods in both performance and efficiency, offering a scalable, non-invasive solution for ensuring the safety of customized LLMs. Our analysis across various fine-tuning scenarios confirms SafetyLock's robustness, advocating its integration into safety protocols for aligned LLMs. The code is released at https://github.com/zhu-minjun/SafetyLock.
Published: 2024

13. Towards Homogeneous Lexical Tone Decoding from Heterogeneous Intracranial Recordings

Author: Wu, Di, Li, Siyuan, Feng, Chen, Cao, Lu, Zhang, Yue, Yang, Jie, and Sawan, Mohamad
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Quantitative Biology - Neurons and Cognition
Abstract: Recent advancements in brain-computer interfaces (BCIs) have enabled the decoding of lexical tones from intracranial recordings, offering the potential to restore the communication abilities of speech-impaired tonal language speakers. However, data heterogeneity induced by both physiological and instrumental factors poses a significant challenge for unified invasive brain tone decoding. Traditional subject-specific models, which operate under a heterogeneous decoding paradigm, fail to capture generalized neural representations and cannot effectively leverage data across subjects. To address these limitations, we introduce Homogeneity-Heterogeneity Disentangled Learning for neural Representations (H2DiLR), a novel framework that disentangles and learns both the homogeneity and heterogeneity from intracranial recordings across multiple subjects. To evaluate H2DiLR, we collected stereoelectroencephalography (sEEG) data from multiple participants reading Mandarin materials comprising 407 syllables, representing nearly all Mandarin characters. Extensive experiments demonstrate that H2DiLR, as a unified decoding paradigm, significantly outperforms the conventional heterogeneous decoding approach. Furthermore, we empirically confirm that H2DiLR effectively captures both homogeneity and heterogeneity during neural representation learning., Comment: Preprint V1 with 10 pages main text
Published: 2024

14. MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting

Author: Zhang, Yue, Liu, Minhao, Chen, Zhaokang, Wu, Bin, Zeng, Yubin, Zhan, Chao, He, Yingjie, Huang, Junxin, and Zhou, Wenjiang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Achieving high-resolution, identity consistency, and accurate lip-speech synchronization in face visual dubbing presents significant challenges, particularly for real-time applications like live video streaming. We propose MuseTalk, which generates lip-sync targets in a latent space encoded by a Variational Autoencoder, enabling high-fidelity talking face video generation with efficient inference. Specifically, we project the occluded lower half of the face image and itself as an reference into a low-dimensional latent space and use a multi-scale U-Net to fuse audio and visual features at various levels. We further propose a novel sampling strategy during training, which selects reference images with head poses closely matching the target, allowing the model to focus on precise lip movement by filtering out redundant information. Additionally, we analyze the mechanism of lip-sync loss and reveal its relationship with input information volume. Extensive experiments show that MuseTalk consistently outperforms recent state-of-the-art methods in visual fidelity and achieves comparable lip-sync accuracy. As MuseTalk supports the online generation of face at 256x256 at more than 30 FPS with negligible starting latency, it paves the way for real-time applications., Comment: 15 pages, 4 figures
Published: 2024

15. ELICIT: LLM Augmentation via External In-Context Capability

Author: Wang, Futing, Yan, Jianhao, Zhang, Yue, and Lin, Tao
Subjects: Computer Science - Computation and Language
Abstract: Enhancing the adaptive capabilities of large language models is a critical pursuit in both research and application. Traditional fine-tuning methods require substantial data and computational resources, especially for enhancing specific capabilities, while in-context learning is limited by the need for appropriate demonstrations and efficient token usage. Inspired by the expression of in-context learned capabilities through task vectors and the concept of modularization, we propose \alg, a framework consisting of two modules designed to effectively store and reuse task vectors to elicit the diverse capabilities of models without additional training or inference tokens. Our comprehensive experiments and analysis demonstrate that our pipeline is highly transferable across different input formats, tasks, and model architectures. ELICIT serves as a plug-and-play performance booster to enable adaptive elicitation of model capabilities. By externally storing and reusing vectors that represent in-context learned capabilities, \alg not only demonstrates the potential to operate modular capabilities but also significantly enhances the performance, versatility, adaptability, and scalability of large language models. Our code will be publicly available at https://github.com/LINs-lab/ELICIT., Comment: Work in progress
Published: 2024

16. Keys to Robust Edits: from Theoretical Insights to Practical Advances

Author: Yan, Jianhao, Wang, Futing, Luo, Yun, Li, Yafu, and Zhang, Yue
Subjects: Computer Science - Computation and Language
Abstract: Large language models (LLMs) have revolutionized knowledge storage and retrieval, but face challenges with conflicting and outdated information. Knowledge editing techniques have been proposed to address these issues, yet they struggle with robustness tests involving long contexts, paraphrased subjects, and continuous edits. This work investigates the cause of these failures in locate-and-edit methods, offering theoretical insights into their key-value modeling and deriving mathematical bounds for robust and specific edits, leading to a novel 'group discussion' conceptual model for locate-and-edit methods. Empirical analysis reveals that keys used by current methods fail to meet robustness and specificity requirements. To address this, we propose a Robust Edit Pathway (REP) that disentangles editing keys from LLMs' inner representations. Evaluations on LLaMA2-7B and Mistral-7B using the CounterFact dataset show that REP significantly improves robustness across various metrics, both in-domain and out-of-domain, with minimal trade-offs in success rate and locality. Our findings advance the development of reliable and flexible knowledge updating in LLMs., Comment: Work in progress
Published: 2024

17. A generic Branch-and-Cut algorithm for bi-objective binary linear programs

Author: Fouilhoux, Pierre, Létocart, Lucas, and Zhang, Yue
Subjects: Computer Science - Discrete Mathematics, Mathematics - Optimization and Control
Abstract: This paper presents the first generic bi-objective binary linear branch-and-cut algorithm. Studying the impact of valid inequalities in solution and objective spaces, two cutting frameworks are proposed. The multi-point separation problem is introduced together with a cutting algorithm to efficiently generate valid inequalities violating multiple points simultaneously. The other main idea is to invoke state-of-the-art integer linear programming solver's internal advanced techniques such as cut separators. Aggregation techniques are proposed to use these frameworks with a trade-off among efficient cut separations, tight lower and upper bound sets and advanced branching strategies. Experiments on various types of instances in the literature exhibit the promising efficiency of the algorithm that solves instances with up to 2800 binary variables in less than one hour of CPU time. Our algorithms are easy to extend for more than two objectives and integer variables.
Published: 2024

18. BBN Constraint on Heavy Neutrino Production and Decay

Author: Chen, Yu-Ming and Zhang, Yue
Subjects: High Energy Physics - Phenomenology, Astrophysics - Cosmology and Nongalactic Astrophysics, High Energy Physics - Experiment
Abstract: We explore the big-bang nucleosynthesis (BBN) constraint on heavy neutrino that is a mixture of gauge singlet fermion and active neutrinos in the Standard Model. We work in the minimal model with only two parameters, the heavy neutrino mass $m_4$ and the mixing parameter $|U_{a4}|^2$, where $a=e$, $\mu$, or $\tau$ stands for the active neutrino flavor. We show that both the early universe production mechanism and decay products of the heavy neutrino are determined by $m_4$ and $|U_{a4}|^2$, with little room for further assumptions. This predictivity allows us to present a portrait of the entire BBN excluded parameter space. Our analysis includes various effects including temporary matter domination, energy injections in the form of pions, photons and light neutrinos. The BBN constraint is complementary to terrestrial search for heavy neutrinos (heavy neutral leptons) behind the origin of neutrino masses and portal to the dark sector., Comment: 25 pages, 9 figures
Published: 2024

19. ECon: On the Detection and Resolution of Evidence Conflicts

Author: Jiayang, Cheng, Chan, Chunkit, Zhuang, Qianqian, Qiu, Lin, Zhang, Tianhang, Liu, Tengxiao, Song, Yangqiu, Zhang, Yue, Liu, Pengfei, and Zhang, Zheng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The rise of large language models (LLMs) has significantly influenced the quality of information in decision-making systems, leading to the prevalence of AI-generated content and challenges in detecting misinformation and managing conflicting information, or "inter-evidence conflicts." This study introduces a method for generating diverse, validated evidence conflicts to simulate real-world misinformation scenarios. We evaluate conflict detection methods, including Natural Language Inference (NLI) models, factual consistency (FC) models, and LLMs, on these conflicts (RQ1) and analyze LLMs' conflict resolution behaviors (RQ2). Our key findings include: (1) NLI and LLM models exhibit high precision in detecting answer conflicts, though weaker models suffer from low recall; (2) FC models struggle with lexically similar answer conflicts, while NLI and LLM models handle these better; and (3) stronger models like GPT-4 show robust performance, especially with nuanced conflicts. For conflict resolution, LLMs often favor one piece of conflicting evidence without justification and rely on internal knowledge if they have prior beliefs., Comment: Accepted by EMNLP 2024 main conference
Published: 2024

20. SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models

Author: Zhang, Yue, Xu, Zhiyang, Shen, Ying, Kordjamshidi, Parisa, and Huang, Lifu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Integrating the 3D world into large language models (3D-based LLMs) has been a promising research direction for 3D scene understanding. However, current 3D-based LLMs fall short in situated understanding due to two key limitations: 1) existing 3D datasets are constructed from a global perspective of the 3D scenes and lack situated context. 2) the architectures of existing 3D-based LLMs lack explicit alignment between the spatial representations of 3D scenes and natural language, limiting their performance in tasks requiring precise spatial reasoning. We address these issues by introducing a scalable situated 3D dataset, named Spartun3D, that incorporates various situated spatial reasoning tasks. Furthermore, we propose Spartun3D-LLM, built on an existing 3D-based LLM but integrated with a novel situated spatial alignment module, aiming to enhance the alignment between 3D visual representations and their corresponding textual descriptions. Experimental results demonstrate that both our proposed dataset and alignment module significantly enhance the situated spatial understanding of 3D-based LLMs.
Published: 2024

21. Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding

Author: Zhang, Chong, Tu, Yi, Zhao, Yixi, Yuan, Chenshu, Chen, Huan, Zhang, Yue, Chai, Mingxu, Guo, Ya, Zhu, Huijia, Zhang, Qi, and Gui, Tao
Subjects: Computer Science - Computation and Language, Computer Science - Multimedia
Abstract: Modeling and leveraging layout reading order in visually-rich documents (VrDs) is critical in document intelligence as it captures the rich structure semantics within documents. Previous works typically formulated layout reading order as a permutation of layout elements, i.e. a sequence containing all the layout elements. However, we argue that this formulation does not adequately convey the complete reading order information in the layout, which may potentially lead to performance decline in downstream VrD tasks. To address this issue, we propose to model the layout reading order as ordering relations over the set of layout elements, which have sufficient expressive capability for the complete reading order information. To enable empirical evaluation on methods towards the improved form of reading order prediction (ROP), we establish a comprehensive benchmark dataset including the reading order annotation as relations over layout elements, together with a relation-extraction-based method that outperforms previous methods. Moreover, to highlight the practical benefits of introducing the improved form of layout reading order, we propose a reading-order-relation-enhancing pipeline to improve model performance on any arbitrary VrD task by introducing additional reading order relation inputs. Comprehensive results demonstrate that the pipeline generally benefits downstream VrD tasks: (1) with utilizing the reading order relation information, the enhanced downstream models achieve SOTA results on both two task settings of the targeted dataset; (2) with utilizing the pseudo reading order information generated by the proposed ROP model, the performance of the enhanced models has improved across all three models and eight cross-domain VrD-IE/QA task settings without targeted optimization., Comment: Accepted as a long paper in the main conference of EMNLP 2024
Published: 2024

22. A Unified Hallucination Mitigation Framework for Large Vision-Language Models

Author: Chang, Yue, Jing, Liqiang, Zhang, Xiaopeng, and Zhang, Yue
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Hallucination is a common problem for Large Vision-Language Models (LVLMs) with long generations which is difficult to eradicate. The generation with hallucinations is partially inconsistent with the image content. To mitigate hallucination, current studies either focus on the process of model inference or the results of model generation, but the solutions they design sometimes do not deal appropriately with various types of queries and the hallucinations of the generations about these queries. To accurately deal with various hallucinations, we present a unified framework, Dentist, for hallucination mitigation. The core step is to first classify the queries, then perform different processes of hallucination mitigation based on the classification result, just like a dentist first observes the teeth and then makes a plan. In a simple deployment, Dentist can classify queries as perception or reasoning and easily mitigate potential hallucinations in answers which has been demonstrated in our experiments. On MMbench, we achieve a 13.44%/10.2%/15.8% improvement in accuracy on Image Quality, a Coarse Perception visual question answering (VQA) task, over the baseline InstructBLIP/LLaVA/VisualGLM., Comment: Accepted by TMLR
Published: 2024

23. Two Distinct Oxidation Dispersion Mechanisms in Pd-CeO2 Mediated by Thermodynamic and Kinetic Behaviors of Single Pd Species

Author: Zou, Chen, Liu, Wen, Chen, Shiyuan, Li, Songda, Yang, Fangwen, Yu, Linjiang, Zeng, Chaobin, Zhang, Yue-Yu, Hu, Xiaojuan, Han, Zhong-Kang, Jiang, Ying, Yuan, Wentao, Yang, Hangsheng, and Wang, Yong
Subjects: Condensed Matter - Materials Science
Abstract: Understanding the dispersion process of supported catalysts is crucial for synthesizing atomic-level dispersed catalysts and precisely manipulating their chemical state. However, the underlying dispersion mechanism remains elusive due to the lack of atomic-level evidence during the dispersion process. Herein, by employing spherical aberration-corrected environmental scanning transmission electron microscopy (ESTEM), first-principles calculations, and a global optimization algorithm, we unraveled the pre-oxidation dispersion and direct dispersion mechanisms in the Pd/CeO2 (100) system, mediated by the thermodynamic and kinetic behaviors of single Pd species. We discovered that at lower temperatures, the Pd nanoparticles first undergo oxidation followed by the dispersion of PdO, while at higher temperatures, the entire dispersion process of Pd remains in a metallic state. The distinct dispersion mechanisms at different temperatures are driven by the thermodynamic and kinetic differences of environment-dependent single Pd species. The nonmobile Pd1O4 species stabilized at lower temperatures obstructs the direct dispersion of Pd nanoparticles, instead triggering a sequence of pre-oxidation followed by limited dispersion. In contrast, the highly mobile Pd1O2 species at higher temperatures facilitates the complete and direct dispersion of Pd nanoparticles. This research illuminates the essential physical mechanisms of oxidative dispersion from both thermodynamic and kinetic perspectives, potentially enabling strategies for precisely controlling the state of highly dispersed catalysts.
Published: 2024

24. Learning Task Planning from Multi-Modal Demonstration for Multi-Stage Contact-Rich Manipulation

Author: Chen, Kejia, Shen, Zheng, Zhang, Yue, Chen, Lingyun, Wu, Fan, Bing, Zhenshan, Haddadin, Sami, and Knoll, Alois
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: Large Language Models (LLMs) have gained popularity in task planning for long-horizon manipulation tasks. To enhance the validity of LLM-generated plans, visual demonstrations and online videos have been widely employed to guide the planning process. However, for manipulation tasks involving subtle movements but rich contact interactions, visual perception alone may be insufficient for the LLM to fully interpret the demonstration. Additionally, visual data provides limited information on force-related parameters and conditions, which are crucial for effective execution on real robots. In this paper, we introduce an in-context learning framework that incorporates tactile and force-torque information from human demonstrations to enhance LLMs' ability to generate plans for new task scenarios. We propose a bootstrapped reasoning pipeline that sequentially integrates each modality into a comprehensive task plan. This task plan is then used as a reference for planning in new task configurations. Real-world experiments on two different sequential manipulation tasks demonstrate the effectiveness of our framework in improving LLMs' understanding of multi-modal demonstrations and enhancing the overall planning performance.
Published: 2024

25. Semformer: Transformer Language Models with Semantic Planning

Author: Yin, Yongjing, Ding, Junran, Song, Kai, and Zhang, Yue
Subjects: Computer Science - Computation and Language
Abstract: Next-token prediction serves as the dominant component in current neural language models. During the training phase, the model employs teacher forcing, which predicts tokens based on all preceding ground truth tokens. However, this approach has been found to create shortcuts, utilizing the revealed prefix to spuriously fit future tokens, potentially compromising the accuracy of the next-token predictor. In this paper, we introduce Semformer, a novel method of training a Transformer language model that explicitly models the semantic planning of response. Specifically, we incorporate a sequence of planning tokens into the prefix, guiding the planning token representations to predict the latent semantic representations of the response, which are induced by an autoencoder. In a minimal planning task (i.e., graph path-finding), our model exhibits near-perfect performance and effectively mitigates shortcut learning, a feat that standard training methods and baseline models have been unable to accomplish. Furthermore, we pretrain Semformer from scratch with 125M parameters, demonstrating its efficacy through measures of perplexity, in-context learning, and fine-tuning on summarization tasks.
Published: 2024

26. Gated Slot Attention for Efficient Linear-Time Sequence Modeling

Author: Zhang, Yu, Yang, Songlin, Zhu, Ruijie, Zhang, Yue, Cui, Leyang, Wang, Yiqiao, Wang, Bolun, Shi, Freda, Wang, Bailin, Bi, Wei, Zhou, Peng, and Fu, Guohong
Subjects: Computer Science - Computation and Language
Abstract: Linear attention Transformers and their gated variants, celebrated for enabling parallel training and efficient recurrent inference, still fall short in recall-intensive tasks compared to traditional Transformers and demand significant resources for training from scratch. This paper introduces Gated Slot Attention (GSA), which enhances Attention with Bounded-memory-Control (ABC) by incorporating a gating mechanism inspired by Gated Linear Attention (GLA). Essentially, GSA comprises a two-layer GLA linked via $\operatorname{softmax}$, utilizing context-aware memory reading and adaptive forgetting to improve memory capacity while maintaining compact recurrent state size. This design greatly enhances both training and inference efficiency through GLA's hardware-efficient training algorithm and reduced state size. Additionally, retaining the $\operatorname{softmax}$ operation is particularly beneficial in "finetuning pretrained Transformers to RNNs" (T2R) settings, reducing the need for extensive training from scratch. Extensive experiments confirm GSA's superior performance in scenarios requiring in-context recall and in T2R settings., Comment: NeurIPS 2024
Published: 2024

27. Deep Self-Cleansing for Medical Image Segmentation with Noisy Labels

Author: Dong, Jiahua, Zhang, Yue, Wang, Qiuli, Tong, Ruofeng, Ying, Shihong, Gong, Shaolin, Zhang, Xuanpu, Lin, Lanfen, Chen, Yen-Wei, and Zhou, S. Kevin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Medical image segmentation is crucial in the field of medical imaging, aiding in disease diagnosis and surgical planning. Most established segmentation methods rely on supervised deep learning, in which clean and precise labels are essential for supervision and significantly impact the performance of models. However, manually delineated labels often contain noise, such as missing labels and inaccurate boundary delineation, which can hinder networks from correctly modeling target characteristics. In this paper, we propose a deep self-cleansing segmentation framework that can preserve clean labels while cleansing noisy ones in the training phase. To achieve this, we devise a gaussian mixture model-based label filtering module that distinguishes noisy labels from clean labels. Additionally, we develop a label cleansing module to generate pseudo low-noise labels for identified noisy samples. The preserved clean labels and pseudo-labels are then used jointly to supervise the network. Validated on a clinical liver tumor dataset and a public cardiac diagnosis dataset, our method can effectively suppress the interference from noisy labels and achieve prominent segmentation performance., Comment: 31 pages, 7 figures
Published: 2024

28. Helicopter Parenting, Parental Psychological and Behavioral Control Revisited: Assessing Constructs Across the United States and South Korea

Author: Zhang, Yue, Hwang, Woosang, Jung, Eunjoo, Kim, Seong Hee, and Sin, Hye Lim
Published: 2020

29. Epidemiological characteristics and spatiotemporal patterns of visceral leishmaniasis in Xinjiang, China, during 2004-2021

Author: Zhao, Jiangshan, Zhang, Yue, Zhang, Haiting, Wang, Shuo, He, Haibo, Shi, Guangzhong, Wumaier, Maimaitijiang, Hou, Yanyan, Zhang, Ling, Yin, Jianhai, Wang, Yi, and Cao, Jianping
Published: 2024

30. An Evolutionary Task Scheduling Algorithm Using Fuzzy Fitness Evaluation Method for Communication Satellite Network

Author: Jiang, Xuemei, Guo, Yangyang, Zhang, Yue, Song, Yanjie, Pedrycz, Witold, and Xing, Lining
Subjects: Mathematics - Optimization and Control
Abstract: Communications satellite networks (CSNs), as an integral component of the next generation of communication systems, have the capability to offer services globally. Data transmission in this network primarily relies on two modes: inter-satellite communication and satellite-to-ground station communication. The latter directly impacts the successful reception of data by users. However, due to resource and task limitations, finding a satisfactory solution poses a significant challenge. The communication satellite-ground station network scheduling problem (CS-GSNSP) aims to optimize CSN effectiveness by devising a plan that maximizes link construction time while considering constraints associated with satellite operation modes. The large number of tasks and numerous constraints in the problem result in a time-consuming evaluation of fitness function values. To address this issue, we propose a fuzzy fitness evaluation method (FFEA) that employs fuzzy or real evaluation methods based on individual similarity degrees. Additionally, we introduce an evolutionary algorithm based on FFEA (FFEEA) for iteratively searching high-quality network construction schemes. In FFEEA, an adaptive crossover approach is used for efficient population search. Finally, extensive experiments are conducted to demonstrate that our proposed fuzzy fitness evaluation method and other improvement strategies significantly enhance satellite network service time., Comment: 14 pages
Published: 2024

31. LAKD-Activation Mapping Distillation Based on Local Learning

Author: Zhang, Yaoze, Zhang, Yuming, Zhao, Yu, Zhang, Yue, and Zhu, Feiyu
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Knowledge distillation is widely applied in various fundamental vision models to enhance the performance of compact models. Existing knowledge distillation methods focus on designing different distillation targets to acquire knowledge from teacher models. However, these methods often overlook the efficient utilization of distilled information, crudely coupling different types of information, making it difficult to explain how the knowledge from the teacher network aids the student network in learning. This paper proposes a novel knowledge distillation framework, Local Attention Knowledge Distillation (LAKD), which more efficiently utilizes the distilled information from teacher networks, achieving higher interpretability and competitive performance. The framework establishes an independent interactive training mechanism through a separation-decoupling mechanism and non-directional activation mapping. LAKD decouples the teacher's features and facilitates progressive interaction training from simple to complex. Specifically, the student network is divided into local modules with independent gradients to decouple the knowledge transferred from the teacher. The non-directional activation mapping helps the student network integrate knowledge from different local modules by learning coarse-grained feature knowledge. We conducted experiments on the CIFAR-10, CIFAR-100, and ImageNet datasets, and the results show that our LAKD method significantly outperforms existing methods, consistently achieving state-of-the-art performance across different datasets., Comment: 8 pages,7 figures
Published: 2024

32. RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation

Author: Zhang, Xuanwang, Song, Yunze, Wang, Yidong, Tang, Shuyun, Li, Xinfeng, Zeng, Zhengran, Wu, Zhen, Ye, Wei, Xu, Wenyuan, Zhang, Yue, Dai, Xinyu, Zhang, Shikun, and Wen, Qingsong
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) demonstrate human-level capabilities in dialogue, reasoning, and knowledge retention. However, even the most advanced LLMs face challenges such as hallucinations and real-time updating of their knowledge. Current research addresses this bottleneck by equipping LLMs with external knowledge, a technique known as Retrieval Augmented Generation (RAG). However, two key issues constrained the development of RAG. First, there is a growing lack of comprehensive and fair comparisons between novel RAG algorithms. Second, open-source tools such as LlamaIndex and LangChain employ high-level abstractions, which results in a lack of transparency and limits the ability to develop novel algorithms and evaluation metrics. To close this gap, we introduce RAGLAB, a modular and research-oriented open-source library. RAGLAB reproduces 6 existing algorithms and provides a comprehensive ecosystem for investigating RAG algorithms. Leveraging RAGLAB, we conduct a fair comparison of 6 RAG algorithms across 10 benchmarks. With RAGLAB, researchers can efficiently compare the performance of various algorithms and develop novel algorithms., Comment: 6 pages, 3 figures
Published: 2024

33. Personality Alignment of Large Language Models

Author: Zhu, Minjun, Yang, Linyi, and Zhang, Yue
Subjects: Computer Science - Computation and Language
Abstract: Current methods for aligning large language models (LLMs) typically aim to reflect general human values and behaviors, but they often fail to capture the unique characteristics and preferences of individual users. To address this gap, we introduce the concept of Personality Alignment. This approach tailors LLMs' responses and decisions to match the specific preferences of individual users or closely related groups. Inspired by psychometrics, we created the Personality Alignment with Personality Inventories (PAPI) dataset, which includes data from 300,000 real subjects, each providing behavioral preferences based on the Big Five Personality Factors. This dataset allows us to quantitatively evaluate the extent to which LLMs can align with each subject's behavioral patterns. Recognizing the challenges of personality alignments: such as limited personal data, diverse preferences, and scalability requirements: we developed an activation intervention optimization method. This method enhances LLMs' ability to efficiently align with individual behavioral preferences using minimal data and computational resources. Remarkably, our method, PAS, achieves superior performance while requiring only 1/5 of the optimization time compared to DPO, offering practical value for personality alignment. Our work paves the way for future AI systems to make decisions and reason in truly personality ways, enhancing the relevance and meaning of AI interactions for each user and advancing human-centered artificial intelligence.The code has released in \url{https://github.com/zhu-minjun/PAlign}.
Published: 2024

34. Narrowing the Gap between Vision and Action in Navigation

Author: Zhang, Yue and Kordjamshidi, Parisa
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: The existing methods for Vision and Language Navigation in the Continuous Environment (VLN-CE) commonly incorporate a waypoint predictor to discretize the environment. This simplifies the navigation actions into a view selection task and improves navigation performance significantly compared to direct training using low-level actions. However, the VLN-CE agents are still far from the real robots since there are gaps between their visual perception and executed actions. First, VLN-CE agents that discretize the visual environment are primarily trained with high-level view selection, which causes them to ignore crucial spatial reasoning within the low-level action movements. Second, in these models, the existing waypoint predictors neglect object semantics and their attributes related to passibility, which can be informative in indicating the feasibility of actions. To address these two issues, we introduce a low-level action decoder jointly trained with high-level action prediction, enabling the current VLN agent to learn and ground the selected visual view to the low-level controls. Moreover, we enhance the current waypoint predictor by utilizing visual representations containing rich semantic information and explicitly masking obstacles based on humans' prior knowledge about the feasibility of actions. Empirically, our agent can improve navigation performance metrics compared to the strong baselines on both high-level and low-level actions.
Published: 2024

35. See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses

Author: Chen, Yulong, Liu, Yang, Yan, Jianhao, Bai, Xuefeng, Zhong, Ming, Yang, Yinghao, Yang, Ziyi, Zhu, Chenguang, and Zhang, Yue
Subjects: Computer Science - Computation and Language
Abstract: The impressive performance of Large Language Models (LLMs) has consistently surpassed numerous human-designed benchmarks, presenting new challenges in assessing the shortcomings of LLMs. Designing tasks and finding LLMs' limitations are becoming increasingly important. In this paper, we investigate the question of whether an LLM can discover its own limitations from the errors it makes. To this end, we propose a Self-Challenge evaluation framework with human-in-the-loop. Starting from seed instances that GPT-4 fails to answer, we prompt GPT-4 to summarize error patterns that can be used to generate new instances and incorporate human feedback on them to refine these patterns for generating more challenging data, iteratively. We end up with 8 diverse patterns, such as text manipulation and questions with assumptions. We then build a benchmark, SC-G4, consisting of 1,835 instances generated by GPT-4 using these patterns, with human-annotated gold responses. The SC-G4 serves as a challenging benchmark that allows for a detailed assessment of LLMs' abilities. Our results show that only 44.96\% of instances in SC-G4 can be answered correctly by GPT-4. Interestingly, our pilot study indicates that these error patterns also challenge other LLMs, such as Claude-3 and Llama-3, and cannot be fully resolved through fine-tuning. Our work takes the first step to demonstrate that LLMs can autonomously identify their inherent flaws and provide insights for future dynamic and automatic evaluation., Comment: COLM 2024
Published: 2024

36. RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation

Author: Ru, Dongyu, Qiu, Lin, Hu, Xiangkun, Zhang, Tianhang, Shi, Peng, Chang, Shuaichen, Jiayang, Cheng, Wang, Cunxiang, Sun, Shichao, Li, Huanyu, Zhang, Zizhao, Wang, Binjie, Jiang, Jiarong, He, Tong, Wang, Zhiguo, Liu, Pengfei, Zhang, Yue, and Zhang, Zheng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Despite Retrieval-Augmented Generation (RAG) showing promising capability in leveraging external knowledge, a comprehensive evaluation of RAG systems is still challenging due to the modular nature of RAG, evaluation of long-form responses and reliability of measurements. In this paper, we propose a fine-grained evaluation framework, RAGChecker, that incorporates a suite of diagnostic metrics for both the retrieval and generation modules. Meta evaluation verifies that RAGChecker has significantly better correlations with human judgments than other evaluation metrics. Using RAGChecker, we evaluate 8 RAG systems and conduct an in-depth analysis of their performance, revealing insightful patterns and trade-offs in the design choices of RAG architectures. The metrics of RAGChecker can guide researchers and practitioners in developing more effective RAG systems. This work has been open sourced at https://github.com/amazon-science/RAGChecker., Comment: Under Review. Github Repo: https://github.com/amazon-science/RAGChecker
Published: 2024

37. LLM-based MOFs Synthesis Condition Extraction using Few-Shot Demonstrations

Author: Shi, Lei, Liu, Zhimeng, Yang, Yi, Wu, Weize, Zhang, Yuyang, Zhang, Hongbo, Lin, Jing, Wu, Siyu, Chen, Zihan, Li, Ruiming, Wang, Nan, Liu, Zipeng, Tan, Huobin, Gao, Hongyi, Zhang, Yue, and Wang, Ge
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The extraction of Metal-Organic Frameworks (MOFs) synthesis conditions from literature text has been challenging but crucial for the logical design of new MOFs with desirable functionality. The recent advent of large language models (LLMs) provides disruptively new solution to this long-standing problem and latest researches have reported over 90% F1 in extracting correct conditions from MOFs literature. We argue in this paper that most existing synthesis extraction practices with LLMs stay with the primitive zero-shot learning, which could lead to downgraded extraction and application performance due to the lack of specialized knowledge. This work pioneers and optimizes the few-shot in-context learning paradigm for LLM extraction of material synthesis conditions. First, we propose a human-AI joint data curation process to secure high-quality ground-truth demonstrations for few-shot learning. Second, we apply a BM25 algorithm based on the retrieval-augmented generation (RAG) technique to adaptively select few-shot demonstrations for each MOF's extraction. Over a dataset randomly sampled from 84,898 well-defined MOFs, the proposed few-shot method achieves much higher average F1 performance (0.93 vs. 0.81, +14.8%) than the native zero-shot LLM using the same GPT-4 model, under fully automatic evaluation that are more objective than the previous human evaluation. The proposed method is further validated through real-world material experiments: compared with the baseline zero-shot LLM, the proposed few-shot approach increases the MOFs structural inference performance (R^2) by 29.4% in average.
Published: 2024

38. DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training

Author: Xie, Yu, Qiao, Qian, Gao, Jun, Wu, Tianxiang, Fan, Jiaqing, Zhang, Yue, Zhang, Jielei, and Sun, Huyang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: More and more end-to-end text spotting methods based on Transformer architecture have demonstrated superior performance. These methods utilize a bipartite graph matching algorithm to perform one-to-one optimal matching between predicted objects and actual objects. However, the instability of bipartite graph matching can lead to inconsistent optimization targets, thereby affecting the training performance of the model. Existing literature applies denoising training to solve the problem of bipartite graph matching instability in object detection tasks. Unfortunately, this denoising training method cannot be directly applied to text spotting tasks, as these tasks need to perform irregular shape detection tasks and more complex text recognition tasks than classification. To address this issue, we propose a novel denoising training method (DNTextSpotter) for arbitrary-shaped text spotting. Specifically, we decompose the queries of the denoising part into noised positional queries and noised content queries. We use the four Bezier control points of the Bezier center curve to generate the noised positional queries. For the noised content queries, considering that the output of the text in a fixed positional order is not conducive to aligning position with content, we employ a masked character sliding method to initialize noised content queries, thereby assisting in the alignment of text content and position. To improve the model's perception of the background, we further utilize an additional loss function for background characters classification in the denoising training part.Although DNTextSpotter is conceptually simple, it outperforms the state-of-the-art methods on four benchmarks (Total-Text, SCUT-CTW1500, ICDAR15, and Inverse-Text), especially yielding an improvement of 11.3% against the best approach in Inverse-Text dataset., Comment: Accepted by ACM'MM2024
Published: 2024

39. Nonlinearity-induced dynamical self-organized twisted-bilayer lattices in Bose-Einstein condensates

Author: Tian, Rui, Zhang, Yue, Wu, Tianhao, Liu, Min, Zhang, Yong-Chang, Li, Shuai, and Liu, Bo
Subjects: Condensed Matter - Quantum Gases, Condensed Matter - Mesoscale and Nanoscale Physics, Nonlinear Sciences - Pattern Formation and Solitons, Quantum Physics
Abstract: Creating crystal bilayers twisted with respect to each other would lead to large periodic supercell structures, which can support a wide range of novel electron correlated phenomena, where the full understanding is still under debate. Here, we propose a new scheme to realize a nonlinearity-induced dynamical self-organized twisted-bilayer lattice in an atomic Bose-Einstein condensate (BEC). The key idea here is to utilize the nonlinear effect from the intrinsic atomic interactions to couple different layers and induce a dynamical self-organized supercell structure, dramatically distinct from the conventional wisdom to achieve the static twisted-bilayer lattices. To illustrate that, we study the dynamics of a two-component BEC and show that the nonlinear interaction effect naturally emerged in the Gross-Pitaevskii equation of interacting bosonic ultracold atoms can dynamically induce both periodic (commensurable) and aperiodic (incommensurable) moir\'{e} structures. One of the interesting moir\'{e} phenomena, i.e., the flat-band physics, is shown through investigating the dynamics of the wave packet of BEC. Our proposal can be implemented using available state-of-the-art experimental techniques and reveal a profound connection between the nonlinearity and twistronics in cold atom quantum simulators., Comment: 6 pages, 5 figures
Published: 2024

40. Anomalous symmetry protected blockade of skin effect in one-dimensional non-Hermitian lattice systems

Author: Li, Shuai, Liu, Min, Zhang, Yue, Tian, Rui, Arzamasovs, Maksims, and Liu, Bo
Subjects: Quantum Physics, Condensed Matter - Mesoscale and Nanoscale Physics, Condensed Matter - Quantum Gases, Condensed Matter - Superconductivity
Abstract: The non-Hermitian skin effect (NHSE), an anomalous localization behavior of the bulk states, is an inherently non-Hermitian phenomenon, which can not find a counterpart in Hermitian systems. However, the fragility of NHSE has been revealed recently, such as the boundary sensitivity, and it stimulates a lot of studies on discussing the fate of that. Here we present a theorem which shows that the combined spatial reflection symmetry can be considered as a criterion in one-dimensional non-Hermitian systems to determine whether the NHSE can exist or not. Distinct from previous studies, our proposed criterion only relies on analyzing the symmetry of the system, freeing out other requirements, such as the information of the energy spectrum. Furthermore, by taking the non-Hermitian Kitaev chain as an example, we verify our theorem through both a mathematical proof via the non-Bloch band theory and the exact diagonalization numerical studies. Our results reveal a profound connection between the symmetry and the fate of NHSE., Comment: 7 pages, 2 figures, including Supplementary Material
Published: 2024

41. Structure-Aware Simplification for Hypergraph Visualization

Author: Oliver, Peter, Zhang, Eugene, and Zhang, Yue
Subjects: Computer Science - Graphics
Abstract: Hypergraphs provide a natural way to represent polyadic relationships in network data. For large hypergraphs, it is often difficult to visually detect structures within the data. Recently, a scalable polygon-based visualization approach was developed allowing hypergraphs with thousands of hyperedges to be simplified and examined at different levels of detail. However, this approach is not guaranteed to eliminate all of the visual clutter caused by unavoidable overlaps. Furthermore, meaningful structures can be lost at simplified scales, making their interpretation unreliable. In this paper, we define hypergraph structures using the bipartite graph representation, allowing us to decompose the hypergraph into a union of structures including topological blocks, bridges, and branches, and to identify exactly where unavoidable overlaps must occur. We also introduce a set of topology preserving and topology altering atomic operations, enabling the preservation of important structures while reducing unavoidable overlaps to improve visual clarity and interpretability in simplified scales. We demonstrate our approach in several real-world applications., Comment: 15 pages, 14 figures, to be published in VIS 2024
Published: 2024

42. Topology Optimization of Random Memristors for Input-Aware Dynamic SNN

Author: Wang, Bo, Wang, Shaocong, Lin, Ning, Li, Yi, Yu, Yifei, Zhang, Yue, Yang, Jichang, Wu, Xiaoshan, He, Yangu, Wang, Songqi, Chen, Rui, Li, Guoqi, Qi, Xiaojuan, Wang, Zhongrui, and Shang, Dashan
Subjects: Computer Science - Emerging Technologies, Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing
Abstract: There is unprecedented development in machine learning, exemplified by recent large language models and world simulators, which are artificial neural networks running on digital computers. However, they still cannot parallel human brains in terms of energy efficiency and the streamlined adaptability to inputs of different difficulties, due to differences in signal representation, optimization, run-time reconfigurability, and hardware architecture. To address these fundamental challenges, we introduce pruning optimization for input-aware dynamic memristive spiking neural network (PRIME). Signal representation-wise, PRIME employs leaky integrate-and-fire neurons to emulate the brain's inherent spiking mechanism. Drawing inspiration from the brain's structural plasticity, PRIME optimizes the topology of a random memristive spiking neural network without expensive memristor conductance fine-tuning. For runtime reconfigurability, inspired by the brain's dynamic adjustment of computational depth, PRIME employs an input-aware dynamic early stop policy to minimize latency during inference, thereby boosting energy efficiency without compromising performance. Architecture-wise, PRIME leverages memristive in-memory computing, mirroring the brain and mitigating the von Neumann bottleneck. We validated our system using a 40 nm 256 Kb memristor-based in-memory computing macro on neuromorphic image classification and image inpainting. Our results demonstrate the classification accuracy and Inception Score are comparable to the software baseline, while achieving maximal 62.50-fold improvements in energy efficiency, and maximal 77.0% computational load savings. The system also exhibits robustness against stochastic synaptic noise of analogue memristors. Our software-hardware co-designed model paves the way to future brain-inspired neuromorphic computing with brain-like energy efficiency and adaptivity., Comment: 15 pages, 5 figures
Published: 2024

43. Non-chiral non-Bloch invariants and topological phase diagram in non-unitary quantum dynamics without chiral symmetry

Author: Zhang, Yue, Li, Shuai, Xu, Yingchao, Tian, Rui, Zhang, Miao, Li, Hongrong, Gao, Hong, Zubairy, M. Suhail, Li, Fuli, and Liu, Bo
Subjects: Quantum Physics, Condensed Matter - Mesoscale and Nanoscale Physics, Condensed Matter - Quantum Gases, Physics - Optics
Abstract: The non-Bloch topology leads to the emergence of various counter-intuitive phenomena in non-Hermitian systems under the open boundary condition (OBC), which can not find a counterpart in Hermitian systems. However, in the non-Hermitian system without chiral symmetry, being ubiquitous in nature, exploring its non-Bloch topology has so far eluded experimental effort. Here by introducing the concept of non-chiral non-Bloch invariants, we theoretically predict and experimentally identify the non-Bloch topological phase diagram of a one-dimensional (1D) non-Hermitian system without chiral symmetry in discrete-time non-unitary quantum walks of single photons. Interestingly, we find that such topological invariants not only can distinguish topologically distinct gapped phases, but also faithfully capture the corresponding gap closing in open-boundary spectrum at the phase boundary. Different topological regions are experimentally identified by measuring the featured discontinuities of the higher moments of the walker's displacement, which amazingly match excellently with our defined non-Bloch invariants. Our work provides a useful platform to study the interplay among topology, symmetries and the non-Hermiticity., Comment: 10 pages, 5 figures, including Supplementary Material
Published: 2024

44. EUFormer: Learning Driven 3D Spine Deformity Assessment with Orthogonal Optical Images

Author: Meng, Nan, Cheung, Jason P. Y., Huang, Tao, Zhao, Moxin, Zhang, Yue, Yu, Chenxi, Shi, Chang, and Zhang, Teng
Subjects: Computer Science - Graphics
Abstract: In clinical settings, the screening, diagnosis, and monitoring of adolescent idiopathic scoliosis (AIS) typically involve physical or radiographic examinations. However, physical examinations are subjective, while radiographic examinations expose patients to harmful radiation. Consequently, we propose a pipeline that can accurately determine scoliosis severity. This pipeline utilizes posteroanterior (PA) and lateral (LAT) RGB images as input to generate spine curve maps, which are then used to reconstruct the three-dimensional (3D) spine curve for AIS severity grading. To generate the 2D spine curves accurately and efficiently, we further propose an Efficient U-shape transFormer (EUFormer) as the generator. It can efficiently utilize the learned feature across channels, therefore producing consecutive spine curves from both PA and LAT views. Experimental results demonstrate superior performance of EUFormer on spine curve generation against other classical U-shape models. This finding demonstrates that the proposed method for grading the severity of AIS, based on a 3D spine curve, is more accurate when compared to using a 2D spine curve.
Published: 2024

45. Disorder-dependent superconducting pairing symmetry in doped graphene

Author: Guo, Kaiyi, Zhang, Yue, Liang, Ying, and Ma, Tianxing
Subjects: Condensed Matter - Superconductivity
Abstract: Disorder and doping have profound effects on the intrinsic physical mechanisms of superconductivity. In this paper, we employed the determinant quantum Monte Carlo method to investigate the symmetry-allowed superconducting orders on the two-dimensional honeycomb lattice within the Hubbard model, using doped graphene as the carrier, focusing their response to bond disorder. Specifically, we calculated the pairing susceptibility and effective pairing interactions for the $d+id$ wave and extended $s$-wave pairings for different electron densities and disorder strengths. Our calculations show that at high electron densities, increased disorder strength may lead to a transform from $d+id$ wave dominance to extended $s$ wave dominance. However, at lower electron densities, neither of the two superconducting pairings appears under larger disorder strength. Our calculations may contribute to a further understanding of the superconducting behavior in doped materials affected by disorder., Comment: 9 pages and 11 figures. Published version
Published: 2024
Full Text: View/download PDF

46. Dynamic neural network with memristive CIM and CAM for 2D and 3D vision

Author: Zhang, Yue, Zhang, Woyu, Wang, Shaocong, Lin, Ning, Yu, Yifei, He, Yangu, Wang, Bo, Jiang, Hao, Lin, Peng, Xu, Xiaoxin, Qi, Xiaojuan, Wang, Zhongrui, Zhang, Xumeng, Shang, Dashan, Liu, Qi, Cheng, Kwang-Ting, and Liu, Ming
Subjects: Computer Science - Hardware Architecture, Computer Science - Artificial Intelligence, Computer Science - Emerging Technologies, Computer Science - Neural and Evolutionary Computing
Abstract: The brain is dynamic, associative and efficient. It reconfigures by associating the inputs with past experiences, with fused memory and processing. In contrast, AI models are static, unable to associate inputs with past experiences, and run on digital computers with physically separated memory and processing. We propose a hardware-software co-design, a semantic memory-based dynamic neural network (DNN) using memristor. The network associates incoming data with the past experience stored as semantic vectors. The network and the semantic memory are physically implemented on noise-robust ternary memristor-based Computing-In-Memory (CIM) and Content-Addressable Memory (CAM) circuits, respectively. We validate our co-designs, using a 40nm memristor macro, on ResNet and PointNet++ for classifying images and 3D points from the MNIST and ModelNet datasets, which not only achieves accuracy on par with software but also a 48.1% and 15.9% reduction in computational budget. Moreover, it delivers a 77.6% and 93.3% reduction in energy consumption., Comment: In press
Published: 2024

47. Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models

Author: Zhang, Yue, Ma, Ziqiao, Li, Jialu, Qiao, Yanyuan, Wang, Zun, Chai, Joyce, Wu, Qi, Bansal, Mohit, and Kordjamshidi, Parisa
Subjects: Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: Vision-and-Language Navigation (VLN) has gained increasing attention over recent years and many approaches have emerged to advance their development. The remarkable achievements of foundation models have shaped the challenges and proposed methods for VLN research. In this survey, we provide a top-down review that adopts a principled framework for embodied planning and reasoning, and emphasizes the current methods and future opportunities leveraging foundation models to address VLN challenges. We hope our in-depth discussions could provide valuable resources and insights: on one hand, to milestone the progress and explore opportunities and potential roles for foundation models in this field, and on the other, to organize different challenges and solutions in VLN to foundation model researchers., Comment: Authors contributed equally to this work, and supervisors contributed equal advising to this work
Published: 2024

48. Text2TimeSeries: Enhancing Financial Forecasting through Time Series Prediction Updates with Event-Driven Insights from Large Language Models

Author: Kurisinkel, Litton Jose, Mishra, Pruthwik, and Zhang, Yue
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Time series models, typically trained on numerical data, are designed to forecast future values. These models often rely on weighted averaging techniques over time intervals. However, real-world time series data is seldom isolated and is frequently influenced by non-numeric factors. For instance, stock price fluctuations are impacted by daily random events in the broader world, with each event exerting a unique influence on price signals. Previously, forecasts in financial markets have been approached in two main ways: either as time-series problems over price sequence or sentiment analysis tasks. The sentiment analysis tasks aim to determine whether news events will have a positive or negative impact on stock prices, often categorizing them into discrete labels. Recognizing the need for a more comprehensive approach to accurately model time series prediction, we propose a collaborative modeling framework that incorporates textual information about relevant events for predictions. Specifically, we leverage the intuition of large language models about future changes to update real number time series predictions. We evaluated the effectiveness of our approach on financial market data., Comment: 21 pages, 12 figures
Published: 2024

49. GPT-4 vs. Human Translators: A Comprehensive Evaluation of Translation Quality Across Languages, Domains, and Expertise Levels

Author: Yan, Jianhao, Yan, Pingchuan, Chen, Yulong, Li, Judy, Zhu, Xianchao, and Zhang, Yue
Subjects: Computer Science - Computation and Language
Abstract: This study comprehensively evaluates the translation quality of Large Language Models (LLMs), specifically GPT-4, against human translators of varying expertise levels across multiple language pairs and domains. Through carefully designed annotation rounds, we find that GPT-4 performs comparably to junior translators in terms of total errors made but lags behind medium and senior translators. We also observe the imbalanced performance across different languages and domains, with GPT-4's translation capability gradually weakening from resource-rich to resource-poor directions. In addition, we qualitatively study the translation given by GPT-4 and human translators, and find that GPT-4 translator suffers from literal translations, but human translators sometimes overthink the background information. To our knowledge, this study is the first to evaluate LLMs against human translators and analyze the systematic differences between their outputs, providing valuable insights into the current state of LLM-based translation and its potential limitations.
Published: 2024

50. Improved Long-Term Prediction of Chaos Using Reservoir Computing Based on Stochastic Spin-Orbit Torque Devices

Author: Wang, Cen, Lei, Xinyao, Cai, Kaiming, Yang, Xiaofei, and Zhang, Yue
Subjects: Condensed Matter - Materials Science, Nonlinear Sciences - Chaotic Dynamics
Abstract: Predicting chaotic systems is crucial for understanding complex behaviors, yet challenging due to their sensitivity to initial conditions and inherent unpredictability. Probabilistic Reservoir Computing (RC) is well-suited for long-term chaotic predictions by handling complex dynamic systems. Spin-Orbit Torque (SOT) devices in spintronics, with their nonlinear and probabilistic operations, can enhance performance in these tasks. This study proposes an RC system utilizing SOT devices for predicting chaotic dynamics. By simulating the reservoir in an RC network with SOT devices that achieve nonlinear resistance changes with random distribution, we enhance the robustness for the predictive capability of the model. The RC network predicted the behaviors of the Mackey-Glass and Lorenz chaotic systems, demonstrating that stochastic SOT devices significantly improve long-term prediction accuracy., Comment: 14 pages, 3 figures
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Region

Database

Publisher

46,000 results on '"ZHANG, Yue"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources