7,713 results on '"Ye, Wei"'
Search Results
2. A Survey on Evaluating Large Language Models in Code Generation Tasks
- Author
-
Chen, Liguo, Guo, Qi, Jia, Hongrui, Zeng, Zhengran, Wang, Xin, Xu, Yijiang, Wu, Jian, Wang, Yidong, Gao, Qing, Wang, Jindong, Ye, Wei, and Zhang, Shikun
- Subjects
Computer Science - Software Engineering - Abstract
This paper provides a comprehensive review of the current methods and metrics used to evaluate the performance of Large Language Models (LLMs) in code generation tasks. With the rapid growth in demand for automated software development, LLMs have demonstrated significant potential in the field of code generation. The paper begins by reviewing the historical development of LLMs and their applications in code generation. Next, it details various methods and metrics for assessing the code generation capabilities of LLMs, including code correctness, efficiency, readability, and evaluation methods based on expert review and user experience. The paper also evaluates the widely used benchmark datasets, identifying their limitations and proposing directions for future improvements. Specifically, the paper analyzes the performance of code generation models across different tasks by combining multiple evaluation metrics, such as code compilation/interpretation success rates, unit test pass rates, and performance and efficiency metrics, to comprehensively assess the practical application of LLMs in code generation. Finally, the paper discusses the challenges faced in evaluating LLMs in code generation, particularly how to ensure the comprehensiveness and accuracy of evaluation methods and how to adapt to the evolving practices of software development. These analyses and discussions provide valuable insights for further optimizing and improving the application of LLMs in code generation tasks.
- Published
- 2024
3. MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model
- Author
-
Jiang, Chaoya, Hongrui, Jia, Xu, Haiyang, Ye, Wei, Dong, Mengfan, Yan, Ming, Zhang, Ji, Huang, Fei, and Zhang, Shikun
- Subjects
Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Multimedia - Abstract
This paper presents MaVEn, an innovative Multi-granularity Visual Encoding framework designed to enhance the capabilities of Multimodal Large Language Models (MLLMs) in multi-image reasoning. Current MLLMs primarily focus on single-image visual understanding, limiting their ability to interpret and integrate information across multiple images. MaVEn addresses this limitation by combining discrete visual symbol sequences, which abstract coarse-grained semantic concepts, with traditional continuous representation sequences that model fine-grained features. This dual approach bridges the semantic gap between visual and textual data, thereby improving the model's ability to process and interpret information from multiple images effectively. Additionally, we design a dynamic reduction mechanism by for long-sequence continuous features to enhance multi-image processing efficiency. Experimental results demonstrate that MaVEn significantly enhances MLLMs' understanding in complex multi-image scenarios, while also improving performance in single-image contexts.
- Published
- 2024
4. RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation
- Author
-
Zhang, Xuanwang, Song, Yunze, Wang, Yidong, Tang, Shuyun, Li, Xinfeng, Zeng, Zhengran, Wu, Zhen, Ye, Wei, Xu, Wenyuan, Zhang, Yue, Dai, Xinyu, Zhang, Shikun, and Wen, Qingsong
- Subjects
Computer Science - Computation and Language - Abstract
Large Language Models (LLMs) demonstrate human-level capabilities in dialogue, reasoning, and knowledge retention. However, even the most advanced LLMs face challenges such as hallucinations and real-time updating of their knowledge. Current research addresses this bottleneck by equipping LLMs with external knowledge, a technique known as Retrieval Augmented Generation (RAG). However, two key issues constrained the development of RAG. First, there is a growing lack of comprehensive and fair comparisons between novel RAG algorithms. Second, open-source tools such as LlamaIndex and LangChain employ high-level abstractions, which results in a lack of transparency and limits the ability to develop novel algorithms and evaluation metrics. To close this gap, we introduce RAGLAB, a modular and research-oriented open-source library. RAGLAB reproduces 6 existing algorithms and provides a comprehensive ecosystem for investigating RAG algorithms. Leveraging RAGLAB, we conduct a fair comparison of 6 RAG algorithms across 10 benchmarks. With RAGLAB, researchers can efficiently compare the performance of various algorithms and develop novel algorithms., Comment: 6 pages, 3 figures
- Published
- 2024
5. OC3D: Weakly Supervised Outdoor 3D Object Detection with Only Coarse Click Annotation
- Author
-
Xia, Qiming, Lin, Hongwei, Ye, Wei, Wu, Hai, Luo, Yadan, Zhao, Shijia, Li, Xin, and Wen, Chenglu
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
LiDAR-based outdoor 3D object detection has received widespread attention. However, training 3D detectors from the LiDAR point cloud typically relies on expensive bounding box annotations. This paper presents OC3D, an innovative weakly supervised method requiring only coarse clicks on the bird's eye view of the 3D point cloud. A key challenge here is the absence of complete geometric descriptions of the target objects from such simple click annotations. To address this problem, our proposed OC3D adopts a two-stage strategy. In the first stage, we initially design a novel dynamic and static classification strategy and then propose the Click2Box and Click2Mask modules to generate box-level and mask-level pseudo-labels for static and dynamic instances, respectively. In the second stage, we design a Mask2Box module, leveraging the learning capabilities of neural networks to update mask-level pseudo-labels, which contain less information, to box-level pseudo-labels. Experimental results on the widely used KITTI and nuScenes datasets demonstrate that our OC3D with only coarse clicks achieves state-of-the-art performance compared to weakly-supervised 3D detection methods. Combining OC3D with a missing click mining strategy, we propose an OC3D++ pipeline, which requires only 0.2% annotation cost in the KITTI dataset to achieve performance comparable to fully supervised methods. The code will be made publicly available.
- Published
- 2024
6. Refining Corpora from a Model Calibration Perspective for Chinese Spelling Correction
- Author
-
Yu, Dingyao, An, Yang, Ye, Wei, Xiao, Xiongfeng, Mao, Shaoguang, Ge, Tao, and Zhang, Shikun
- Subjects
Computer Science - Computation and Language - Abstract
Chinese Spelling Correction (CSC) commonly lacks large-scale high-quality corpora, due to the labor-intensive labeling of spelling errors in real-life human writing or typing scenarios. Two data augmentation methods are widely adopted: (1) \textit{Random Replacement} with the guidance of confusion sets and (2) \textit{OCR/ASR-based Generation} that simulates character misusing. However, both methods inevitably introduce noisy data (e.g., false spelling errors), potentially leading to over-correction. By carefully analyzing the two types of corpora, we find that though the latter achieves more robust generalization performance, the former yields better-calibrated CSC models. We then provide a theoretical analysis of this empirical observation, based on which a corpus refining strategy is proposed. Specifically, OCR/ASR-based data samples are fed into a well-calibrated CSC model trained on random replacement-based corpora and then filtered based on prediction confidence. By learning a simple BERT-based model on the refined OCR/ASR-based corpus, we set up impressive state-of-the-art performance on three widely-used benchmarks, while significantly alleviating over-correction (e.g., lowering false positive predictions).
- Published
- 2024
7. Enhancing In-Context Learning via Implicit Demonstration Augmentation
- Author
-
Zhou, Xiaoling, Ye, Wei, Wang, Yidong, Jiang, Chaoya, Lee, Zhemg, Xie, Rui, and Zhang, Shikun
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,I.2.7 - Abstract
The emergence of in-context learning (ICL) enables large pre-trained language models (PLMs) to make predictions for unseen inputs without updating parameters. Despite its potential, ICL's effectiveness heavily relies on the quality, quantity, and permutation of demonstrations, commonly leading to suboptimal and unstable performance. In this paper, we tackle this challenge for the first time from the perspective of demonstration augmentation. Specifically, we start with enriching representations of demonstrations by leveraging their deep feature distribution. We then theoretically reveal that when the number of augmented copies approaches infinity, the augmentation is approximately equal to a novel logit calibration mechanism integrated with specific statistical properties. This insight results in a simple yet highly efficient method that significantly improves the average and worst-case accuracy across diverse PLMs and tasks. Moreover, our method effectively reduces performance variance among varying demonstrations, permutations, and templates, and displays the capability to address imbalanced class distributions., Comment: Accepted by ACL 2024 Main 19 pages,10 figures
- Published
- 2024
8. Decoupling Forgery Semantics for Generalizable Deepfake Detection
- Author
-
Ye, Wei, He, Xinan, and Ding, Feng
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this paper, we propose a novel method for detecting DeepFakes, enhancing the generalization of detection through semantic decoupling. There are now multiple DeepFake forgery technologies that not only possess unique forgery semantics but may also share common forgery semantics. The unique forgery semantics and irrelevant content semantics may promote over-fitting and hamper generalization for DeepFake detectors. For our proposed method, after decoupling, the common forgery semantics could be extracted from DeepFakes, and subsequently be employed for developing the generalizability of DeepFake detectors. Also, to pursue additional generalizability, we designed an adaptive high-pass module and a two-stage training strategy to improve the independence of decoupled semantics. Evaluation on FF++, Celeb-DF, DFD, and DFDC datasets showcases our method's excellent detection and generalization performance. Code is available at: https://github.com/leaffeall/DFS-GDD., Comment: Accepted by BMVC 2024
- Published
- 2024
9. Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling
- Author
-
Qiao, Zile, Ye, Wei, Jiang, Yong, Mo, Tong, Xie, Pengjun, Li, Weiping, Huang, Fei, and Zhang, Shikun
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Retrieval-augmented language models (RALMs) have recently shown great potential in mitigating the limitations of implicit knowledge in LLMs, such as untimely updating of the latest expertise and unreliable retention of long-tail knowledge. However, since the external knowledge base, as well as the retriever, can not guarantee reliability, potentially leading to the knowledge retrieved not being helpful or even misleading for LLM generation. In this paper, we introduce Supportiveness-based Knowledge Rewriting (SKR), a robust and pluggable knowledge rewriter inherently optimized for LLM generation. Specifically, we introduce the novel concept of "supportiveness"--which represents how effectively a knowledge piece facilitates downstream tasks--by considering the perplexity impact of augmented knowledge on the response text of a white-box LLM. Based on knowledge supportiveness, we first design a training data curation strategy for our rewriter model, effectively identifying and filtering out poor or irrelevant rewrites (e.g., with low supportiveness scores) to improve data efficacy. We then introduce the direct preference optimization (DPO) algorithm to align the generated rewrites to optimal supportiveness, guiding the rewriter model to summarize augmented content that better improves the final response. Comprehensive evaluations across six popular knowledge-intensive tasks and four LLMs have demonstrated the effectiveness and superiority of SKR. With only 7B parameters, SKR has shown better knowledge rewriting capability over GPT-4, the current state-of-the-art general-purpose LLM.
- Published
- 2024
10. AutoSurvey: Large Language Models Can Automatically Write Surveys
- Author
-
Wang, Yidong, Guo, Qi, Yao, Wenjin, Zhang, Hongbo, Zhang, Xin, Wu, Zhen, Zhang, Meishan, Dai, Xinyu, Zhang, Min, Wen, Qingsong, Ye, Wei, Zhang, Shikun, and Zhang, Yue
- Subjects
Computer Science - Information Retrieval ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
This paper introduces AutoSurvey, a speedy and well-organized methodology for automating the creation of comprehensive literature surveys in rapidly evolving fields like artificial intelligence. Traditional survey paper creation faces challenges due to the vast volume and complexity of information, prompting the need for efficient survey methods. While large language models (LLMs) offer promise in automating this process, challenges such as context window limitations, parametric knowledge constraints, and the lack of evaluation benchmarks remain. AutoSurvey addresses these challenges through a systematic approach that involves initial retrieval and outline generation, subsection drafting by specialized LLMs, integration and refinement, and rigorous evaluation and iteration. Our contributions include a comprehensive solution to the survey problem, a reliable evaluation method, and experimental validation demonstrating AutoSurvey's effectiveness.We open our resources at \url{https://github.com/AutoSurveys/AutoSurvey}.
- Published
- 2024
11. A3:Ambiguous Aberrations Captured via Astray-Learning for Facial Forgery Semantic Sublimation
- Author
-
He, Xinan, Zhou, Yue, Ye, Wei, and Ding, Feng
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Prior DeepFake detection methods have faced a core challenge in preserving generalizability and fairness effectively. In this paper, we proposed an approach akin to decoupling and sublimating forgery semantics, named astray-learning. The primary objective of the proposed method is to blend hybrid forgery semantics derived from high-frequency components into authentic imagery, named aberrations. The ambiguity of aberrations is beneficial to reducing the model's bias towards specific semantics. Consequently, it can enhance the model's generalization ability and maintain the detection fairness. All codes for astray-learning are publicly available at https://anonymous.4open.science/r/astray-learning-C49B ., Comment: 19 pages, 9 figures
- Published
- 2024
12. Deep Hierarchical Graph Alignment Kernels
- Author
-
Tang, Shuhao, Tian, Hao, Cao, Xiaofeng, and Ye, Wei
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Typical R-convolution graph kernels invoke the kernel functions that decompose graphs into non-isomorphic substructures and compare them. However, overlooking implicit similarities and topological position information between those substructures limits their performances. In this paper, we introduce Deep Hierarchical Graph Alignment Kernels (DHGAK) to resolve this problem. Specifically, the relational substructures are hierarchically aligned to cluster distributions in their deep embedding space. The substructures belonging to the same cluster are assigned the same feature map in the Reproducing Kernel Hilbert Space (RKHS), where graph feature maps are derived by kernel mean embedding. Theoretical analysis guarantees that DHGAK is positive semi-definite and has linear separability in the RKHS. Comparison with state-of-the-art graph kernels on various benchmark datasets demonstrates the effectiveness and efficiency of DHGAK. The code is available at Github (https://github.com/EWesternRa/DHGAK).
- Published
- 2024
13. Generative manufacturing systems using diffusion models and ChatGPT
- Author
-
Li, Xingyu, Tao, Fei, Ye, Wei, Nassehi, Aydin, and Sutherland, John W.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Human-Computer Interaction ,Electrical Engineering and Systems Science - Systems and Control - Abstract
In this study, we introduce Generative Manufacturing Systems (GMS) as a novel approach to effectively manage and coordinate autonomous manufacturing assets, thereby enhancing their responsiveness and flexibility to address a wide array of production objectives and human preferences. Deviating from traditional explicit modeling, GMS employs generative AI, including diffusion models and ChatGPT, for implicit learning from envisioned futures, marking a shift from a model-optimum to a training-sampling decision-making. Through the integration of generative AI, GMS enables complex decision-making through interactive dialogue with humans, allowing manufacturing assets to generate multiple high-quality global decisions that can be iteratively refined based on human feedback. Empirical findings showcase GMS's substantial improvement in system resilience and responsiveness to uncertainties, with decision times reduced from seconds to milliseconds. The study underscores the inherent creativity and diversity in the generated solutions, facilitating human-centric decision-making through seamless and continuous human-machine interactions.
- Published
- 2024
14. Boosting Model Resilience via Implicit Adversarial Data Augmentation
- Author
-
Zhou, Xiaoling, Ye, Wei, Lee, Zhemg, Xie, Rui, and Zhang, Shikun
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition ,I.2.6 ,I.4.3 - Abstract
Data augmentation plays a pivotal role in enhancing and diversifying training data. Nonetheless, consistently improving model performance in varied learning scenarios, especially those with inherent data biases, remains challenging. To address this, we propose to augment the deep features of samples by incorporating their adversarial and anti-adversarial perturbation distributions, enabling adaptive adjustment in the learning difficulty tailored to each sample's specific characteristics. We then theoretically reveal that our augmentation process approximates the optimization of a surrogate loss function as the number of augmented copies increases indefinitely. This insight leads us to develop a meta-learning-based framework for optimizing classifiers with this novel loss, introducing the effects of augmentation while bypassing the explicit augmentation process. We conduct extensive experiments across four common biased learning scenarios: long-tail learning, generalized long-tail learning, noisy label learning, and subpopulation shift learning. The empirical results demonstrate that our method consistently achieves state-of-the-art performance, highlighting its broad adaptability., Comment: 9 pages, 6 figures, accepted by IJCAI 2024
- Published
- 2024
15. Noiseless linear amplification-based quantum Ziv-Zakai bound for phase estimation and its Heisenberg error limits in noisy scenarios
- Author
-
Ye, Wei, Xiao, Peng, Xu, Xiaofan, Zhu, Xiang, Yan, Yunbin, Wang, Lu, Ren, Jie, Zhu, Yuxuan, Xia, Ying, Rao, Xuan, and Chang, Shoukang
- Subjects
Quantum Physics - Abstract
In this work, we address the central problem about how to effectively find the available precision limit of unknown parameters. In the framework of the quantum Ziv-Zakai bound (QZZB), we employ noiseless linear amplification (NLA)techniques to an initial coherent state (CS) as the probe state, and focus on whether the phase estimation performance is improved significantly in noisy scenarios, involving the photon-loss and phase-diffusion cases. More importantly, we also obtain two kinds of Heisenberg error limits of the QZZB with the NLA-based CS in these noisy scenarios, making comparisons with both the Margolus-Levitin (ML) type bound and the Mandelstam-Tamm (MT) type bound. Our analytical results show that in cases of photon loss and phase diffusion, the phase estimation performance of the QZZB can be improved remarkably by increasing the NLA gain factor. Particularly, the improvement is more pronounced with severe photon losses. Furthermore in minimal photon losses, our Heisenberg error limit shows better compactness than the cases of the ML-type and MT-type bounds. Our findings will provide an useful guidance for accomplishing more complex quantum information processing tasks., Comment: 10 pages, 9 figures
- Published
- 2024
16. FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models
- Author
-
Yu, Zhuohao, Gao, Chang, Yao, Wenjin, Wang, Yidong, Zeng, Zhengran, Ye, Wei, Wang, Jindong, Zhang, Yue, and Zhang, Shikun
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
The rapid development of large language model (LLM) evaluation methodologies and datasets has led to a profound challenge: integrating state-of-the-art evaluation techniques cost-effectively while ensuring reliability, reproducibility, and efficiency. Currently, there is a notable absence of a unified and adaptable framework that seamlessly integrates various evaluation approaches. Moreover, the reliability of evaluation findings is often questionable due to potential data contamination, with the evaluation efficiency commonly overlooked when facing the substantial costs associated with LLM inference. In response to these challenges, we introduce FreeEval, a modular and scalable framework crafted to enable trustworthy and efficient automatic evaluations of LLMs. Firstly, FreeEval's unified abstractions simplify the integration and improve the transparency of diverse evaluation methodologies, encompassing dynamic evaluation that demand sophisticated LLM interactions. Secondly, the framework integrates meta-evaluation techniques like human evaluation and data contamination detection, which, along with dynamic evaluation modules in the platform, enhance the fairness of the evaluation outcomes. Lastly, FreeEval is designed with a high-performance infrastructure, including distributed computation and caching strategies, enabling extensive evaluations across multi-node, multi-GPU clusters for open-source and proprietary LLMs., Comment: We open-source all our code at: https://github.com/WisdomShell/FreeEval
- Published
- 2024
17. CoderUJB: An Executable and Unified Java Benchmark for Practical Programming Scenarios
- Author
-
Zeng, Zhengran, Wang, Yidong, Xie, Rui, Ye, Wei, and Zhang, Shikun
- Subjects
Computer Science - Software Engineering ,68N30 (Primary) 68T20 (Secondary) ,D.2.0 - Abstract
In the evolving landscape of large language models (LLMs) tailored for software engineering, the need for benchmarks that accurately reflect real-world development scenarios is paramount. Current benchmarks are either too simplistic or fail to capture the multi-tasking nature of software development. To address this, we introduce CoderUJB, a new benchmark designed to evaluate LLMs across diverse Java programming tasks that are executable and reflective of actual development scenarios, acknowledging Java's prevalence in real-world software production. CoderUJB comprises 2,239 programming questions derived from 17 real open-source Java projects and spans five practical programming tasks. Our empirical study on this benchmark investigates the coding abilities of various open-source and closed-source LLMs, examining the effects of continued pre-training in specific programming languages code and instruction fine-tuning on their performance. The findings indicate that while LLMs exhibit strong potential, challenges remain, particularly in non-functional code generation (e.g., test generation and defect detection). Importantly, our results advise caution in the specific programming languages continued pre-training and instruction fine-tuning, as these techniques could hinder model performance on certain tasks, suggesting the need for more nuanced strategies. CoderUJB thus marks a significant step towards more realistic evaluations of programming capabilities in LLMs, and our study provides valuable insights for the future development of these models in software engineering., Comment: 11 pages, 4 figures, issta2024 accepted
- Published
- 2024
18. CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians
- Author
-
Paliwal, Avinash, Ye, Wei, Xiong, Jinhui, Kotovenko, Dmytro, Ranjan, Rakesh, Chandra, Vikas, and Kalantari, Nima Khademi
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Graphics - Abstract
The field of 3D reconstruction from images has rapidly evolved in the past few years, first with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian Splatting (3DGS). The latter provides a significant edge over NeRF in terms of the training and inference speed, as well as the reconstruction quality. Although 3DGS works well for dense input images, the unstructured point-cloud like representation quickly overfits to the more challenging setup of extremely sparse input images (e.g., 3 images), creating a representation that appears as a jumble of needles from novel views. To address this issue, we propose regularized optimization and depth-based initialization. Our key idea is to introduce a structured Gaussian representation that can be controlled in 2D image space. We then constraint the Gaussians, in particular their position, and prevent them from moving independently during optimization. Specifically, we introduce single and multiview constraints through an implicit convolutional decoder and a total variation loss, respectively. With the coherency introduced to the Gaussians, we further constrain the optimization through a flow-based loss function. To support our regularized optimization, we propose an approach to initialize the Gaussians using monocular depth estimates at each input view. We demonstrate significant improvements compared to the state-of-the-art sparse-view NeRF-based approaches on a variety of scenes., Comment: Project page: https://people.engr.tamu.edu/nimak/Papers/CoherentGS
- Published
- 2024
19. CodeShell Technical Report
- Author
-
Xie, Rui, Zeng, Zhengran, Yu, Zhuohao, Gao, Chang, Zhang, Shikun, and Ye, Wei
- Subjects
Computer Science - Software Engineering ,Computer Science - Artificial Intelligence - Abstract
Code large language models mark a pivotal breakthrough in artificial intelligence. They are specifically crafted to understand and generate programming languages, significantly boosting the efficiency of coding development workflows. In this technical report, we present CodeShell-Base, a seven billion-parameter foundation model with 8K context length, showcasing exceptional proficiency in code comprehension. By incorporating Grouped-Query Attention and Rotary Positional Embedding into GPT-2, CodeShell-Base integrates the structural merits of StarCoder and CodeLlama and forms its unique architectural design. We then carefully built a comprehensive data pre-processing process, including similar data deduplication, perplexity-based data filtering, and model-based data filtering. Through this process, We have curated 100 billion high-quality pre-training data from GitHub. Benefiting from the high-quality data, CodeShell-Base outperforms CodeLlama in Humaneval after training on just 500 billion tokens (5 epochs). We have conducted extensive experiments across multiple language datasets, including Python, Java, and C++, and the results indicate that our model possesses robust foundational capabilities in code comprehension and generation.
- Published
- 2024
20. NightHaze: Nighttime Image Dehazing via Self-Prior Learning
- Author
-
Lin, Beibei, Jin, Yeying, Yan, Wending, Ye, Wei, Yuan, Yuan, and Tan, Robby T.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Masked autoencoder (MAE) shows that severe augmentation during training produces robust representations for high-level tasks. This paper brings the MAE-like framework to nighttime image enhancement, demonstrating that severe augmentation during training produces strong network priors that are resilient to real-world night haze degradations. We propose a novel nighttime image dehazing method with self-prior learning. Our main novelty lies in the design of severe augmentation, which allows our model to learn robust priors. Unlike MAE that uses masking, we leverage two key challenging factors of nighttime images as augmentation: light effects and noise. During training, we intentionally degrade clear images by blending them with light effects as well as by adding noise, and subsequently restore the clear images. This enables our model to learn clear background priors. By increasing the noise values to approach as high as the pixel intensity values of the glow and light effect blended images, our augmentation becomes severe, resulting in stronger priors. While our self-prior learning is considerably effective in suppressing glow and revealing details of background scenes, in some cases, there are still some undesired artifacts that remain, particularly in the forms of over-suppression. To address these artifacts, we propose a self-refinement module based on the semi-supervised teacher-student framework. Our NightHaze, especially our MAE-like self-prior learning, shows that models trained with severe augmentation effectively improve the visibility of input haze images, approaching the clarity of clear nighttime images. Extensive experiments demonstrate that our NightHaze achieves state-of-the-art performance, outperforming existing nighttime image dehazing methods by a substantial margin of 15.5% for MUSIQ and 23.5% for ClipIQA.
- Published
- 2024
21. NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
- Author
-
Ju, Zeqian, Wang, Yuancheng, Shen, Kai, Tan, Xu, Xin, Detai, Yang, Dongchao, Liu, Yanqing, Leng, Yichong, Song, Kaitao, Tang, Siliang, Wu, Zhizheng, Qin, Tao, Li, Xiang-Yang, Ye, Wei, Zhang, Shikun, Bian, Jiang, He, Lei, Li, Jinyu, and Zhao, Sheng
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning ,Computer Science - Sound - Abstract
While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing different attributes and generate them individually. Motivated by it, we propose NaturalSpeech 3, a TTS system with novel factorized diffusion models to generate natural speech in a zero-shot way. Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model to generate attributes in each subspace following its corresponding prompt. With this factorization design, NaturalSpeech 3 can effectively and efficiently model intricate speech with disentangled subspaces in a divide-and-conquer way. Experiments show that NaturalSpeech 3 outperforms the state-of-the-art TTS systems on quality, similarity, prosody, and intelligibility, and achieves on-par quality with human recordings. Furthermore, we achieve better performance by scaling to 1B parameters and 200K hours of training data., Comment: Achieving human-level quality and naturalness on multi-speaker datasets (e.g., LibriSpeech) in a zero-shot way
- Published
- 2024
22. Can Large Language Models Recall Reference Location Like Humans?
- Author
-
Wang, Ye, Xu, Xinrun, Xie, Rui, Hu, Wenxin, and Ye, Wei
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
When completing knowledge-intensive tasks, humans sometimes need not just an answer but also a corresponding reference passage for auxiliary reading. Previous methods required obtaining pre-segmented article chunks through additional retrieval models. This paper explores leveraging the parameterized knowledge stored during the pre-training phase of large language models (LLMs) to independently recall reference passage from any starting position. We propose a two-stage framework that simulates the scenario of humans recalling easily forgotten references. Initially, the LLM is prompted to recall document title identifiers to obtain a coarse-grained document set. Then, based on the acquired coarse-grained document set, it recalls fine-grained passage. In the two-stage recall process, we use constrained decoding to ensure that content outside of the stored documents is not generated. To increase speed, we only recall a short prefix in the second stage, then locate its position to retrieve a complete passage. Experiments on KILT knowledge-sensitive tasks have verified that LLMs can independently recall reference passage location in various task forms, and the obtained reference significantly assist downstream tasks.
- Published
- 2024
23. Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models
- Author
-
Jiang, Chaoya, Ye, Wei, Dong, Mengfan, Jia, Hongrui, Xu, Haiyang, Yan, Ming, Zhang, Ji, and Zhang, Shikun
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Large Vision Language Models exhibit remarkable capabilities but struggle with hallucinations inconsistencies between images and their descriptions. Previous hallucination evaluation studies on LVLMs have identified hallucinations in terms of objects, attributes, and relations but overlooked complex hallucinations that create an entire narrative around a fictional entity. In this paper, we introduce a refined taxonomy of hallucinations, featuring a new category: Event Hallucination. We then utilize advanced LLMs to generate and filter fine grained hallucinatory data consisting of various types of hallucinations, with a particular focus on event hallucinations, laying the groundwork for integrating discriminative and generative evaluation methods within our universal evaluation framework. The proposed benchmark distinctively assesses LVLMs ability to tackle a broad spectrum of hallucinations, making it a reliable and comprehensive tool for gauging LVLMs efficacy in handling hallucinations. We will release our code and data.
- Published
- 2024
24. KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
- Author
-
Yu, Zhuohao, Gao, Chang, Yao, Wenjin, Wang, Yidong, Ye, Wei, Wang, Jindong, Xie, Xing, Zhang, Yue, and Zhang, Shikun
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Automatic evaluation methods for large language models (LLMs) are hindered by data contamination, leading to inflated assessments of their effectiveness. Existing strategies, which aim to detect contaminated texts, focus on quantifying contamination status instead of accurately gauging model performance. In this paper, we introduce KIEval, a Knowledge-grounded Interactive Evaluation framework, which incorporates an LLM-powered "interactor" role for the first time to accomplish a dynamic contamination-resilient evaluation. Starting with a question in a conventional LLM benchmark involving domain-specific knowledge, KIEval utilizes dynamically generated, multi-round, and knowledge-focused dialogues to determine whether a model's response is merely a recall of benchmark answers or demonstrates a deep comprehension to apply knowledge in more complex conversations. Extensive experiments on seven leading LLMs across five datasets validate KIEval's effectiveness and generalization. We also reveal that data contamination brings no contribution or even negative effect to models' real-world applicability and understanding, and existing contamination detection methods for LLMs can only identify contamination in pre-training but not during supervised fine-tuning., Comment: Accepted to ACL 2024 (main conference); 19 pages, 5 figures, 19 tables, code is available at: https://github.com/zhuohaoyu/KIEval
- Published
- 2024
25. Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection
- Author
-
Ye, Wei, Jiang, Chaoya, Xu, Haiyang, Ye, Chenhao, Li, Chenliang, Yan, Ming, Zhang, Shikun, Huang, Songhang, and Huang, Fei
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Vision Transformers (ViTs) have become increasingly popular in large-scale Vision and Language Pre-training (VLP) models. Although previous VLP research has demonstrated the efficacy of ViTs, these efforts still struggle with computational inefficiencies caused by lengthy visual sequences. To address this challenge, we introduce an efficient VLP approach called TRIPS, which stands for Text-Relevant Image Patch Selection. TRIPS progressively reduces the visual sequence using a text-guided patch-selection layer in the visual backbone, thereby accelerating both training and inference processes. This patch-selection layer dynamically computes text-dependent visual attention, enabling it to identify attentive image tokens with text guidance and fuse inattentive ones in an end-to-end fashion. Importantly, TRIPS does not add any extra parameters and generalizes to most ViT-based VLP models. We incorporate TRIPS into three representative VLP models covering single-stream, dual-stream, and generative paradigms, and conduct extensive experiments on five widely-used multi-modal benchmark datasets. Our experimental results reveal that TRIPS delivers a 40% speedup, while maintaining competitive or superior performance on downstream tasks.
- Published
- 2024
26. NightRain: Nighttime Video Deraining via Adaptive-Rain-Removal and Adaptive-Correction
- Author
-
Lin, Beibei, Jin, Yeying, Yan, Wending, Ye, Wei, Yuan, Yuan, Zhang, Shunli, and Tan, Robby
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Existing deep-learning-based methods for nighttime video deraining rely on synthetic data due to the absence of real-world paired data. However, the intricacies of the real world, particularly with the presence of light effects and low-light regions affected by noise, create significant domain gaps, hampering synthetic-trained models in removing rain streaks properly and leading to over-saturation and color shifts. Motivated by this, we introduce NightRain, a novel nighttime video deraining method with adaptive-rain-removal and adaptive-correction. Our adaptive-rain-removal uses unlabeled rain videos to enable our model to derain real-world rain videos, particularly in regions affected by complex light effects. The idea is to allow our model to obtain rain-free regions based on the confidence scores. Once rain-free regions and the corresponding regions from our input are obtained, we can have region-based paired real data. These paired data are used to train our model using a teacher-student framework, allowing the model to iteratively learn from less challenging regions to more challenging regions. Our adaptive-correction aims to rectify errors in our model's predictions, such as over-saturation and color shifts. The idea is to learn from clear night input training videos based on the differences or distance between those input videos and their corresponding predictions. Our model learns from these differences, compelling our model to correct the errors. From extensive experiments, our method demonstrates state-of-the-art performance. It achieves a PSNR of 26.73dB, surpassing existing nighttime video deraining methods by a substantial margin of 13.7%., Comment: Accepted by AAAI24
- Published
- 2024
27. Cytosolic DNA initiates a vicious circle of aging-related endothelial inflammation and mitochondrial dysfunction via STING: the inhibitory effect of Cilostazol
- Author
-
Zheng, Zhi-hua, Wang, Jiao-jiao, Lin, Jiu-guo, Ye, Wei-le, Zou, Jia-mi, Liang, Li-yin, Yang, Ping-lian, Qiu, Wan-lu, Li, Yuan-yuan, Yang, Si-jia, Zhao, Man, Zhou, Qing, Li, Cheng-zhi, Li, Min, Li, Zhuo-ming, Zhang, Dong-mei, Liu, Pei-qing, and Liu, Zhi-ping
- Published
- 2024
- Full Text
- View/download PDF
28. Investigation on successive gas breakthroughs behavior of saturated GMZ bentonite under rigid boundary conditions
- Author
-
Cui, Lin-Yong, Ye, Wei-Min, Wang, Qiong, Chen, Yong-Gui, and Cui, Yu-Jun
- Published
- 2024
- Full Text
- View/download PDF
29. Intermolecular asymmetric functionalization of unstrained C(sp3)–C(sp3) bonds in allylic substitution reactions
- Author
-
Chen, Ye-Wei, Qiu, Yehao, Liu, Yang, Lin, Guo-Qiang, Hartwig, John F., and He, Zhi-Tao
- Published
- 2024
- Full Text
- View/download PDF
30. Supervised Knowledge Makes Large Language Models Better In-context Learners
- Author
-
Yang, Linyi, Zhang, Shuibai, Yu, Zhuohao, Bao, Guangsheng, Wang, Yidong, Wang, Jindong, Xu, Ruochen, Ye, Wei, Xie, Xing, Chen, Weizhu, and Zhang, Yue
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The recent progress in large-scale generative models has further expanded their use in real-world language applications. However, the critical challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. While previous in-context learning research has focused on enhancing models to adhere to users' specific instructions and quality expectations, and to avoid undesired outputs, little to no work has explored the use of task-Specific fine-tuned Language Models (SLMs) to improve LLMs' in-context learning during the inference stage. Our primary contribution is the establishment of a simple yet effective framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks. Using our proposed plug-in method, enhanced versions of Llama 2 and ChatGPT surpass their original versions regarding generalizability and factuality. We offer a comprehensive suite of resources, including 16 curated datasets, prompts, model checkpoints, and LLM outputs across 9 distinct tasks. The code and data are released at: https://github.com/YangLinyi/Supervised-Knowledge-Makes-Large-Language-Models-Better-In-context-Learners. Our empirical analysis sheds light on the advantages of incorporating discriminative models into LLMs and highlights the potential of our methodology in fostering more reliable LLMs., Comment: Accepted to ICLR 2024
- Published
- 2023
31. PICNN: A Pathway towards Interpretable Convolutional Neural Networks
- Author
-
Guo, Wengang, Yang, Jiayi, Yin, Huilin, Chen, Qijun, and Ye, Wei
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Convolutional Neural Networks (CNNs) have exhibited great performance in discriminative feature learning for complex visual tasks. Besides discrimination power, interpretability is another important yet under-explored property for CNNs. One difficulty in the CNN interpretability is that filters and image classes are entangled. In this paper, we introduce a novel pathway to alleviate the entanglement between filters and image classes. The proposed pathway groups the filters in a late conv-layer of CNN into class-specific clusters. Clusters and classes are in a one-to-one relationship. Specifically, we use the Bernoulli sampling to generate the filter-cluster assignment matrix from a learnable filter-class correspondence matrix. To enable end-to-end optimization, we develop a novel reparameterization trick for handling the non-differentiable Bernoulli sampling. We evaluate the effectiveness of our method on ten widely used network architectures (including nine CNNs and a ViT) and five benchmark datasets. Experimental results have demonstrated that our method PICNN (the combination of standard CNNs with our proposed pathway) exhibits greater interpretability than standard CNNs while achieving higher or comparable discrimination power.
- Published
- 2023
32. TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training
- Author
-
Jiang, Chaoya, ye, Wei, Xu, Haiyang, Ye, Qinghao, Yan, Ming, Zhang, Ji, and Zhang, Shikun
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Self-supervised Multi-modal Contrastive Learning (SMCL) remarkably advances modern Vision-Language Pre-training (VLP) models by aligning visual and linguistic modalities. Due to noises in web-harvested text-image pairs, however, scaling up training data volume in SMCL presents considerable obstacles in terms of computational cost and data inefficiency. To improve data efficiency in VLP, we propose Text-aware Image Mixing (TiMix), which integrates mix-based data augmentation techniques into SMCL, yielding significant performance improvements without significantly increasing computational overhead. We provide a theoretical analysis of TiMixfrom a mutual information (MI) perspective, showing that mixed data samples for cross-modal contrastive learning implicitly serve as a regularizer for the contrastive loss. The experimental results demonstrate that TiMix exhibits a comparable performance on downstream tasks, even with a reduced amount of training data and shorter training time, when benchmarked against existing methods. This work empirically and theoretically demonstrates the potential of data mixing for data-efficient and computationally viable VLP, benefiting broader VLP model adoption in practical scenarios., Comment: Accepted on AAAI2024
- Published
- 2023
33. Labels Need Prompts Too: Mask Matching for Natural Language Understanding Tasks
- Author
-
Li, Bo, Ye, Wei, Wang, Quansen, Zhao, Wen, and Zhang, Shikun
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Textual label names (descriptions) are typically semantically rich in many natural language understanding (NLU) tasks. In this paper, we incorporate the prompting methodology, which is widely used to enrich model input, into the label side for the first time. Specifically, we propose a Mask Matching method, which equips an input with a prompt and its label with another, and then makes predictions by matching their mask representations. We evaluate our method extensively on 8 NLU tasks with 14 datasets. The experimental results show that Mask Matching significantly outperforms its counterparts of fine-tuning and conventional prompt-tuning, setting up state-of-the-art performances in several datasets. Mask Matching is particularly good at handling NLU tasks with large label counts and informative label names. As pioneering efforts that investigate the label-side prompt, we also discuss open issues for future study., Comment: AAAI2024, Regular Paper
- Published
- 2023
34. COMBHelper: A Neural Approach to Reduce Search Space for Graph Combinatorial Problems
- Author
-
Tian, Hao, Medya, Sourav, and Ye, Wei
- Subjects
Computer Science - Machine Learning ,Computer Science - Neural and Evolutionary Computing - Abstract
Combinatorial Optimization (CO) problems over graphs appear routinely in many applications such as in optimizing traffic, viral marketing in social networks, and matching for job allocation. Due to their combinatorial nature, these problems are often NP-hard. Existing approximation algorithms and heuristics rely on the search space to find the solutions and become time-consuming when this space is large. In this paper, we design a neural method called COMBHelper to reduce this space and thus improve the efficiency of the traditional CO algorithms based on node selection. Specifically, it employs a Graph Neural Network (GNN) to identify promising nodes for the solution set. This pruned search space is then fed to the traditional CO algorithms. COMBHelper also uses a Knowledge Distillation (KD) module and a problem-specific boosting module to bring further efficiency and efficacy. Our extensive experiments show that the traditional CO algorithms with COMBHelper are at least 2 times faster than their original versions.
- Published
- 2023
35. Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
- Author
-
Jiang, Chaoya, Xu, Haiyang, Dong, Mengfan, Chen, Jiaxing, Ye, Wei, Yan, Ming, Ye, Qinghao, Zhang, Ji, Huang, Fei, and Zhang, Shikun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Multi-modal large language models (MLLMs) have been shown to efficiently integrate natural language with visual information to handle multi-modal tasks. However, MLLMs still face a fundamental limitation of hallucinations, where they tend to generate erroneous or fabricated information. In this paper, we address hallucinations in MLLMs from a novel perspective of representation learning. We first analyzed the representation distribution of textual and visual tokens in MLLM, revealing two important findings: 1) there is a significant gap between textual and visual representations, indicating unsatisfactory cross-modal representation alignment; 2) representations of texts that contain and do not contain hallucinations are entangled, making it challenging to distinguish them. These two observations inspire us with a simple yet effective method to mitigate hallucinations. Specifically, we introduce contrastive learning into MLLMs and use text with hallucination as hard negative examples, naturally bringing representations of non-hallucinative text and visual samples closer while pushing way representations of non-hallucinating and hallucinative text. We evaluate our method quantitatively and qualitatively, showing its effectiveness in reducing hallucination occurrences and improving performance across multiple benchmarks. On the MMhal-Bench benchmark, our method obtains a 34.66% /29.5% improvement over the baseline MiniGPT-4/LLaVA. Our code is available on https://github.com/X-PLUG/mPLUG-HalOwl/tree/main/hacl.
- Published
- 2023
36. A Comprehensive Real-World Evaluation of 5G Improvements over 4G in Low- and Mid-Bands
- Author
-
Rochman, Muhammad Iqbal, Ye, Wei, Zhang, Zhi-Li, and Ghosh, Monisha
- Subjects
Computer Science - Networking and Internet Architecture - Abstract
As discussions around 6G begin, it is important to carefully quantify the spectral efficiency gains actually realized by deployed 5G networks as compared to 4G through various enhancements such as higher modulation, beamforming, and MIMO. This will inform the design of future cellular systems, especially in the mid-bands, which provide a good balance between bandwidth and propagation. Similar to 4G, 5G also utilizes low-band (<1 GHz) and mid-band spectrum (1 to 6 GHz), and hence comparing the performance of 4G and 5G in these bands will provide insights into how further improvements can be attained. In this work, we address a crucial question: is the performance boost in 5G compared to 4G primarily a result of increased bandwidth, or do the other enhancements play significant roles, and if so, under what circumstances? Hence, we conduct city-wide measurements of 4G and 5G cellular networks deployed in low- and mid-bands in Chicago and Minneapolis, and carefully quantify the contributions of different aspects of 5G advancements to its improved throughput performance. Our analyses show that (i) compared to 4G, the throughput improvement in 5G today is mainly influenced by the wider channel bandwidth, both from single channels and channel aggregation, (ii) in addition to wider channels, improved 5G throughput requires better signal conditions, which can be delivered by denser deployment and/or use of beamforming in mid-bands, (iii) the channel rank in real-world environments rarely supports the full 4 layers of 4x4 MIMO and (iv) advanced features such as MU-MIMO and higher order modulation such as 1024-QAM have yet to be widely deployed. These observations and conclusions lead one to consider designing the next generation of cellular systems to have wider channels, perhaps with improved channel aggregation, dense deployment with more beams.
- Published
- 2023
37. MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
- Author
-
Yu, Dingyao, Song, Kaitao, Lu, Peiling, He, Tianyu, Tan, Xu, Ye, Wei, Zhang, Shikun, and Bian, Jiang
- Subjects
Computer Science - Computation and Language ,Computer Science - Multimedia ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
AI-empowered music processing is a diverse field that encompasses dozens of tasks, ranging from generation tasks (e.g., timbre synthesis) to comprehension tasks (e.g., music classification). For developers and amateurs, it is very difficult to grasp all of these task to satisfy their requirements in music processing, especially considering the huge differences in the representations of music data and the model applicability across platforms among various tasks. Consequently, it is necessary to build a system to organize and integrate these tasks, and thus help practitioners to automatically analyze their demand and call suitable tools as solutions to fulfill their requirements. Inspired by the recent success of large language models (LLMs) in task automation, we develop a system, named MusicAgent, which integrates numerous music-related tools and an autonomous workflow to address user requirements. More specifically, we build 1) toolset that collects tools from diverse sources, including Hugging Face, GitHub, and Web API, etc. 2) an autonomous workflow empowered by LLMs (e.g., ChatGPT) to organize these tools and automatically decompose user requests into multiple sub-tasks and invoke corresponding music tools. The primary goal of this system is to free users from the intricacies of AI-music tools, enabling them to concentrate on the creative aspect. By granting users the freedom to effortlessly combine tools, the system offers a seamless and enriching music experience.
- Published
- 2023
38. Mid-Band 5G: A Measurement Study in Europe and US
- Author
-
Fezeu, Rostand A. K., Carpenter, Jason, Fiandrino, Claudio, Ramadan, Eman, Ye, Wei, Widmer, Joerg, Qian, Feng, and Zhang, Zhi-Li
- Subjects
Computer Science - Networking and Internet Architecture - Abstract
Fifth Generation (5G) mobile networks mark a significant shift from previous generations of networks. By introducing a flexible design, 5G networks support highly diverse application requirements. Currently, the landscape of previous measurement studies does not shed light on 5G network configuration and the inherent implications to application performance. In this paper, we precisely fill this gap and report our in-depth multi-country measurement study on 5G deployed at mid-bands. This is the common playground for U.S. and European carriers. Our findings reveal key aspects on how carriers configure their network, including spectrum utilization, frame configuration, resource allocation and their implication on the application performance., Comment: 18 pages, 36 figures
- Published
- 2023
39. Accuracy of nanopore sequencing as a diagnostic assay for pulmonary tuberculosis versus smear, culture and Xpert MTB/RIF: A head-to-head comparison
- Author
-
Yang, Juan, Ye, Wei, Zhang, Chao, Lin, Wenhong, Mei, Lin, Liu, Shengsheng, and Liu, Jie
- Published
- 2023
40. Numerical investigation of gas migration behaviour in saturated bentonite with consideration of temperature
- Author
-
Cui, Lin-Yong, Masum, Shakil A., Ye, Wei-Min, Thomas, Hywel R., Zhou, Chao, and Hu, Hong-Qiang
- Published
- 2024
- Full Text
- View/download PDF
41. Investigation on cavitating turbulent flow for the twisted NACA66 hydrofoil using a PANS model with helicity modification
- Author
-
Geng, Chen, Qian, Zhao-hui, Zheng, Ke-xin, Ye, Wei-xiang, and Luo, Xian-wu
- Published
- 2024
- Full Text
- View/download PDF
42. Integration of adsorption, reduction, and filtration in PANI/PVDF nanofiber composite membrane for removal of Cr(VI)
- Author
-
Liu, Hongyu, Ye, Wei, Zhang, Huan, Wang, Huicai, and Wei, Junfu
- Published
- 2024
- Full Text
- View/download PDF
43. Evaluating the quantum optimal biased bound in a unitary evolution process
- Author
-
Chang, Shoukang, Ye, Wei, Rao, Xuan, Zhang, Huan, Huang, Liqing, Luo, Mengmeng, Chen, Yuetao, Ma, Qiang, and Gao, Shaoyan
- Subjects
Quantum Physics - Abstract
Seeking the available precision limit of unknown parameters is a significant task in quantum parameter estimation. One often resorts to the widely utilized quantum Cramer-Rao bound (QCRB) based on unbiased estimators to finish this task. Nevertheless, most actual estimators are usually biased in the limited number of trials. For this reason, we introduce two effective error bounds for biased estimators based on a unitary evolution process in the framework of the quantum optimal biased bound. Furthermore, we show their estimation performance by two specific examples of the unitary evolution process, including the phase encoding and the SU(2) interferometer process. Our findings will provide an useful guidance for finding the precision limit of unknown parameters., Comment: 11 pages, 3 figures, welcome comments
- Published
- 2023
44. COPA: Efficient Vision-Language Pre-training Through Collaborative Object- and Patch-Text Alignment
- Author
-
Jiang, Chaoya, Xu, Haiyang, Ye, Wei, Ye, Qinghao, Li, Chenliang, Yan, Ming, Bi, Bin, Zhang, Shikun, Zhang, Ji, and Huang, Fei
- Subjects
Computer Science - Multimedia - Abstract
Vision-Language Pre-training (VLP) methods based on object detection enjoy the rich knowledge of fine-grained object-text alignment but at the cost of computationally expensive inference. Recent Visual-Transformer (ViT)-based approaches circumvent this issue while struggling with long visual sequences without detailed cross-modal alignment information. This paper introduces a ViT-based VLP technique that efficiently incorporates object information through a novel patch-text alignment mechanism. Specifically, we convert object-level signals into patch-level ones and devise a Patch-Text Alignment pre-training task (PTA) to learn a text-aware patch detector. By using off-the-shelf delicate object annotations in 5\% training images, we jointly train PTA with other conventional VLP objectives in an end-to-end manner, bypassing the high computational cost of object detection and yielding an effective patch detector that accurately detects text-relevant patches, thus considerably reducing patch sequences and accelerating computation within the ViT backbone. Our experiments on a variety of widely-used benchmarks reveal that our method achieves a speedup of nearly 88\% compared to prior VLP models while maintaining competitive or superior performance on downstream tasks with similar model size and data scale., Comment: Accepted on ACM MM2023
- Published
- 2023
45. Enhancing Visibility in Nighttime Haze Images Using Guided APSF and Gradient Adaptive Convolution
- Author
-
Jin, Yeying, Lin, Beibei, Yan, Wending, Yuan, Yuan, Ye, Wei, and Tan, Robby T.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Visibility in hazy nighttime scenes is frequently reduced by multiple factors, including low light, intense glow, light scattering, and the presence of multicolored light sources. Existing nighttime dehazing methods often struggle with handling glow or low-light conditions, resulting in either excessively dark visuals or unsuppressed glow outputs. In this paper, we enhance the visibility from a single nighttime haze image by suppressing glow and enhancing low-light regions. To handle glow effects, our framework learns from the rendered glow pairs. Specifically, a light source aware network is proposed to detect light sources of night images, followed by the APSF (Atmospheric Point Spread Function)-guided glow rendering. Our framework is then trained on the rendered images, resulting in glow suppression. Moreover, we utilize gradient-adaptive convolution, to capture edges and textures in hazy scenes. By leveraging extracted edges and textures, we enhance the contrast of the scene without losing important structural details. To boost low-light intensity, our network learns an attention map, then adjusted by gamma correction. This attention has high values on low-light regions and low values on haze and glow regions. Extensive evaluation on real nighttime haze images, demonstrates the effectiveness of our method. Our experiments demonstrate that our method achieves a PSNR of 30.38dB, outperforming state-of-the-art methods by 13% on GTA5 nighttime haze dataset. Our data and code is available at https://github.com/jinyeying/nighttime_dehaze., Comment: Accepted to ACM'MM2023, https://github.com/jinyeying/nighttime_dehaze
- Published
- 2023
46. BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization
- Author
-
Jiang, Chaoya, Xu, Haiyang, Ye, Wei, Ye, Qinghao, Li, Chenliang, Yan, Ming, Bi, Bin, Zhang, Shikun, Huang, Fei, and Huang, Songfang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Vision Transformer (ViT) based Vision-Language Pre-training (VLP) models have demonstrated impressive performance in various tasks. However, the lengthy visual token sequences fed into ViT can lead to training inefficiency and ineffectiveness. Existing efforts address the challenge by either bottom-level patch extraction in the ViT backbone or top-level patch abstraction outside, not balancing training efficiency and effectiveness well. Inspired by text summarization in natural language processing, we propose a Bottom-Up Patch Summarization approach named BUS, coordinating bottom-level extraction and top-level abstraction to learn a concise summary of lengthy visual token sequences efficiently. Specifically, We incorporate a Text-Semantics-Aware Patch Selector (TSPS) into the ViT backbone to perform a coarse-grained visual token extraction and then attach a flexible Transformer-based Patch Abstraction Decoder (PAD) upon the backbone for top-level visual abstraction. This bottom-up collaboration enables our BUS to yield high training efficiency while maintaining or even improving effectiveness. We evaluate our approach on various visual-language understanding and generation tasks and show competitive downstream task performance while boosting the training efficiency by 50\%. Additionally, our model achieves state-of-the-art performance on many downstream tasks by increasing input image resolution without increasing computational costs over baselines., Comment: Accepted on ICCV2023
- Published
- 2023
47. A Survey on Evaluation of Large Language Models
- Author
-
Chang, Yupeng, Wang, Xu, Wang, Jindong, Wu, Yuan, Yang, Linyi, Zhu, Kaijie, Chen, Hao, Yi, Xiaoyuan, Wang, Cunxiang, Wang, Yidong, Ye, Wei, Zhang, Yue, Chang, Yi, Yu, Philip S., Yang, Qiang, and Xie, Xing
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications. As LLMs continue to play a vital role in both research and daily use, their evaluation becomes increasingly critical, not only at the task level, but also at the society level for better understanding of their potential risks. Over the past years, significant efforts have been made to examine LLMs from various perspectives. This paper presents a comprehensive review of these evaluation methods for LLMs, focusing on three key dimensions: what to evaluate, where to evaluate, and how to evaluate. Firstly, we provide an overview from the perspective of evaluation tasks, encompassing general natural language processing tasks, reasoning, medical usage, ethics, educations, natural and social sciences, agent applications, and other areas. Secondly, we answer the `where' and `how' questions by diving into the evaluation methods and benchmarks, which serve as crucial components in assessing performance of LLMs. Then, we summarize the success and failure cases of LLMs in different tasks. Finally, we shed light on several future challenges that lie ahead in LLMs evaluation. Our aim is to offer invaluable insights to researchers in the realm of LLMs evaluation, thereby aiding the development of more proficient LLMs. Our key point is that evaluation should be treated as an essential discipline to better assist the development of LLMs. We consistently maintain the related open-source materials at: https://github.com/MLGroupJLU/LLM-eval-survey., Comment: Accepted by ACM Transactions on Intelligent Systems and Technology (TIST); 45 pages; More recent works; https://llm-eval.github.io/
- Published
- 2023
48. EmoGen: Eliminating Subjective Bias in Emotional Music Generation
- Author
-
Kang, Chenfei, Lu, Peiling, Yu, Botao, Tan, Xu, Ye, Wei, Zhang, Shikun, and Bian, Jiang
- Subjects
Computer Science - Sound ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Computer Science - Multimedia ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Music is used to convey emotions, and thus generating emotional music is important in automatic music generation. Previous work on emotional music generation directly uses annotated emotion labels as control signals, which suffers from subjective bias: different people may annotate different emotions on the same music, and one person may feel different emotions under different situations. Therefore, directly mapping emotion labels to music sequences in an end-to-end way would confuse the learning process and hinder the model from generating music with general emotions. In this paper, we propose EmoGen, an emotional music generation system that leverages a set of emotion-related music attributes as the bridge between emotion and music, and divides the generation into two stages: emotion-to-attribute mapping with supervised clustering, and attribute-to-music generation with self-supervised learning. Both stages are beneficial: in the first stage, the attribute values around the clustering center represent the general emotions of these samples, which help eliminate the impacts of the subjective bias of emotion labels; in the second stage, the generation is completely disentangled from emotion labels and thus free from the subjective bias. Both subjective and objective evaluations show that EmoGen outperforms previous methods on emotion control accuracy and music quality respectively, which demonstrate our superiority in generating emotional music. Music samples generated by EmoGen are available via this link:https://ai-muzic.github.io/emogen/, and the code is available at this link:https://github.com/microsoft/muzic/., Comment: 12 pages, 7 pages
- Published
- 2023
49. Gas diffusion property of compacted GMZ bentonite tested under different boundary conditions considering saturation and gas pressure
- Author
-
Ji, Yu-Heng, Ye, Wei-Min, Lu, Pu-Huai, Wang, Qiong, and Chen, Yong-Gui
- Published
- 2024
- Full Text
- View/download PDF
50. Exploiting Pseudo Future Contexts for Emotion Recognition in Conversations
- Author
-
Wei, Yinyi, Liu, Shuaipeng, Yan, Hailei, Ye, Wei, Mo, Tong, and Wan, Guanglu
- Subjects
Computer Science - Computation and Language - Abstract
With the extensive accumulation of conversational data on the Internet, emotion recognition in conversations (ERC) has received increasing attention. Previous efforts of this task mainly focus on leveraging contextual and speaker-specific features, or integrating heterogeneous external commonsense knowledge. Among them, some heavily rely on future contexts, which, however, are not always available in real-life scenarios. This fact inspires us to generate pseudo future contexts to improve ERC. Specifically, for an utterance, we generate its future context with pre-trained language models, potentially containing extra beneficial knowledge in a conversational form homogeneous with the historical ones. These characteristics make pseudo future contexts easily fused with historical contexts and historical speaker-specific contexts, yielding a conceptually simple framework systematically integrating multi-contexts. Experimental results on four ERC datasets demonstrate our method's superiority. Further in-depth analyses reveal that pseudo future contexts can rival real ones to some extent, especially in relatively context-independent conversations., Comment: 15 pages, accepted by ADMA 2023
- Published
- 2023
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.