173,107 results on '"LI, Wei"'
Search Results
2. Resurging Asia and Highly Skilled International Migration
- Author
-
Li, Wei and Yu, Wan
- Published
- 2022
3. Recent Development of Alternating Current Field Measurement Combine with New Technology
- Author
-
Yuan, Xin'an, Li, Wei, Zhao, Jianming, Yin, Xiaokang, Li, Xiao, and Zhao, Jianchao
- Subjects
Alternating current field measurement ,Multi frequency detection ,RACFM technology ,Multifrequency ACFM ,Visualization method in ACFM ,thema EDItEUR::T Technology, Engineering, Agriculture, Industrial processes::TN Civil engineering, surveying and building::TNC Structural engineering ,thema EDItEUR::R Earth Sciences, Geography, Environment, Planning::RB Earth sciences::RBG Geology, geomorphology and the lithosphere::RBGG Petrology, petrography and mineralogy - Abstract
This open access book can be divided into three parts. In part 1, three articles are employed to introduce the RACFM technology. In part 2, two articles are introduced to explain the Multifrequency ACFM. In part 3, three articles are introduced to explain the visualization research in ACFM. With the development of ACFM detection technology, traditional single excitation frequency and single direction excitation structures cannot meet the requirements of multiple types of defect detection (such as cracks at different angles, and buried defects). New types of excitation structures and methods have been proposed, mainly including rotating electromagnetic field detection, multi-frequency detection, and defects visual algorithm. The changes in the excitation structure and signal mentioned above have expanded the scope of application of ACFM detection and provided opportunities for the cross-integration and innovation of ACFM detection technology with other advanced detection methods. This book mainly focuses on the study of the rotating alternating current field measurement (RACFM), the multifrequency ACFM, and the visualization method in ACFM.
- Published
- 2024
- Full Text
- View/download PDF
4. Speaking Anxiety: Facilitator or Hindrance to Postgraduates' Thesis Defense Performance
- Author
-
Li-Wei Wei
- Abstract
Foreign language anxiety (FLA) is a well-documented phenomenon that can significantly affect academic performance. This study examined the extent to which Taiwanese postgraduate students experienced speaking anxiety during thesis defense presentations, along with potential gender differences and variations across postgraduate programs. It specifically analyzed the correlation between anxiety and thesis presentation performance. A meticulously designed study involving 168 Taiwanese master's students in an oral thesis defense seminar employed a modified Academic Performance Scale and Personal Report of Public Speaking Anxiety instruments to quantitatively evaluate anxiety levels. Statistical analyses unveiled significant associations between anxiety levels and academic performance. The results indicate a high level of anxiety among participants, with a mean anxiety level of 4.05 (N=168, X=2.85) and a moderate level of thesis presentation performance, evidenced by a mean score of 2.85 (N=168, X=2.85). Notably, female postgraduates exhibited higher anxiety levels than their male counterparts. The study identifies a positive, albeit modest, correlation between anxiety and performance, suggesting that a certain level of anxiety may enhance performance. The findings underscore the pervasive influence of anxiety in academic contexts and highlight gender disparities and the impact of diverse postgraduate programs on anxiety and performance. The study challenges conventional assumptions about the negative effects of anxiety on performance, suggesting that moderate anxiety can be a motivating catalyst. This study contributes to a more nuanced understanding of the role of anxiety in learning and performance and prompts the development of targeted interventions to address anxiety and support postgraduate students' academic success.
- Published
- 2024
5. Correlational Analysis of the Interplay among Academic Anxiety, Emotional Intelligence Management, and Academic Resilience
- Author
-
Li-Wei Wei and Ying-Chao Song
- Abstract
This study examines the interplay between academic anxiety, emotional intelligence management, and academic resilience in Chinese international postgraduate students in Thailand. Using a correlational design and a sample of 353 valid participants, the study employed the Weighted Emotional Intelligence Scale (WEIS), Academic Anxiety Scale (AAS), and Academic Resilience Scale-30 (ARS). Contrary to expectations, the analysis revealed no significant differences in academic anxiety, emotional intelligence management, or academic resilience across demographic cohorts (gender, academic major, and occupation). Weak and non-significant correlations were also observed between academic anxiety, emotional intelligence management, and academic resilience. These findings challenge assumptions about demographic influences on these constructs and suggest a broader challenge for international students. Despite the prevalence of academic anxiety and deficiencies in emotional intelligence management and resilience, these constructs were not influenced by demographic factors. The study highlights the importance of holistic educational approaches that prioritize cultural and contextual factors and underscores the need for further research to unravel the complex dynamics of academic anxiety, emotional intelligence, and resilience.
- Published
- 2024
6. VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka
- Author
-
Chen, Li-Wei, Lee, Hung-Shin, and Chang, Chen-Chi
- Subjects
Computer Science - Sound ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
This paper introduces VoxHakka, a text-to-speech (TTS) system designed for Taiwanese Hakka, a critically under-resourced language spoken in Taiwan. Leveraging the YourTTS framework, VoxHakka achieves high naturalness and accuracy and low real-time factor in speech synthesis while supporting six distinct Hakka dialects. This is achieved by training the model with dialect-specific data, allowing for the generation of speaker-aware Hakka speech. To address the scarcity of publicly available Hakka speech corpora, we employed a cost-effective approach utilizing a web scraping pipeline coupled with automatic speech recognition (ASR)-based data cleaning techniques. This process ensured the acquisition of a high-quality, multi-speaker, multi-dialect dataset suitable for TTS training. Subjective listening tests conducted using comparative mean opinion scores (CMOS) demonstrate that VoxHakka significantly outperforms existing publicly available Hakka TTS systems in terms of pronunciation accuracy, tone correctness, and overall naturalness. This work represents a significant advancement in Hakka language technology and provides a valuable resource for language preservation and revitalization efforts., Comment: Submitted to O-COCOSDA 2024
- Published
- 2024
7. Effective Noise-aware Data Simulation for Domain-adaptive Speech Enhancement Leveraging Dynamic Stochastic Perturbation
- Author
-
Wang, Chien-Chun, Chen, Li-Wei, Lee, Hung-Shin, Chen, Berlin, and Wang, Hsin-Min
- Subjects
Computer Science - Sound ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Cross-domain speech enhancement (SE) is often faced with severe challenges due to the scarcity of noise and background information in an unseen target domain, leading to a mismatch between training and test conditions. This study puts forward a novel data simulation method to address this issue, leveraging noise-extractive techniques and generative adversarial networks (GANs) with only limited target noisy speech data. Notably, our method employs a noise encoder to extract noise embeddings from target-domain data. These embeddings aptly guide the generator to synthesize utterances acoustically fitted to the target domain while authentically preserving the phonetic content of the input clean speech. Furthermore, we introduce the notion of dynamic stochastic perturbation, which can inject controlled perturbations into the noise embeddings during inference, thereby enabling the model to generalize well to unseen noise conditions. Experiments on the VoiceBank-DEMAND benchmark dataset demonstrate that our domain-adaptive SE method outperforms an existing strong baseline based on data simulation., Comment: Accepted to IEEE SLT 2024
- Published
- 2024
8. Nonparametric Estimation of Path-specific Effects in Presence of Nonignorable Missing Covariates
- Author
-
Shan, Jiawei, Wang, Ting, Li, Wei, and Ai, Chunrong
- Subjects
Statistics - Methodology - Abstract
The path-specific effect (PSE) is of primary interest in mediation analysis when multiple intermediate variables between treatment and outcome are observed, as it can isolate the specific effect through each mediator, thus mitigating potential bias arising from other intermediate variables serving as mediator-outcome confounders. However, estimation and inference of PSE become challenging in the presence of nonignorable missing covariates, a situation particularly common in epidemiological research involving sensitive patient information. In this paper, we propose a fully nonparametric methodology to address this challenge. We establish identification for PSE by expressing it as a functional of observed data and demonstrate that the associated nuisance functions can be uniquely determined through sequential optimization problems by leveraging a shadow variable. Then we propose a sieve-based regression imputation approach for estimation. We establish the large-sample theory for the proposed estimator, and introduce a robust and efficient approach to make inference for PSE. The proposed method is applied to the NHANES dataset to investigate the mediation roles of dyslipidemia and obesity in the pathway from Type 2 diabetes mellitus to cardiovascular disease., Comment: 37 pages, 6 figures
- Published
- 2024
9. BackFlip: The Impact of Local and Global Data Augmentations on Artistic Image Aesthetic Assessment
- Author
-
Strafforello, Ombretta, Odriozola, Gonzalo Muradas, Behrad, Fatemeh, Chen, Li-Wei, Maerten, Anne-Sofie, Soydaner, Derya, and Wagemans, Johan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Assessing the aesthetic quality of artistic images presents unique challenges due to the subjective nature of aesthetics and the complex visual characteristics inherent to artworks. Basic data augmentation techniques commonly applied to natural images in computer vision may not be suitable for art images in aesthetic evaluation tasks, as they can change the composition of the art images. In this paper, we explore the impact of local and global data augmentation techniques on artistic image aesthetic assessment (IAA). We introduce BackFlip, a local data augmentation technique designed specifically for artistic IAA. We evaluate the performance of BackFlip across three artistic image datasets and four neural network architectures, comparing it with the commonly used data augmentation techniques. Then, we analyze the effects of components within the BackFlip pipeline through an ablation study. Our findings demonstrate that local augmentations, such as BackFlip, tend to outperform global augmentations on artistic IAA in most cases, probably because they do not perturb the composition of the art images. These results emphasize the importance of considering both local and global augmentations in future computational aesthetics research., Comment: Published at the VISART VII workshop at ECCV 2024. Ombretta Strafforello, Gonzalo Muradas Odriozola, Fatemeh Behrad, Li-Wei Chen, Anne-Sofie Maerten and Derya Soydaner contributed equally to this work
- Published
- 2024
10. Design and Performance of the ALPS II Regeneration Cavity
- Author
-
Kozlowski, Todd, Wei, Li-Wei, Spector, Aaron D., Hallal, Ayman, Fraedrich, Henry, Brotherton, Daniel C., Oceano, Isabella, Ejlli, Aldo, Grote, Hartmut, Hollis, Harold, Karan, Kanioar, Mueller, Guido, Tanner, D. B., Willke, Benno, and Lindner, Axel
- Subjects
Physics - Optics - Abstract
The Regeneration Cavity (RC) is a critical component of the Any Light Particle Search II (ALPS II) experiment. It increases the signal from possible axions and axion-like particles in the experiment by nearly four orders of magnitude. The total round-trip optical losses of the power circulating in the cavity must be minimized in order to maximize the resonant enhancement of the cavity, which is an important figure of merit for ALPS II. Lower optical losses also increase the cavity storage time and with the 123 meter long ALPS II RC we have demonstrated the longest storage time of a two-mirror optical cavity. We measured a storage time of $7.17 \pm 0.01$ ms, equivalent to a linewidth of 44.4 Hz and a finesse of 27,500 at a wavelength of 1064 nm., Comment: 16 pages, 8 figures, 1 table
- Published
- 2024
11. BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking
- Author
-
Wang, Hanzheng, Li, Wei, Xia, Xiang-Gen, and Du, Qian
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Hyperspectral object tracking (HOT) has exhibited potential in various applications, particularly in scenes where objects are camouflaged. Existing trackers can effectively retrieve objects via band regrouping because of the bias in existing HOT datasets, where most objects tend to have distinguishing visual appearances rather than spectral characteristics. This bias allows the tracker to directly use the visual features obtained from the false-color images generated by hyperspectral images without the need to extract spectral features. To tackle this bias, we find that the tracker should focus on the spectral information when object appearance is unreliable. Thus, we provide a new task called hyperspectral camouflaged object tracking (HCOT) and meticulously construct a large-scale HCOT dataset, termed BihoT, which consists of 41,912 hyperspectral images covering 49 video sequences. The dataset covers various artificial camouflage scenes where objects have similar appearances, diverse spectrums, and frequent occlusion, making it a very challenging dataset for HCOT. Besides, a simple but effective baseline model, named spectral prompt-based distractor-aware network (SPDAN), is proposed, comprising a spectral embedding network (SEN), a spectral prompt-based backbone network (SPBN), and a distractor-aware module (DAM). Specifically, the SEN extracts spectral-spatial features via 3-D and 2-D convolutions. Then, the SPBN fine-tunes powerful RGB trackers with spectral prompts and alleviates the insufficiency of training samples. Moreover, the DAM utilizes a novel statistic to capture the distractor caused by occlusion from objects and background. Extensive experiments demonstrate that our proposed SPDAN achieves state-of-the-art performance on the proposed BihoT and other HOT datasets.
- Published
- 2024
12. Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation
- Author
-
Han, Jiawei, Liu, Kaiqi, Li, Wei, and Chen, Guangzhi
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Point cloud semantic segmentation can significantly enhance the perception of an intelligent agent. Nevertheless, the discriminative capability of the segmentation network is influenced by the quantity of samples available for different categories. To mitigate the cognitive bias induced by class imbalance, this paper introduces a novel method, namely subspace prototype guidance (\textbf{SPG}), to guide the training of segmentation network. Specifically, the point cloud is initially separated into independent point sets by category to provide initial conditions for the generation of feature subspaces. The auxiliary branch which consists of an encoder and a projection head maps these point sets into separate feature subspaces. Subsequently, the feature prototypes which are extracted from the current separate subspaces and then combined with prototypes of historical subspaces guide the feature space of main branch to enhance the discriminability of features of minority categories. The prototypes derived from the feature space of main branch are also employed to guide the training of the auxiliary branch, forming a supervisory loop to maintain consistent convergence of the entire network. The experiments conducted on the large public benchmarks (i.e. S3DIS, ScanNet v2, ScanNet200, Toronto-3D) and collected real-world data illustrate that the proposed method significantly improves the segmentation performance and surpasses the state-of-the-art method. The code is available at \url{https://github.com/Javion11/PointLiBR.git}.
- Published
- 2024
13. Deep Code Search with Naming-Agnostic Contrastive Multi-View Learning
- Author
-
Feng, Jiadong, Li, Wei, Wei, Zhao, Xu, Yong, Wang, Juhong, and Li, Hui
- Subjects
Computer Science - Information Retrieval ,Computer Science - Software Engineering - Abstract
Software development is a repetitive task, as developers usually reuse or get inspiration from existing implementations. Code search, which refers to the retrieval of relevant code snippets from a codebase according to the developer's intent that has been expressed as a query, has become increasingly important in the software development process. Due to the success of deep learning in various applications, a great number of deep learning based code search approaches have sprung up and achieved promising results. However, developers may not follow the same naming conventions and the same variable may have different variable names in different implementations, bringing a challenge to deep learning based code search methods that rely on explicit variable correspondences to understand source code. To overcome this challenge, we propose a naming-agnostic code search method (NACS) based on contrastive multi-view code representation learning. NACS strips information bound to variable names from Abstract Syntax Tree (AST), the representation of the abstract syntactic structure of source code, and focuses on capturing intrinsic properties solely from AST structures. We use semantic-level and syntax-level augmentation techniques to prepare realistically rational data and adopt contrastive learning to design a graph-view modeling component in NACS to enhance the understanding of code snippets. We further model ASTs in a path view to strengthen the graph-view modeling component through multi-view learning. Extensive experiments show that NACS provides superior code search performance compared to baselines and NACS can be adapted to help existing code search methods overcome the impact of different naming conventions.
- Published
- 2024
14. Characterization of Intensity Correlation via Single-photon Detection in Quantum Key Distribution
- Author
-
Xing, Tianyi, Liu, Junxuan, Zhang, Likang, Wang, Min-Yan, Li, Yu-Huai, Liu, Ruiyin, Peng, Qingquan, Wang, Dongyang, Wang, Yaxuan, Liu, Hongwei, Li, Wei, Cao, Yuan, and Huang, Anqi
- Subjects
Quantum Physics - Abstract
One of the most significant vulnerabilities in the source unit of quantum key distribution (QKD) is the correlation between quantum states after modulation, which shall be characterized and evaluated for its practical security performance. In this work, we propose a methodology to characterize the intensity correlation according to the single-photon detection results in the measurement unit without modifying the configuration of the QKD system. In contrast to the previous research that employs extra classical optical detector to measure the correlation, our method can directly analyse the detection data generated during the raw key exchange, enabling to characterize the feature of correlation in real-time system operation. The basic method is applied to a BB84 QKD system and the characterized correlation decreases the secure key rate shown by the security proof. Furthermore, the method is extended and applied to characterize the correlation from the result of Bell-state measurement, which demonstrates its applicability to a running full-scheme MDI QKD system. This study provides an approach for standard certification of a QKD system.
- Published
- 2024
15. Beyond Inter-Item Relations: Dynamic Adaptive Mixture-of-Experts for LLM-Based Sequential Recommendation
- Author
-
Liu, CanYi, Li, Wei, Youchen, Zhang, Li, Hui, and Ji, Rongrong
- Subjects
Computer Science - Information Retrieval - Abstract
Sequential recommender system (SRS) predicts the next items that users may prefer based on user historical interaction sequences. Inspired by the rise of large language models (LLMs) in various AI applications, there is a surge of work on LLM-based SRS. Despite their attractive performance, existing LLM-based SRS still exhibit some limitations, including neglecting intra-item relations, ignoring long-term collaborative knowledge and using inflexible architecture designs for adaption. To alleviate these issues, we propose an LLM-based SRS named MixRec. Built on top of coarse-grained adaption for capturing inter-item relations, MixRec is further enhanced with (1) context masking that models intra-item relations to help LLM better understand token and item semantics in the context of SRS, (2) collaborative knowledge injection that helps LLM incorporate long-term collaborative knowledge, and (3) a dynamic adaptive mixture-of-experts design that can flexibly choose expert architectures based on Bayesian optimization to better incorporate different sequential information. Extensive experiments demonstrate that MixRec can effectively handle sequential recommendation in a dynamic and adaptive manner., Comment: 11 pages, 14 figures
- Published
- 2024
16. Coupling Between Local and Global Oscillations in Palladium-Catalysed Methane Oxidation
- Author
-
Hu, Yuxiong, Hu, Jianyu, Sun, Mengzhao, Li, Aowen, Shi, Shucheng, Hu, P. J., Zhou, Wu, Willinger, Marc-Georg, Zhou, Dan, Liu, Zhi, Liu, Xi, Li, Wei-Xue, and Wang, Zhu-Jun
- Subjects
Physics - Chemical Physics ,Nonlinear Sciences - Adaptation and Self-Organizing Systems ,Nonlinear Sciences - Chaotic Dynamics ,Physics - Applied Physics - Abstract
The interplay between order and disorder is crucial across various fields, especially in understanding oscillatory phenomena. Periodic oscillations are frequently observed in heterogeneous catalysis, yet their underlying mechanisms need deeper exploration. Here, we investigate how periodic oscillations arise during methane oxidation catalysed by palladium nanoparticles (Pd NPs), utilizing a suite of complementary operando techniques across various spatial scales. We found that reaction intensity and collective dynamic modes can be tuned by the reactant gas-flow rate. At lower gas-flow rates, we observed periodic facet reconstruction of Pd NPs correlated with repeated bubbling behaviour at the Pd/PdO interface, without evident global oscillatory responses. Conversely, at higher gas-flow rates, Pd NPs undergo chaotic transformations between metallic and oxidized states, resulting in overall oscillation. Integrating our observations at different gas-flow rates, we attributed the emergence of global oscillation to thermal coupling regulated by gas flow and connected local and global dynamics through a weak synchronization mechanism. This work demonstrates the correlations between open surfaces and interfaces, chaos and regularity, and dissipative processes and coupling behaviour. Our findings offer critical insights into the complexity behind catalytic oscillations and provide guidance for modulating oscillatory behaviours in catalytic processes, with significant implications for both science and industry.
- Published
- 2024
17. ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
- Author
-
Li, Junxian, Zhang, Di, Wang, Xunzhi, Hao, Zeying, Lei, Jingdi, Tan, Qian, Zhou, Cai, Liu, Wei, Yang, Yaotian, Xiong, Xinrui, Wang, Weiyun, Chen, Zhe, Wang, Wenhai, Li, Wei, Zhang, Shufei, Su, Mao, Ouyang, Wanli, Li, Yuqiang, and Zhou, Dongzhan
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Large Language Models (LLMs) have achieved remarkable success and have been applied across various scientific fields, including chemistry. However, many chemical tasks require the processing of visual information, which cannot be successfully handled by existing chemical LLMs. This brings a growing need for models capable of integrating multimodal information in the chemical domain. In this paper, we introduce \textbf{ChemVLM}, an open-source chemical multimodal large language model specifically designed for chemical applications. ChemVLM is trained on a carefully curated bilingual multimodal dataset that enhances its ability to understand both textual and visual chemical information, including molecular structures, reactions, and chemistry examination questions. We develop three datasets for comprehensive evaluation, tailored to Chemical Optical Character Recognition (OCR), Multimodal Chemical Reasoning (MMCR), and Multimodal Molecule Understanding tasks. We benchmark ChemVLM against a range of open-source and proprietary multimodal large language models on various tasks. Experimental results demonstrate that ChemVLM achieves competitive performance across all evaluated tasks. Our model can be found at https://huggingface.co/AI4Chem/ChemVLM-26B., Comment: 11 pages, updated version
- Published
- 2024
18. DeepInteraction++: Multi-Modality Interaction for Autonomous Driving
- Author
-
Yang, Zeyu, Song, Nan, Li, Wei, Zhu, Xiatian, Zhang, Li, and Torr, Philip H. S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Existing top-performance autonomous driving systems typically rely on the multi-modal fusion strategy for reliable scene understanding. This design is however fundamentally restricted due to overlooking the modality-specific strengths and finally hampering the model performance. To address this limitation, in this work, we introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout, enabling their unique characteristics to be exploited during the whole perception pipeline. To demonstrate the effectiveness of the proposed strategy, we design DeepInteraction++, a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder. Specifically, the encoder is implemented as a dual-stream Transformer with specialized attention operation for information exchange and integration between separate modality-specific representations. Our multi-modal representational learning incorporates both object-centric, precise sampling-based feature alignment and global dense information spreading, essential for the more challenging planning task. The decoder is designed to iteratively refine the predictions by alternately aggregating information from separate representations in a unified modality-agnostic manner, realizing multi-modal predictive interaction. Extensive experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks. Our code is available at https://github.com/fudan-zvg/DeepInteraction., Comment: Journal extension of NeurIPS 2022. arXiv admin note: text overlap with arXiv:2208.11112
- Published
- 2024
19. Respiratory Subtraction for Pulmonary Microwave Ablation Evaluation
- Author
-
Li, Wan, Zhong, Xinyun, Li, Wei, Zhang, Song, Rong, Moheng, Xi, Yan, Yuan, Peng, Wang, Zechen, Jiang, Xiaolei, Yi, Rongxi, Tang, Hui, Chen, Yang, Tong, Chaohui, Wu, Zhan, and Wang, Feng
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Currently, lung cancer is a leading cause of global cancer mortality, often necessitating minimally invasive interventions. Microwave ablation (MWA) is extensively utilized for both primary and secondary lung tumors. Although numerous clinical guidelines and standards for MWA have been established, the clinical evaluation of ablation surgery remains challenging and requires long-term patient follow-up for confirmation. In this paper, we propose a method termed respiratory subtraction to evaluate lung tumor ablation therapy performance based on pre- and post-operative image guidance. Initially, preoperative images undergo coarse rigid registration to their corresponding postoperative positions, followed by further non-rigid registration. Subsequently, subtraction images are generated by subtracting the registered preoperative images from the postoperative ones. Furthermore, to enhance the clinical assessment of MWA treatment performance, we devise a quantitative analysis metric to evaluate ablation efficacy by comparing differences between tumor areas and treatment areas. To the best of our knowledge, this is the pioneering work in the field to facilitate the assessment of MWA surgery performance on pulmonary tumors. Extensive experiments involving 35 clinical cases further validate the efficacy of the respiratory subtraction method. The experimental results confirm the effectiveness of the respiratory subtraction method and the proposed quantitative evaluation metric in assessing lung tumor treatment.
- Published
- 2024
20. GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI
- Author
-
Chen, Pengcheng, Ye, Jin, Wang, Guoan, Li, Yanjun, Deng, Zhongying, Li, Wei, Li, Tianbin, Duan, Haodong, Huang, Ziyan, Su, Yanzhou, Wang, Benyou, Zhang, Shaoting, Fu, Bin, Cai, Jianfei, Zhuang, Bohan, Seibel, Eric J, He, Junjun, and Qiao, Yu
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals, and can be applied in various fields. In the medical field, LVLMs have a high potential to offer substantial assistance for diagnosis and treatment. Before that, it is crucial to develop benchmarks to evaluate LVLMs' effectiveness in various medical applications. Current benchmarks are often built upon specific academic literature, mainly focusing on a single domain, and lacking varying perceptual granularities. Thus, they face specific challenges, including limited clinical relevance, incomplete evaluations, and insufficient guidance for interactive LVLMs. To address these limitations, we developed the GMAI-MMBench, the most comprehensive general medical AI benchmark with well-categorized data structure and multi-perceptual granularity to date. It is constructed from 285 datasets across 39 medical image modalities, 18 clinical-related tasks, 18 departments, and 4 perceptual granularities in a Visual Question Answering (VQA) format. Additionally, we implemented a lexical tree structure that allows users to customize evaluation tasks, accommodating various assessment needs and substantially supporting medical AI research and applications. We evaluated 50 LVLMs, and the results show that even the advanced GPT-4o only achieves an accuracy of 52%, indicating significant room for improvement. Moreover, we identified five key insufficiencies in current cutting-edge LVLMs that need to be addressed to advance the development of better medical applications. We believe that GMAI-MMBench will stimulate the community to build the next generation of LVLMs toward GMAI.
- Published
- 2024
21. MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation
- Author
-
Mao, Xiaofeng, Jiang, Zhengkai, Wang, Qilin, Fu, Chencan, Zhang, Jiangning, Wu, Jiafu, Wang, Yabiao, Wang, Chengjie, Li, Wei, and Chi, Mingmin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent advancements in the field of Diffusion Transformers have substantially improved the generation of high-quality 2D images, 3D videos, and 3D shapes. However, the effectiveness of the Transformer architecture in the domain of co-speech gesture generation remains relatively unexplored, as prior methodologies have predominantly employed the Convolutional Neural Network (CNNs) or simple a few transformer layers. In an attempt to bridge this research gap, we introduce a novel Masked Diffusion Transformer for co-speech gesture generation, referred to as MDT-A2G, which directly implements the denoising process on gesture sequences. To enhance the contextual reasoning capability of temporally aligned speech-driven gestures, we incorporate a novel Masked Diffusion Transformer. This model employs a mask modeling scheme specifically designed to strengthen temporal relation learning among sequence gestures, thereby expediting the learning process and leading to coherent and realistic motions. Apart from audio, Our MDT-A2G model also integrates multi-modal information, encompassing text, emotion, and identity. Furthermore, we propose an efficient inference strategy that diminishes the denoising computation by leveraging previously calculated results, thereby achieving a speedup with negligible performance degradation. Experimental results demonstrate that MDT-A2G excels in gesture generation, boasting a learning speed that is over 6$\times$ faster than traditional diffusion transformers and an inference speed that is 5.7$\times$ than the standard diffusion model.
- Published
- 2024
- Full Text
- View/download PDF
22. $\eta_{_{c2}}(^1D_{_2})$ and its electromagnetic decays
- Author
-
Du, Xin-Yao, Pe, Su-Yan, Li, Wei, Jia, Man, Li, Qiang, Wang, Tianhong, and Wang, Guo-Li
- Subjects
High Energy Physics - Phenomenology ,High Energy Physics - Experiment - Abstract
The spin-singlet state $\eta_{_{c2}}(^1D_{_2})$ has not been discovered in experiment and it is the only missing low-excited $D$-wave charmonium, so in this paper, we like to study its properties. Using the Bethe-Salpeter equation method, we obtain its mass as $3828.2$ MeV and its electromagnetic decay widths as $\Gamma[\eta_{_{c2}}(1D)\rightarrow h_{_{c}}(1P)\gamma]=284$ keV, $\Gamma[\eta_{_{c2}}(1D)\rightarrow J/\psi\gamma]=1.04$ keV, $\Gamma[\eta_{_{c2}}(1D)\rightarrow\psi(2S)\gamma]=3.08$ eV, and $\Gamma[\eta_{_{c2}}(1D)\rightarrow\psi(3770)\gamma]=0.143$ keV. We estimate its full width to be about $366$ keV, and point out that the electromagnetic decay partial width is very sensitive to its mass and show the variation of the width along with the mass in the range of $3800\sim3872$ MeV. In our calculation, the emphasis is put on the relativistic corrections. Our results show that $\eta_{_{c2}}\rightarrow h_{_{c}}\gamma$ is the non-relativistic $E1$ transition dominated $E1+M2+E3$ decay, and $\eta_{_{c2}}\rightarrow \psi\gamma$ is the $M1+E2+M3+E4$ decay but the relativistic $E2$ transition contributes the most., Comment: 20 pages, 4 figures, 6 tables
- Published
- 2024
23. Magnetocaloric Effect of Topological Excitations in Kitaev Magnets
- Author
-
Li, Han, Lv, Enze, Xi, Ning, Gao, Yuan, Qi, Yang, Li, Wei, and Su, Gang
- Subjects
Condensed Matter - Strongly Correlated Electrons ,Condensed Matter - Statistical Mechanics - Abstract
Traditional magnetic sub-Kelvin cooling relies on the nearly free local moments in hydrate paramagnetic salts, whose utility is hampered by the dilute magnetic ions and low thermal conductivity. Here we propose to use instead fractional excitations inherent to quantum spin liquids (QSLs) as an alternative, which are sensitive to external fields and can induce a very distinctive magnetocaloric effect. With state-of-the-art tensor-network approach, we compute low-temperature properties of Kitaev honeycomb model. For the ferromagnetic case, strong demagnetization cooling effect is observed due to the nearly free $Z_2$ vortices via spin fractionalization, described by a paramagnetic equation of state with a renormalized Curie constant. For the antiferromagnetic Kitaev case, we uncover an intermediate-field gapless QSL phase with very large spin entropy, possibly due to the emergence of spinon Fermi surface. Potential realization of topological excitation cooling in Kitaev materials is also discussed, which may offer a promising pathway to circumvent existing limitations in the paramagnetic hydrates., Comment: 10 pages, 4 figures; supplementary materials; to appear in Nat. Commun. (2024)
- Published
- 2024
24. Emergent quantum disordered phase in Na$_2$Co$_2$TeO$_6$ under intermediate magnetic field along $c$ axis
- Author
-
Zhou, Xu-Guang, Li, Han, Kim, Chaebin, Matsuo, Akira, Mehlawat, Kavita, Matsui, Kazuki, Yang, Zhuo, Miyata, Atsuhiko, Su, Gang, Kindo, Koichi, Park, Je-Geun, Kohama, Yoshimitsu, Li, Wei, and Matsuda, Yasuhiro H.
- Subjects
Condensed Matter - Strongly Correlated Electrons - Abstract
Identifying the exotic quantum spin liquid phase in Kitaev magnets has garnered great research interests and remains a significant challenge. In experiments, most of the proposed candidate materials exhibit an antiferromagnetic (AFM) order at low temperatures, thus the challenge transforms into the searching for a field-driven disordered phase that is distinct from the partially polarized paramagnetic phase after suppressing the AFM order. Recently, Na$_2$Co$_2$TeO$_6$ has been proposed as one of the prime candidates, where the Kitaev interaction is realized by the high-spin $t^{5}_{2g}e^2_g$ configuration, and spin-orbit entangled $J_{\rm eff} = 1/2$ state in a bond-edge shared honeycomb lattice. In this study, we identify an emergent intermediate disordered phase induced by an external field along the $c$-axis of the honeycomb plane. This phase is characterized through magnetization and magnetocaloric effect experiments in high magnetic fields. To explain the experimental results, we propose an effective spin model with large AFM Kitaev interaction, which yields results in good agreement with both our findings and previously reported data. We determine that the effective $K$-$J$-$\Gamma$-$\Gamma'$ model for Na$_2$Co$_2$TeO$_6$ is nearly dual to that of $\alpha$-RuCl$_3$ under an unitary transformation. Given the insignificant fragility of Na$_2$Co$_2$TeO$_6$ sample, further high-field experiments can be conducted to explore this intermediate-field quantum spin disordered phase., Comment: 12 pages, 8 figures
- Published
- 2024
25. MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code
- Author
-
Ning, Kaiwen, Chen, Jiachi, Zhong, Qingyuan, Zhang, Tao, Wang, Yanlin, Li, Wei, Zhang, Yu, Zhang, Weizhe, and Zheng, Zibin
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Software Engineering - Abstract
With the advent of large language models (LLMs), numerous software service providers (SSPs) are dedicated to developing LLMs customized for code generation tasks, such as CodeLlama and Copilot. However, these LLMs can be leveraged by attackers to create malicious software, which may pose potential threats to the software ecosystem. For example, they can automate the creation of advanced phishing malware. To address this issue, we first conduct an empirical study and design a prompt dataset, MCGTest, which involves approximately 400 person-hours of work and consists of 406 malicious code generation tasks. Utilizing this dataset, we propose MCGMark, the first robust, code structure-aware, and encodable watermarking approach to trace LLM-generated code. We embed encodable information by controlling the token selection and ensuring the output quality based on probabilistic outliers. Additionally, we enhance the robustness of the watermark by considering the structural features of malicious code, preventing the embedding of the watermark in easily modified positions, such as comments. We validate the effectiveness and robustness of MCGMark on the DeepSeek-Coder. MCGMark achieves an embedding success rate of 88.9% within a maximum output limit of 400 tokens. Furthermore, it also demonstrates strong robustness and has minimal impact on the quality of the output code. Our approach assists SSPs in tracing and holding responsible parties accountable for malicious code generated by LLMs.
- Published
- 2024
26. Identity-Consistent Diffusion Network for Grading Knee Osteoarthritis Progression in Radiographic Imaging
- Author
-
Wu, Wenhua, Hu, Kun, Yue, Wenxi, Li, Wei, Simic, Milena, Li, Changyang, Xiang, Wei, and Wang, Zhiyong
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Knee osteoarthritis (KOA), a common form of arthritis that causes physical disability, has become increasingly prevalent in society. Employing computer-aided techniques to automatically assess the severity and progression of KOA can greatly benefit KOA treatment and disease management. Particularly, the advancement of X-ray technology in KOA demonstrates its potential for this purpose. Yet, existing X-ray prognosis research generally yields a singular progression severity grade, overlooking the potential visual changes for understanding and explaining the progression outcome. Therefore, in this study, a novel generative model is proposed, namely Identity-Consistent Radiographic Diffusion Network (IC-RDN), for multifaceted KOA prognosis encompassing a predicted future knee X-ray scan conditioned on the baseline scan. Specifically, an identity prior module for the diffusion and a downstream generation-guided progression prediction module are introduced. Compared to conventional image-to-image generative models, identity priors regularize and guide the diffusion to focus more on the clinical nuances of the prognosis based on a contrastive learning strategy. The progression prediction module utilizes both forecasted and baseline knee scans, and a more comprehensive formulation of KOA severity progression grading is expected. Extensive experiments on a widely used public dataset, OAI, demonstrate the effectiveness of the proposed method., Comment: Accepted by ECCV 2024
- Published
- 2024
27. The Llama 3 Herd of Models
- Author
-
Dubey, Abhimanyu, Jauhri, Abhinav, Pandey, Abhinav, Kadian, Abhishek, Al-Dahle, Ahmad, Letman, Aiesha, Mathur, Akhil, Schelten, Alan, Yang, Amy, Fan, Angela, Goyal, Anirudh, Hartshorn, Anthony, Yang, Aobo, Mitra, Archi, Sravankumar, Archie, Korenev, Artem, Hinsvark, Arthur, Rao, Arun, Zhang, Aston, Rodriguez, Aurelien, Gregerson, Austen, Spataru, Ava, Roziere, Baptiste, Biron, Bethany, Tang, Binh, Chern, Bobbie, Caucheteux, Charlotte, Nayak, Chaya, Bi, Chloe, Marra, Chris, McConnell, Chris, Keller, Christian, Touret, Christophe, Wu, Chunyang, Wong, Corinne, Ferrer, Cristian Canton, Nikolaidis, Cyrus, Allonsius, Damien, Song, Daniel, Pintz, Danielle, Livshits, Danny, Esiobu, David, Choudhary, Dhruv, Mahajan, Dhruv, Garcia-Olano, Diego, Perino, Diego, Hupkes, Dieuwke, Lakomkin, Egor, AlBadawy, Ehab, Lobanova, Elina, Dinan, Emily, Smith, Eric Michael, Radenovic, Filip, Zhang, Frank, Synnaeve, Gabriel, Lee, Gabrielle, Anderson, Georgia Lewis, Nail, Graeme, Mialon, Gregoire, Pang, Guan, Cucurell, Guillem, Nguyen, Hailey, Korevaar, Hannah, Xu, Hu, Touvron, Hugo, Zarov, Iliyan, Ibarra, Imanol Arrieta, Kloumann, Isabel, Misra, Ishan, Evtimov, Ivan, Copet, Jade, Lee, Jaewon, Geffert, Jan, Vranes, Jana, Park, Jason, Mahadeokar, Jay, Shah, Jeet, van der Linde, Jelmer, Billock, Jennifer, Hong, Jenny, Lee, Jenya, Fu, Jeremy, Chi, Jianfeng, Huang, Jianyu, Liu, Jiawen, Wang, Jie, Yu, Jiecao, Bitton, Joanna, Spisak, Joe, Park, Jongsoo, Rocca, Joseph, Johnstun, Joshua, Saxe, Joshua, Jia, Junteng, Alwala, Kalyan Vasuden, Upasani, Kartikeya, Plawiak, Kate, Li, Ke, Heafield, Kenneth, Stone, Kevin, El-Arini, Khalid, Iyer, Krithika, Malik, Kshitiz, Chiu, Kuenley, Bhalla, Kunal, Rantala-Yeary, Lauren, van der Maaten, Laurens, Chen, Lawrence, Tan, Liang, Jenkins, Liz, Martin, Louis, Madaan, Lovish, Malo, Lubo, Blecher, Lukas, Landzaat, Lukas, de Oliveira, Luke, Muzzi, Madeline, Pasupuleti, Mahesh, Singh, Mannat, Paluri, Manohar, Kardas, Marcin, Oldham, Mathew, Rita, Mathieu, Pavlova, Maya, Kambadur, Melanie, Lewis, Mike, Si, Min, Singh, Mitesh Kumar, Hassan, Mona, Goyal, Naman, Torabi, Narjes, Bashlykov, Nikolay, Bogoychev, Nikolay, Chatterji, Niladri, Duchenne, Olivier, Çelebi, Onur, Alrassy, Patrick, Zhang, Pengchuan, Li, Pengwei, Vasic, Petar, Weng, Peter, Bhargava, Prajjwal, Dubal, Pratik, Krishnan, Praveen, Koura, Punit Singh, Xu, Puxin, He, Qing, Dong, Qingxiao, Srinivasan, Ragavan, Ganapathy, Raj, Calderer, Ramon, Cabral, Ricardo Silveira, Stojnic, Robert, Raileanu, Roberta, Girdhar, Rohit, Patel, Rohit, Sauvestre, Romain, Polidoro, Ronnie, Sumbaly, Roshan, Taylor, Ross, Silva, Ruan, Hou, Rui, Wang, Rui, Hosseini, Saghar, Chennabasappa, Sahana, Singh, Sanjay, Bell, Sean, Kim, Seohyun Sonia, Edunov, Sergey, Nie, Shaoliang, Narang, Sharan, Raparthy, Sharath, Shen, Sheng, Wan, Shengye, Bhosale, Shruti, Zhang, Shun, Vandenhende, Simon, Batra, Soumya, Whitman, Spencer, Sootla, Sten, Collot, Stephane, Gururangan, Suchin, Borodinsky, Sydney, Herman, Tamar, Fowler, Tara, Sheasha, Tarek, Georgiou, Thomas, Scialom, Thomas, Speckbacher, Tobias, Mihaylov, Todor, Xiao, Tong, Karn, Ujjwal, Goswami, Vedanuj, Gupta, Vibhor, Ramanathan, Vignesh, Kerkez, Viktor, Gonguet, Vincent, Do, Virginie, Vogeti, Vish, Petrovic, Vladan, Chu, Weiwei, Xiong, Wenhan, Fu, Wenyin, Meers, Whitney, Martinet, Xavier, Wang, Xiaodong, Tan, Xiaoqing Ellen, Xie, Xinfeng, Jia, Xuchao, Wang, Xuewei, Goldschlag, Yaelle, Gaur, Yashesh, Babaei, Yasmine, Wen, Yi, Song, Yiwen, Zhang, Yuchen, Li, Yue, Mao, Yuning, Coudert, Zacharie Delpierre, Yan, Zheng, Chen, Zhengxing, Papakipos, Zoe, Singh, Aaditya, Grattafiori, Aaron, Jain, Abha, Kelsey, Adam, Shajnfeld, Adam, Gangidi, Adithya, Victoria, Adolfo, Goldstand, Ahuva, Menon, Ajay, Sharma, Ajay, Boesenberg, Alex, Vaughan, Alex, Baevski, Alexei, Feinstein, Allie, Kallet, Amanda, Sangani, Amit, Yunus, Anam, Lupu, Andrei, Alvarado, Andres, Caples, Andrew, Gu, Andrew, Ho, Andrew, Poulton, Andrew, Ryan, Andrew, Ramchandani, Ankit, Franco, Annie, Saraf, Aparajita, Chowdhury, Arkabandhu, Gabriel, Ashley, Bharambe, Ashwin, Eisenman, Assaf, Yazdan, Azadeh, James, Beau, Maurer, Ben, Leonhardi, Benjamin, Huang, Bernie, Loyd, Beth, De Paola, Beto, Paranjape, Bhargavi, Liu, Bing, Wu, Bo, Ni, Boyu, Hancock, Braden, Wasti, Bram, Spence, Brandon, Stojkovic, Brani, Gamido, Brian, Montalvo, Britt, Parker, Carl, Burton, Carly, Mejia, Catalina, Wang, Changhan, Kim, Changkyu, Zhou, Chao, Hu, Chester, Chu, Ching-Hsiang, Cai, Chris, Tindal, Chris, Feichtenhofer, Christoph, Civin, Damon, Beaty, Dana, Kreymer, Daniel, Li, Daniel, Wyatt, Danny, Adkins, David, Xu, David, Testuggine, Davide, David, Delia, Parikh, Devi, Liskovich, Diana, Foss, Didem, Wang, Dingkang, Le, Duc, Holland, Dustin, Dowling, Edward, Jamil, Eissa, Montgomery, Elaine, Presani, Eleonora, Hahn, Emily, Wood, Emily, Brinkman, Erik, Arcaute, Esteban, Dunbar, Evan, Smothers, Evan, Sun, Fei, Kreuk, Felix, Tian, Feng, Ozgenel, Firat, Caggioni, Francesco, Guzmán, Francisco, Kanayet, Frank, Seide, Frank, Florez, Gabriela Medina, Schwarz, Gabriella, Badeer, Gada, Swee, Georgia, Halpern, Gil, Thattai, Govind, Herman, Grant, Sizov, Grigory, Guangyi, Zhang, Lakshminarayanan, Guna, Shojanazeri, Hamid, Zou, Han, Wang, Hannah, Zha, Hanwen, Habeeb, Haroun, Rudolph, Harrison, Suk, Helen, Aspegren, Henry, Goldman, Hunter, Damlaj, Ibrahim, Molybog, Igor, Tufanov, Igor, Veliche, Irina-Elena, Gat, Itai, Weissman, Jake, Geboski, James, Kohli, James, Asher, Japhet, Gaya, Jean-Baptiste, Marcus, Jeff, Tang, Jeff, Chan, Jennifer, Zhen, Jenny, Reizenstein, Jeremy, Teboul, Jeremy, Zhong, Jessica, Jin, Jian, Yang, Jingyi, Cummings, Joe, Carvill, Jon, Shepard, Jon, McPhie, Jonathan, Torres, Jonathan, Ginsburg, Josh, Wang, Junjie, Wu, Kai, U, Kam Hou, Saxena, Karan, Prasad, Karthik, Khandelwal, Kartikay, Zand, Katayoun, Matosich, Kathy, Veeraraghavan, Kaushik, Michelena, Kelly, Li, Keqian, Huang, Kun, Chawla, Kunal, Lakhotia, Kushal, Huang, Kyle, Chen, Lailin, Garg, Lakshya, A, Lavender, Silva, Leandro, Bell, Lee, Zhang, Lei, Guo, Liangpeng, Yu, Licheng, Moshkovich, Liron, Wehrstedt, Luca, Khabsa, Madian, Avalani, Manav, Bhatt, Manish, Tsimpoukelli, Maria, Mankus, Martynas, Hasson, Matan, Lennie, Matthew, Reso, Matthias, Groshev, Maxim, Naumov, Maxim, Lathi, Maya, Keneally, Meghan, Seltzer, Michael L., Valko, Michal, Restrepo, Michelle, Patel, Mihir, Vyatskov, Mik, Samvelyan, Mikayel, Clark, Mike, Macey, Mike, Wang, Mike, Hermoso, Miquel Jubert, Metanat, Mo, Rastegari, Mohammad, Bansal, Munish, Santhanam, Nandhini, Parks, Natascha, White, Natasha, Bawa, Navyata, Singhal, Nayan, Egebo, Nick, Usunier, Nicolas, Laptev, Nikolay Pavlovich, Dong, Ning, Zhang, Ning, Cheng, Norman, Chernoguz, Oleg, Hart, Olivia, Salpekar, Omkar, Kalinli, Ozlem, Kent, Parkin, Parekh, Parth, Saab, Paul, Balaji, Pavan, Rittner, Pedro, Bontrager, Philip, Roux, Pierre, Dollar, Piotr, Zvyagina, Polina, Ratanchandani, Prashant, Yuvraj, Pritish, Liang, Qian, Alao, Rachad, Rodriguez, Rachel, Ayub, Rafi, Murthy, Raghotham, Nayani, Raghu, Mitra, Rahul, Li, Raymond, Hogan, Rebekkah, Battey, Robin, Wang, Rocky, Maheswari, Rohan, Howes, Russ, Rinott, Ruty, Bondu, Sai Jayesh, Datta, Samyak, Chugh, Sara, Hunt, Sara, Dhillon, Sargun, Sidorov, Sasha, Pan, Satadru, Verma, Saurabh, Yamamoto, Seiji, Ramaswamy, Sharadh, Lindsay, Shaun, Feng, Sheng, Lin, Shenghao, Zha, Shengxin Cindy, Shankar, Shiva, Zhang, Shuqiang, Wang, Sinong, Agarwal, Sneha, Sajuyigbe, Soji, Chintala, Soumith, Max, Stephanie, Chen, Stephen, Kehoe, Steve, Satterfield, Steve, Govindaprasad, Sudarshan, Gupta, Sumit, Cho, Sungmin, Virk, Sunny, Subramanian, Suraj, Choudhury, Sy, Goldman, Sydney, Remez, Tal, Glaser, Tamar, Best, Tamara, Kohler, Thilo, Robinson, Thomas, Li, Tianhe, Zhang, Tianjun, Matthews, Tim, Chou, Timothy, Shaked, Tzook, Vontimitta, Varun, Ajayi, Victoria, Montanez, Victoria, Mohan, Vijai, Kumar, Vinay Satish, Mangla, Vishal, Albiero, Vítor, Ionescu, Vlad, Poenaru, Vlad, Mihailescu, Vlad Tiberiu, Ivanov, Vladimir, Li, Wei, Wang, Wenchen, Jiang, Wenwen, Bouaziz, Wes, Constable, Will, Tang, Xiaocheng, Wang, Xiaofang, Wu, Xiaojian, Wang, Xiaolan, Xia, Xide, Wu, Xilun, Gao, Xinbo, Chen, Yanjun, Hu, Ye, Jia, Ye, Qi, Ye, Li, Yenda, Zhang, Yilin, Zhang, Ying, Adi, Yossi, Nam, Youngjin, Yu, Wang, Hao, Yuchen, Qian, Yundi, He, Yuzi, Rait, Zach, DeVito, Zachary, Rosnbrick, Zef, Wen, Zhaoduo, Yang, Zhenyu, and Zhao, Zhiwei
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.
- Published
- 2024
28. Dynamic Object Queries for Transformer-based Incremental Object Detection
- Author
-
Zhang, Jichuan, Li, Wei, Cheng, Shuang, Li, Ya-Li, and Wang, Shengjin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Incremental object detection (IOD) aims to sequentially learn new classes, while maintaining the capability to locate and identify old ones. As the training data arrives with annotations only with new classes, IOD suffers from catastrophic forgetting. Prior methodologies mainly tackle the forgetting issue through knowledge distillation and exemplar replay, ignoring the conflict between limited model capacity and increasing knowledge. In this paper, we explore \textit{dynamic object queries} for incremental object detection built on Transformer architecture. We propose the \textbf{Dy}namic object \textbf{Q}uery-based \textbf{DE}tection \textbf{TR}ansformer (DyQ-DETR), which incrementally expands the model representation ability to achieve stability-plasticity tradeoff. First, a new set of learnable object queries are fed into the decoder to represent new classes. These new object queries are aggregated with those from previous phases to adapt both old and new knowledge well. Second, we propose the isolated bipartite matching for object queries in different phases, based on disentangled self-attention. The interaction among the object queries at different phases is eliminated to reduce inter-class confusion. Thanks to the separate supervision and computation over object queries, we further present the risk-balanced partial calibration for effective exemplar replay. Extensive experiments demonstrate that DyQ-DETR significantly surpasses the state-of-the-art methods, with limited parameter overhead. Code will be made publicly available.
- Published
- 2024
29. Variational Monte Carlo Study of the 1/9 Magnetization Plateau in Kagome Antiferromagnets
- Author
-
He, Li-Wei, Yu, Shun-Li, and Li, Jian-Xin
- Subjects
Condensed Matter - Strongly Correlated Electrons - Abstract
Motivated by very recent experimental observations of the 1/9 magnetization plateaus in YCu$_3$(OH)$_{6+x}$Br$_{3-x}$ and YCu$_3$(OD)$_{6+x}$Br$_{3-x}$, our study delves into the magnetic field-induced phase transitions in the nearest-neighbor antiferromagnetic Heisenberg model on the kagome lattice using the variational Monte Carlo technique. We uncover a phase transition from a zero-field Dirac spin liquid to a field-induced magnetically disordered phase that exhibits the 1/9 magnetization plateau. Through a comprehensive analysis encompassing the magnetization distribution, spin correlations, chiral order parameter, topological entanglement entropy, ground-state degeneracy, Chern number and excitation spectrum, we pinpoint the phase associated with this magnetization plateau as a chiral $\mathbb{Z}_3$ topological quantum spin liquid and elucidate its diverse physical properties., Comment: Accepted by Phys. Rev. Lett
- Published
- 2024
30. Spinon quantum spin Hall state in the kagome antiferromagnet with a Dzyaloshinskii-Moriya interaction
- Author
-
He, Li-Wei and Li, Jian-Xin
- Subjects
Condensed Matter - Strongly Correlated Electrons - Abstract
We investigate the spin-$\frac{1}{2}$ antiferromagnetic Heisenberg model with a Dzyaloshinskii-Moriya interaction on kagome lattice, making use of the variational Monte Carlo technique. An exotic quantum spin state is found to arise from a melting of the $\boldsymbol{Q} = 0$ long-range magnetic order by a topological transition, when a small anisotropic third nearest-neighbor antiferromagnetic Heisenberg interaction is turned on. This novel state is a gapped quantum spin liquid, characterized by a topological order with ground-state degeneracy $n_g = 4$ and topological entanglement entropy $\gamma = \ln 2$, suggesting it is an Abelian topological phase. Furthermore, the Chern numbers of the spin-up (-down) spinon occupied bands of this state are $C_{\uparrow \downarrow} = \pm 1$, respectively. From this perspective, this state is also a time-reversal symmetric (total Chern number $C_{total} = 0$) topological insulator with spinons as the chiral edge states, which carry opposite spin and move in the opposite direction. It is analogous to the quantum spin Hall state but the spin current is carried by deconfined spinons in a quantum spin liquid, so is dubbed as the spinon quantum spin Hall state.
- Published
- 2024
- Full Text
- View/download PDF
31. Identification and multiply robust estimation of causal effects via instrumental variables from an auxiliary heterogeneous population
- Author
-
Li, Wei, Liu, Jiapeng, Ding, Peng, and Geng, Zhi
- Subjects
Statistics - Methodology - Abstract
Evaluating causal effects in a primary population of interest with unmeasured confounders is challenging. Although instrumental variables (IVs) are widely used to address unmeasured confounding, they may not always be available in the primary population. Fortunately, IVs might have been used in previous observational studies on similar causal problems, and these auxiliary studies can be useful to infer causal effects in the primary population, even if they represent different populations. However, existing methods often assume homogeneity or equality of conditional average treatment effects between the primary and auxiliary populations, which may be limited in practice. This paper aims to remove the homogeneity requirement and establish a novel identifiability result allowing for different conditional average treatment effects across populations. We also construct a multiply robust estimator that remains consistent despite partial misspecifications of the observed data model and achieves local efficiency if all nuisance models are correct. The proposed approach is illustrated through simulation studies. We finally apply our approach by leveraging data from lower income individuals with cigarette price as a valid IV to evaluate the causal effect of smoking on physical functional status in higher income group where strong IVs are not available.
- Published
- 2024
32. Low latency carbon budget analysis reveals a large decline of the land carbon sink in 2023
- Author
-
Ke, Piyu, Ciais, Philippe, Sitch, Stephen, Li, Wei, Bastos, Ana, Liu, Zhu, Xu, Yidi, Gui, Xiaofan, Bian, Jiang, Goll, Daniel S, Xi, Yi, Li, Wanjing, O'Sullivan, Michael, de Souza, Jeffeson Goncalves, Friedlingstein, Pierre, and Chevallier, Frederic
- Subjects
Physics - Atmospheric and Oceanic Physics - Abstract
In 2023, the CO2 growth rate was 3.37 +/- 0.11 ppm at Mauna Loa, 86% above the previous year, and hitting a record high since observations began in 1958, while global fossil fuel CO2 emissions only increased by 0.6 +/- 0.5%. This implies an unprecedented weakening of land and ocean sinks, and raises the question of where and why this reduction happened. Here we show a global net land CO2 sink of 0.44 +/- 0.21 GtC yr-1, the weakest since 2003. We used dynamic global vegetation models, satellites fire emissions, an atmospheric inversion based on OCO-2 measurements, and emulators of ocean biogeochemical and data driven models to deliver a fast-track carbon budget in 2023. Those models ensured consistency with previous carbon budgets. Regional flux anomalies from 2015-2022 are consistent between top-down and bottom-up approaches, with the largest abnormal carbon loss in the Amazon during the drought in the second half of 2023 (0.31 +/- 0.19 GtC yr-1), extreme fire emissions of 0.58 +/- 0.10 GtC yr-1 in Canada and a loss in South-East Asia (0.13 +/- 0.12 GtC yr-1). Since 2015, land CO2 uptake north of 20 degree N declined by half to 1.13 +/- 0.24 GtC yr-1 in 2023. Meanwhile, the tropics recovered from the 2015-16 El Nino carbon loss, gained carbon during the La Nina years (2020-2023), then switched to a carbon loss during the 2023 El Nino (0.56 +/- 0.23 GtC yr-1). The ocean sink was stronger than normal in the equatorial eastern Pacific due to reduced upwelling from La Nina's retreat in early 2023 and the development of El Nino later. Land regions exposed to extreme heat in 2023 contributed a gross carbon loss of 1.73 GtC yr-1, indicating that record warming in 2023 had a strong negative impact on the capacity of terrestrial ecosystems to mitigate climate change.
- Published
- 2024
33. Discovery and inference of possibly bi-directional causal relationships with invalid instrumental variables
- Author
-
Li, Wei, Duan, Rui, and Li, Sai
- Subjects
Statistics - Methodology - Abstract
Learning causal relationships between pairs of complex traits from observational studies is of great interest across various scientific domains. However, most existing methods assume the absence of unmeasured confounding and restrict causal relationships between two traits to be uni-directional, which may be violated in real-world systems. In this paper, we address the challenge of causal discovery and effect inference for two traits while accounting for unmeasured confounding and potential feedback loops. By leveraging possibly invalid instrumental variables, we provide identification conditions for causal parameters in a model that allows for bi-directional relationships, and we also establish identifiability of the causal direction under the introduced conditions. Then we propose a data-driven procedure to detect the causal direction and provide inference results about causal effects along the identified direction. We show that our method consistently recovers the true direction and produces valid confidence intervals for the causal effect. We conduct extensive simulation studies to show that our proposal outperforms existing methods. We finally apply our method to analyze real data sets from UK Biobank.
- Published
- 2024
34. Friedkin-Johnsen Model for Opinion Dynamics on Signed Graphs
- Author
-
Zhou, Xiaotian, Sun, Haoxin, Xu, Wanyue, Li, Wei, and Zhang, Zhongzhi
- Subjects
Computer Science - Social and Information Networks ,Computer Science - Networking and Internet Architecture - Abstract
A signed graph offers richer information than an unsigned graph, since it describes both collaborative and competitive relationships in social networks. In this paper, we study opinion dynamics on a signed graph, based on the Friedkin-Johnsen model. We first interpret the equilibrium opinion in terms of a defined random walk on an augmented signed graph, by representing the equilibrium opinion of every node as a combination of all nodes' internal opinions, with the coefficient of the internal opinion for each node being the difference of two absorbing probabilities. We then quantify some relevant social phenomena and express them in terms of the $\ell_2$ norms of vectors. We also design a nearly-linear time signed Laplacian solver for assessing these quantities, by establishing a connection between the absorbing probability of random walks on a signed graph and that on an associated unsigned graph. We further study the opinion optimization problem by changing the initial opinions of a fixed number of nodes, which can be optimally solved in cubic time. We provide a nearly-linear time algorithm with error guarantee to approximately solve the problem. Finally, we execute extensive experiments on sixteen real-life signed networks, which show that both of our algorithms are effective and efficient, and are scalable to massive graphs with over 20 million nodes.
- Published
- 2024
35. Multi-Granularity Semantic Revision for Large Language Model Distillation
- Author
-
Liu, Xiaoyu, Zhang, Yun, Li, Wei, Li, Simiao, Huang, Xudong, Chen, Hanting, Tang, Yehui, Hu, Jie, Xiong, Zhiwei, and Wang, Yunhe
- Subjects
Computer Science - Computation and Language - Abstract
Knowledge distillation plays a key role in compressing the Large Language Models (LLMs), which boosts a small-size student model under large teacher models' guidance. However, existing LLM distillation methods overly rely on student-generated outputs, which may introduce generation errors and misguide the distillation process. Moreover, the distillation loss functions introduced in previous art struggle to align the most informative part due to the complex distribution of LLMs' outputs. To address these problems, we propose a multi-granularity semantic revision method for LLM distillation. At the sequence level, we propose a sequence correction and re-generation (SCRG) strategy. SCRG first calculates the semantic cognitive difference between the teacher and student to detect the error token, then corrects it with the teacher-generated one, and re-generates the sequence to reduce generation errors and enhance generation diversity. At the token level, we design a distribution adaptive clipping Kullback-Leibler (DAC-KL) loss as the distillation objective function. DAC-KL loss exploits a learnable sub-network to adaptively extract semantically dense areas from the teacher's output, avoiding the interference of redundant information in the distillation process. Finally, at the span level, we leverage the span priors of a sequence to compute the probability correlations within spans, and constrain the teacher and student's probability correlations to be consistent, further enhancing the transfer of semantic information. Extensive experiments across different model families with parameters ranging from 0.1B to 13B demonstrate the superiority of our method compared to existing methods.
- Published
- 2024
36. Generalizable Implicit Motion Modeling for Video Frame Interpolation
- Author
-
Guo, Zujin, Li, Wei, and Loy, Chen Change
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Motion modeling is critical in flow-based Video Frame Interpolation (VFI). Existing paradigms either consider linear combinations of bidirectional flows or directly predict bilateral flows for given timestamps without exploring favorable motion priors, thus lacking the capability of effectively modeling spatiotemporal dynamics in real-world videos. To address this limitation, in this study, we introduce Generalizable Implicit Motion Modeling (GIMM), a novel and effective approach to motion modeling for VFI. Specifically, to enable GIMM as an effective motion modeling paradigm, we design a motion encoding pipeline to model spatiotemporal motion latent from bidirectional flows extracted from pre-trained flow estimators, effectively representing input-specific motion priors. Then, we implicitly predict arbitrary-timestep optical flows within two adjacent input frames via an adaptive coordinate-based neural network, with spatiotemporal coordinates and motion latent as inputs. Our GIMM can be smoothly integrated with existing flow-based VFI works without further modifications. We show that GIMM performs better than the current state of the art on the VFI benchmarks., Comment: Project Page: https://gseancdat.github.io/projects/GIMMVFI
- Published
- 2024
37. Investigating Public Fine-Tuning Datasets: A Complex Review of Current Practices from a Construction Perspective
- Author
-
Ma, Runyuan, Li, Wei, and Shang, Fukai
- Subjects
Computer Science - Computation and Language - Abstract
With the rapid development of the large model domain, research related to fine-tuning has concurrently seen significant advancement, given that fine-tuning is a constituent part of the training process for large-scale models. Data engineering plays a fundamental role in the training process of models, which includes data infrastructure, data processing, etc. Data during fine-tuning likewise forms the base for large models. In order to embrace the power and explore new possibilities of fine-tuning datasets, this paper reviews current public fine-tuning datasets from the perspective of data construction. An overview of public fine-tuning datasets from two sides: evolution and taxonomy, is provided in this review, aiming to chart the development trajectory. Construction techniques and methods for public fine-tuning datasets of Large Language Models (LLMs), including data generation and data augmentation among others, are detailed. This elaboration follows the aforementioned taxonomy, specifically across demonstration, comparison, and generalist categories. Additionally, a category tree of data generation techniques has been abstracted in our review to assist researchers in gaining a deeper understanding of fine-tuning datasets from the construction dimension. Our review also summarizes the construction features in different data preparation phases of current practices in this field, aiming to provide a comprehensive overview and inform future research. Fine-tuning dataset practices, encompassing various data modalities, are also discussed from a construction perspective in our review. Towards the end of the article, we offer insights and considerations regarding the future construction and developments of fine-tuning datasets.
- Published
- 2024
38. LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
- Author
-
Li, Feng, Zhang, Renrui, Zhang, Hao, Zhang, Yuanhan, Li, Bo, Li, Wei, Ma, Zejun, and Li, Chunyuan
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Visual instruction tuning has made considerable strides in enhancing the capabilities of Large Multimodal Models (LMMs). However, existing open LMMs largely focus on single-image tasks, their applications to multi-image scenarios remains less explored. Additionally, prior LMM research separately tackles different scenarios, leaving it impossible to generalize cross scenarios with new emerging capabilities. To this end, we introduce LLaVA-NeXT-Interleave, which simultaneously tackles Multi-image, Multi-frame (video), Multi-view (3D), and Multi-patch (single-image) scenarios in LMMs. To enable these capabilities, we regard the interleaved data format as a general template and compile the M4-Instruct dataset with 1,177.6k samples, spanning 4 primary domains with 14 tasks and 41 datasets. We also curate the LLaVA-Interleave Bench to comprehensively evaluate the multi-image performance of LMMs. Through extensive experiments, LLaVA-NeXT-Interleave achieves leading results in multi-image, video, and 3D benchmarks, while maintaining the performance of single-image tasks. Besides, our model also exhibits several emerging capabilities, e.g., transferring tasks across different settings and modalities. Code is available at https://github.com/LLaVA-VL/LLaVA-NeXT, Comment: Project Page: https://llava-vl.github.io/blog/2024-06-16-llava-next-interleave/
- Published
- 2024
39. Quantum Supercriticality in the Ising Model and Rydberg Atom Array
- Author
-
Wang, Junsen, Lv, Enze, Li, Xinyang, Jin, Yuliang, and Li, Wei
- Subjects
Condensed Matter - Strongly Correlated Electrons ,Condensed Matter - Quantum Gases ,Condensed Matter - Statistical Mechanics ,Quantum Physics - Abstract
Supercriticality, featured with universal scaling behaviors, emerges as an intriguing phenomenon proximate to the classical liquid-gas critical point. In this study, we extend this significant concept to quantum many-body systems near the quantum critical point (QCP), employing tensor network calculations and scaling analyses of the Ising model and Rydberg atom array. The supercritical, fluid-like, quantum states are found to be strongly fluctuating and highly entangled, as characterized by the universal scalings in susceptibility $\chi_z \sim (h_x-h_x^c)^{-\gamma}$, correlation length $\xi \sim (h_x-h_x^c)^{-\nu}$, fidelity susceptibility $\chi_F \sim (h_x - h_x^c)^{d\nu - 2}$, and entanglement entropy $S_{\rm E} \sim \ln{(h_x - h_x^c)}$. Here, $\gamma$ and $\nu$ represent critical exponents, $d$ is the dimension of the system, and $h_x^c$ is the critical transverse field of the Ising QCP. The universal scaling behaviors are revealed in the regime enclosed by two quantum supercritical crossover lines in the longitudinal-transverse field ($h_z$-$h_x$) plane, $|h_z| \propto (h_x - h_x^c)^{\beta + \gamma}$ relating to critical exponents $\beta$ and $\gamma$, where the response functions, measures of entanglement, and fidelity susceptibility reach their maxima. We propose that Rydberg atom arrays and quantum Ising magnets provide available platforms for exploring emergent supercritical phenomena and identifying the universal scalings. The present work establishes a foundation for exploring quantum supercriticality in magnetic systems and through quantum simulations., Comment: 8 pages, 5 figures (SM 5 pages, 6 figures)
- Published
- 2024
40. Music Era Recognition Using Supervised Contrastive Learning and Artist Information
- Author
-
He, Qiqi, Song, Xuchen, Hao, Weituo, Wang, Ju-Chiang, Lu, Wei-Tsung, and Li, Wei
- Subjects
Computer Science - Sound ,Computer Science - Artificial Intelligence ,Computer Science - Information Retrieval ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Does popular music from the 60s sound different than that of the 90s? Prior study has shown that there would exist some variations of patterns and regularities related to instrumentation changes and growing loudness across multi-decadal trends. This indicates that perceiving the era of a song from musical features such as audio and artist information is possible. Music era information can be an important feature for playlist generation and recommendation. However, the release year of a song can be inaccessible in many circumstances. This paper addresses a novel task of music era recognition. We formulate the task as a music classification problem and propose solutions based on supervised contrastive learning. An audio-based model is developed to predict the era from audio. For the case where the artist information is available, we extend the audio-based model to take multimodal inputs and develop a framework, called MultiModal Contrastive (MMC) learning, to enhance the training. Experimental result on Million Song Dataset demonstrates that the audio-based model achieves 54% in accuracy with a tolerance of 3-years range; incorporating the artist information with the MMC framework for training leads to 9% improvement further.
- Published
- 2024
41. Hadronuclear interactions in AGN jets as the origin of the diffuse high-energy neutrino background
- Author
-
Xue, Rui, Wang, Ze-Rui, Joshi, Jagdish C., and Li, Wei-Jian
- Subjects
Astrophysics - High Energy Astrophysical Phenomena - Abstract
The origin of diffuse high-energy neutrinos from TeV to PeV energies detected by IceCube Observatory remains a mystery. In our previous work, we have shown that hadronuclear (p-p) interactions in AGN jets could be important and generate detectable very-high-energy emissions. Here, we further explore these interactions in the AGN jets based on their luminosity function. The diffuse neutrino flux and corresponding $\gamma$-ray flux have been calculated and compared with observational data. In our modeling, two beaming patterns are considered separately. To make sure that the corresponding $\gamma$-ray flux does not overshoot the diffuse $\gamma$-ray background, we find that if the neutrino production region in jet is opaque to $\gamma$ rays, p-p interactions in AGN jets with a small viewing angle (the blazar case) are able to interpret the PeV neutrino background. Similarly, AGN jets with a large viewing angle (the radio galaxy case) may interpret the TeV neutrino background. While, if the neutrino production region is transparent to $\gamma$ rays, only blazars have the potential to interpret the DNB around PeV band. Some caveats are also discussed., Comment: 12 pages, 6 figures, accepted for publication in ApJ
- Published
- 2024
42. InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
- Author
-
Zhang, Pan, Dong, Xiaoyi, Zang, Yuhang, Cao, Yuhang, Qian, Rui, Chen, Lin, Guo, Qipeng, Duan, Haodong, Wang, Bin, Ouyang, Linke, Zhang, Songyang, Zhang, Wenwei, Li, Yining, Gao, Yang, Sun, Peng, Zhang, Xinyue, Li, Wei, Li, Jingwen, Wang, Wenhai, Yan, Hang, He, Conghui, Zhang, Xingcheng, Chen, Kai, Dai, Jifeng, Qiao, Yu, Lin, Dahua, and Wang, Jiaqi
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. This long-context capability allows IXC-2.5 to excel in tasks requiring extensive input and output contexts. Compared to its previous 2.0 version, InternLM-XComposer-2.5 features three major upgrades in vision-language comprehension: (1) Ultra-High Resolution Understanding, (2) Fine-Grained Video Understanding, and (3) Multi-Turn Multi-Image Dialogue. In addition to comprehension, IXC-2.5 extends to two compelling applications using extra LoRA parameters for text-image composition: (1) Crafting Webpages and (2) Composing High-Quality Text-Image Articles. IXC-2.5 has been evaluated on 28 benchmarks, outperforming existing open-source state-of-the-art models on 16 benchmarks. It also surpasses or competes closely with GPT-4V and Gemini Pro on 16 key tasks. The InternLM-XComposer-2.5 is publicly available at https://github.com/InternLM/InternLM-XComposer., Comment: Technical Report. https://github.com/InternLM/InternLM-XComposer
- Published
- 2024
43. WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation
- Author
-
Huang, Zihao, Hu, Shoukang, Wang, Guangcong, Liu, Tianqi, Zang, Yuhang, Cao, Zhiguo, Li, Wei, and Liu, Ziwei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Existing human datasets for avatar creation are typically limited to laboratory environments, wherein high-quality annotations (e.g., SMPL estimation from 3D scans or multi-view images) can be ideally provided. However, their annotating requirements are impractical for real-world images or videos, posing challenges toward real-world applications on current avatar creation methods. To this end, we propose the WildAvatar dataset, a web-scale in-the-wild human avatar creation dataset extracted from YouTube, with $10,000+$ different human subjects and scenes. WildAvatar is at least $10\times$ richer than previous datasets for 3D human avatar creation. We evaluate several state-of-the-art avatar creation methods on our dataset, highlighting the unexplored challenges in real-world applications on avatar creation. We also demonstrate the potential for generalizability of avatar creation methods, when provided with data at scale. We publicly release our data source links and annotations, to push forward 3D human avatar creation and other related fields for real-world applications., Comment: Project page: https://wildavatar.github.io/
- Published
- 2024
44. The Affordances of iPad for Constructing a Technology-Mediated Space in Hong Kong English Medium Instruction Secondary Classrooms: A Translanguaging View
- Author
-
Kevin W. H. Tai and Li Wei
- Abstract
Despite the widespread use of mobile digital devices such as iPads in teaching and learning, there is little research on the ways in which content teachers make use of the technological affordances of the iPad to achieve pedagogical goals in bilingual/multilingual classrooms. This article adopts translanguaging as an analytical perspective to explore how the use of the iPad extends the semiotic and spatial repertoires for enabling the English Medium Instruction (EMI) teacher to create a translanguaging space for supporting multilingual students' learning of new academic knowledge. The data for this article is based on a linguistic ethnographic project in an EMI mathematics classroom in a secondary school in Hong Kong. Multimodal Conversation Analysis is used to analyse the classroom interactional data, triangulated with the video-stimulated-recall-interviews that are analysed using Interpretative Phenomenological Analysis. The article argues that the iPad provides opportunities for the EMI teacher to fully exploit the semiotic and spatial resources for creating a technology-mediated space in the classroom. Such a space in turn allows the teacher to accomplish content teaching and build a more engaging environment for learning.
- Published
- 2024
- Full Text
- View/download PDF
45. Conforming mesh modeling of multi-physics effect on residual stress in multi-layer powder bed fusion process
- Author
-
Kishore, Mysore Nagaraja, Qian, Dong, Soshi, Masakazu, and Li, Wei
- Subjects
Manufacturing Engineering ,Engineering ,Powder bed fusion ,Computational fluid dynamics ,Finite element method ,Discrete element method ,Conforming mesh ,Residual stress ,Industrial Engineering & Automation ,Manufacturing engineering ,Mechanical engineering - Abstract
The current research aims to predict the residual stress accumulation and evolution in the powder bed fusion processed multi-layer thin wall structures through a conforming mesh modeling approach. It involves the discrete element method (DEM) interfaced with the volume of fluid (VOF) method using computational fluid dynamics (CFD) coupled with the finite element method (FEM). The conforming mesh approach developed in the research predicts multi-physics, its induced porosity, and the cumulative effect on the residual stress in the powder bed fusion processed Ti-6Al-4V thin wall structures. The results of the residual stress in the multi-layered component from this method were further quantitatively compared with the non-conforming finite element method. The results show the conforming mesh approach was not only effective in capturing the layer geometry, and defects induced during the printing, but also predicted the residual stress in the region of the defect more accurately than the non-conforming mesh methods.
- Published
- 2024
46. High-density vertical sidewall MoS2 transistors through T-shape vertical lamination.
- Author
-
Tao, Quanyang, Wu, Ruixia, Zou, Xuming, Chen, Yang, Li, Wanying, Lu, Zheyi, Ma, Likuan, Kong, Lingan, Lu, Donglin, Yang, Xiaokun, Song, Wenjing, Li, Wei, Liu, Liting, Ding, Shuimei, Liu, Xiao, Duan, Xidong, Liao, Lei, and Liu, Yin Allison
- Abstract
Vertical transistors, in which the source and drain are aligned vertically and the current flow is normal to the wafer surface, have attracted considerable attention recently. However, the realization of high-density vertical transistors is challenging, and could be largely attributed to the incompatibility between vertical structures and conventional lateral fabrication processes. Here we report a T-shape lamination approach for realizing high-density vertical sidewall transistors, where lateral transistors could be pre-fabricated on planar substrates first and then laminated onto vertical substrates using T-shape stamps, hence overcoming the incompatibility between planar processes and vertical structures. Based on this technique, we vertically stacked 60 MoS2 transistors within a small vertical footprint, corresponding to a device density over 108 cm-2. Furthermore, we demonstrate two approaches for scalable fabrication of vertical sidewall transistor arrays, including simultaneous lamination onto multiple vertical substrates, as well as on the same vertical substrate using multi-cycle layer-by-layer laminations.
- Published
- 2024
47. Twist angle driven electronic structure evolution of twisted bilayer graphene
- Author
-
Yu, Jiawei, Jia, Guihao, Li, Qian, Wang, Yuyang, Xiao, Kebin, Ju, Yongkang, Zhang, Hongyun, Hu, Zhiqiang, Guo, Yunkai, Lian, Biao, Tang, Peizhe, Zhou, Shuyun, Xue, Qi-Kun, and Li, Wei
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
In twisted bilayer graphene (TBG) devices, local strains often coexist and entangle with the twist-angle dependent moir\'e superlattice, both of which can significantly affect the electronic properties of TBG. Here, using low-temperature scanning tunneling microscopy, we investigate the fine evolution of the electronic structures of a TBG device with continuous variation of twist angles from 0.32{\deg} to 1.29{\deg}, spanning the first (1.1{\deg}), second (0.5{\deg}) and third (0.3{\deg}) magic angles. We reveal the exotic behavior of the flat bands and remote bands in both the energy space and real space near the magic angles. Interestingly, we observe an anomalous spectral weight transfer between the two flat band peaks in the tunneling spectra when approaching the first magic angle, suggesting strong inter-flat-bands interactions. The position of the remote band peak can be an index for the twist angle in TBG, since it positively correlates with the twist angle but is insensitive to the strain. Moreover, influences of the twist angle gradient on symmetry breaking of the flat bands are also studied.
- Published
- 2024
48. Comprehensive Generative Replay for Task-Incremental Segmentation with Concurrent Appearance and Semantic Forgetting
- Author
-
Li, Wei, Zhang, Jingyang, Heng, Pheng-Ann, and Gu, Lixu
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Generalist segmentation models are increasingly favored for diverse tasks involving various objects from different image sources. Task-Incremental Learning (TIL) offers a privacy-preserving training paradigm using tasks arriving sequentially, instead of gathering them due to strict data sharing policies. However, the task evolution can span a wide scope that involves shifts in both image appearance and segmentation semantics with intricate correlation, causing concurrent appearance and semantic forgetting. To solve this issue, we propose a Comprehensive Generative Replay (CGR) framework that restores appearance and semantic knowledge by synthesizing image-mask pairs to mimic past task data, which focuses on two aspects: modeling image-mask correspondence and promoting scalability for diverse tasks. Specifically, we introduce a novel Bayesian Joint Diffusion (BJD) model for high-quality synthesis of image-mask pairs with their correspondence explicitly preserved by conditional denoising. Furthermore, we develop a Task-Oriented Adapter (TOA) that recalibrates prompt embeddings to modulate the diffusion model, making the data synthesis compatible with different tasks. Experiments on incremental tasks (cardiac, fundus and prostate segmentation) show its clear advantage for alleviating concurrent appearance and semantic forgetting. Code is available at https://github.com/jingyzhang/CGR., Comment: Accepted by MICCAI24
- Published
- 2024
49. CSAKD: Knowledge Distillation with Cross Self-Attention for Hyperspectral and Multispectral Image Fusion
- Author
-
Hsu, Chih-Chung, Ni, Chih-Chien, Lee, Chia-Ming, and Kang, Li-Wei
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Hyperspectral imaging, capturing detailed spectral information for each pixel, is pivotal in diverse scientific and industrial applications. Yet, the acquisition of high-resolution (HR) hyperspectral images (HSIs) often needs to be addressed due to the hardware limitations of existing imaging systems. A prevalent workaround involves capturing both a high-resolution multispectral image (HR-MSI) and a low-resolution (LR) HSI, subsequently fusing them to yield the desired HR-HSI. Although deep learning-based methods have shown promising in HR-MSI/LR-HSI fusion and LR-HSI super-resolution (SR), their substantial model complexities hinder deployment on resource-constrained imaging devices. This paper introduces a novel knowledge distillation (KD) framework for HR-MSI/LR-HSI fusion to achieve SR of LR-HSI. Our KD framework integrates the proposed Cross-Layer Residual Aggregation (CLRA) block to enhance efficiency for constructing Dual Two-Streamed (DTS) network structure, designed to extract joint and distinct features from LR-HSI and HR-MSI simultaneously. To fully exploit the spatial and spectral feature representations of LR-HSI and HR-MSI, we propose a novel Cross Self-Attention (CSA) fusion module to adaptively fuse those features to improve the spatial and spectral quality of the reconstructed HR-HSI. Finally, the proposed KD-based joint loss function is employed to co-train the teacher and student networks. Our experimental results demonstrate that the student model not only achieves comparable or superior LR-HSI SR performance but also significantly reduces the model-size and computational requirements. This marks a substantial advancement over existing state-of-the-art methods. The source code is available at https://github.com/ming053l/CSAKD., Comment: Submitted to TIP 2024
- Published
- 2024
50. A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR
- Author
-
Pham, Van Tung, Lin, Yist, Han, Tao, Li, Wei, Zhang, Jun, Lu, Lu, and Wang, Yuxuan
- Subjects
Computer Science - Machine Learning - Abstract
Recent works have shown promising results in connecting speech encoders to large language models (LLMs) for speech recognition. However, several limitations persist, including limited fine-tuning options, a lack of mechanisms to enforce speech-text alignment, and high insertion errors especially in domain mismatch conditions. This paper presents a comprehensive solution to address these issues. We begin by investigating more thoughtful fine-tuning schemes. Next, we propose a matching loss to enhance alignment between modalities. Finally, we explore training and inference methods to mitigate high insertion errors. Experimental results on the Librispeech corpus demonstrate that partially fine-tuning the encoder and LLM using parameter-efficient methods, such as LoRA, is the most cost-effective approach. Additionally, the matching loss improves modality alignment, enhancing performance. The proposed training and inference methods significantly reduce insertion errors.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.