Author: "Li, Yue" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Li, Yue"' showing total 37,312 results

Start Over Author "Li, Yue"

37,312 results on '"Li, Yue"'

1. The Lullaby’s Utopian Function and the Green Utopian Imagination in Suzanne Collins’s The Hunger Games Trilogy

Author: Chen, Aihua and Li, Yue
Published: 2023

2. The relationship between fungal growth rate and temperature and humidity

Author: Zhan, Zhichao, Meiling, Xu, Li, Yue, and Dong, Meihua
Published: 2021
Full Text: View/download PDF

3. FewViewGS: Gaussian Splatting with Few View Matching and Multi-stage Training

Author: Yin, Ruihong, Yugay, Vladimir, Li, Yue, Karaoglu, Sezer, and Gevers, Theo
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The field of novel view synthesis from images has seen rapid advancements with the introduction of Neural Radiance Fields (NeRF) and more recently with 3D Gaussian Splatting. Gaussian Splatting became widely adopted due to its efficiency and ability to render novel views accurately. While Gaussian Splatting performs well when a sufficient amount of training images are available, its unstructured explicit representation tends to overfit in scenarios with sparse input images, resulting in poor rendering performance. To address this, we present a 3D Gaussian-based novel view synthesis method using sparse input images that can accurately render the scene from the viewpoints not covered by the training images. We propose a multi-stage training scheme with matching-based consistency constraints imposed on the novel views without relying on pre-trained depth estimation or diffusion models. This is achieved by using the matches of the available training images to supervise the generation of the novel views sampled between the training frames with color, geometry, and semantic losses. In addition, we introduce a locality preserving regularization for 3D Gaussians which removes rendering artifacts by preserving the local color structure of the scene. Evaluation on synthetic and real-world datasets demonstrates competitive or superior performance of our method in few-shot novel view synthesis compared to existing state-of-the-art methods., Comment: Accepted by NeurIPS2024
Published: 2024

4. PairSmell: A Novel Perspective Inspecting Software Modular Structure

Author: Zhong, Chenxing, Feitosa, Daniel, Avgeriou, Paris, Huang, Huang, Li, Yue, and Zhang, He
Subjects: Computer Science - Software Engineering, D.2
Abstract: Enhancing the modular structure of existing systems has attracted substantial research interest, focusing on two main methods: (1) software modularization and (2) identifying design issues (e.g., smells) as refactoring opportunities. However, re-modularization solutions often require extensive modifications to the original modules, and the design issues identified are generally too coarse to guide refactoring strategies. Combining the above two methods, this paper introduces a novel concept, PairSmell, which exploits modularization to pinpoint design issues necessitating refactoring. We concentrate on a granular but fundamental aspect of modularity principles -- modular relation (MR), i.e., whether a pair of entities are separated or collocated. The main assumption is that, if the actual MR of a pair violates its `apt MR', i.e., an MR agreed on by multiple modularization tools (as raters), it can be deemed likely a flawed architectural decision that necessitates further examination. To quantify and evaluate PairSmell, we conduct an empirical study on 20 C/C++ and Java projects, using 4 established modularization tools to identify two forms of PairSmell: inapt separated pairs InSep and inapt collocated pairs InCol. Our study on 260,003 instances reveals that their architectural impacts are substantial: (1) on average, 14.60% and 20.44% of software entities are involved in InSep and InCol MRs respectively; (2) InSep pairs are associated with 190% more co-changes than properly separated pairs, while InCol pairs are associated with 35% fewer co-changes than properly collocated pairs, both indicating a successful identification of modular structures detrimental to software quality; and (3) both forms of PairSmell persist across software evolution., Comment: Accepted by 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE'25)
Published: 2024

5. An Efficient Watermarking Method for Latent Diffusion Models via Low-Rank Adaptation

Author: Lin, Dongdong, Li, Yue, Tondi, Benedetta, Li, Bin, and Barni, Mauro
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The rapid proliferation of deep neural networks (DNNs) is driving a surge in model watermarking technologies, as the trained deep models themselves serve as intellectual properties. The core of existing model watermarking techniques involves modifying or tuning the models' weights. However, with the emergence of increasingly complex models, ensuring the efficiency of watermarking process is essential to manage the growing computational demands. Prioritizing efficiency not only optimizes resource utilization, making the watermarking process more applicable, but also minimizes potential impacts on model performance. In this letter, we propose an efficient watermarking method for latent diffusion models (LDMs) which is based on Low-Rank Adaptation (LoRA). We specifically choose to add trainable low-rank matrices to the existing weight matrices of the models to embed watermark, while keeping the original weights frozen. Moreover, we also propose a dynamic loss weight tuning algorithm to balance the generative task with the watermark embedding task, ensuring that the model can be watermarked with a limited impact on the quality of the generated images. Experimental results show that the proposed method ensures fast watermark embedding and maintains a very low bit error rate of the watermark, a high-quality of the generated image, and a zero false negative rate (FNR) for verification.
Published: 2024

6. Label Set Optimization via Activation Distribution Kurtosis for Zero-shot Classification with Generative Models

Author: Li, Yue, Zhao, Zhixue, and Scarton, Carolina
Subjects: Computer Science - Computation and Language
Abstract: In-context learning (ICL) performance is known to be sensitive to the prompt design, yet the impact of class label options in zero-shot classification has been largely overlooked. This study presents the first comprehensive empirical study investigating how label option (e.g., lexical choice, order, and elaboration) influences zero-shot ICL classification performance. Our findings reveal that lexical choices for label names (e.g., agree vs.support in stance classification) play an important role, with effects also linked to label orders. An analysis of the model internal states further shows that optimal label names tend to activate fewer outlier neurons in the feed forward network. Based on this observation, we propose Label set Optimization via Activation Distribution kurtosiS (LOADS), a post-hoc approach requiring no gradient propagation. LOADS not only demonstrates effectiveness with only 100 unlabelled samples across different model types and sizes, but also shows cross-lingual transferability.
Published: 2024

7. Attention Is All You Need for LLM-based Code Vulnerability Localization

Author: Li, Yue, Li, Xiao, Wu, Hao, Zhang, Yue, Cheng, Xiuzhen, Zhong, Sheng, and Xu, Fengyuan
Subjects: Computer Science - Cryptography and Security
Abstract: The rapid expansion of software systems and the growing number of reported vulnerabilities have emphasized the importance of accurately identifying vulnerable code segments. Traditional methods for vulnerability localization, such as manual code audits or rule-based tools, are often time-consuming and limited in scope, typically focusing on specific programming languages or types of vulnerabilities. In recent years, the introduction of large language models (LLMs) such as GPT and LLaMA has opened new possibilities for automating vulnerability detection. However, while LLMs show promise in this area, they face challenges, particularly in maintaining accuracy over longer code contexts. This paper introduces LOVA, a novel framework leveraging the self-attention mechanisms inherent in LLMs to enhance vulnerability localization. Our key insight is that self-attention mechanisms assign varying importance to different parts of the input, making it possible to track how much attention the model focuses on specific lines of code. In the context of vulnerability localization, the hypothesis is that vulnerable lines of code will naturally attract higher attention weights because they have a greater influence on the model's output. By systematically tracking changes in attention weights and focusing on specific lines of code, LOVA improves the precision of identifying vulnerable lines across various programming languages. Through rigorous experimentation and evaluation, we demonstrate that LOVA significantly outperforms existing LLM-based approaches, achieving up to a 5.3x improvement in F1-scores. LOVA also demonstrated strong scalability, with up to a 14.6x improvement in smart contract vulnerability localization across languages like C, Python, Java, and Solidity. Its robustness was proven through consistent performance across different LLM architectures.
Published: 2024

8. Standardizing Generative Face Video Compression using Supplemental Enhancement Information

Author: Chen, Bolin, Ye, Yan, Chen, Jie, Liao, Ru-Ling, Yin, Shanzhi, Wang, Shiqi, Yang, Kaifa, Li, Yue, Xu, Yiling, Wang, Ye-Kui, Gehlot, Shiv, Su, Guan-Ming, Yin, Peng, McCarthy, Sean, and Sullivan, Gary J.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper proposes a Generative Face Video Compression (GFVC) approach using Supplemental Enhancement Information (SEI), where a series of compact spatial and temporal representations of a face video signal (i.e., 2D/3D key-points, facial semantics and compact features) can be coded using SEI message and inserted into the coded video bitstream. At the time of writing, the proposed GFVC approach is an official "technology under consideration" (TuC) for standardization by the Joint Video Experts Team (JVET) of ISO/IEC JVT 1/SC 29 and ITU-T SG16. To the best of the authors' knowledge, the JVET work on the proposed SEI-based GFVC approach is the first standardization activity for generative video compression. The proposed SEI approach has not only advanced the reconstruction quality of early-day Model-Based Coding (MBC) via the state-of-the-art generative technique, but also established a new SEI definition for future GFVC applications and deployment. Experimental results illustrate that the proposed SEI-based GFVC approach can achieve remarkable rate-distortion performance compared with the latest Versatile Video Coding (VVC) standard, whilst also potentially enabling a wide variety of functionalities including user-specified animation/filtering and metaverse-related applications.
Published: 2024

9. MixEHR-Nest: Identifying Subphenotypes within Electronic Health Records through Hierarchical Guided-Topic Modeling

Author: Wang, Ruohan, Wang, Zilong, Song, Ziyang, Buckeridge, David, and Li, Yue
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Quantitative Biology - Quantitative Methods, J.3
Abstract: Automatic subphenotyping from electronic health records (EHRs)provides numerous opportunities to understand diseases with unique subgroups and enhance personalized medicine for patients. However, existing machine learning algorithms either focus on specific diseases for better interpretability or produce coarse-grained phenotype topics without considering nuanced disease patterns. In this study, we propose a guided topic model, MixEHR-Nest, to infer sub-phenotype topics from thousands of disease using multi-modal EHR data. Specifically, MixEHR-Nest detects multiple subtopics from each phenotype topic, whose prior is guided by the expert-curated phenotype concepts such as Phenotype Codes (PheCodes) or Clinical Classification Software (CCS) codes. We evaluated MixEHR-Nest on two EHR datasets: (1) the MIMIC-III dataset consisting of over 38 thousand patients from intensive care unit (ICU) from Beth Israel Deaconess Medical Center (BIDMC) in Boston, USA; (2) the healthcare administrative database PopHR, comprising 1.3 million patients from Montreal, Canada. Experimental results demonstrate that MixEHR-Nest can identify subphenotypes with distinct patterns within each phenotype, which are predictive for disease progression and severity. Consequently, MixEHR-Nest distinguishes between type 1 and type 2 diabetes by inferring subphenotypes using CCS codes, which do not differentiate these two subtype concepts. Additionally, MixEHR-Nest not only improved the prediction accuracy of short-term mortality of ICU patients and initial insulin treatment in diabetic patients but also revealed the contributions of subphenotypes. For longitudinal analysis, MixEHR-Nest identified subphenotypes of distinct age prevalence under the same phenotypes, such as asthma, leukemia, epilepsy, and depression. The MixEHR-Nest software is available at GitHub: https://github.com/li-lab-mcgill/MixEHR-Nest.
Published: 2024

10. Releasing the Parameter Latency of Neural Representation for High-Efficiency Video Compression

Author: Zhang, Gai, Zhang, Xinfeng, Tang, Lv, Li, Yue, Zhang, Kai, and Zhang, Li
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia
Abstract: For decades, video compression technology has been a prominent research area. Traditional hybrid video compression framework and end-to-end frameworks continue to explore various intra- and inter-frame reference and prediction strategies based on discrete transforms and deep learning techniques. However, the emerging implicit neural representation (INR) technique models entire videos as basic units, automatically capturing intra-frame and inter-frame correlations and obtaining promising performance. INR uses a compact neural network to store video information in network parameters, effectively eliminating spatial and temporal redundancy in the original video. However, in this paper, our exploration and verification reveal that current INR video compression methods do not fully exploit their potential to preserve information. We investigate the potential of enhancing network parameter storage through parameter reuse. By deepening the network, we designed a feasible INR parameter reuse scheme to further improve compression performance. Extensive experimental results show that our method significantly enhances the rate-distortion performance of INR video compression.
Published: 2024

11. TrajGPT: Irregular Time-Series Representation Learning for Health Trajectory Analysis

Author: Song, Ziyang, Lu, Qingcheng, Zhu, He, Buckeridge, David, and Li, Yue
Subjects: Computer Science - Machine Learning
Abstract: In many domains, such as healthcare, time-series data is often irregularly sampled with varying intervals between observations. This poses challenges for classical time-series models that require equally spaced data. To address this, we propose a novel time-series Transformer called Trajectory Generative Pre-trained Transformer (TrajGPT). TrajGPT employs a novel Selective Recurrent Attention (SRA) mechanism, which utilizes a data-dependent decay to adaptively filter out irrelevant past information based on contexts. By interpreting TrajGPT as discretized ordinary differential equations (ODEs), it effectively captures the underlying continuous dynamics and enables time-specific inference for forecasting arbitrary target timesteps. Experimental results demonstrate that TrajGPT excels in trajectory forecasting, drug usage prediction, and phenotype classification without requiring task-specific fine-tuning. By evolving the learned continuous dynamics, TrajGPT can interpolate and extrapolate disease risk trajectories from partially-observed time series. The visualization of predicted health trajectories shows that TrajGPT forecasts unseen diseases based on the history of clinically relevant phenotypes (i.e., contexts)., Comment: 9 pages
Published: 2024

12. The no boundary density matrix

Author: Ivo, Victor, Li, Yue-Zhou, and Maldacena, Juan
Subjects: High Energy Physics - Theory, General Relativity and Quantum Cosmology
Abstract: We discuss a no-boundary proposal for a subregion of the universe. In the classical approximation, this density matrix involves finding a specific classical solution of the equations of motion with no boundary. Beyond the usual no boundary condition at early times, we also have another no boundary condition in the region we trace out. We can find the prescription by starting from the usual Hartle-Hawking proposal for the wavefunction on a full slice and tracing out the unobserved region in the classical approximation. We discuss some specific subregions and compute the corresponding solutions. These geometries lead to phenomenologically unacceptable probabilities, as expected. We also discuss how the usual Coleman de Luccia bubble solutions can be interpreted as a possible no boundary contribution to the density matrix of the universe. These geometries lead to local (but not global) maxima of the probability that are phenomenologically acceptable., Comment: 35+16 pages, 22 figures; one paragraph added and refs added
Published: 2024

13. Generalizable Non-Line-of-Sight Imaging with Learnable Physical Priors

Author: Sun, Shida, Li, Yue, Zhang, Yueyi, and Xiong, Zhiwei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Non-line-of-sight (NLOS) imaging, recovering the hidden volume from indirect reflections, has attracted increasing attention due to its potential applications. Despite promising results, existing NLOS reconstruction approaches are constrained by the reliance on empirical physical priors, e.g., single fixed path compensation. Moreover, these approaches still possess limited generalization ability, particularly when dealing with scenes at a low signal-to-noise ratio (SNR). To overcome the above problems, we introduce a novel learning-based solution, comprising two key designs: Learnable Path Compensation (LPC) and Adaptive Phasor Field (APF). The LPC applies tailored path compensation coefficients to adapt to different objects in the scene, effectively reducing light wave attenuation, especially in distant regions. Meanwhile, the APF learns the precise Gaussian window of the illumination function for the phasor field, dynamically selecting the relevant spectrum band of the transient measurement. Experimental validations demonstrate that our proposed approach, only trained on synthetic data, exhibits the capability to seamlessly generalize across various real-world datasets captured by different imaging systems and characterized by low SNRs.
Published: 2024

14. GeSn 320 \times 256 Focal Plane Array for Silicon-Based Short-wave Infrared Imaging

Author: Xu, Guoyin, Cong, Hui, Li, Yue, Wu, Zhengjie, Fu, Fenghe, Chen, Ping, Zhao, Chao, Xu, Chi, and Xue, Chunlai
Subjects: Physics - Optics, Physics - Applied Physics
Abstract: Short-wave infrared (SWIR) imaging arrays have demonstrated great potential in applications spanning from military to civilian consumer electronics. However, the current focal plane arrays (FPAs), which are based on compound semiconductors, have limited applications in civilian circumstances due to elevated manufacturing costs and prolonged fabrication cycle time. To address this, a high-performance 320 $\times$ 256 focal plane array based on group-IV semiconductors has been designed and manufactured on a Si substrate using a complementary metal-oxide semiconductor (CMOS) compatible fabrication process. The optical absorption layer is composed of GeSn alloy, whose bandgap could be tailored by choosing the appropriate Sn concentration. In this work, a 10% Sn concentration was employed, yielding a response cutoff wavelength of 2308 nm for the Si-based photodetector, which was measured at 298 K. Moreover, a specific detectivity of 9.7 $\times$ 10$^{11}$ cm$\cdot$ Hz$^{1/2}$ $\cdot$ W$^{-1}$ has been achieved at 77 K, surpassing all previously reported GeSn devices, and rivals commercial extended InGaAs photodetectors. With the help of read-out circuits (ROIC), SWIR images have been successfully captured for the first time by using Si-based GeSn FPA. This work demonstrates the potential of group IV imaging arrays for various applications in the commercial SWIR imaging field.
Published: 2024

15. Spectral oversubtraction? An approach for speech enhancement after robot ego speech filtering in semi-real-time

Author: Li, Yue, Hindriks, Koen V., and Kunneman, Florian A.
Subjects: Computer Science - Robotics, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, 68T50
Abstract: Spectral subtraction, widely used for its simplicity, has been employed to address the Robot Ego Speech Filtering (RESF) problem for detecting speech contents of human interruption from robot's single-channel microphone recordings when it is speaking. However, this approach suffers from oversubtraction in the fundamental frequency range (FFR), leading to degraded speech content recognition. To address this, we propose a Two-Mask Conformer-based Metric Generative Adversarial Network (CMGAN) to enhance the detected speech and improve recognition results. Our model compensates for oversubtracted FFR values with high-frequency information and long-term features and then de-noises the new spectrogram. In addition, we introduce an incremental processing method that allows semi-real-time audio processing with streaming input on a network trained on long fixed-length input. Evaluations of two datasets, including one with unseen noise, demonstrate significant improvements in recognition accuracy and the effectiveness of the proposed two-mask approach and incremental processing, enhancing the robustness of the proposed RESF pipeline in real-world HRI scenarios., Comment: 6 pages, 2 figures, submitted to 2025 IEEE ICASSP
Published: 2024

16. Cell-ontology guided transcriptome foundation model

Author: Yuan, Xinyu, Zhan, Zhihao, Zhang, Zuobai, Zhou, Manqi, Zhao, Jianan, Han, Boyu, Li, Yue, and Tang, Jian
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Transcriptome foundation models TFMs hold great promises of deciphering the transcriptomic language that dictate diverse cell functions by self-supervised learning on large-scale single-cell gene expression data, and ultimately unraveling the complex mechanisms of human diseases. However, current TFMs treat cells as independent samples and ignore the taxonomic relationships between cell types, which are available in cell ontology graphs. We argue that effectively leveraging this ontology information during the TFM pre-training can improve learning biologically meaningful gene co-expression patterns while preserving TFM as a general purpose foundation model for downstream zero-shot and fine-tuning tasks. To this end, we present \textbf{s}ingle \textbf{c}ell, \textbf{Cell}-\textbf{o}ntology guided TFM scCello. We introduce cell-type coherence loss and ontology alignment loss, which are minimized along with the masked gene expression prediction loss during the pre-training. The novel loss component guide scCello to learn the cell-type-specific representation and the structural relation between cell types from the cell ontology graph, respectively. We pre-trained scCello on 22 million cells from CellxGene database leveraging their cell-type labels mapped to the cell ontology graph from Open Biological and Biomedical Ontology Foundry. Our TFM demonstrates competitive generalization and transferability performance over the existing TFMs on biologically important tasks including identifying novel cell types of unseen cells, prediction of cell-type-specific marker genes, and cancer drug responses., Comment: All anonymous reviewers' constructive suggestions are appreciated. The next version will be updated soon
Published: 2024

17. ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining

Author: Ma, Qi, Li, Yue, Ren, Bin, Sebe, Nicu, Konukoglu, Ender, Gevers, Theo, Van Gool, Luc, and Paudel, Danda Pani
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: 3D Gaussian Splatting (3DGS) has become the de facto method of 3D representation in many vision tasks. This calls for the 3D understanding directly in this representation space. To facilitate the research in this direction, we first build a large-scale dataset of 3DGS using the commonly used ShapeNet and ModelNet datasets. Our dataset ShapeSplat consists of 65K objects from 87 unique categories, whose labels are in accordance with the respective datasets. The creation of this dataset utilized the compute equivalent of 2 GPU years on a TITAN XP GPU. We utilize our dataset for unsupervised pretraining and supervised finetuning for classification and segmentation tasks. To this end, we introduce \textbf{\textit{Gaussian-MAE}}, which highlights the unique benefits of representation learning from Gaussian parameters. Through exhaustive experiments, we provide several valuable insights. In particular, we show that (1) the distribution of the optimized GS centroids significantly differs from the uniformly sampled point cloud (used for initialization) counterpart; (2) this change in distribution results in degradation in classification but improvement in segmentation tasks when using only the centroids; (3) to leverage additional Gaussian parameters, we propose Gaussian feature grouping in a normalized feature space, along with splats pooling layer, offering a tailored solution to effectively group and embed similar Gaussians, which leads to notable improvement in finetuning tasks.
Published: 2024

18. LoopSplat: Loop Closure by Registering 3D Gaussian Splats

Author: Zhu, Liyuan, Li, Yue, Sandström, Erik, Huang, Shengyu, Schindler, Konrad, and Armeni, Iro
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: Simultaneous Localization and Mapping (SLAM) based on 3D Gaussian Splats (3DGS) has recently shown promise towards more accurate, dense 3D scene maps. However, existing 3DGS-based methods fail to address the global consistency of the scene via loop closure and/or global bundle adjustment. To this end, we propose LoopSplat, which takes RGB-D images as input and performs dense mapping with 3DGS submaps and frame-to-model tracking. LoopSplat triggers loop closure online and computes relative loop edge constraints between submaps directly via 3DGS registration, leading to improvements in efficiency and accuracy over traditional global-to-local point cloud registration. It uses a robust pose graph optimization formulation and rigidly aligns the submaps to achieve global consistency. Evaluation on the synthetic Replica and real-world TUM-RGBD, ScanNet, and ScanNet++ datasets demonstrates competitive or superior tracking, mapping, and rendering compared to existing methods for dense RGB-D SLAM. Code is available at loopsplat.github.io., Comment: Project page: https://loopsplat.github.io/
Published: 2024

19. Gravity and a universal cutoff for field theory

Author: Caron-Huot, Simon and Li, Yue-Zhou
Subjects: High Energy Physics - Theory
Abstract: We analyze the one-loop effects of massive fields on 2-to-2 scattering processes involving gravitons. It has been suggested that in the presence of gravity, any local effective field theory description must break down at the "species scale". We first observe that unitarity and analyticity of the amplitude indeed imply a species-type bound $G\Lambda^{d-2}N\leq O(1)$, where $N$ counts parametrically light species and $\Lambda$ is an energy scale above which new unknown ingredients must modify the graviton amplitude. To clarify what happens at this scale, we contrast the partial wave decomposition of calculated amplitudes with that of some ultraviolet scenarios: string theory and strongly interacting Planck-scale physics. Observing that the latter exhibit a markedly stronger high-spin content, we define nonperturbatively the high-spin onset scale $\Lambda_{\rm o}$, which coincides with the string scale and higher-dimensional Planck scale in respective examples. We argue that, generally, no local field description can exist at distances shorter than $1/\Lambda_{\rm o}$., Comment: 44+9 pages, 17 figures; refs added, figure 12 updated
Published: 2024

20. Global weak solutions to a fractional Cahn-Hilliard cross-diffusion system in lymphangiogenesis

Author: Jüngel, Ansgar and Li, Yue
Subjects: Mathematics - Analysis of PDEs, 35D30, 35K35, 35K65, 35K67, 92C37
Abstract: A spectral-fractional Cahn-Hilliard cross-diffusion system, which describes the pre-patterning of lymphatic vessel morphology in collagen gels, is studied. The model consists of two higher-order quasilinear parabolic equations and describes the evolution of the fiber phase volume fraction and the solute concentration. The free energy consists of the nonconvex Flory-Huggins energy and a fractional gradient energy, modeling nonlocal long-range correlations. The existence of global weak solutions to this system in a bounded domain with no-flux boundary conditions is shown. The proof is based on a three-level approximation scheme, spectral-fractional calculus, and a priori estimates coming from the energy inequality.
Published: 2024

21. Moment&Cross: Next-Generation Real-Time Cross-Domain CTR Prediction for Live-Streaming Recommendation at Kuaishou

Author: Cao, Jiangxia, Wang, Shen, Li, Yue, Wang, Shenghui, Tang, Jian, Wang, Shiyao, Yang, Shuang, Liu, Zhaojie, and Zhou, Guorui
Subjects: Computer Science - Information Retrieval
Abstract: Kuaishou, is one of the largest short-video and live-streaming platform, compared with short-video recommendations, live-streaming recommendation is more complex because of: (1) temporarily-alive to distribution, (2) user may watch for a long time with feedback delay, (3) content is unpredictable and changes over time. Actually, even if a user is interested in the live-streaming author, it still may be an negative watching (e.g., short-view < 3s) since the real-time content is not attractive enough. Therefore, for live-streaming recommendation, there exists a challenging task: how do we recommend the live-streaming at right moment for users? Additionally, our platform's major exposure content is short short-video, and the amount of exposed short-video is 9x more than exposed live-streaming. Thus users will leave more behaviors on short-videos, which leads to a serious data imbalance problem making the live-streaming data could not fully reflect user interests. In such case, there raises another challenging task: how do we utilize users' short-video behaviors to make live-streaming recommendation better?, Comment: Work in progress
Published: 2024

22. CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models

Author: Wan, Shengye, Nikolaidis, Cyrus, Song, Daniel, Molnar, David, Crnkovich, James, Grace, Jayson, Bhatt, Manish, Chennabasappa, Sahana, Whitman, Spencer, Ding, Stephanie, Ionescu, Vlad, Li, Yue, and Saxe, Joshua
Subjects: Computer Science - Cryptography and Security, Computer Science - Machine Learning
Abstract: We are releasing a new suite of security benchmarks for LLMs, CYBERSECEVAL 3, to continue the conversation on empirically measuring LLM cybersecurity risks and capabilities. CYBERSECEVAL 3 assesses 8 different risks across two broad categories: risk to third parties, and risk to application developers and end users. Compared to previous work, we add new areas focused on offensive security capabilities: automated social engineering, scaling manual offensive cyber operations, and autonomous offensive cyber operations. In this paper we discuss applying these benchmarks to the Llama 3 models and a suite of contemporaneous state-of-the-art LLMs, enabling us to contextualize risks both with and without mitigations in place.
Published: 2024

23. Coexistence of large anomalous Hall effect and topological magnetic skyrmions in a Weyl nodal ring ferromagnet Mn5Ge3

Author: Li, Hang, Zhou, Feng, Ding, Bei, Chen, Jie, Song, Linxuan, Yang, Wenyun, Lau, Yong-Chang, Yang, Jinbo, Li, Yue, Jiang, Yong, and Wang, Wenhong
Subjects: Condensed Matter - Materials Science
Abstract: Topological magnetic materials are expected to show multiple transport responses because of their unusual bulk electronic topology in momentum space and topological spin texture in real space. However, such multiple topological properties-hosting materials are rare in nature. In this work, we reveal the coexistence of a large tunable anomalous Hall effect and topological magnetic skyrmions in a Weyl nodal ring ferromagnet Mn5Ge3, by using electrical transport and Lorentz transmission electronic microscope (TEM) measurements. It was found that the intrinsic anomalous Hall conductivity (AHC) can reach up to 979.7 S/cm with current along [120] and magnetic field along [001] of the Mn5Ge3 single crystals. Our theoretical calculations reveal that the large AHC is closely related with two Weyl nodal rings in band structure near the Fermi level and is strongly modified by the content of Ge. Moreover, our Lorentz-TEM images and micromagnetic simulation results, together with the sizable topological Hall effect clearly point to the robust formation of magnetic skyrmions over a wide temperature-magnetic field region. These results prove Mn5Ge3 as a rare magnetic topological nodal-line semimetal with great significance to explore novel multiple topological phenomena, which facilitates the development of spintronics., Comment: 38 pages, 22 figures
Published: 2024

24. The Llama 3 Herd of Models

Author: Dubey, Abhimanyu, Jauhri, Abhinav, Pandey, Abhinav, Kadian, Abhishek, Al-Dahle, Ahmad, Letman, Aiesha, Mathur, Akhil, Schelten, Alan, Yang, Amy, Fan, Angela, Goyal, Anirudh, Hartshorn, Anthony, Yang, Aobo, Mitra, Archi, Sravankumar, Archie, Korenev, Artem, Hinsvark, Arthur, Rao, Arun, Zhang, Aston, Rodriguez, Aurelien, Gregerson, Austen, Spataru, Ava, Roziere, Baptiste, Biron, Bethany, Tang, Binh, Chern, Bobbie, Caucheteux, Charlotte, Nayak, Chaya, Bi, Chloe, Marra, Chris, McConnell, Chris, Keller, Christian, Touret, Christophe, Wu, Chunyang, Wong, Corinne, Ferrer, Cristian Canton, Nikolaidis, Cyrus, Allonsius, Damien, Song, Daniel, Pintz, Danielle, Livshits, Danny, Esiobu, David, Choudhary, Dhruv, Mahajan, Dhruv, Garcia-Olano, Diego, Perino, Diego, Hupkes, Dieuwke, Lakomkin, Egor, AlBadawy, Ehab, Lobanova, Elina, Dinan, Emily, Smith, Eric Michael, Radenovic, Filip, Zhang, Frank, Synnaeve, Gabriel, Lee, Gabrielle, Anderson, Georgia Lewis, Nail, Graeme, Mialon, Gregoire, Pang, Guan, Cucurell, Guillem, Nguyen, Hailey, Korevaar, Hannah, Xu, Hu, Touvron, Hugo, Zarov, Iliyan, Ibarra, Imanol Arrieta, Kloumann, Isabel, Misra, Ishan, Evtimov, Ivan, Copet, Jade, Lee, Jaewon, Geffert, Jan, Vranes, Jana, Park, Jason, Mahadeokar, Jay, Shah, Jeet, van der Linde, Jelmer, Billock, Jennifer, Hong, Jenny, Lee, Jenya, Fu, Jeremy, Chi, Jianfeng, Huang, Jianyu, Liu, Jiawen, Wang, Jie, Yu, Jiecao, Bitton, Joanna, Spisak, Joe, Park, Jongsoo, Rocca, Joseph, Johnstun, Joshua, Saxe, Joshua, Jia, Junteng, Alwala, Kalyan Vasuden, Upasani, Kartikeya, Plawiak, Kate, Li, Ke, Heafield, Kenneth, Stone, Kevin, El-Arini, Khalid, Iyer, Krithika, Malik, Kshitiz, Chiu, Kuenley, Bhalla, Kunal, Rantala-Yeary, Lauren, van der Maaten, Laurens, Chen, Lawrence, Tan, Liang, Jenkins, Liz, Martin, Louis, Madaan, Lovish, Malo, Lubo, Blecher, Lukas, Landzaat, Lukas, de Oliveira, Luke, Muzzi, Madeline, Pasupuleti, Mahesh, Singh, Mannat, Paluri, Manohar, Kardas, Marcin, Oldham, Mathew, Rita, Mathieu, Pavlova, Maya, Kambadur, Melanie, Lewis, Mike, Si, Min, Singh, Mitesh Kumar, Hassan, Mona, Goyal, Naman, Torabi, Narjes, Bashlykov, Nikolay, Bogoychev, Nikolay, Chatterji, Niladri, Duchenne, Olivier, Çelebi, Onur, Alrassy, Patrick, Zhang, Pengchuan, Li, Pengwei, Vasic, Petar, Weng, Peter, Bhargava, Prajjwal, Dubal, Pratik, Krishnan, Praveen, Koura, Punit Singh, Xu, Puxin, He, Qing, Dong, Qingxiao, Srinivasan, Ragavan, Ganapathy, Raj, Calderer, Ramon, Cabral, Ricardo Silveira, Stojnic, Robert, Raileanu, Roberta, Girdhar, Rohit, Patel, Rohit, Sauvestre, Romain, Polidoro, Ronnie, Sumbaly, Roshan, Taylor, Ross, Silva, Ruan, Hou, Rui, Wang, Rui, Hosseini, Saghar, Chennabasappa, Sahana, Singh, Sanjay, Bell, Sean, Kim, Seohyun Sonia, Edunov, Sergey, Nie, Shaoliang, Narang, Sharan, Raparthy, Sharath, Shen, Sheng, Wan, Shengye, Bhosale, Shruti, Zhang, Shun, Vandenhende, Simon, Batra, Soumya, Whitman, Spencer, Sootla, Sten, Collot, Stephane, Gururangan, Suchin, Borodinsky, Sydney, Herman, Tamar, Fowler, Tara, Sheasha, Tarek, Georgiou, Thomas, Scialom, Thomas, Speckbacher, Tobias, Mihaylov, Todor, Xiao, Tong, Karn, Ujjwal, Goswami, Vedanuj, Gupta, Vibhor, Ramanathan, Vignesh, Kerkez, Viktor, Gonguet, Vincent, Do, Virginie, Vogeti, Vish, Petrovic, Vladan, Chu, Weiwei, Xiong, Wenhan, Fu, Wenyin, Meers, Whitney, Martinet, Xavier, Wang, Xiaodong, Tan, Xiaoqing Ellen, Xie, Xinfeng, Jia, Xuchao, Wang, Xuewei, Goldschlag, Yaelle, Gaur, Yashesh, Babaei, Yasmine, Wen, Yi, Song, Yiwen, Zhang, Yuchen, Li, Yue, Mao, Yuning, Coudert, Zacharie Delpierre, Yan, Zheng, Chen, Zhengxing, Papakipos, Zoe, Singh, Aaditya, Grattafiori, Aaron, Jain, Abha, Kelsey, Adam, Shajnfeld, Adam, Gangidi, Adithya, Victoria, Adolfo, Goldstand, Ahuva, Menon, Ajay, Sharma, Ajay, Boesenberg, Alex, Vaughan, Alex, Baevski, Alexei, Feinstein, Allie, Kallet, Amanda, Sangani, Amit, Yunus, Anam, Lupu, Andrei, Alvarado, Andres, Caples, Andrew, Gu, Andrew, Ho, Andrew, Poulton, Andrew, Ryan, Andrew, Ramchandani, Ankit, Franco, Annie, Saraf, Aparajita, Chowdhury, Arkabandhu, Gabriel, Ashley, Bharambe, Ashwin, Eisenman, Assaf, Yazdan, Azadeh, James, Beau, Maurer, Ben, Leonhardi, Benjamin, Huang, Bernie, Loyd, Beth, De Paola, Beto, Paranjape, Bhargavi, Liu, Bing, Wu, Bo, Ni, Boyu, Hancock, Braden, Wasti, Bram, Spence, Brandon, Stojkovic, Brani, Gamido, Brian, Montalvo, Britt, Parker, Carl, Burton, Carly, Mejia, Catalina, Wang, Changhan, Kim, Changkyu, Zhou, Chao, Hu, Chester, Chu, Ching-Hsiang, Cai, Chris, Tindal, Chris, Feichtenhofer, Christoph, Civin, Damon, Beaty, Dana, Kreymer, Daniel, Li, Daniel, Wyatt, Danny, Adkins, David, Xu, David, Testuggine, Davide, David, Delia, Parikh, Devi, Liskovich, Diana, Foss, Didem, Wang, Dingkang, Le, Duc, Holland, Dustin, Dowling, Edward, Jamil, Eissa, Montgomery, Elaine, Presani, Eleonora, Hahn, Emily, Wood, Emily, Brinkman, Erik, Arcaute, Esteban, Dunbar, Evan, Smothers, Evan, Sun, Fei, Kreuk, Felix, Tian, Feng, Ozgenel, Firat, Caggioni, Francesco, Guzmán, Francisco, Kanayet, Frank, Seide, Frank, Florez, Gabriela Medina, Schwarz, Gabriella, Badeer, Gada, Swee, Georgia, Halpern, Gil, Thattai, Govind, Herman, Grant, Sizov, Grigory, Guangyi, Zhang, Lakshminarayanan, Guna, Shojanazeri, Hamid, Zou, Han, Wang, Hannah, Zha, Hanwen, Habeeb, Haroun, Rudolph, Harrison, Suk, Helen, Aspegren, Henry, Goldman, Hunter, Damlaj, Ibrahim, Molybog, Igor, Tufanov, Igor, Veliche, Irina-Elena, Gat, Itai, Weissman, Jake, Geboski, James, Kohli, James, Asher, Japhet, Gaya, Jean-Baptiste, Marcus, Jeff, Tang, Jeff, Chan, Jennifer, Zhen, Jenny, Reizenstein, Jeremy, Teboul, Jeremy, Zhong, Jessica, Jin, Jian, Yang, Jingyi, Cummings, Joe, Carvill, Jon, Shepard, Jon, McPhie, Jonathan, Torres, Jonathan, Ginsburg, Josh, Wang, Junjie, Wu, Kai, U, Kam Hou, Saxena, Karan, Prasad, Karthik, Khandelwal, Kartikay, Zand, Katayoun, Matosich, Kathy, Veeraraghavan, Kaushik, Michelena, Kelly, Li, Keqian, Huang, Kun, Chawla, Kunal, Lakhotia, Kushal, Huang, Kyle, Chen, Lailin, Garg, Lakshya, A, Lavender, Silva, Leandro, Bell, Lee, Zhang, Lei, Guo, Liangpeng, Yu, Licheng, Moshkovich, Liron, Wehrstedt, Luca, Khabsa, Madian, Avalani, Manav, Bhatt, Manish, Tsimpoukelli, Maria, Mankus, Martynas, Hasson, Matan, Lennie, Matthew, Reso, Matthias, Groshev, Maxim, Naumov, Maxim, Lathi, Maya, Keneally, Meghan, Seltzer, Michael L., Valko, Michal, Restrepo, Michelle, Patel, Mihir, Vyatskov, Mik, Samvelyan, Mikayel, Clark, Mike, Macey, Mike, Wang, Mike, Hermoso, Miquel Jubert, Metanat, Mo, Rastegari, Mohammad, Bansal, Munish, Santhanam, Nandhini, Parks, Natascha, White, Natasha, Bawa, Navyata, Singhal, Nayan, Egebo, Nick, Usunier, Nicolas, Laptev, Nikolay Pavlovich, Dong, Ning, Zhang, Ning, Cheng, Norman, Chernoguz, Oleg, Hart, Olivia, Salpekar, Omkar, Kalinli, Ozlem, Kent, Parkin, Parekh, Parth, Saab, Paul, Balaji, Pavan, Rittner, Pedro, Bontrager, Philip, Roux, Pierre, Dollar, Piotr, Zvyagina, Polina, Ratanchandani, Prashant, Yuvraj, Pritish, Liang, Qian, Alao, Rachad, Rodriguez, Rachel, Ayub, Rafi, Murthy, Raghotham, Nayani, Raghu, Mitra, Rahul, Li, Raymond, Hogan, Rebekkah, Battey, Robin, Wang, Rocky, Maheswari, Rohan, Howes, Russ, Rinott, Ruty, Bondu, Sai Jayesh, Datta, Samyak, Chugh, Sara, Hunt, Sara, Dhillon, Sargun, Sidorov, Sasha, Pan, Satadru, Verma, Saurabh, Yamamoto, Seiji, Ramaswamy, Sharadh, Lindsay, Shaun, Feng, Sheng, Lin, Shenghao, Zha, Shengxin Cindy, Shankar, Shiva, Zhang, Shuqiang, Wang, Sinong, Agarwal, Sneha, Sajuyigbe, Soji, Chintala, Soumith, Max, Stephanie, Chen, Stephen, Kehoe, Steve, Satterfield, Steve, Govindaprasad, Sudarshan, Gupta, Sumit, Cho, Sungmin, Virk, Sunny, Subramanian, Suraj, Choudhury, Sy, Goldman, Sydney, Remez, Tal, Glaser, Tamar, Best, Tamara, Kohler, Thilo, Robinson, Thomas, Li, Tianhe, Zhang, Tianjun, Matthews, Tim, Chou, Timothy, Shaked, Tzook, Vontimitta, Varun, Ajayi, Victoria, Montanez, Victoria, Mohan, Vijai, Kumar, Vinay Satish, Mangla, Vishal, Albiero, Vítor, Ionescu, Vlad, Poenaru, Vlad, Mihailescu, Vlad Tiberiu, Ivanov, Vladimir, Li, Wei, Wang, Wenchen, Jiang, Wenwen, Bouaziz, Wes, Constable, Will, Tang, Xiaocheng, Wang, Xiaofang, Wu, Xiaojian, Wang, Xiaolan, Xia, Xide, Wu, Xilun, Gao, Xinbo, Chen, Yanjun, Hu, Ye, Jia, Ye, Qi, Ye, Li, Yenda, Zhang, Yilin, Zhang, Ying, Adi, Yossi, Nam, Youngjin, Yu, Wang, Hao, Yuchen, Qian, Yundi, He, Yuzi, Rait, Zach, DeVito, Zachary, Rosnbrick, Zef, Wen, Zhaoduo, Yang, Zhenyu, and Zhao, Zhiwei
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.
Published: 2024

25. The Second Joint Workshop on Cross Reality

Author: Wang, Nanjia, Li, Yue, Chiossi, Francesco, Pointecker, Fabian, Zhao, Lixiang, and Zielasko, Daniel
Subjects: Computer Science - Human-Computer Interaction
Abstract: The 2nd Joint Workshop on Cross Reality (JWCR'24), organized as part of ISMAR 2024, seeks to explore the burgeoning field of Cross Reality (CR), which encompasses the seamless integration and transition between various points on the reality-virtuality continuum (RVC) such as Virtual Reality (VR), Augmented Virtuality (AV), and Augmented Reality (AR). This hybrid workshop aims to build upon the foundation laid by the inaugural JWCR at ISMAR 2023, which successfully unified diverse CR research communities. The workshop will address key themes including CR visualization, interaction, user behavior, design, development, engineering, and collaboration. CR Visualization focuses on creating and displaying spatial data across the RVC, enabling users to navigate and interpret information fluidly. CR Interaction delves into natural user engagements using gestures, voice commands, and other advanced techniques to enhance immersion. The study of CR User Behavior and Experience investigates how users perceive and interact within these hybrid environments. Furthermore, CR Design and Development emphasizes creating effective CR applications using innovative processes and tools, while CR Collaboration examines methods for fostering teamwork in mixed reality settings., Comment: 5 pages
Published: 2024

26. GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis

Author: Liu, Weizhi, Li, Yue, Lin, Dongdong, Tian, Hui, and Li, Haizhou
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Amid the burgeoning development of generative models like diffusion models, the task of differentiating synthesized audio from its natural counterpart grows more daunting. Deepfake detection offers a viable solution to combat this challenge. Yet, this defensive measure unintentionally fuels the continued refinement of generative models. Watermarking emerges as a proactive and sustainable tactic, preemptively regulating the creation and dissemination of synthesized content. Thus, this paper, as a pioneer, proposes the generative robust audio watermarking method (Groot), presenting a paradigm for proactively supervising the synthesized audio and its source diffusion models. In this paradigm, the processes of watermark generation and audio synthesis occur simultaneously, facilitated by parameter-fixed diffusion models equipped with a dedicated encoder. The watermark embedded within the audio can subsequently be retrieved by a lightweight decoder. The experimental results highlight Groot's outstanding performance, particularly in terms of robustness, surpassing that of the leading state-of-the-art methods. Beyond its impressive resilience against individual post-processing attacks, Groot exhibits exceptional robustness when facing compound attacks, maintaining an average watermark extraction accuracy of around 95%., Comment: Accepted by ACM MM 2024
Published: 2024

27. Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification

Author: Zhang, Li, Jiang, Ning, Wang, Qing, Li, Yue, Lu, Quan, and Xie, Lei
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Trained on 680,000 hours of massive speech data, Whisper is a multitasking, multilingual speech foundation model demonstrating superior performance in automatic speech recognition, translation, and language identification. However, its applicability in speaker verification (SV) tasks remains unexplored, particularly in low-data-resource scenarios where labeled speaker data in specific domains are limited. To fill this gap, we propose a lightweight adaptor framework to boost SV with Whisper, namely Whisper-SV. Given that Whisper is not specifically optimized for SV tasks, we introduce a representation selection module to quantify the speaker-specific characteristics contained in each layer of Whisper and select the top-k layers with prominent discriminative speaker features. To aggregate pivotal speaker-related features while diminishing non-speaker redundancies across the selected top-k distinct layers of Whisper, we design a multi-layer aggregation module in Whisper-SV to integrate multi-layer representations into a singular, compacted representation for SV. In the multi-layer aggregation module, we employ convolutional layers with shortcut connections among different layers to refine speaker characteristics derived from multi-layer representations from Whisper. In addition, an attention aggregation layer is used to reduce non-speaker interference and amplify speaker-specific cues for SV tasks. Finally, a simple classification module is used for speaker classification. Experiments on VoxCeleb1, FFSVC, and IMSV datasets demonstrate that Whisper-SV achieves EER/minDCF of 2.22%/0.307, 6.14%/0.488, and 7.50%/0.582, respectively, showing superior performance in low-data-resource SV scenarios.
Published: 2024

28. ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents

Author: Shen, Haiyang, Li, Yue, Meng, Desong, Cai, Dongqi, Qi, Sheng, Zhang, Li, Xu, Mengwei, and Ma, Yun
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence
Abstract: Recent advancements in integrating large language models (LLMs) with application programming interfaces (APIs) have gained significant interest in both academia and industry. These API-based agents, leveraging the strong autonomy and planning capabilities of LLMs, can efficiently solve problems requiring multi-step actions. However, their ability to handle multi-dimensional difficulty levels, diverse task types, and real-world demands through APIs remains unknown. In this paper, we introduce \textsc{ShortcutsBench}, a large-scale benchmark for the comprehensive evaluation of API-based agents in solving tasks with varying levels of difficulty, diverse task types, and real-world demands. \textsc{ShortcutsBench} includes a wealth of real APIs from Apple Inc.'s operating systems, refined user queries from shortcuts, human-annotated high-quality action sequences from shortcut developers, and accurate parameter filling values about primitive parameter types, enum parameter types, outputs from previous actions, and parameters that need to request necessary information from the system or user. Our extensive evaluation of agents built with $5$ leading open-source (size >= 57B) and $4$ closed-source LLMs (e.g. Gemini-1.5-Pro and GPT-3.5) reveals significant limitations in handling complex queries related to API selection, parameter filling, and requesting necessary information from systems and users. These findings highlight the challenges that API-based agents face in effectively fulfilling real and complex user queries. All datasets, code, and experimental results will be available at \url{https://github.com/eachsheep/shortcutsbench}.
Published: 2024

29. SCDNet: Self-supervised Learning Feature-based Speaker Change Detection

Author: Li, Yue, Wang, Xinsheng, Zhang, Li, and Xie, Lei
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: Speaker Change Detection (SCD) is to identify boundaries among speakers in a conversation. Motivated by the success of fine-tuning wav2vec 2.0 models for the SCD task, a further investigation of self-supervised learning (SSL) features for SCD is conducted in this work. Specifically, an SCD model, named SCDNet, is proposed. With this model, various state-of-the-art SSL models, including Hubert, wav2vec 2.0, and WavLm are investigated. To discern the most potent layer of SSL models for SCD, a learnable weighting method is employed to analyze the effectiveness of intermediate representations. Additionally, a fine-tuning-based approach is also implemented to further compare the characteristics of SSL models in the SCD task. Furthermore, a contrastive learning method is proposed to mitigate the overfitting tendencies in the training of both the fine-tuning-based method and SCDNet. Experiments showcase the superiority of WavLm in the SCD task and also demonstrate the good design of SCDNet.
Published: 2024

30. Calibrated absolute optical contrast for high-throughput characterization of horizontally aligned carbon nanotube arrays

Author: Li, Yue, Xie, Ying, Wang, Jianping, Xu, Yang, Wang, Shurui, Zhao, Yunbiao, Qian, Liu, Zhao, Ziqiang, and Zhang, Jin
Subjects: Physics - Applied Physics, Physics - Optics
Abstract: Horizontally aligned carbon nanotube (HACNT) arrays hold significant potential for various applications in nanoelectronics and material science. However, their high-throughput characterization remains challenging due to the lack of methods with both high efficiency and high accuracy. Here, we present a novel technique, Calibrated Absolute Optical Contrast (CAOC), achieved through the implementation of differential principles to filter out stray signals and high-resolution calibration to endow optical contrast with physical significance. CAOC offers major advantages over previous characterization techniques, providing consistent and reliable measurements of HACNT array density with high throughput and non-destructive assessment. To validate its utility, we demonstrate wafer-scale uniformity assessment by rapid density mapping. This technique not only facilitates the practical evaluation of HACNT arrays but also provides insights into balancing high throughput and high resolution in nanomaterial characterization.
Published: 2024

31. A Quantum Neural Network-Based Approach to Power Quality Disturbances Detection and Recognition

Author: Li, Guo-Dong, He, Hai-Yan, Li, Yue, Li, Xin-Hao, Liu, Hao, Wang, Qing-Le, and Cheng, Long
Subjects: Quantum Physics
Abstract: Power quality disturbances (PQDs) significantly impact the stability and reliability of power systems, necessitating accurate and efficient detection and recognition methods. While numerous classical algorithms for PQDs detection and recognition have been extensively studied and applied, related work in the quantum domain is still in its infancy. In this paper, an improved quantum neural networks (QNN) model for PQDs detection and recognition is proposed. Specifically, the model constructs a quantum circuit comprising data qubits and ancilla qubits. Classical data is transformed into quantum data by embedding it into data qubits via the encoding layer. Subsequently, parametric quantum gates are utilized to form the variational layer, which facilitates qubit information transformation, thereby extracting essential feature information for detection and recognition. The expected value is obtained by measuring ancilla qubits, enabling the completion of disturbance classification based on this expected value. An analysis reveals that the runtime and space complexities of the QNN are $O\left ( poly\left ( N \right ) \right )$ and $O\left ( N \right )$, respectively. Extensive experiments validate the feasibility and superiority of the proposed model in PQD detection and recognition. The model achieves accuracies of 99.75\%, 97.85\% and 95.5\% in experiments involving the detection of disturbances, recognition of seven single disturbances, and recognition of ten mixed disturbances, respectively. Additionally, noise simulation and comparative experiments demonstrate that the proposed model exhibits robust anti-noise capabilities, requires few training parameters, and maintains high accuracy.
Published: 2024

32. Use of the shearlet energy entropy and of the support vector machine classifier to process weak microseismic and desert seismic signals

Author: Li, Yue, Fan, Shiyu, Zhang, Chao, and Yang, Baojun
Subjects: Shearlet energy entropy, SVM, Microseismic signal, Desert seismic signal, Signal detection, Geophysics. Cosmic physics, QC801-809, Chemistry, QD1-999, Geology, QE1-996.5
Abstract: Low-amplitude signal detection is a key procedure in borehole microseismic and desert seismic exploration. Usually, signals are difficult to detect due to their low amplitude and noise contamination. To solve this problem, we propose a method combining shearlet energy entropy with a support vector machine (SVM) to detect low-amplitude signals. In the proposed method, the signal feature is extracted using shearlet energy entropy. The signal is more sparsely represented in the shearlet domain because of the multi-scale and multi-direction characteristic of the shearlet transform, which favours signal feature extraction. Furthermore, in calculating shearlet energy entropy, we use the correlation of shearlet coefficients to enhance the difference between signal and noise in the shearlet domain. Shearlet energy entropy makes the SVM achieve a more accurate classification result compared with other traditional features such as amplitude and energy. The results of synthetic and field data show that our method is more effective than the STA/LTA and the convolutional neural network for low-amplitude microseismic signal and desert seismic signal detection.
Published: 2020
Full Text: View/download PDF

33. Direct observation of twisted stacking domains in the van der Waals magnet CrI3.

Author: Jang, Myeongjin, Lee, Sol, Cantos-Prieto, Fernando, Košić, Ivona, Li, Yue, McCray, Arthur, Jung, Min-Hyoung, Yoon, Jun-Yeong, Boddapati, Loukya, Deepak, Francis, Jeong, Hu, Phatak, Charudatta, Santos, Elton, Navarro-Moratalla, Efrén, and Kim, Kwanpyo
Abstract: Van der Waals (vdW) stacking is a powerful technique to achieve desired properties in condensed matter systems through layer-by-layer crystal engineering. A remarkable example is the control over the twist angle between artificially-stacked vdW crystals, enabling the realization of unconventional phenomena in moiré structures ranging from superconductivity to strongly correlated magnetism. Here, we report the appearance of unusual 120° twisted faults in vdW magnet CrI3 crystals. In exfoliated samples, we observe vertical twisted domains with a thickness below 10 nm. The size and distribution of twisted domains strongly depend on the sample preparation methods, with as-synthesized unexfoliated samples showing tenfold thicker domains than exfoliated samples. Cooling induces changes in the relative populations among different twisting domains, rather than the previously assumed structural phase transition to the rhombohedral stacking. The stacking disorder induced by sample fabrication processes may explain the unresolved thickness-dependent magnetic coupling observed in CrI3.
Published: 2024

34. A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for Speech Interruption During Human-Robot Interaction

Author: Li, Yue, Kunneman, Florian A., and Hindriks, Koen V.
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: With current state-of-the-art automatic speech recognition (ASR) systems, it is not possible to transcribe overlapping speech audio streams separately. Consequently, when these ASR systems are used as part of a social robot like Pepper for interaction with a human, it is common practice to close the robot's microphone while it is talking itself. This prevents the human users to interrupt the robot, which limits speech-based human-robot interaction. To enable a more natural interaction which allows for such interruptions, we propose an audio processing pipeline for filtering out robot's ego speech using only a single-channel microphone. This pipeline takes advantage of the possibility to feed the robot ego speech signal, generated by a text-to-speech API, as training data into a machine learning model. The proposed pipeline combines a convolutional neural network and spectral subtraction to extract overlapping human speech from the audio recorded by the robot-embedded microphone. When evaluating on a held-out test set, we find that this pipeline outperforms our previous approach to this task, as well as state-of-the-art target speech extraction systems that were retrained on the same dataset. We have also integrated the proposed pipeline into a lightweight robot software development framework to make it available for broader use. As a step towards demonstrating the feasibility of deploying our pipeline, we use this framework to evaluate the effectiveness of the pipeline in a small lab-based feasibility pilot using the social robot Pepper. Our results show that when participants interrupt the robot, the pipeline can extract the participant's speech from one-second streaming audio buffers received by the robot-embedded single-channel microphone, hence in near-real time., Comment: 8 pages,16 figures, Under review by RoMan 2024 conference
Published: 2024

35. Two-dimensional signal-dependent parabolic-elliptic Keller-Segel system and its means field derivation

Author: Bol, Lukas, Chen, Li, and Li, Yue
Subjects: Mathematics - Analysis of PDEs
Abstract: In this paper, the well-posedness of two-dimensional signal-dependent Keller-Segel system and its mean field derivation from a interacting particle system on the whole space are investigated. The signal dependence effect is reflected by the fact that the diffusion coefficient in the particle system depends nonlinearly on the interactions between the individuals. Therefore, the mathematical challenge in studying the well-posedness of this system lies in the possible degeneracy and the aggregation effect when the concentration of signal becomes unbounded. The well-established method on bounded domain, to obtain the appropriate estimates for the signal concentration, is invalid for the whole space case. Motivated by the entropy minimization method and Onofri's inequality, which has been successfully applied for parabolic-parabolic Keller-Segel system, we establish a complete entropy estimate benefited from linear diffusion term, which plays important role in obtaining the Lp estimates for the solution. Furthermore, the upper bound for the concentration of signal is obtained. Based on estimates we obtained for the density of cells, the rigorous mean-field derivation is proved by introducing an intermediate particle system with a mollified interaction potential with logarithmic scaling. By using this mollification, we obtain the convergence of the particle trajectories in expectation, which implies the weak propagation of chaos. Additionally, under a regularity assumption of the cell-density, we derive the strong L1 convergence for the propagation of chaos by using relative entropy method.
Published: 2024

36. A thermodynamic and analytical description on the quantitative phase-field model with enhanced interface diffusivity

Author: Li, Yue, Wang, Lei, Li, Junjie, Wang, Jincheng, and Wang, Zhijun
Subjects: Condensed Matter - Materials Science
Abstract: Based on the idea of maintaining physical diffuse interface kinetics, enhancing interfacial diffusivity has recently provided a new direction for quantitative phase-field simulation at microstructural length and time scale. Establishing a general relationship between interface diffusivity and width is vital to facilitate the practical application. However, it is still limited by time-consuming numerical corrections, and its relationship with non-dilute thermodynamic properties still needs to be revealed. In this study, we present a new thermodynamic and analytical method for determining interfacial diffusivity enhancement. Unlike previous numerical corrections of partition coefficients and interface temperature, this new method aims to keep several thermodynamic quantities unchanged after enlarging the interface width. These essential quantities are theoretically proven to be diffusion potential jump across the diffuse interface and free energy dissipation by trans-interface diffusion. Since no dilute approximation has been employed in model derivation, the present method is available for binary alloys with arbitrary thermodynamic properties and can be easily extended to describe multicomponent systems. Therefore, the present method is expected to advance the recent quantitative phase-field framework and facilitate its practical applications., Comment: 22 pages 10+1figures
Published: 2024

37. Revealing Hierarchical Structure of Leaf Venations in Plant Science via Label-Efficient Segmentation: Dataset and Method

Author: Liu, Weizhen, Li, Ao, Wu, Ze, Li, Yue, Ge, Baobin, Lan, Guangyu, Chen, Shilin, Li, Minghe, Liu, Yunfei, Yuan, Xiaohui, and Dong, Nanqing
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Hierarchical leaf vein segmentation is a crucial but under-explored task in agricultural sciences, where analysis of the hierarchical structure of plant leaf venation can contribute to plant breeding. While current segmentation techniques rely on data-driven models, there is no publicly available dataset specifically designed for hierarchical leaf vein segmentation. To address this gap, we introduce the HierArchical Leaf Vein Segmentation (HALVS) dataset, the first public hierarchical leaf vein segmentation dataset. HALVS comprises 5,057 real-scanned high-resolution leaf images collected from three plant species: soybean, sweet cherry, and London planetree. It also includes human-annotated ground truth for three orders of leaf veins, with a total labeling effort of 83.8 person-days. Based on HALVS, we further develop a label-efficient learning paradigm that leverages partial label information, i.e. missing annotations for tertiary veins. Empirical studies are performed on HALVS, revealing new observations, challenges, and research directions on leaf vein segmentation., Comment: Accepted by IJCAI2024, Code: https://github.com/WeizhenLiuBioinform/HALVS-Hierarchical-Vein-Segment.git
Published: 2024

38. Preliminary Design of a General Electronics Platform for Accelerator Facilities

Author: Zhu, Jinfu, Ding, Hongli, Li, Haokui, Ran, Qiaoye, Dai, Xiwen, Li, Wei, Han, Jiawei, Li, Yue, Zhang, Zhiyuan, Qiu, Weixin, and Zhang, Weiqing
Subjects: Physics - Instrumentation and Detectors
Abstract: Many accelerators require considerable electronic systems for tests, verification, and operation. In Shenzhen Superconducting Soft X-ray Free Electron Laser (S3FEL), to meet the early tests and verification of various systems, save development expenses, and improve the reusability of hardware, firmware, and software systems, we have considered the needs of each system and preliminarily designed a general electronics platform based on MicroTCA.4. The Advanced Mezzanine Card (AMC) will place an FPGA Mezzanine Card (FMC) that supports 500 MSPS to 2 GSPS ADC/DAC. We will design two FMC cards on the Rear Transition Module (RTM), which can be used for analog signal conditioning and waveform digitization by 10 MSPS to 250 MSPS ADC/DAC or motor control. The commercial MCH, CPU, power module, and MTCA crate are deployed. This platform can also be applied to other accelerator facilities., Comment: 3 pages, 4 figures, 2024 IEEE Real-Time Conference
Published: 2024

39. GAD-Generative Learning for HD Map-Free Autonomous Driving

Author: Sun, Weijian, Jia, Yanbo, Zeng, Qi, Liu, Zihao, Liao, Jiang, Li, Yue, and Li, Xianfeng
Subjects: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep-learning-based techniques have been widely adopted for autonomous driving software stacks for mass production in recent years, focusing primarily on perception modules, with some work extending this method to prediction modules. However, the downstream planning and control modules are still designed with hefty handcrafted rules, dominated by optimization-based methods such as quadratic programming or model predictive control. This results in a performance bottleneck for autonomous driving systems in that corner cases simply cannot be solved by enumerating hand-crafted rules. We present a deep-learning-based approach that brings prediction, decision, and planning modules together with the attempt to overcome the rule-based methods' deficiency in real-world applications of autonomous driving, especially for urban scenes. The DNN model we proposed is solely trained with 10 hours of human driver data, and it supports all mass-production ADAS features available on the market to date. This method is deployed onto a Jiyue test car with no modification to its factory-ready sensor set and compute platform. the feasibility, usability, and commercial potential are demonstrated in this article.
Published: 2024

40. SemiPL: A Semi-supervised Method for Event Sound Source Localization

Author: Li, Yue, Yin, Baiqiao, Liu, Jinfu, Wen, Jiajun, Lin, Jiaying, and Liu, Mengyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In recent years, Event Sound Source Localization has been widely applied in various fields. Recent works typically relying on the contrastive learning framework show impressive performance. However, all work is based on large relatively simple datasets. It's also crucial to understand and analyze human behaviors (actions and interactions of people), voices, and sounds in chaotic events in many applications, e.g., crowd management, and emergency response services. In this paper, we apply the existing model to a more complex dataset, explore the influence of parameters on the model, and propose a semi-supervised improvement method SemiPL. With the increase in data quantity and the influence of label quality, self-supervised learning will be an unstoppable trend. The experiment shows that the parameter adjustment will positively affect the existing model. In particular, SSPL achieved an improvement of 12.2% cIoU and 0.56% AUC in Chaotic World compared to the results provided. The code is available at: https://github.com/ly245422/SSPL
Published: 2024

41. TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation

Author: Cheng, Junhao, Yin, Baiqiao, Cai, Kaixin, Huang, Minbin, Li, Hanhui, He, Yuxin, Lu, Xi, Li, Yue, Li, Yifei, Cheng, Yuhao, Yan, Yiqiang, and Liang, Xiaodan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent advances in diffusion models can generate high-quality and stunning images from text. However, multi-turn image generation, which is of high demand in real-world scenarios, still faces challenges in maintaining semantic consistency between images and texts, as well as contextual consistency of the same subject across multiple interactive turns. To address this issue, we introduce TheaterGen, a training-free framework that integrates large language models (LLMs) and text-to-image (T2I) models to provide the capability of multi-turn image generation. Within this framework, LLMs, acting as a "Screenwriter", engage in multi-turn interaction, generating and managing a standardized prompt book that encompasses prompts and layout designs for each character in the target image. Based on these, Theatergen generate a list of character images and extract guidance information, akin to the "Rehearsal". Subsequently, through incorporating the prompt book and guidance information into the reverse denoising process of T2I diffusion models, Theatergen generate the final image, as conducting the "Final Performance". With the effective management of prompt books and character images, TheaterGen significantly improves semantic and contextual consistency in synthesized images. Furthermore, we introduce a dedicated benchmark, CMIGBench (Consistent Multi-turn Image Generation Benchmark) with 8000 multi-turn instructions. Different from previous multi-turn benchmarks, CMIGBench does not define characters in advance. Both the tasks of story generation and multi-turn editing are included on CMIGBench for comprehensive evaluation. Extensive experimental results show that TheaterGen outperforms state-of-the-art methods significantly. It raises the performance bar of the cutting-edge Mini DALLE 3 model by 21% in average character-character similarity and 19% in average text-image similarity.
Published: 2024

42. Differentiable Voronoi Diagrams for Simulation of Cell-Based Mechanical Systems

Author: Numerow, Logan, Li, Yue, Coros, Stelian, and Thomaszewski, Bernhard
Subjects: Computer Science - Graphics, Computer Science - Computational Geometry
Abstract: Navigating topological transitions in cellular mechanical systems is a significant challenge for existing simulation methods. While abstract models lack predictive capabilities at the cellular level, explicit network representations struggle with topology changes, and per-cell representations are computationally too demanding for large-scale simulations. To address these challenges, we propose a novel cell-centered approach based on differentiable Voronoi diagrams. Representing each cell with a Voronoi site, our method defines shape and topology of the interface network implicitly. In this way, we substantially reduce the number of problem variables, eliminate the need for explicit contact handling, and ensure continuous geometry changes during topological transitions. Closed-form derivatives of network positions facilitate simulation with Newton-type methods for a wide range of per-cell energies. Finally, we extend our differentiable Voronoi diagrams to enable coupling with arbitrary rigid and deformable boundaries. We apply our approach to a diverse set of examples, highlighting splitting and merging of cells as well as neighborhood changes. We illustrate applications to inverse problems by matching soap foam simulations to real-world images. Comparative analysis with explicit cell models reveals that our method achieves qualitatively comparable results at significantly faster computation times., Comment: 11 pages, 10 figures
Published: 2024
Full Text: View/download PDF

43. Differentiable Geodesic Distance for Intrinsic Minimization on Triangle Meshes

Author: Li, Yue, Numerow, Logan, Thomaszewski, Bernhard, and Coros, Stelian
Subjects: Computer Science - Graphics, Computer Science - Computational Geometry
Abstract: Computing intrinsic distances on discrete surfaces is at the heart of many minimization problems in geometry processing and beyond. Solving these problems is extremely challenging as it demands the computation of on-surface distances along with their derivatives. We present a novel approach for intrinsic minimization of distance-based objectives defined on triangle meshes. Using a variational formulation of shortest-path geodesics, we compute first and second-order distance derivatives based on the implicit function theorem, thus opening the door to efficient Newton-type minimization solvers. We demonstrate our differentiable geodesic distance framework on a wide range of examples, including geodesic networks and membranes on surfaces of arbitrary genus, two-way coupling between hosting surface and embedded system, differentiable geodesic Voronoi diagrams, and efficient computation of Karcher means on complex shapes. Our analysis shows that second-order descent methods based on our differentiable geodesics outperform existing first-order and quasi-Newton methods by large margins.
Published: 2024
Full Text: View/download PDF

44. 3D deep learning for enhanced atom probe tomography analysis of nanoscale microstructures

Author: Yu, Jiwei, Wang, Zhangwei, Saksena, Aparna, Wei, Shaolou, Wei, Ye, Colnaghi, Timoteo, Marek, Andreas, Rampp, Markus, Song, Min, Gault, Baptiste, and Li, Yue
Subjects: Condensed Matter - Materials Science, Physics - Data Analysis, Statistics and Probability
Abstract: Quantitative analysis of microstructural features on the nanoscale, including precipitates, local chemical orderings (LCOs) or structural defects (e.g. stacking faults) plays a pivotal role in understanding the mechanical and physical responses of engineering materials. Atom probe tomography (APT), known for its exceptional combination of chemical sensitivity and sub-nanometer resolution, primarily identifies microstructures through compositional segregations. However, this fails when there is no significant segregation, as can be the case for LCOs and stacking faults. Here, we introduce a 3D deep learning approach, AtomNet, designed to process APT point cloud data at the single-atom level for nanoscale microstructure extraction, simultaneously considering compositional and structural information. AtomNet is showcased in segmenting L12-type nanoprecipitates from the matrix in an AlLiMg alloy, irrespective of crystallographic orientations, which outperforms previous methods. AtomNet also allows for 3D imaging of L10-type LCOs in an AuCu alloy, a challenging task for conventional analysis due to their small size and subtle compositional differences. Finally, we demonstrate the use of AtomNet for revealing 2D stacking faults in a Co-based superalloy, without any defected training data, expanding the capabilities of APT for automated exploration of hidden microstructures. AtomNet pushes the boundaries of APT analysis, and holds promise in establishing precise quantitative microstructure-property relationships across a diverse range of metallic materials.
Published: 2024

45. SFMViT: SlowFast Meet ViT in Chaotic World

Author: Lin, Jiaying, Wen, Jiajun, Liu, Mengyuan, Liu, Jinfu, Yin, Baiqiao, and Li, Yue
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: The task of spatiotemporal action localization in chaotic scenes is a challenging task toward advanced video understanding. Paving the way with high-quality video feature extraction and enhancing the precision of detector-predicted anchors can effectively improve model performance. To this end, we propose a high-performance dual-stream spatiotemporal feature extraction network SFMViT with an anchor pruning strategy. The backbone of our SFMViT is composed of ViT and SlowFast with prior knowledge of spatiotemporal action localization, which fully utilizes ViT's excellent global feature extraction capabilities and SlowFast's spatiotemporal sequence modeling capabilities. Secondly, we introduce the confidence maximum heap to prune the anchors detected in each frame of the picture to filter out the effective anchors. These designs enable our SFMViT to achieve a mAP of 26.62% in the Chaotic World dataset, far exceeding existing models. Code is available at https://github.com/jfightyr/SlowFast-Meet-ViT.
Published: 2024

46. HDBN: A Novel Hybrid Dual-branch Network for Robust Skeleton-based Action Recognition

Author: Liu, Jinfu, Yin, Baiqiao, Lin, Jiaying, Wen, Jiajun, Li, Yue, and Liu, Mengyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Skeleton-based action recognition has gained considerable traction thanks to its utilization of succinct and robust skeletal representations. Nonetheless, current methodologies often lean towards utilizing a solitary backbone to model skeleton modality, which can be limited by inherent flaws in the network backbone. To address this and fully leverage the complementary characteristics of various network architectures, we propose a novel Hybrid Dual-Branch Network (HDBN) for robust skeleton-based action recognition, which benefits from the graph convolutional network's proficiency in handling graph-structured data and the powerful modeling capabilities of Transformers for global information. In detail, our proposed HDBN is divided into two trunk branches: MixGCN and MixFormer. The two branches utilize GCNs and Transformers to model both 2D and 3D skeletal modalities respectively. Our proposed HDBN emerged as one of the top solutions in the Multi-Modal Video Reasoning and Analyzing Competition (MMVRAC) of 2024 ICME Grand Challenge, achieving accuracies of 47.95% and 75.36% on two benchmarks of the UAV-Human dataset by outperforming most existing methods. Our code will be publicly available at: https://github.com/liujf69/ICMEW2024-Track10.
Published: 2024

47. An energy efficient quantum-enhanced machine

Author: Hou, Waner, Zhao, Xingyu, Rehan, Kamran, Li, Yi, Li, Yue, Lutz, Eric, Lin, Yiheng, and Du, Jiangfeng
Subjects: Quantum Physics
Abstract: Quantum friction, a quantum analog of classical friction, reduces the performance of quantum machines, such as heat engines, and makes them less energy efficient. We here report the experimental realization of an energy efficient quantum engine coupled to a quantum battery that stores the produced work, using a single ion in a linear Paul trap. We first establish the quantum nature of the device by observing nonclassical work oscillations with the number of cycles as verified by energy measurements of the battery. We moreover successfully apply shortcut-to-adiabaticity techniques to suppress quantum friction and improve work production. While the average energy cost of the shortcut protocol is only about $3\%$, the work output is enhanced by up to approximately 33$\%$, making the machine significantly more energy efficient. In addition, we show that the quantum engine consistently outperforms its classical counterpart in this regime. Our results pave the way for energy efficient machines with quantum-enhanced performance.
Published: 2024

48. RealTCD: Temporal Causal Discovery from Interventional Data with Large Language Model

Author: Li, Peiwen, Wang, Xin, Zhang, Zeyang, Meng, Yuan, Shen, Fang, Li, Yue, Wang, Jialong, Li, Yang, and Zhu, Wenweu
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Statistics - Methodology
Abstract: In the field of Artificial Intelligence for Information Technology Operations, causal discovery is pivotal for operation and maintenance of graph construction, facilitating downstream industrial tasks such as root cause analysis. Temporal causal discovery, as an emerging method, aims to identify temporal causal relationships between variables directly from observations by utilizing interventional data. However, existing methods mainly focus on synthetic datasets with heavy reliance on intervention targets and ignore the textual information hidden in real-world systems, failing to conduct causal discovery for real industrial scenarios. To tackle this problem, in this paper we propose to investigate temporal causal discovery in industrial scenarios, which faces two critical challenges: 1) how to discover causal relationships without the interventional targets that are costly to obtain in practice, and 2) how to discover causal relations via leveraging the textual information in systems which can be complex yet abundant in industrial contexts. To address these challenges, we propose the RealTCD framework, which is able to leverage domain knowledge to discover temporal causal relationships without interventional targets. Specifically, we first develop a score-based temporal causal discovery method capable of discovering causal relations for root cause analysis without relying on interventional targets through strategic masking and regularization. Furthermore, by employing Large Language Models (LLMs) to handle texts and integrate domain knowledge, we introduce LLM-guided meta-initialization to extract the meta-knowledge from textual information hidden in systems to boost the quality of discovery. We conduct extensive experiments on simulation and real-world datasets to show the superiority of our proposed RealTCD framework over existing baselines in discovering temporal causal structures.
Published: 2024

49. Quasi-Frobenius Novikov algebras and pre-Novikov bialgebras

Author: Li, Yue and Hong, Yanyong
Subjects: Mathematics - Rings and Algebras, Mathematics - Representation Theory
Abstract: Pre-Novikov algebras and quasi-Frobenius Novikov algebras naturally appear in the theory of Novikov bialgebras. In this paper, we show that there is a natural pre-Novikov algebra structure associated to a quasi-Frobenius Novikov algebra. Then we introduce the definition of double constructions of quasi-Frobenius Novikov algebras associated to two pre-Novikov algebras and show that it is characterized by a pre-Novikov bialgebra. We also introduce the notion of pre-Novikov Yang-Baxter equation, whose symmetric solutions can produce pre-Novikov bialgebras. Moreover, the operator forms of pre-Novikov Yang-Baxter equation are also investigated.
Published: 2024

50. CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

Author: Bhatt, Manish, Chennabasappa, Sahana, Li, Yue, Nikolaidis, Cyrus, Song, Daniel, Wan, Shengye, Ahmad, Faizan, Aschermann, Cornelius, Chen, Yaohui, Kapil, Dhaval, Molnar, David, Whitman, Spencer, and Saxe, Joshua
Subjects: Computer Science - Cryptography and Security, Computer Science - Machine Learning
Abstract: Large language models (LLMs) introduce new security risks, but there are few comprehensive evaluation suites to measure and reduce these risks. We present BenchmarkName, a novel benchmark to quantify LLM security risks and capabilities. We introduce two new areas for testing: prompt injection and code interpreter abuse. We evaluated multiple state-of-the-art (SOTA) LLMs, including GPT-4, Mistral, Meta Llama 3 70B-Instruct, and Code Llama. Our results show that conditioning away risk of attack remains an unsolved problem; for example, all tested models showed between 26% and 41% successful prompt injection tests. We further introduce the safety-utility tradeoff: conditioning an LLM to reject unsafe prompts can cause the LLM to falsely reject answering benign prompts, which lowers utility. We propose quantifying this tradeoff using False Refusal Rate (FRR). As an illustration, we introduce a novel test set to quantify FRR for cyberattack helpfulness risk. We find many LLMs able to successfully comply with "borderline" benign requests while still rejecting most unsafe requests. Finally, we quantify the utility of LLMs for automating a core cybersecurity task, that of exploiting software vulnerabilities. This is important because the offensive capabilities of LLMs are of intense interest; we quantify this by creating novel test sets for four representative problems. We find that models with coding capabilities perform better than those without, but that further work is needed for LLMs to become proficient at exploit generation. Our code is open source and can be used to evaluate other LLMs.
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Region

Database

Publisher

37,312 results on '"Li, Yue"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources