Author: "An, Jie" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"An, Jie"' showing total 1,720,927 results

Start Over Author "An, Jie"

1,720,927 results on '"An, Jie"'

51. Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models

Author: Yeo, Wei Jie, Satapthy, Ranjan, and Cambria, Erik
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) are capable of generating persuasive Natural Language Explanations (NLEs) to justify their answers. However, the faithfulness of these explanations should not be readily trusted at face value. Recent studies have proposed various methods to measure the faithfulness of NLEs, typically by inserting perturbations at the explanation or feature level. We argue that these approaches are neither comprehensive nor correctly designed according to the established definition of faithfulness. Moreover, we highlight the risks of grounding faithfulness findings on out-of-distribution samples. In this work, we leverage a causal mediation technique called activation patching, to measure the faithfulness of an explanation towards supporting the explained answer. Our proposed metric, Causal Faithfulness quantifies the consistency of causal attributions between explanations and the corresponding model outputs as the indicator of faithfulness. We experimented across models varying from 2B to 27B parameters and found that models that underwent alignment tuning tend to produce more faithful and plausible explanations. We find that Causal Faithfulness is a promising improvement over existing faithfulness tests by taking into account the model's internal computations and avoiding out of distribution concerns that could otherwise undermine the validity of faithfulness assessments. We release the code in \url{https://github.com/wj210/Causal-Faithfulness}, Comment: Under review
Published: 2024

52. Graph Neural Flows for Unveiling Systemic Interactions Among Irregularly Sampled Time Series

Author: Mercatali, Giangiacomo, Freitas, Andre, and Chen, Jie
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Interacting systems are prevalent in nature. It is challenging to accurately predict the dynamics of the system if its constituent components are analyzed independently. We develop a graph-based model that unveils the systemic interactions of time series observed at irregular time points, by using a directed acyclic graph to model the conditional dependencies (a form of causal notation) of the system components and learning this graph in tandem with a continuous-time model that parameterizes the solution curves of ordinary differential equations (ODEs). Our technique, a graph neural flow, leads to substantial enhancements over non-graph-based methods, as well as graph-based methods without the modeling of conditional dependencies. We validate our approach on several tasks, including time series classification and forecasting, to demonstrate its efficacy., Comment: NeurIPS 2024. Code is available at https://github.com/gmerca/GNeuralFlow
Published: 2024

53. Detecting AI-Generated Texts in Cross-Domains

Author: Zhou, You and Wang, Jie
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, I.2.7
Abstract: Existing tools to detect text generated by a large language model (LLM) have met with certain success, but their performance can drop when dealing with texts in new domains. To tackle this issue, we train a ranking classifier called RoBERTa-Ranker, a modified version of RoBERTa, as a baseline model using a dataset we constructed that includes a wider variety of texts written by humans and generated by various LLMs. We then present a method to fine-tune RoBERTa-Ranker that requires only a small amount of labeled data in a new domain. Experiments show that this fine-tuned domain-aware model outperforms the popular DetectGPT and GPTZero on both in-domain and cross-domain texts, where AI-generated texts may either be in a different domain or generated by a different LLM not used to generate the training datasets. This approach makes it feasible and economical to build a single system to detect AI-generated texts across various domains.
Published: 2024
Full Text: View/download PDF

54. Predicting synthesis window of beta-TaON with thermodynamic modeling

Author: LaBelle, Dmitri and Hu, Yong-Jie
Subjects: Condensed Matter - Materials Science
Abstract: Phase-pure synthesis has been a major challenge for metal oxynitrides due to their sensitivity to synthesis conditions and the limited understanding of the underlying thermodynamics. The beta-phase tantalum oxynitride (beta-TaON), a promising material for applications in photocatalysis and energy storage, is particularly difficult to synthesize in a reproducible, phase-pure form. In this work, a computational thermodynamic model with experimental validation is presented to evaluate the phase-pure synthesis conditions for beta-TaON via ammonolysis reactions. The finite-temperature thermochemical properties of the reactant, product, and byproduct phases are predicted via first-principles calculations with the quasi-harmonic approach (QHA) as well as implemented from available thermodynamic databases. With the thermochemical properties, a thermodynamic model based on the CALculation of PHAse Diagrams (CALPAHD) approach is developed to assess the phase equilibria associated with the synthesis reactions and correspondingly predict the synthesis window for beta-TaON. A three-dimensional phase diagram is predicted as a function of gas composition and temperature, providing insights into optimal synthesis conditions. The computational predictions are further compared with available experimental data, offering a systematic framework for phase-pure beta-TaON synthesis., Comment: 12 pages, 7 figures
Published: 2024

55. PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment

Author: Wang, Zekun Moore, Wang, Shawn, Zhu, Kang, Liu, Jiaheng, Xu, Ke, Fu, Jie, Zhou, Wangchunshu, and Huang, Wenhao
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Alignment of large language models (LLMs) involves training models on preference-contrastive output pairs to adjust their responses according to human preferences. To obtain such contrastive pairs, traditional methods like RLHF and RLAIF rely on limited contrasting patterns, such as varying model variants or decoding temperatures. This singularity leads to two issues: (1) alignment is not comprehensive; and thereby (2) models are susceptible to jailbreaking attacks. To address these issues, we investigate how to construct more comprehensive and diversified contrasting patterns to enhance preference data (RQ1) and verify the impact of the diversification of contrasting patterns on model alignment (RQ2). For RQ1, we propose PopAlign, a framework that integrates diversified contrasting patterns across the prompt, model, and pipeline levels, introducing six contrasting strategies that do not require additional feedback labeling procedures. Regarding RQ2, we conduct thorough experiments demonstrating that PopAlign significantly outperforms existing methods, leading to more comprehensive alignment., Comment: 28 pages
Published: 2024

56. A Local Method for Compact and Non-compact Yamabe Problems

Author: Xu, Jie
Subjects: Mathematics - Differential Geometry, 58J05, 35J60, 53C18
Abstract: Let $ (M, g) $ be a compact manifold or a complete non-compact manifold without boundary, $ \dim M \geqslant 4 $, and not locally conformally flat. In this article, we introduce a new local method to resolve the Yamabe problem on compact manifold for dimensions at least $ 4 $, and the Yamabe problem on non-compact complete manifolds without boundary, which are pointwise conformal to subsets of some compact manifolds. In particular, the new local method applies to the hard cases--the Yamabe constants are positive. Our local method also generalizes Brezis and Nirenberg's nonlinear eigenvalue problem to subsets of manifolds., Comment: 40 Pages, all comments are welcome
Published: 2024

57. GlossyGS: Inverse Rendering of Glossy Objects with 3D Gaussian Splatting

Author: Lai, Shuichang, Huang, Letian, Guo, Jie, Cheng, Kai, Pan, Bowen, Long, Xiaoxiao, Lyu, Jiangjing, Lv, Chengfei, and Guo, Yanwen
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Reconstructing objects from posed images is a crucial and complex task in computer graphics and computer vision. While NeRF-based neural reconstruction methods have exhibited impressive reconstruction ability, they tend to be time-comsuming. Recent strategies have adopted 3D Gaussian Splatting (3D-GS) for inverse rendering, which have led to quick and effective outcomes. However, these techniques generally have difficulty in producing believable geometries and materials for glossy objects, a challenge that stems from the inherent ambiguities of inverse rendering. To address this, we introduce GlossyGS, an innovative 3D-GS-based inverse rendering framework that aims to precisely reconstruct the geometry and materials of glossy objects by integrating material priors. The key idea is the use of micro-facet geometry segmentation prior, which helps to reduce the intrinsic ambiguities and improve the decomposition of geometries and materials. Additionally, we introduce a normal map prefiltering strategy to more accurately simulate the normal distribution of reflective surfaces. These strategies are integrated into a hybrid geometry and material representation that employs both explicit and implicit methods to depict glossy objects. We demonstrate through quantitative analysis and qualitative visualization that the proposed method is effective to reconstruct high-fidelity geometries and materials of glossy objects, and performs favorably against state-of-the-arts.
Published: 2024

58. On the Boltzmann equation with soft potentials: Existence, uniqueness and smoothing effect of mild solutions

Author: He, Ling-Bing, Ji, Jie, and Li, Wei-Xi
Subjects: Mathematics - Analysis of PDEs, 35B65, 35Q20
Abstract: We consider the spatially inhomogeneous Boltzmann equation without angular cutoff for soft potentials. For any given initial datum such that the mass, energy and entropy densities are bounded and the mass is away from vacuum, we establish the local-in-time existence and uniqueness of mild solutions, and further provide the first result on sharp smoothing effect in analytic space or Gevrey space for soft potentials., Comment: 60pages,0 figure
Published: 2024

59. Attention-Guided Perturbation for Consistency Regularization in Semi-Supervised Medical Image Segmentation

Author: Cheng, Yuxuan, Shao, Chenxi, Ma, Jie, and Li, Guoliang
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Medical image segmentation is a pivotal step in diagnostic and therapeutic processes. However, the acquisition of high-quality annotated data is often constrained by scarcity and cost. Semi-supervised learning offers a promising approach to enhance model performance by using unlabeled data. While consistency regularization is a prevalent method in semi-supervised image segmentation, there is a dearth of research on perturbation strategies tailored for semi-supervised medical image segmentation tasks. This paper introduces an attention-guided perturbation strategy for semi-supervised consistency regularization in the context of medical image segmentation. We add the perturbation based on the attention from the model in the image and feature level to achieve consistency regularization. The method is adept at accommodating the intricate structures and high-dimensional semantics inherent in medical images, thereby enhancing the performance of semi-supervised segmentation tasks. Our method achieved state-of-the-art results on benchmark datasets, including a 90.4\% Dice score on the ACDC dataset in the 7-case scenario.
Published: 2024

60. Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models

Author: Ren, Jie, Chen, Kangrui, Chen, Chen, Sehwag, Vikash, Xing, Yue, Tang, Jiliang, and Lyu, Lingjuan
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Multimedia
Abstract: Large Language Models (LLMs) and Vision-Language Models (VLMs) have made significant advancements in a wide range of natural language processing and vision-language tasks. Access to large web-scale datasets has been a key factor in their success. However, concerns have been raised about the unauthorized use of copyrighted materials and potential copyright infringement. Existing methods, such as sample-level Membership Inference Attacks (MIA) and distribution-based dataset inference, distinguish member data (data used for training) and non-member data by leveraging the common observation that models tend to memorize and show greater confidence in member data. Nevertheless, these methods face challenges when applied to LLMs and VLMs, such as the requirement for ground-truth member data or non-member data that shares the same distribution as the test data. In this paper, we propose a novel dataset-level membership inference method based on Self-Comparison. We find that a member prefix followed by a non-member suffix (paraphrased from a member suffix) can further trigger the model's memorization on training data. Instead of directly comparing member and non-member data, we introduce paraphrasing to the second half of the sequence and evaluate how the likelihood changes before and after paraphrasing. Unlike prior approaches, our method does not require access to ground-truth member data or non-member data in identical distribution, making it more practical. Extensive experiments demonstrate that our proposed method outperforms traditional MIA and dataset inference techniques across various datasets and models, including including public models, fine-tuned models, and API-based commercial models.
Published: 2024

61. Composable free-space continuous-variable quantum key distribution using discrete modulation

Author: Jaksch, Kevin, Dirmeier, Thomas, Weiser, Yannick, Richter, Stefan, Bayraktar, Ömer, Hacker, Bastian, Rösler, Conrad, Khan, Imran, Petscharning, Stefan, Grafenauer, Thomas, Hentschel, Michael, Ömer, Bernhard, Pacher, Christoph, Kanitschar, Florian, Upadhyaya, Twesh, Lin, Jie, Lütkenhaus, Norbert, Leuchs, Gerd, and Marquardt, Christoph
Subjects: Quantum Physics
Abstract: Continuous-variable (CV) quantum key distribution (QKD) allows for quantum secure communication with the benefit of being close to existing classical coherent communication. In recent years, CV QKD protocols using a discrete number of displaced coherent states have been studied intensively, as the modulation can be directly implemented with real devices with a finite digital resolution. However, the experimental demonstrations until now only calculated key rates in the asymptotic regime. To be used in cryptographic applications, a QKD system has to generate keys with composable security in the finite-size regime. In this paper, we present a CV QKD system using discrete modulation that is especially designed for urban atmospheric channels. For this, we use polarization encoding to cope with the turbulent but non-birefringent atmosphere. This will allow to expand CV QKD networks beyond the existing fiber backbone. In a first laboratory demonstration, we implemented a novel type of security proof allowing to calculate composable finite-size key rates against i.i.d. collective attacks without any Gaussian assumptions. We applied the full QKD protocol including a QRNG, error correction and privacy amplification to extract secret keys. In particular, we studied the impact of frame errors on the actual key generation.
Published: 2024

62. Using Protected Attributes to Consider Fairness in Multi-Agent Systems

Author: La Malfa, Gabriele, Zhang, Jie M., Luck, Michael, and Black, Elizabeth
Subjects: Computer Science - Multiagent Systems, Computer Science - Artificial Intelligence
Abstract: Fairness in Multi-Agent Systems (MAS) has been extensively studied, particularly in reward distribution among agents in scenarios such as goods allocation, resource division, lotteries, and bargaining systems. Fairness in MAS depends on various factors, including the system's governing rules, the behaviour of the agents, and their characteristics. Yet, fairness in human society often involves evaluating disparities between disadvantaged and privileged groups, guided by principles of Equality, Diversity, and Inclusion (EDI). Taking inspiration from the work on algorithmic fairness, which addresses bias in machine learning-based decision-making, we define protected attributes for MAS as characteristics that should not disadvantage an agent in terms of its expected rewards. We adapt fairness metrics from the algorithmic fairness literature -- namely, demographic parity, counterfactual fairness, and conditional statistical parity -- to the multi-agent setting, where self-interested agents interact within an environment. These metrics allow us to evaluate the fairness of MAS, with the ultimate aim of designing MAS that do not disadvantage agents based on protected attributes.
Published: 2024

63. F\'eeton ($B-L$ Gauge Boson) Dark Matter Testable in Future Direct Detection Experiments

Author: Cheng, Yu, Sheng, Jie, and Yanagida, Tsutomu T.
Subjects: High Energy Physics - Phenomenology
Abstract: In this paper, we revisit the f\'eeton (gauge boson of $U(1)_{B-L}$ symmetry) dark matter scenario, and first point out the $U(1)$ gauge symmetry can be a linear combination of the $B-L$ and the SM hypercharge gauge symmetries. With the redefinition of $B-L$ charge of fermions, the coupling between electron and f\'eeton can be enhanced. After showing the parameter space required from the DM stability and cosmic production, we discuss the potential for verifying them in dark matter direct detection experiments. The results show that future experiments, such as SuperCDMS, have a sensitivity to reach the f\'eeton DM region consistent with its cosmic production., Comment: 13 pages, 2 figures
Published: 2024

64. SiFiSinger: A High-Fidelity End-to-End Singing Voice Synthesizer based on Source-filter Model

Author: Cui, Jianwei, Gu, Yu, Weng, Chao, Zhang, Jie, Chen, Liping, and Dai, Lirong
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Machine Learning, Computer Science - Sound
Abstract: This paper presents an advanced end-to-end singing voice synthesis (SVS) system based on the source-filter mechanism that directly translates lyrical and melodic cues into expressive and high-fidelity human-like singing. Similarly to VISinger 2, the proposed system also utilizes training paradigms evolved from VITS and incorporates elements like the fundamental pitch (F0) predictor and waveform generation decoder. To address the issue that the coupling of mel-spectrogram features with F0 information may introduce errors during F0 prediction, we consider two strategies. Firstly, we leverage mel-cepstrum (mcep) features to decouple the intertwined mel-spectrogram and F0 characteristics. Secondly, inspired by the neural source-filter models, we introduce source excitation signals as the representation of F0 in the SVS system, aiming to capture pitch nuances more accurately. Meanwhile, differentiable mcep and F0 losses are employed as the waveform decoder supervision to fortify the prediction accuracy of speech envelope and pitch in the generated speech. Experiments on the Opencpop dataset demonstrate efficacy of the proposed model in synthesis quality and intonation accuracy., Comment: Accepted by ICASSP 2024, Synthesized audio samples are available at: https://sounddemos.github.io/sifisinger
Published: 2024

65. Proposal of quantum repeater architecture based on Rydberg atom quantum processors

Author: Zhang, Yan-Lei, Jie, Qing-Xuan, Li, Ming, Wu, Shu-Hao, Wang, Zhu-Bo, Zou, Xu-Bo, Zhang, Peng-Fei, Li, Gang, Zhang, Tiancai, Guo, Guang-Can, and Zou, Chang-Ling
Subjects: Quantum Physics
Abstract: Realizing large-scale quantum networks requires the generation of high-fidelity quantum entanglement states between remote quantum nodes, a key resource for quantum communication, distributed computation and sensing applications. However, entanglement distribution between quantum network nodes is hindered by optical transmission loss and local operation errors. Here, we propose a novel quantum repeater architecture that synergistically integrates Rydberg atom quantum processors with optical cavities to overcome these challenges. Our scheme leverages cavity-mediated interactions for efficient remote entanglement generation, followed by Rydberg interaction-based entanglement purification and swapping. Numerical simulations, incorporating realistic experimental parameters, demonstrate the generation of Bell states with 99\% fidelity at rates of 1.1\,kHz between two nodes in local-area network (distance $0.1\,\mathrm{km}$), and can be extend to metropolitan-area ($25\,\mathrm{km}$) or intercity ($\mathrm{250\,\mathrm{km}}$, with the assitance of frequency converters) network with a rate of 0.1\,kHz. This scalable approach opens up near-term opportunities for exploring quantum network applications and investigating the advantages of distributed quantum information processing., Comment: 3 figures
Published: 2024

66. VidCompress: Memory-Enhanced Temporal Compression for Video Understanding in Large Language Models

Author: Lan, Xiaohan, Yuan, Yitian, Jie, Zequn, and Ma, Lin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia
Abstract: Video-based multimodal large language models (Video-LLMs) possess significant potential for video understanding tasks. However, most Video-LLMs treat videos as a sequential set of individual frames, which results in insufficient temporal-spatial interaction that hinders fine-grained comprehension and difficulty in processing longer videos due to limited visual token capacity. To address these challenges, we propose VidCompress, a novel Video-LLM featuring memory-enhanced temporal compression. VidCompress employs a dual-compressor approach: a memory-enhanced compressor captures both short-term and long-term temporal relationships in videos and compresses the visual tokens using a multiscale transformer with a memory-cache mechanism, while a text-perceived compressor generates condensed visual tokens by utilizing Q-Former and integrating temporal contexts into query embeddings with cross attention. Experiments on several VideoQA datasets and comprehensive benchmarks demonstrate that VidCompress efficiently models complex temporal-spatial relations and significantly outperforms existing Video-LLMs., Comment: 9 pages, 4 figures
Published: 2024

67. SeaDATE: Remedy Dual-Attention Transformer with Semantic Alignment via Contrast Learning for Multimodal Object Detection

Author: Dong, Shuhan, Li, Yunsong, Xie, Weiying, Zhang, Jiaqing, Tian, Jiayuan, Yang, Danian, and Lei, Jie
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Multimodal object detection leverages diverse modal information to enhance the accuracy and robustness of detectors. By learning long-term dependencies, Transformer can effectively integrate multimodal features in the feature extraction stage, which greatly improves the performance of multimodal object detection. However, current methods merely stack Transformer-guided fusion techniques without exploring their capability to extract features at various depth layers of network, thus limiting the improvements in detection performance. In this paper, we introduce an accurate and efficient object detection method named SeaDATE. Initially, we propose a novel dual attention Feature Fusion (DTF) module that, under Transformer's guidance, integrates local and global information through a dual attention mechanism, strengthening the fusion of modal features from orthogonal perspectives using spatial and channel tokens. Meanwhile, our theoretical analysis and empirical validation demonstrate that the Transformer-guided fusion method, treating images as sequences of pixels for fusion, performs better on shallow features' detail information compared to deep semantic information. To address this, we designed a contrastive learning (CL) module aimed at learning features of multimodal samples, remedying the shortcomings of Transformer-guided fusion in extracting deep semantic features, and effectively utilizing cross-modal information. Extensive experiments and ablation studies on the FLIR, LLVIP, and M3FD datasets have proven our method to be effective, achieving state-of-the-art detection performance.
Published: 2024

68. Model reduction, machine learning based global optimisation for large-scale steady state nonlinear systems

Author: Tao, Min, Petsagkourakis, Panagiotis, Li, Jie, and Theodoropoulos, Constantinos
Subjects: Mathematics - Optimization and Control
Abstract: Many engineering processes can be accurately modelled using partial differential equations (PDEs), but high dimensionality and non-convexity of the resulting systems pose limitations on their efficient optimisation. In this work, a model reduction, machine-learning methodology combining principal component analysis (PCA) and artificial neural networks (ANNs) is employed to construct a reduced surrogate model, which can then be utilised by advanced deterministic global optimisation algorithms to compute global optimal solutions with theoretical guarantees. However, such optimisation would still be time-consuming due to the high non-convexity of the activation functions inside the reduced ANN structures. To develop a computationally-efficient optimisation framework, we propose two alternative strategies: The first one is a piecewise-affine reformulation of the nonlinear ANN activation functions, while the second one is based on deep rectifier neural networks with ReLU activation function. The performance of the proposed framework is demonstrated through two illustrative case studies.
Published: 2024

69. A Study of Decay Rate of Bound Negative Muons

Author: Deng, Jian-Bo, Deng, Miao-Yi, Ma, Shi-Jie, Wang, Rui-Bo, Fan, Qi-Qi, He, Peng-Zhang, He, Yi-Peng, Li, Shuo-Wen, and Hu, Xian-Ru
Subjects: High Energy Physics - Phenomenology
Abstract: A number of experiments show that the decay lifetimes of muons bound to atomic nuclei are longer than the decay lifetimes of free muons. In this paper, a scheme of extending quantum mechanics (EQM) is proposed to resolve this problem. The Schr$\ddot{\text{o}}$dinger's equation is obtained to prove the validation of this attempt. The decay ratio of bound muons is also calculated in EQM, and the result is in good agreement with the experimental data., Comment: 5 pages, 1 figure, 2 tables
Published: 2024

70. Light-Weight Fault Tolerant Attention for Large Language Model Training

Author: Liang, Yuhang, Li, Xinyi, Ren, Jie, Li, Ang, Fang, Bo, and Chen, Jieyang
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Machine Learning, C.1.4, B.2.3, I.2.7
Abstract: Large Language Models (LLMs) have demonstrated remarkable performance in various natural language processing tasks. However, the training of these models is computationally intensive and susceptible to faults, particularly in the attention mechanism, which is a critical component of transformer-based LLMs. In this paper, we investigate the impact of faults on LLM training, focusing on INF, NaN, and near-INF values in the computation results with systematic fault injection experiments. We observe the propagation patterns of these errors, which can trigger non-trainable states in the model and disrupt training, forcing the procedure to load from checkpoints. To mitigate the impact of these faults, we propose ATTNChecker, the first Algorithm-Based Fault Tolerance (ABFT) technique tailored for the attention mechanism in LLMs. ATTNChecker is designed based on fault propagation patterns of LLM and incorporates performance optimization to adapt to both system reliability and model vulnerability while providing lightweight protection for fast LLM training. Evaluations on four LLMs show that ATTNChecker on average incurs on average 7% overhead on training while detecting and correcting all extreme errors. Compared with the state-of-the-art checkpoint/restore approach, ATTNChecker reduces recovery overhead by up to 49x.
Published: 2024

71. Adaptive Coordinators and Prompts on Heterogeneous Graphs for Cross-Domain Recommendations

Author: Zhang, Hengyu, Shen, Chunxu, Sun, Xiangguo, Tan, Jie, Rong, Yu, Piao, Chengzhi, Cheng, Hong, and Yi, Lingling
Subjects: Computer Science - Information Retrieval
Abstract: In the online digital world, users frequently engage with diverse items across multiple domains (e.g., e-commerce platforms, streaming services, and social media networks), forming complex heterogeneous interaction graphs. Leveraging this multi-domain information can undoubtedly enhance the performance of recommendation systems by providing more comprehensive user insights and alleviating data sparsity in individual domains. However, integrating multi-domain knowledge for the cross-domain recommendation is very hard due to inherent disparities in user behavior and item characteristics and the risk of negative transfer, where irrelevant or conflicting information from the source domains adversely impacts the target domain's performance. To address these challenges, we offer HAGO, a novel framework with $\textbf{H}$eterogeneous $\textbf{A}$daptive $\textbf{G}$raph co$\textbf{O}$rdinators, which dynamically integrate multi-domain graphs into a cohesive structure by adaptively adjusting the connections between coordinators and multi-domain graph nodes, thereby enhancing beneficial inter-domain interactions while mitigating negative transfer effects. Additionally, we develop a universal multi-domain graph pre-training strategy alongside HAGO to collaboratively learn high-quality node representations across domains. To effectively transfer the learned multi-domain knowledge to the target domain, we design an effective graph prompting method, which incorporates pre-trained embeddings with learnable prompts for the recommendation task. Our framework is compatible with various graph-based models and pre-training techniques, demonstrating broad applicability and effectiveness. Further experimental results show that our solutions outperform state-of-the-art methods in multi-domain recommendation scenarios and highlight their potential for real-world applications., Comment: Under review
Published: 2024

72. Ada-K Routing: Boosting the Efficiency of MoE-based LLMs

Author: Yue, Tongtian, Guo, Longteng, Cheng, Jie, Gao, Xuange, and Liu, Jing
Subjects: Computer Science - Computation and Language
Abstract: In the era of Large Language Models (LLMs), Mixture-of-Experts (MoE) architectures offer a promising approach to managing computational costs while scaling up model parameters. Conventional MoE-based LLMs typically employ static Top-K routing, which activates a fixed and equal number of experts for each token regardless of their significance within the context. In this paper, we propose a novel Ada-K routing strategy that dynamically adjusts the number of activated experts for each token, thereby improving the balance between computational efficiency and model performance. Specifically, our strategy incorporates learnable and lightweight allocator modules that decide customized expert resource allocation tailored to the contextual needs for each token. These allocators are designed to be fully pluggable, making it broadly applicable across all mainstream MoE-based LLMs. We leverage the Proximal Policy Optimization (PPO) algorithm to facilitate an end-to-end learning process for this non-differentiable decision-making framework. Extensive evaluations on four popular baseline models demonstrate that our Ada-K routing method significantly outperforms conventional Top-K routing. Compared to Top-K, our method achieves over 25% reduction in FLOPs and more than 20% inference speedup while still improving performance across various benchmarks. Moreover, the training of Ada-K is highly efficient. Even for Mixtral-8x22B, a MoE-based LLM with more than 140B parameters, the training time is limited to 8 hours. Detailed analysis shows that harder tasks, middle layers, and content words tend to activate more experts, providing valuable insights for future adaptive MoE system designs. Both the training code and model checkpoints will be publicly available., Comment: Coauthors do not reach a consensus on submitting the current version
Published: 2024

73. Towards Reliable Verification of Unauthorized Data Usage in Personalized Text-to-Image Diffusion Models

Author: Li, Boheng, Wei, Yanhao, Fu, Yankai, Wang, Zhenting, Li, Yiming, Zhang, Jie, Wang, Run, and Zhang, Tianwei
Subjects: Computer Science - Computers and Society, Computer Science - Computer Vision and Pattern Recognition
Abstract: Text-to-image diffusion models are pushing the boundaries of what generative AI can achieve in our lives. Beyond their ability to generate general images, new personalization techniques have been proposed to customize the pre-trained base models for crafting images with specific themes or styles. Such a lightweight solution, enabling AI practitioners and developers to easily build their own personalized models, also poses a new concern regarding whether the personalized models are trained from unauthorized data. A promising solution is to proactively enable data traceability in generative models, where data owners embed external coatings (e.g., image watermarks or backdoor triggers) onto the datasets before releasing. Later the models trained over such datasets will also learn the coatings and unconsciously reproduce them in the generated mimicries, which can be extracted and used as the data usage evidence. However, we identify the existing coatings cannot be effectively learned in personalization tasks, making the corresponding verification less reliable. In this paper, we introduce SIREN, a novel methodology to proactively trace unauthorized data usage in black-box personalized text-to-image diffusion models. Our approach optimizes the coating in a delicate way to be recognized by the model as a feature relevant to the personalization task, thus significantly improving its learnability. We also utilize a human perceptual-aware constraint, a hypersphere classification technique, and a hypothesis-testing-guided verification method to enhance the stealthiness and detection accuracy of the coating. The effectiveness of SIREN is verified through extensive experiments on a diverse set of benchmark datasets, models, and learning algorithms. SIREN is also effective in various real-world scenarios and evaluated against potential countermeasures. Our code is publicly available., Comment: To appear in the IEEE Symposium on Security & Privacy, May 2025
Published: 2024

74. CRUcialG: Reconstruct Integrated Attack Scenario Graphs by Cyber Threat Intelligence Reports

Author: Cheng, Wenrui, Zhu, Tiantian, Chen, Tieming, Yuan, Qixuan, Ying, Jie, Li, Hongmei, Xiong, Chunlin, Li, Mingda, Lv, Mingqi, and Chen, Yan
Subjects: Computer Science - Cryptography and Security
Abstract: Cyber Threat Intelligence (CTI) reports are factual records compiled by security analysts through their observations of threat events or their own practical experience with attacks. In order to utilize CTI reports for attack detection, existing methods have attempted to map the content of reports onto system-level attack provenance graphs to clearly depict attack procedures. However, existing studies on constructing graphs from CTI reports suffer from problems such as weak natural language processing (NLP) capabilities, discrete and fragmented graphs, and insufficient attack semantic representation. Therefore, we propose a system called CRUcialG for the automated reconstruction of attack scenario graphs (ASGs) by CTI reports. First, we use NLP models to extract systematic attack knowledge from CTI reports to form preliminary ASGs. Then, we propose a four-phase attack rationality verification framework from the tactical phase with attack procedure to evaluate the reasonability of ASGs. Finally, we implement the relation repair and phase supplement of ASGs by adopting a serialized graph generation model. We collect a total of 10,607 CTI reports and generate 5,761 complete ASGs. Experimental results on CTI reports from 30 security vendors and DARPA show that the similarity of ASG reconstruction by CRUcialG can reach 84.54%. Compared with SOTA (EXTRACTOR and AttackG), the recall of CRUcialG (extraction of real attack events) can reach 88.13% and 94.46% respectively, which is 40% higher than SOTA on average. The F1-score of attack phase verification is able to reach 90.04%.
Published: 2024

75. Parameterize Structure with Differentiable Template for 3D Shape Generation

Author: Ma, Changfeng, Guo, Pengxiao, Yang, Shuangyu, Chen, Yinuo, Guo, Jie, Wang, Chongjun, Guo, Yanwen, and Wang, Wenping
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Structural representation is crucial for reconstructing and generating editable 3D shapes with part semantics. Recent 3D shape generation works employ complicated networks and structure definitions relying on hierarchical annotations and pay less attention to the details inside parts. In this paper, we propose the method that parameterizes the shared structure in the same category using a differentiable template and corresponding fixed-length parameters. Specific parameters are fed into the template to calculate cuboids that indicate a concrete shape. We utilize the boundaries of three-view drawings of each cuboid to further describe the inside details. Shapes are represented with the parameters and three-view details inside cuboids, from which the SDF can be calculated to recover the object. Benefiting from our fixed-length parameters and three-view details, our networks for reconstruction and generation are simple and effective to learn the latent space. Our method can reconstruct or generate diverse shapes with complicated details, and interpolate them smoothly. Extensive evaluations demonstrate the superiority of our method on reconstruction from point cloud, generation, and interpolation.
Published: 2024

76. V2M: Visual 2-Dimensional Mamba for Image Representation Learning

Author: Wang, Chengkun, Zheng, Wenzhao, Huang, Yuanhui, Zhou, Jie, and Lu, Jiwen
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Mamba has garnered widespread attention due to its flexible design and efficient hardware performance to process 1D sequences based on the state space model (SSM). Recent studies have attempted to apply Mamba to the visual domain by flattening 2D images into patches and then regarding them as a 1D sequence. To compensate for the 2D structure information loss (e.g., local similarity) of the original image, most existing methods focus on designing different orders to sequentially process the tokens, which could only alleviate this issue to some extent. In this paper, we propose a Visual 2-Dimensional Mamba (V2M) model as a complete solution, which directly processes image tokens in the 2D space. We first generalize SSM to the 2-dimensional space which generates the next state considering two adjacent states on both dimensions (e.g., columns and rows). We then construct our V2M based on the 2-dimensional SSM formulation and incorporate Mamba to achieve hardware-efficient parallel processing. The proposed V2M effectively incorporates the 2D locality prior yet inherits the efficiency and input-dependent scalability of Mamba. Extensive experimental results on ImageNet classification and downstream visual tasks including object detection and instance segmentation on COCO and semantic segmentation on ADE20K demonstrate the effectiveness of our V2M compared with other visual backbones.
Published: 2024

77. Unboxing Virgil ADTs for Fun and Profit

Author: Teo, Bradley Wei Jie and Titzer, Ben L.
Subjects: Computer Science - Programming Languages
Abstract: Algebraic Data Types (ADTs) are an increasingly common feature in modern programming languages. In many implementations, values of non-nullary, multi-case ADTs are allocated on the heap, which may reduce performance and increase memory usage. This work explores annotation-guided optimizations to ADT representation in Virgil, a systems-level programming language that compiles to x86, x86-64, Wasm and the Java Virtual Machine. We extend Virgil with annotations: #unboxed to eliminate the overhead of heap allocation via automatic compiler transformation to a scalar representation, and #packed, to enable programmer-expressed bit-layouts. These annotations allow programmers to both save memory and manipulate data in formats dictated by hardware. We dedicate this work as an homage and echo of work done in collaboration with Jens in the work entitled "A Declarative Approach to Generating Machine Code Tools", an unpublished manuscript from 2005. In fact, this work inherits some syntactic conventions from that prior work. The performance impact of these representation changes was evaluated on a variety of workloads in terms of execution time and memory usage, but we don't include it because Jens like semantics and type systems better!
Published: 2024
Full Text: View/download PDF

78. GlobalMamba: Global Image Serialization for Vision Mamba

Author: Wang, Chengkun, Zheng, Wenzhao, Zhou, Jie, and Lu, Jiwen
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Vision mambas have demonstrated strong performance with linear complexity to the number of vision tokens. Their efficiency results from processing image tokens sequentially. However, most existing methods employ patch-based image tokenization and then flatten them into 1D sequences for causal processing, which ignore the intrinsic 2D structural correlations of images. It is also difficult to extract global information by sequential processing of local patches. In this paper, we propose a global image serialization method to transform the image into a sequence of causal tokens, which contain global information of the 2D image. We first convert the image from the spatial domain to the frequency domain using Discrete Cosine Transform (DCT) and then arrange the pixels with corresponding frequency ranges. We further transform each set within the same frequency band back to the spatial domain to obtain a series of images before tokenization. We construct a vision mamba model, GlobalMamba, with a causal input format based on the proposed global image serialization, which can better exploit the causal relations among image sequences. Extensive experiments demonstrate the effectiveness of our GlobalMamba, including image classification on ImageNet-1K, object detection on COCO, and semantic segmentation on ADE20K.
Published: 2024

79. LG-CAV: Train Any Concept Activation Vector with Language Guidance

Author: Huang, Qihan, Song, Jie, Xue, Mengqi, Zhang, Haofei, Hu, Bingde, Wang, Huiqiong, Jiang, Hao, Wang, Xingen, and Song, Mingli
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Concept activation vector (CAV) has attracted broad research interest in explainable AI, by elegantly attributing model predictions to specific concepts. However, the training of CAV often necessitates a large number of high-quality images, which are expensive to curate and thus limited to a predefined set of concepts. To address this issue, we propose Language-Guided CAV (LG-CAV) to harness the abundant concept knowledge within the certain pre-trained vision-language models (e.g., CLIP). This method allows training any CAV without labeled data, by utilizing the corresponding concept descriptions as guidance. To bridge the gap between vision-language model and the target model, we calculate the activation values of concept descriptions on a common pool of images (probe images) with vision-language model and utilize them as language guidance to train the LG-CAV. Furthermore, after training high-quality LG-CAVs related to all the predicted classes in the target model, we propose the activation sample reweighting (ASR), serving as a model correction technique, to improve the performance of the target model in return. Experiments on four datasets across nine architectures demonstrate that LG-CAV achieves significantly superior quality to previous CAV methods given any concept, and our model correction method achieves state-of-the-art performance compared to existing concept-based methods. Our code is available at https://github.com/hqhQAQ/LG-CAV.
Published: 2024

80. Effi-Code: Unleashing Code Efficiency in Language Models

Author: Huang, Dong, Zeng, Guangtao, Dai, Jianbo, Luo, Meng, Weng, Han, Qing, Yuhao, Cui, Heming, Guo, Zhijiang, and Zhang, Jie M.
Subjects: Computer Science - Computation and Language, Computer Science - Software Engineering
Abstract: As the use of large language models (LLMs) for code generation becomes more prevalent in software development, it is critical to enhance both the efficiency and correctness of the generated code. Existing methods and models primarily focus on the correctness of LLM-generated code, ignoring efficiency. In this work, we present Effi-Code, an approach to enhancing code generation in LLMs that can improve both efficiency and correctness. We introduce a Self-Optimization process based on Overhead Profiling that leverages open-source LLMs to generate a high-quality dataset of correct and efficient code samples. This dataset is then used to fine-tune various LLMs. Our method involves the iterative refinement of generated code, guided by runtime performance metrics and correctness checks. Extensive experiments demonstrate that models fine-tuned on the Effi-Code show significant improvements in both code correctness and efficiency across task types. For example, the pass@1 of DeepSeek-Coder-6.7B-Instruct generated code increases from \textbf{43.3\%} to \textbf{76.8\%}, and the average execution time for the same correct tasks decreases by \textbf{30.5\%}. Effi-Code offers a scalable and generalizable approach to improving code generation in AI systems, with potential applications in software development, algorithm design, and computational problem-solving. The source code of Effi-Code was released in \url{https://github.com/huangd1999/Effi-Code}., Comment: Under Review
Published: 2024

81. Hi-Mamba: Hierarchical Mamba for Efficient Image Super-Resolution

Author: Qiao, Junbo, Liao, Jincheng, Li, Wei, Zhang, Yulun, Guo, Yong, Wen, Yi, Qiu, Zhangxizi, Xie, Jiao, Hu, Jie, and Lin, Shaohui
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: State Space Models (SSM), such as Mamba, have shown strong representation ability in modeling long-range dependency with linear complexity, achieving successful applications from high-level to low-level vision tasks. However, SSM's sequential nature necessitates multiple scans in different directions to compensate for the loss of spatial dependency when unfolding the image into a 1D sequence. This multi-direction scanning strategy significantly increases the computation overhead and is unbearable for high-resolution image processing. To address this problem, we propose a novel Hierarchical Mamba network, namely, Hi-Mamba, for image super-resolution (SR). Hi-Mamba consists of two key designs: (1) The Hierarchical Mamba Block (HMB) assembled by a Local SSM (L-SSM) and a Region SSM (R-SSM) both with the single-direction scanning, aggregates multi-scale representations to enhance the context modeling ability. (2) The Direction Alternation Hierarchical Mamba Group (DA-HMG) allocates the isomeric single-direction scanning into cascading HMBs to enrich the spatial relationship modeling. Extensive experiments demonstrate the superiority of Hi-Mamba across five benchmark datasets for efficient SR. For example, Hi-Mamba achieves a significant PSNR improvement of 0.29 dB on Manga109 for $\times3$ SR, compared to the strong lightweight MambaIR.
Published: 2024

82. Towards Homogeneous Lexical Tone Decoding from Heterogeneous Intracranial Recordings

Author: Wu, Di, Li, Siyuan, Feng, Chen, Cao, Lu, Zhang, Yue, Yang, Jie, and Sawan, Mohamad
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Quantitative Biology - Neurons and Cognition
Abstract: Recent advancements in brain-computer interfaces (BCIs) have enabled the decoding of lexical tones from intracranial recordings, offering the potential to restore the communication abilities of speech-impaired tonal language speakers. However, data heterogeneity induced by both physiological and instrumental factors poses a significant challenge for unified invasive brain tone decoding. Traditional subject-specific models, which operate under a heterogeneous decoding paradigm, fail to capture generalized neural representations and cannot effectively leverage data across subjects. To address these limitations, we introduce Homogeneity-Heterogeneity Disentangled Learning for neural Representations (H2DiLR), a novel framework that disentangles and learns both the homogeneity and heterogeneity from intracranial recordings across multiple subjects. To evaluate H2DiLR, we collected stereoelectroencephalography (sEEG) data from multiple participants reading Mandarin materials comprising 407 syllables, representing nearly all Mandarin characters. Extensive experiments demonstrate that H2DiLR, as a unified decoding paradigm, significantly outperforms the conventional heterogeneous decoding approach. Furthermore, we empirically confirm that H2DiLR effectively captures both homogeneity and heterogeneity during neural representation learning., Comment: Preprint V1 with 10 pages main text
Published: 2024

83. A Hybrid Sampling and Multi-Objective Optimization Approach for Enhanced Software Defect Prediction

Author: Zhang, Jie, Li, Dongcheng, Wong, W. Eric, and Wang, Shengrong
Subjects: Computer Science - Software Engineering
Abstract: Accurate early prediction of software defects is essential to maintain software quality and reduce maintenance costs. However, the field of software defect prediction (SDP) faces challenges such as class imbalances, high-dimensional feature spaces, and suboptimal prediction accuracy. To mitigate these challenges, this paper introduces a novel SDP framework that integrates hybrid sampling techniques, specifically Borderline SMOTE and Tomek Links, with a suite of multi-objective optimization algorithms, including NSGA-II, MOPSO, and MODE. The proposed model applies feature fusion through multi-objective optimization, enhancing both the generalization capability and stability of the predictions. Furthermore, the integration of parallel processing for these optimization algorithms significantly boosts the computational efficiency of the model. Comprehensive experiments conducted on datasets from NASA and PROMISE repositories demonstrate that the proposed hybrid sampling and multi-objective optimization approach improves data balance, eliminates redundant features, and enhances prediction accuracy. The experimental results also highlight the robustness of the feature fusion approach, confirming its superiority over existing state-of-the-art techniques in terms of predictive performance and applicability across diverse datasets.
Published: 2024

84. HASN: Hybrid Attention Separable Network for Efficient Image Super-resolution

Author: Cao, Weifeng, Lei, Xiaoyan, Shi, Jun, Liang, Wanyong, Liu, Jie, and Bai, Zongfei
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently, lightweight methods for single image super-resolution (SISR) have gained significant popularity and achieved impressive performance due to limited hardware resources. These methods demonstrate that adopting residual feature distillation is an effective way to enhance performance. However, we find that using residual connections after each block increases the model's storage and computational cost. Therefore, to simplify the network structure and learn higher-level features and relationships between features, we use depthwise separable convolutions, fully connected layers, and activation functions as the basic feature extraction modules. This significantly reduces computational load and the number of parameters while maintaining strong feature extraction capabilities. To further enhance model performance, we propose the Hybrid Attention Separable Block (HASB), which combines channel attention and spatial attention, thus making use of their complementary advantages. Additionally, we use depthwise separable convolutions instead of standard convolutions, significantly reducing the computational load and the number of parameters while maintaining strong feature extraction capabilities. During the training phase, we also adopt a warm-start retraining strategy to exploit the potential of the model further. Extensive experiments demonstrate the effectiveness of our approach. Our method achieves a smaller model size and reduced computational complexity without compromising performance. Code can be available at https://github.com/nathan66666/HASN.git, Comment: Accepted by Visual Computer
Published: 2024
Full Text: View/download PDF

85. Accelerating Mixed-Precision Out-of-Core Cholesky Factorization with Static Task Scheduling

Author: Ren, Jie, Ltaief, Hatem, Abdulah, Sameh, and Keyes, David E.
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: This paper explores the performance optimization of out-of-core (OOC) Cholesky factorization on shared-memory systems equipped with multiple GPUs. We employ fine-grained computational tasks to expose concurrency while creating opportunities to overlap data movement asynchronously with computations, especially when dealing with matrices that cannot fit on the GPU memory. We leverage the directed acyclic graph of the task-based Cholesky factorization and map it onto a static scheduler that promotes data reuse while supporting strategies for reducing data movement with the CPU host when the GPU memory is exhausted. The CPU-GPU interconnect may become the main performance bottleneck as the gap between the GPU execution rate and the traditional PCIe bandwidth continues to widen. While the surface-to-volume effect of compute-bound kernels partially mitigates the overhead of data motion, deploying mixed-precision (MxP) computations exacerbates the throughput discrepancy. Using static task scheduling, we evaluate the performance capabilities of the new ultra-fast NVIDIA chip interconnect technology, codenamed NVLink-C2C, that constitutes the backbone of the NVIDIA Grace Hopper Superchip (GH200), against a new four-precision (FP64/FP32/FP16/FP8) left-looking Cholesky factorization. We report the performance results of a benchmarking campaign on various NVIDIA GPU generations and interconnects. We highlight 20% performance superiority against cuSOLVER on a single GH200 with FP64 while hiding the cost of OOC task-based Cholesky factorization, and we scale almost linearly on four GH200 superships. With MxP enabled, our statically scheduled four-precision tile-based Cholesky factorization scores a 3X performance speedup against its FP64-only counterpart, delivering application-worthy FP64 accuracy when modeling a large-scale geospatial statistical application.
Published: 2024

86. Diabetic retinopathy image classification method based on GreenBen data augmentation

Author: Liu, Yutong, Gao, Jie, and Zhu, Haijiang
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: For the diagnosis of diabetes retinopathy (DR) images, this paper proposes a classification method based on artificial intelligence. The core lies in a new data augmentation method, GreenBen, which first extracts the green channel grayscale image from the retinal image and then performs Ben enhancement. Considering that diabetes macular edema (DME) is a complication closely related to DR, this paper constructs a joint classification framework of DR and DME based on multi task learning and attention module, and uses GreenBen to enhance its data to reduce the difference of DR images and improve the accuracy of model classification. We conducted extensive experiments on three publicly available datasets, and our method achieved the best results. For GreenBen, whether based on the ResNet50 network or the Swin Transformer network, whether for individual classification or joint DME classification, compared with other data augmentation methods, GreenBen achieved stable and significant improvements in DR classification results, with an accuracy increase of 10%.
Published: 2024

87. Semileptonic and nonleptonic decays of $B_{u,d,s,c}^{*}$ in the covariant light-front approach

Author: Wang, Si-Yang, Yang, You-Ya, Sun, Zhi-Jie, Yang, Hao, Li, Peng, and Zhang, Zhi-Qing
Subjects: High Energy Physics - Phenomenology
Abstract: The semileptonic and nonleptonic decays of the b-flavor vector mesons $B^{*}_{u,d,s}$ and $B_{c}^{*}$ are investigated within the covariant light-front quark model (CLFQM). By calculating the form factors of the transitions $B_{u, d, s, c}^{*}\to P$ under the CLFQM, with $P$ denoting a pseudoscalar meson, i.e., $\pi, K, \eta_c(1S,2S), D_{(s)}, B_{(s)}$, we predict and discuss several physical observables, including the branching ratios, polarization fractions $f_{L}, f_{\|}$, and forward-backward asymmetries $A_{FB}$. The total widths of the single-photon radiative decay channels for these b-flavor vector mesons are estimated using their partial widths. In these considered decays, one can find that the semileptonic decays $B_{s}^{*0}\to D_{s}^{-}\ell^{\prime+}{\nu}_{\ell^\prime}$ and $B_{c}^{*+}\to B_{s}^{0}\ell^{\prime+}{\nu}_{\ell^\prime}, \eta_{c}\ell^{\prime+}{\nu}_{\ell^\prime}$, with $\ell^\prime$ being $e$ or $\tau$, and the nonleptonic channels $B_{c}^{*+}\to B^0_{s} \pi^{+}, B^0_{s} \rho^{+}$ have the largest branching ratios, which can reach up to the $10^{-7}$ order, and are most likely to be accessible at the future high-luminosity LHCb and Belle-II experiments., Comment: 30 pages, 4 figures,accepted for publication in Chin. Phys. C
Published: 2024

88. Reducing Data Bottlenecks in Distributed, Heterogeneous Neural Networks

Author: Lin, Ruhai, Zhu, Rui-Jie, and Eshraghian, Jason K.
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Neural and Evolutionary Computing
Abstract: The rapid advancement of embedded multicore and many-core systems has revolutionized computing, enabling the development of high-performance, energy-efficient solutions for a wide range of applications. As models scale up in size, data movement is increasingly the bottleneck to performance. This movement of data can exist between processor and memory, or between cores and chips. This paper investigates the impact of bottleneck size, in terms of inter-chip data traffic, on the performance of deep learning models in embedded multicore and many-core systems. We conduct a systematic analysis of the relationship between bottleneck size, computational resource utilization, and model accuracy. We apply a hardware-software co-design methodology where data bottlenecks are replaced with extremely narrow layers to reduce the amount of data traffic. In effect, time-multiplexing of signals is replaced by learnable embeddings that reduce the demands on chip IOs. Our experiments on the CIFAR100 dataset demonstrate that the classification accuracy generally decreases as the bottleneck ratio increases, with shallower models experiencing a more significant drop compared to deeper models. Hardware-side evaluation reveals that higher bottleneck ratios lead to substantial reductions in data transfer volume across the layers of the neural network. Through this research, we can determine the trade-off between data transfer volume and model performance, enabling the identification of a balanced point that achieves good performance while minimizing data transfer volume. This characteristic allows for the development of efficient models that are well-suited for resource-constrained environments.
Published: 2024

89. Conjugation on reddening sequences and reddening potentials

Author: Liu, Siyang and Pan, Jie
Subjects: Mathematics - Combinatorics, 13F60
Abstract: We describe the conjugation of the reddening sequence according to the formula of $c$-vectors with respect to changing of the initial seed. As applications, we extend the Rotation Lemma, the Target before Source Theorem, and the mutation invariant property of the existence of reddening sequences to totally sign-skew-symmetric cluster algebras. Furthermore, this also leads to the construction of reddening potential which characterizes the number of red mutations a maximal green sequence should admit in any matrix pattern with the initial seed changed via mutations.
Published: 2024

90. Towards Scalable Semantic Representation for Recommendation

Author: Zhang, Taolin, Pan, Junwei, Wang, Jinpeng, Zha, Yaohua, Dai, Tao, Chen, Bin, Luo, Ruisheng, Deng, Xiaoxiang, Wang, Yuan, Yue, Ming, Jiang, Jie, and Xia, Shu-Tao
Subjects: Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: With recent advances in large language models (LLMs), there has been emerging numbers of research in developing Semantic IDs based on LLMs to enhance the performance of recommendation systems. However, the dimension of these embeddings needs to match that of the ID embedding in recommendation, which is usually much smaller than the original length. Such dimension compression results in inevitable losses in discriminability and dimension robustness of the LLM embeddings, which motivates us to scale up the semantic representation. In this paper, we propose Mixture-of-Codes, which first constructs multiple independent codebooks for LLM representation in the indexing stage, and then utilizes the Semantic Representation along with a fusion module for the downstream recommendation stage. Extensive analysis and experiments demonstrate that our method achieves superior discriminability and dimension robustness scalability, leading to the best scale-up performance in recommendations.
Published: 2024

91. Emphasis Rendering for Conversational Text-to-Speech with Multi-modal Multi-scale Context Modeling

Author: Liu, Rui, Jia, Zhenqi, Yang, Jie, Hu, Yifan, and Li, Haizhou
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Conversational Text-to-Speech (CTTS) aims to accurately express an utterance with the appropriate style within a conversational setting, which attracts more attention nowadays. While recognizing the significance of the CTTS task, prior studies have not thoroughly investigated speech emphasis expression, which is essential for conveying the underlying intention and attitude in human-machine interaction scenarios, due to the scarcity of conversational emphasis datasets and the difficulty in context understanding. In this paper, we propose a novel Emphasis Rendering scheme for the CTTS model, termed ER-CTTS, that includes two main components: 1) we simultaneously take into account textual and acoustic contexts, with both global and local semantic modeling to understand the conversation context comprehensively; 2) we deeply integrate multi-modal and multi-scale context to learn the influence of context on the emphasis expression of the current utterance. Finally, the inferred emphasis feature is fed into the neural speech synthesizer to generate conversational speech. To address data scarcity, we create emphasis intensity annotations on the existing conversational dataset (DailyTalk). Both objective and subjective evaluations suggest that our model outperforms the baseline models in emphasis rendering within a conversational setting. The code and audio samples are available at https://github.com/CodeStoreTTS/ER-CTTS., Comment: submitted to IEEE Transaction
Published: 2024

92. Bridging Gaps: Federated Multi-View Clustering in Heterogeneous Hybrid Views

Author: Chen, Xinyue, Ren, Yazhou, Xu, Jie, Lin, Fangfei, Pu, Xiaorong, and Yang, Yang
Subjects: Computer Science - Machine Learning
Abstract: Recently, federated multi-view clustering (FedMVC) has emerged to explore cluster structures in multi-view data distributed on multiple clients. Existing approaches often assume that clients are isomorphic and all of them belong to either single-view clients or multi-view clients. Despite their success, these methods also present limitations when dealing with practical FedMVC scenarios involving heterogeneous hybrid views, where a mixture of both single-view and multi-view clients exhibit varying degrees of heterogeneity. In this paper, we propose a novel FedMVC framework, which concurrently addresses two challenges associated with heterogeneous hybrid views, i.e., client gap and view gap. To address the client gap, we design a local-synergistic contrastive learning approach that helps single-view clients and multi-view clients achieve consistency for mitigating heterogeneity among all clients. To address the view gap, we develop a global-specific weighting aggregation method, which encourages global models to learn complementary features from hybrid views. The interplay between local-synergistic contrastive learning and global-specific weighting aggregation mutually enhances the exploration of the data cluster structures distributed on multiple clients. Theoretical analysis and extensive experiments demonstrate that our method can handle the heterogeneous hybrid views in FedMVC and outperforms state-of-the-art methods. The code is available at \url{https://github.com/5Martina5/FMCSC}.
Published: 2024

93. Exploiting Moving Arrays for Near-Field Sensing

Author: Chen, Yilong, Ren, Zixiang, Yu, Xianghao, Liu, Lei, and Xu, Jie
Subjects: Electrical Engineering and Systems Science - Signal Processing
Abstract: This letter exploits moving arrays to enable nearfield multiple-input multiple-output (MIMO) sensing via a limited number of antenna elements. We consider a scenario where a base station (BS) is equipped with a uniform linear array (ULA) on a moving platform. The objective is to locate a point target in the two-dimensional (2D) space by leveraging the near-field channel characteristics created by the movement of antenna arrays. Under this setup, we analyze the Cramer-Rao bound (CRB) for estimating the target's 2D coordinate, which provides the fundamental sensing performance limits for localization. It is revealed that our proposed design with a moving array achieves a CRB that is proportional to the CRB obtained by an equivalent extremely large ULA matching the platform's size. This shows that the movement of antenna array significantly enlarges its effective aperture to enable near-field sensing. Numerical results show that the proposed moving array design substantially enhances the target estimation performance compared to the conventional fixed array benchmark., Comment: 5 pages, 7 figures
Published: 2024

94. Enterprise Benchmarks for Large Language Model Evaluation

Author: Zhang, Bing, Takeuchi, Mikio, Kawahara, Ryo, Asthana, Shubhi, Hossain, Md. Maruf, Ren, Guang-Jie, Soule, Kate, and Zhu, Yada
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computational Engineering, Finance, and Science
Abstract: The advancement of large language models (LLMs) has led to a greater challenge of having a rigorous and systematic evaluation of complex tasks performed, especially in enterprise applications. Therefore, LLMs need to be able to benchmark enterprise datasets for various tasks. This work presents a systematic exploration of benchmarking strategies tailored to LLM evaluation, focusing on the utilization of domain-specific datasets and consisting of a variety of NLP tasks. The proposed evaluation framework encompasses 25 publicly available datasets from diverse enterprise domains like financial services, legal, cyber security, and climate and sustainability. The diverse performance of 13 models across different enterprise tasks highlights the importance of selecting the right model based on the specific requirements of each task. Code and prompts are available on GitHub.
Published: 2024

95. Unraveling and Mitigating Safety Alignment Degradation of Vision-Language Models

Author: Liu, Qin, Shang, Chao, Liu, Ling, Pappas, Nikolaos, Ma, Jie, John, Neha Anna, Doss, Srikanth, Marquez, Lluis, Ballesteros, Miguel, and Benajiba, Yassine
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: The safety alignment ability of Vision-Language Models (VLMs) is prone to be degraded by the integration of the vision module compared to its LLM backbone. We investigate this phenomenon, dubbed as ''safety alignment degradation'' in this paper, and show that the challenge arises from the representation gap that emerges when introducing vision modality to VLMs. In particular, we show that the representations of multi-modal inputs shift away from that of text-only inputs which represent the distribution that the LLM backbone is optimized for. At the same time, the safety alignment capabilities, initially developed within the textual embedding space, do not successfully transfer to this new multi-modal representation space. To reduce safety alignment degradation, we introduce Cross-Modality Representation Manipulation (CMRM), an inference time representation intervention method for recovering the safety alignment ability that is inherent in the LLM backbone of VLMs, while simultaneously preserving the functional capabilities of VLMs. The empirical results show that our framework significantly recovers the alignment ability that is inherited from the LLM backbone with minimal impact on the fluency and linguistic capabilities of pre-trained VLMs even without additional training. Specifically, the unsafe rate of LLaVA-7B on multi-modal input can be reduced from 61.53% to as low as 3.15% with only inference-time intervention. WARNING: This paper contains examples of toxic or harmful language., Comment: Preprint
Published: 2024

96. Double opponency serves as a basis for color constancy

Author: Yang, Kai-Fu and Li, Yong-Jie
Subjects: Quantitative Biology - Neurons and Cognition
Abstract: Color constancy (CC) is one of the important perceptual abilities of the human visual system, which states that despite changes in illumination, the perceived colors of surfaces generally tend to remain constant. Nevertheless, the mechanisms underlying CC have been debated for several decades. A specific type of cell, known as the double opponent cell in the primary visual cortex (V1), is strongly implicated in achieving CC. However, the exact functioning manner of this cell type remains uncertain. In this work, our quantitative analysis of concentric double-opponent cells in V1 revealed their ability to identify gray surfaces within color-biased scenes. These gray surfaces can then be used to estimate the illumination easily. For the first time, this finding offers a clear functional explanation of concentric double-opponent receptive fields of this cell type in the visual system. Building on this insight, we introduced a novel computational theory--gray-anchoring (GA) theory--to explain how CC is achieved in the visual system. Specifically, GA-based CC involves detecting and anchoring gray surfaces within complex scenes. Our new theory serves as a bridge among the retinex theory, anchoring theory, and the neural mechanisms underlying visual CC in color vision., Comment: 12 pages, 3 figures
Published: 2024

97. On the token distance modeling ability of higher RoPE attention dimension

Author: Hong, Xiangyu, Jiang, Che, Qi, Biqing, Meng, Fandong, Yu, Mo, Zhou, Bowen, and Zhou, Jie
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Length extrapolation algorithms based on Rotary position embedding (RoPE) have shown promising results in extending the context length of language models. However, understanding how position embedding can capture longer-range contextual information remains elusive. Based on the intuition that different dimensions correspond to different frequency of changes in RoPE encoding, we conducted a dimension-level analysis to investigate the correlation between a hidden dimension of an attention head and its contribution to capturing long-distance dependencies. Using our correlation metric, we identified a particular type of attention heads, which we named Positional Heads, from various length-extrapolated models. These heads exhibit a strong focus on long-range information interaction and play a pivotal role in long input processing, as evidence by our ablation. We further demonstrate the correlation between the efficiency of length extrapolation and the extension of the high-dimensional attention allocation of these heads. The identification of Positional Heads provides insights for future research in long-text comprehension., Comment: Accepted to EMNLP 2024 Findings
Published: 2024

98. Is the Gum Nebula an Important Interstellar Scattering Disk of Background Pulsars?

Author: Wang, Rui, Yan, Zhen, Shen, Zhiqiang, Lee, KeJia, Wu, Yajun, Zhao, Rongbing, Huang, Zhipeng, Wang, Xiaowei, and Liu, Jie
Subjects: Astrophysics - High Energy Astrophysical Phenomena
Abstract: The Gum Nebula is a faint supernova remnant extending about 40 degrees across the southern sky, potentially affecting tens of background pulsars. Though the view that the Gum Nebula acts as a potential scattering screen for background pulsars has been recurrently mentioned over the past five decades, it has not been directly confirmed. We chose the strong background pulsar PSR~B0740$-$28 as a probe and monitored its diffractive interstellar scintillation (DISS) at 2.25~$\&$~8.60~GHz simultaneously for about two years using the Shanghai Tian Ma Radio Telescope (TMRT). DISS was detected at both frequencies and quantified by two-dimensional autocorrelation analysis. We calculated their scattering spectral index $\alpha$ and found that 9/21 of the observations followed the theoretical predictions, while 4/21 of them clearly showed $\alpha < 4$. This finding provides strong support for anomalous scattering along the pulsar line of sight, due to the large frequency lever arm and the simultaneous features of our dual-frequency observations. In comparison to the 2.25~GHz observations, scintillation arcs were observed in 10/21 of the secondary spectrum plots for 8.60~GHz observations. Consequently, the highest frequency record for pulsar scintillation arc detection was updated to 8.60~GHz. Our fitting results were the most direct evidence for the view that the Gum Nebula acts as the scattering screen for background pulsars, because both the distance ($245^{+69}_{-72}$~pc) and transverse speed ($22.4^{+4.1}_{-4.2}$~km/s) of the scintillation screen are comparable with related parameters of the Gum Nebula. Our findings indicated that anisotropic scattering provides a superior explanation for the annual modulation of scintillation arcs than isotropic scattering. Additionally, the orientation of its long axis was also fitted., Comment: Accepted by SCIENCE CHINA Physics, Mechanics & Astronomy
Published: 2024

99. The forward-backward asymmetry induced $CP$ asymmetry in ${\overline{B}}^{0}\rightarrow K^{-}\pi^{+}\pi^{0}$ in phase space around the resonances ${\overline{K}}^{}(892)^{0}$ and ${\overline{K}}^{}_{0}(700)$

Author: Qi, Jing-Juan, Zhao, Yu-Jie, and Zhang, Zhen-Hua
Subjects: High Energy Physics - Phenomenology, High Energy Physics - Experiment
Abstract: The interference between amplitudes corresponding to different intermediate resonances plays an important role in generating large CP asymmetries in phase space in multi-body decays of bottom and charmed mesons. In this paper, we study the CP violation in the decay channel ${\overline{B}}^{0}\rightarrow K^{-}\pi^{+}\pi^{0}$ in phase space region where the intermediate resonances $\overline{K}^{*}(892)^{0}$ and ${\overline{K}^{*}_{0}(700)}$ dominate. The Forward-Backward Asymmetry (FBA) and the CP asymmetry induced by FBA (FB-CPA), which are closely related to the interference effects between the two aforementioned resonances, are especially investigated. The correlation of the behaviour of FBA and FB-CPA with the relative strong phase between the amplitude is analyzed., Comment: 16 pages, 5 figures
Published: 2024

100. Modeling and Simulation of 2D Transducers Based on Suspended Graphene-Based Heterostructures in Nanoelectromechanical Pressure Sensors

Author: Liu, Quan, He, Chang, Ding, Jie, Zhang, Wendong, and Fan, Xuge
Subjects: Condensed Matter - Mesoscale and Nanoscale Physics, Condensed Matter - Materials Science, Condensed Matter - Other Condensed Matter, Physics - Applied Physics
Abstract: Graphene-based 2D heterostructures exhibit excellent mechanical and electrical properties, which are expected to exhibit better performances than graphene for nanoelectromechanical pressure sensors. Here, we built the pressure sensor models based on suspended heterostructures of graphene/h-BN, graphene/MoS2, and graphene/MoSe2 by using COMSOL Multiphysics finite element software. We found that suspended circular 2D membranes show the best sensitivity to pressures compared to rectangular and square ones. We simulated the deflections, strains, resonant frequencies, and Young's moduli of suspended graphene-based heterostructures under the conditions of different applied pressures and geometrical sizes, built-in tensions, and the number of atomic layers of 2D membranes. The Young's moduli of 2D heterostructures of graphene, graphene/h-BN, graphene/MoS2, and graphene/MoSe2 were estimated to be 1.001TPa, 921.08 GPa, 551.11 GPa, and 475.68 GPa, respectively. We also discuss the effect of highly asymmetric cavities on device performance. These results would contribute to the understanding of the mechanical properties of graphene-based heterostructures and would be helpful for the design and manufacture of high-performance NEMS pressure sensors.
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,720,927 results on '"An, Jie"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources