222,209 results on '"Wang, Jun"'
Search Results
2. $I=2$ $\pi\pi$ $s$-wave scattering length from lattice QCD
- Author
-
Fu, Ziwen and Wang, Jun
- Subjects
High Energy Physics - Lattice - Abstract
The $I=2$ $\pi\pi$ elastic $s$-wave scattering phase shift is measured by lattice QCD with $N_f=3$ flavors of the Asqtad-improved staggered fermions. The lattice-calculated energy-eigenvalues of $\pi\pi$ systems at one center of mass frame and some moving frames using the moving wall source technique are utilized to secure phase shifts by L\"uscher's formula. Our computations are fine enough to obtain threshold parameters: scattering length $a$, effective range $r$, and shape parameter $P$, which can be extrapolated at the physical point by NLO in chiral perturbation theory, and our relevant NNLO predictions from expanding NPLQCD's works are novelly considered as the systematic uncertainties. Our outcomes are consistent with Roy equation determinations, newer experimental data, and lattice estimations. Numerical computations are performed with a coarse ($a\approx0.12$~fm, $L^3 T = 32^3 64$), two fine ($a\approx0.09$~fm, $L^3 T = 40^3 96$) and a superfine ($a\approx0.06$~fm, $L^3 T = 48^3 144$) lattice ensembles at four pion masses of $m_\pi\sim247~{\rm MeV}$, $249~{\rm MeV}$, $275~{\rm MeV}$, and $384~{\rm MeV}$, respectively., Comment: 8 figures
- Published
- 2024
3. COCO-Occ: A Benchmark for Occluded Panoptic Segmentation and Image Understanding
- Author
-
Wei, Wenbo, Wang, Jun, and Bhalerao, Abhir
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
To help address the occlusion problem in panoptic segmentation and image understanding, this paper proposes a new large-scale dataset, COCO-Occ, which is derived from the COCO dataset by manually labelling the COCO images into three perceived occlusion levels. Using COCO-Occ, we systematically assess and quantify the impact of occlusion on panoptic segmentation on samples having different levels of occlusion. Comparative experiments with SOTA panoptic models demonstrate that the presence of occlusion significantly affects performance with higher occlusion levels resulting in notably poorer performance. Additionally, we propose a straightforward yet effective method as an initial attempt to leverage the occlusion annotation using contrastive learning to render a model that learns a more robust representation capturing different severities of occlusion. Experimental results demonstrate that the proposed approach boosts the performance of the baseline model and achieves SOTA performance on the proposed COCO-Occ dataset.
- Published
- 2024
4. Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis
- Author
-
Huang, Zhiqi, Luo, Dan, Wang, Jun, Liao, Huan, Li, Zhiheng, and Wu, Zhiyong
- Subjects
Computer Science - Sound ,Computer Science - Multimedia ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Our research introduces an innovative framework for video-to-audio synthesis, which solves the problems of audio-video desynchronization and semantic loss in the audio. By incorporating a semantic alignment adapter and a temporal synchronization adapter, our method significantly improves semantic integrity and the precision of beat point synchronization, particularly in fast-paced action sequences. Utilizing a contrastive audio-visual pre-trained encoder, our model is trained with video and high-quality audio data, improving the quality of the generated audio. This dual-adapter approach empowers users with enhanced control over audio semantics and beat effects, allowing the adjustment of the controller to achieve better results. Extensive experiments substantiate the effectiveness of our framework in achieving seamless audio-visual alignment.
- Published
- 2024
5. E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning
- Author
-
Liao, Zihan, Wang, Jun, Yu, Hang, Wei, Lingxiao, Li, Jianguo, and Zhang, Wei
- Subjects
Computer Science - Computation and Language - Abstract
In the realm of Large Language Models (LLMs), the ability to process long contexts is increasingly crucial for tasks such as multi-round dialogues, code generation, and document summarization. This paper addresses the challenges of enhancing the long-context performance, reducing computational complexity, and leveraging pretrained models collectively termed the "impossible triangle." We introduce E2LLM (Encoder Elongated Large Language Models), a novel approach that effectively navigates this paradox. The method involves splitting long contexts into chunks, compressing each into embedding vectors via a pretrained text encoder, and utilizing an adapter to align these representations with a decoder-only LLM. Two training objectives, focusing on reconstruction of the encoder output and long-context instruction fine-tuning, are employed to facilitate the understanding of soft prompts by the LLM. Experimental results demonstrate that E2LLM achieves superior performance in long-context scenarios while balancing efficiency, performance, and compatibility with pretrained models. Our framework thus represents a significant advancement in the field, contributing to effective long-text modeling., Comment: 12 pages, 4 figures
- Published
- 2024
6. AS-Speech: Adaptive Style For Speech Synthesis
- Author
-
Li, Zhipeng, Xing, Xiaofen, Wang, Jun, Chen, Shuaiqi, Yu, Guoqiao, Wan, Guanglu, and Xu, Xiangmin
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
In recent years, there has been significant progress in Text-to-Speech (TTS) synthesis technology, enabling the high-quality synthesis of voices in common scenarios. In unseen situations, adaptive TTS requires a strong generalization capability to speaker style characteristics. However, the existing adaptive methods can only extract and integrate coarse-grained timbre or mixed rhythm attributes separately. In this paper, we propose AS-Speech, an adaptive style methodology that integrates the speaker timbre characteristics and rhythmic attributes into a unified framework for text-to-speech synthesis. Specifically, AS-Speech can accurately simulate style characteristics through fine-grained text-based timbre features and global rhythm information, and achieve high-fidelity speech synthesis through the diffusion model. Experiments show that the proposed model produces voices with higher naturalness and similarity in terms of timbre and rhythm compared to a series of adaptive TTS models., Comment: Accepted by SLT 2024
- Published
- 2024
7. A new unified dark sector model and its implications on the $\sigma_8$ and $S_8$ tensions
- Author
-
Yao, Yan-Hong, Liu, Jian-Qi, Huang, Zhi-Qi, Wang, Jun-Chao, and Su, Yan
- Subjects
Astrophysics - Cosmology and Nongalactic Astrophysics - Abstract
In this paper, we introduced the Unified Three-Form Dark Sector (UTFDS) model, a unified dark sector model that combines dark energy and dark matter through a three-form field. In this framework, the potential of the three-form field acts as dark matter, while the kinetic term represents dark energy. The interaction between dark matter and dark energy is driven by the energy exchange between these two terms. Given the dynamical equations of UTFDS, we provide an autonomous system of evolution equations for UTFDS and perform a stability analysis of its fixed points. The result aligns with our expectations for a unified dark sector. Furthermore, we discover that the dual Lagrangian of the UTFDS Lagrangian is equivalent to a Dirac-Born-Infeld (DBI) Lagrangian. By fixing the parameter $\kappa X_0$ to 250, 500, 750, we refer to the resulting models as the $\overline{\rm UTFDS}$ model with $\kappa X_0$=250, 500, 750, respectively. We then place constraints on these three $\overline{\rm UTFDS}$ models and the $\Lambda$CDM model in light of the Planck 2018 Cosmic Microwave Background (CMB) anisotropies, Redshift Space Distortions (RSD) observations, Baryon Acoustic Oscillation (BAO) measurements, and the $S_8$ prior chosen according to the KiDS1000 Weak gravitational Lensing (WL) measuement. We find that the $\overline{\rm UTFDS}$ model with $\kappa X_0$=500 is the only one among the four models where both $\sigma_8$ and $S_8$ tensions, between CMB and RSD+BAO+WL datasets, are below 2.0$\sigma$. Furthermore, the tensions are relieved without exacerbating the $H_0$ tension. Although both the CMB and RSD+BAO+WL datasets provide definite/positive evidence favoring $\Lambda$CDM over the $\overline{\rm UTFDS}$ model with $\kappa X_0$=500, the evidence is not strong enough to rule out further study of this model., Comment: 16 pages, 6 tables, 5 figures
- Published
- 2024
8. On pseudo-nullity of fine Mordell-Weil group
- Author
-
Lim, Meng Fai, Qin, Chao, and Wang, Jun
- Subjects
Mathematics - Number Theory - Abstract
Let $E$ be an elliptic curve defined over $\mathbb{Q}$ with good ordinary reduction at a prime $p\geq 5$, and let $F$ be an imaginary quadratic field. Under appropriate assumptions, we show that the Pontryagin dual of the fine Mordell-Weil group of $E$ over the $\mathbb{Z}_p^2$-extension of $F$ is pseudo-null as a module over the Iwasawa algebra of the group $\mathbb{Z}_p^2$.
- Published
- 2024
9. Dual Advancement of Representation Learning and Clustering for Sparse and Noisy Images
- Author
-
Li, Wenlin, Xu, Yucheng, Zheng, Xiaoqing, Han, Suoya, Wang, Jun, and Sun, Xiaobo
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Sparse and noisy images (SNIs), like those in spatial gene expression data, pose significant challenges for effective representation learning and clustering, which are essential for thorough data analysis and interpretation. In response to these challenges, we propose Dual Advancement of Representation Learning and Clustering (DARLC), an innovative framework that leverages contrastive learning to enhance the representations derived from masked image modeling. Simultaneously, DARLC integrates cluster assignments in a cohesive, end-to-end approach. This integrated clustering strategy addresses the "class collision problem" inherent in contrastive learning, thus improving the quality of the resulting representations. To generate more plausible positive views for contrastive learning, we employ a graph attention network-based technique that produces denoised images as augmented data. As such, our framework offers a comprehensive approach that improves the learning of representations by enhancing their local perceptibility, distinctiveness, and the understanding of relational semantics. Furthermore, we utilize a Student's t mixture model to achieve more robust and adaptable clustering of SNIs. Extensive experiments, conducted across 12 different types of datasets consisting of SNIs, demonstrate that DARLC surpasses the state-of-the-art methods in both image clustering and generating image representations that accurately capture gene interactions. Code is available at https://github.com/zipging/DARLC.
- Published
- 2024
10. Self-evolving Agents with reflective and memory-augmented abilities
- Author
-
Liang, Xuechen, Tao, Meiling, Xia, Yinghui, Shi, Tianyu, Wang, Jun, and Yang, JingSong
- Subjects
Computer Science - Computation and Language - Abstract
Large language models (LLMs) have made significant advances in the field of natural language processing, but they still face challenges such as continuous decision-making. In this research, we propose a novel framework by integrating iterative feedback, reflective mechanisms, and a memory optimization mechanism based on the Ebbinghaus forgetting curve, it significantly enhances the agents' capabilities in handling multi-tasking and long-span information.
- Published
- 2024
11. LanguaShrink: Reducing Token Overhead with Psycholinguistics
- Author
-
Liang, Xuechen, Tao, Meiling, Xia, Yinghui, Shi, Tianyu, Wang, Jun, and Yang, JingSong
- Subjects
Computer Science - Computation and Language ,Statistics - Machine Learning - Abstract
As large language models (LLMs) improve their capabilities in handling complex tasks, the issues of computational cost and efficiency due to long prompts are becoming increasingly prominent. To accelerate model inference and reduce costs, we propose an innovative prompt compression framework called LanguaShrink. Inspired by the observation that LLM performance depends on the density and position of key information in the input prompts, LanguaShrink leverages psycholinguistic principles and the Ebbinghaus memory curve to achieve task-agnostic prompt compression. This effectively reduces prompt length while preserving essential information. We referred to the training method of OpenChat.The framework introduces part-of-speech priority compression and data distillation techniques, using smaller models to learn compression targets and employing a KL-regularized reinforcement learning strategy for training.\cite{wang2023openchat} Additionally, we adopt a chunk-based compression algorithm to achieve adjustable compression rates. We evaluate our method on multiple datasets, including LongBench, ZeroScrolls, Arxiv Articles, and a newly constructed novel test set. Experimental results show that LanguaShrink maintains semantic similarity while achieving up to 26 times compression. Compared to existing prompt compression methods, LanguaShrink improves end-to-end latency by 1.43 times.
- Published
- 2024
12. Reflective Human-Machine Co-adaptation for Enhanced Text-to-Image Generation Dialogue System
- Author
-
Feng, Yuheng, He, Yangfan, Xia, Yinghui, Shi, Tianyu, Wang, Jun, and Yang, Jinsong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Today's image generation systems are capable of producing realistic and high-quality images. However, user prompts often contain ambiguities, making it difficult for these systems to interpret users' potential intentions. Consequently, machines need to interact with users multiple rounds to better understand users' intents. The unpredictable costs of using or learning image generation models through multiple feedback interactions hinder their widespread adoption and full performance potential, especially for non-expert users. In this research, we aim to enhance the user-friendliness of our image generation system. To achieve this, we propose a reflective human-machine co-adaptation strategy, named RHM-CAS. Externally, the Agent engages in meaningful language interactions with users to reflect on and refine the generated images. Internally, the Agent tries to optimize the policy based on user preferences, ensuring that the final outcomes closely align with user preferences. Various experiments on different tasks demonstrate the effectiveness of the proposed method.
- Published
- 2024
13. Toda lattice and Riemann type minimal surfaces
- Author
-
Gui, Changfeng, Liu, Yong, Wang, Jun, and Yang, Wen
- Subjects
Nonlinear Sciences - Exactly Solvable and Integrable Systems ,Mathematics - Analysis of PDEs ,Mathematics - Differential Geometry - Abstract
Toda lattice and minimal surfaces are related to each other through Allen-Cahn equation. In view of the structure of the solutions of the Toda lattice, we find new balancing configuration using techniques of integrable systems. This allows us to construct new singly periodic minimal surfaces. The genus of these minimal surfaces equals $j(j+1)/2-1$. They are natural generalization of the Riemann minimal surfaces, which have genus zero., Comment: 16 pages
- Published
- 2024
14. Are LLM-based Recommenders Already the Best? Simple Scaled Cross-entropy Unleashes the Potential of Traditional Sequential Recommenders
- Author
-
Xu, Cong, Zhu, Zhangchi, Yu, Mo, Wang, Jun, Wang, Jianyong, and Zhang, Wei
- Subjects
Computer Science - Information Retrieval - Abstract
Large language models (LLMs) have been garnering increasing attention in the recommendation community. Some studies have observed that LLMs, when fine-tuned by the cross-entropy (CE) loss with a full softmax, could achieve `state-of-the-art' performance in sequential recommendation. However, most of the baselines used for comparison are trained using a pointwise/pairwise loss function. This inconsistent experimental setting leads to the underestimation of traditional methods and further fosters over-confidence in the ranking capability of LLMs. In this study, we provide theoretical justification for the superiority of the cross-entropy loss by demonstrating its two desirable properties: tightness and coverage. Furthermore, this study sheds light on additional novel insights: 1) Taking into account only the recommendation performance, CE is not yet optimal as it is not a quite tight bound in terms of some ranking metrics. 2) In scenarios that full softmax cannot be performed, an effective alternative is to scale up the sampled normalizing term. These findings then help unleash the potential of traditional recommendation models, allowing them to surpass LLM-based counterparts. Given the substantial computational burden, existing LLM-based methods are not as effective as claimed for sequential recommendation. We hope that these theoretical understandings in conjunction with the empirical results will facilitate an objective evaluation of LLM-based recommendation in the future., Comment: 18 pages. arXiv admin note: substantial text overlap with arXiv:2402.06216
- Published
- 2024
15. PAGE: Parametric Generative Explainer for Graph Neural Network
- Author
-
Qiu, Yang, Liu, Wei, Wang, Jun, and Li, Ruixuan
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
This article introduces PAGE, a parameterized generative interpretive framework. PAGE is capable of providing faithful explanations for any graph neural network without necessitating prior knowledge or internal details. Specifically, we train the auto-encoder to generate explanatory substructures by designing appropriate training strategy. Due to the dimensionality reduction of features in the latent space of the auto-encoder, it becomes easier to extract causal features leading to the model's output, which can be easily employed to generate explanations. To accomplish this, we introduce an additional discriminator to capture the causality between latent causal features and the model's output. By designing appropriate optimization objectives, the well-trained discriminator can be employed to constrain the encoder in generating enhanced causal features. Finally, these features are mapped to substructures of the input graph through the decoder to serve as explanations. Compared to existing methods, PAGE operates at the sample scale rather than nodes or edges, eliminating the need for perturbation or encoding processes as seen in previous methods. Experimental results on both artificially synthesized and real-world datasets demonstrate that our approach not only exhibits the highest faithfulness and accuracy but also significantly outperforms baseline models in terms of efficiency.
- Published
- 2024
16. Topological GCN for Improving Detection of Hip Landmarks from B-Mode Ultrasound Images
- Author
-
Huang, Tianxiang, Shi, Jing, Jin, Ge, Li, Juncheng, Wang, Jun, Du, Jun, and Shi, Jun
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
The B-mode ultrasound based computer-aided diagnosis (CAD) has demonstrated its effectiveness for diagnosis of Developmental Dysplasia of the Hip (DDH) in infants. However, due to effect of speckle noise in ultrasound im-ages, it is still a challenge task to accurately detect hip landmarks. In this work, we propose a novel hip landmark detection model by integrating the Topological GCN (TGCN) with an Improved Conformer (TGCN-ICF) into a unified frame-work to improve detection performance. The TGCN-ICF includes two subnet-works: an Improved Conformer (ICF) subnetwork to generate heatmaps and a TGCN subnetwork to additionally refine landmark detection. This TGCN can effectively improve detection accuracy with the guidance of class labels. Moreo-ver, a Mutual Modulation Fusion (MMF) module is developed for deeply ex-changing and fusing the features extracted from the U-Net and Transformer branches in ICF. The experimental results on the real DDH dataset demonstrate that the proposed TGCN-ICF outperforms all the compared algorithms.
- Published
- 2024
17. cc-DRL: a Convex Combined Deep Reinforcement Learning Flight Control Design for a Morphing Quadrotor
- Author
-
Yang, Tao, Wu, Huai-Ning, and Wang, Jun-Wei
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Systems and Control - Abstract
In comparison to common quadrotors, the shape change of morphing quadrotors endows it with a more better flight performance but also results in more complex flight dynamics. Generally, it is extremely difficult or even impossible for morphing quadrotors to establish an accurate mathematical model describing their complex flight dynamics. To figure out the issue of flight control design for morphing quadrotors, this paper resorts to a combination of model-free control techniques (e.g., deep reinforcement learning, DRL) and convex combination (CC) technique, and proposes a convex-combined-DRL (cc-DRL) flight control algorithm for position and attitude of a class of morphing quadrotors, where the shape change is realized by the length variation of four arm rods. In the proposed cc-DRL flight control algorithm, proximal policy optimization algorithm that is a model-free DRL algorithm is utilized to off-line train the corresponding optimal flight control laws for some selected representative arm length modes and hereby a cc-DRL flight control scheme is constructed by the convex combination technique. Finally, simulation results are presented to show the effectiveness and merit of the proposed flight control algorithm.
- Published
- 2024
18. Constantly curved holomorphic two-spheres in the complex Grassmannian G(2,6) with constant square norm of the second fundamental form
- Author
-
Fei, Jie, He, Ling, and Wang, Jun
- Subjects
Mathematics - Differential Geometry - Abstract
We completely classify all noncongruent linearly full totally unramified constantly curved holomorphic two-spheres in G(2,6) with constant square norm of the second fundamental form. They turn out to be homogeneous., Comment: 21 pages
- Published
- 2024
19. xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
- Author
-
Qin, Can, Xia, Congying, Ramakrishnan, Krithika, Ryoo, Michael, Tu, Lifu, Feng, Yihao, Shu, Manli, Zhou, Honglu, Awadalla, Anas, Wang, Jun, Purushwalkam, Senthil, Xue, Le, Zhou, Yingbo, Wang, Huan, Savarese, Silvio, Niebles, Juan Carlos, Chen, Zeyuan, Xu, Ran, and Xiong, Caiming
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
We present xGen-VideoSyn-1, a text-to-video (T2V) generation model capable of producing realistic scenes from textual descriptions. Building on recent advancements, such as OpenAI's Sora, we explore the latent diffusion model (LDM) architecture and introduce a video variational autoencoder (VidVAE). VidVAE compresses video data both spatially and temporally, significantly reducing the length of visual tokens and the computational demands associated with generating long-sequence videos. To further address the computational costs, we propose a divide-and-merge strategy that maintains temporal consistency across video segments. Our Diffusion Transformer (DiT) model incorporates spatial and temporal self-attention layers, enabling robust generalization across different timeframes and aspect ratios. We have devised a data processing pipeline from the very beginning and collected over 13M high-quality video-text pairs. The pipeline includes multiple steps such as clipping, text detection, motion estimation, aesthetics scoring, and dense captioning based on our in-house video-LLM model. Training the VidVAE and DiT models required approximately 40 and 642 H100 days, respectively. Our model supports over 14-second 720p video generation in an end-to-end way and demonstrates competitive performance against state-of-the-art T2V models., Comment: Accepted by ECCV24 AI4VA
- Published
- 2024
20. 4D Diffusion for Dynamic Protein Structure Prediction with Reference Guided Motion Alignment
- Author
-
Cheng, Kaihui, Liu, Ce, Su, Qingkun, Wang, Jun, Zhang, Liwei, Tang, Yining, Yao, Yao, Zhu, Siyu, and Qi, Yuan
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Protein structure prediction is pivotal for understanding the structure-function relationship of proteins, advancing biological research, and facilitating pharmaceutical development and experimental design. While deep learning methods and the expanded availability of experimental 3D protein structures have accelerated structure prediction, the dynamic nature of protein structures has received limited attention. This study introduces an innovative 4D diffusion model incorporating molecular dynamics (MD) simulation data to learn dynamic protein structures. Our approach is distinguished by the following components: (1) a unified diffusion model capable of generating dynamic protein structures, including both the backbone and side chains, utilizing atomic grouping and side-chain dihedral angle predictions; (2) a reference network that enhances structural consistency by integrating the latent embeddings of the initial 3D protein structures; and (3) a motion alignment module aimed at improving temporal structural coherence across multiple time steps. To our knowledge, this is the first diffusion-based model aimed at predicting protein trajectories across multiple time steps simultaneously. Validation on benchmark datasets demonstrates that our model exhibits high accuracy in predicting dynamic 3D structures of proteins containing up to 256 amino acids over 32 time steps, effectively capturing both local flexibility in stable states and significant conformational changes.
- Published
- 2024
21. Dynamic PDB: A New Dataset and a SE(3) Model Extension by Integrating Dynamic Behaviors and Physical Properties in Protein Structures
- Author
-
Liu, Ce, Wang, Jun, Cai, Zhiqiang, Wang, Yingxu, Kuang, Huizhen, Cheng, Kaihui, Zhang, Liwei, Su, Qingkun, Tang, Yining, Cao, Fenglei, Han, Limei, Zhu, Siyu, and Qi, Yuan
- Subjects
Quantitative Biology - Biomolecules ,Computer Science - Artificial Intelligence - Abstract
Despite significant progress in static protein structure collection and prediction, the dynamic behavior of proteins, one of their most vital characteristics, has been largely overlooked in prior research. This oversight can be attributed to the limited availability, diversity, and heterogeneity of dynamic protein datasets. To address this gap, we propose to enhance existing prestigious static 3D protein structural databases, such as the Protein Data Bank (PDB), by integrating dynamic data and additional physical properties. Specifically, we introduce a large-scale dataset, Dynamic PDB, encompassing approximately 12.6K proteins, each subjected to all-atom molecular dynamics (MD) simulations lasting 1 microsecond to capture conformational changes. Furthermore, we provide a comprehensive suite of physical properties, including atomic velocities and forces, potential and kinetic energies of proteins, and the temperature of the simulation environment, recorded at 1 picosecond intervals throughout the simulations. For benchmarking purposes, we evaluate state-of-the-art methods on the proposed dataset for the task of trajectory prediction. To demonstrate the value of integrating richer physical properties in the study of protein dynamics and related model design, we base our approach on the SE(3) diffusion model and incorporate these physical properties into the trajectory prediction process. Preliminary results indicate that this straightforward extension of the SE(3) model yields improved accuracy, as measured by MAE and RMSD, when the proposed physical properties are taken into consideration. https://fudan-generative-vision.github.io/dynamicPDB/ .
- Published
- 2024
22. HiMA: Hierarchical Quantum Microarchitecture for Qubit-Scaling and Quantum Process-Level Parallelism
- Author
-
Zhou, Qi, Mei, Zi-Hao, Shi, Han-Qing, Guo, Liang-Liang, Yang, Xiao-Yan, Wang, Yun-Jie, Xu, Xiao-Fan, Xue, Cheng, Kong, Wei-Cheng, Wang, Jun-Chao, Wu, Yu-Chun, Chen, Zhao-Yun, and Guo, Guo-Ping
- Subjects
Computer Science - Hardware Architecture ,Quantum Physics - Abstract
Quantum computing holds immense potential for addressing a myriad of intricate challenges, which is significantly amplified when scaled to thousands of qubits. However, a major challenge lies in developing an efficient and scalable quantum control system. To address this, we propose a novel Hierarchical MicroArchitecture (HiMA) designed to facilitate qubit scaling and exploit quantum process-level parallelism. This microarchitecture is based on three core elements: (i) discrete qubit-level drive and readout, (ii) a process-based hierarchical trigger mechanism, and (iii) multiprocessing with a staggered triggering technique to enable efficient quantum process-level parallelism. We implement HiMA as a control system for a 72-qubit tunable superconducting quantum processing unit, serving a public quantum cloud computing platform, which is capable of expanding to 6144 qubits through three-layer cascading. In our benchmarking tests, HiMA achieves up to a 4.89x speedup under a 5-process parallel configuration. Consequently, to the best of our knowledge, we have achieved the highest CLOPS (Circuit Layer Operations Per Second), reaching up to 43,680, across all publicly available platforms.
- Published
- 2024
23. Unprecedented Central Engine 'Breathing' Phenomenon in an Active Supermassive Black Hole
- Author
-
Zhou, Shuying, Sun, Mouyuan, Feng, Hai-Cheng, Li, Sha-Sha, Xue, Yongquan, Wang, Jun-Xian, Cai, Zhen-Yi, Bai, Jin-Ming, Li, Danyang, Guo, Hengxiao, Liu, H. T., Lu, Kai-Xing, Mao, Jirong, Marculewicz, Marcin, and Wang, Jian-Guo
- Subjects
Astrophysics - High Energy Astrophysical Phenomena ,Astrophysics - Astrophysics of Galaxies - Abstract
Resolving the inner structures of active galactic nuclei (AGNs) provides the "standard ruler" to measure the parallax distances of the Universe and a powerful way to weigh supermassive black holes (SMBHs). Thanks to time-domain observations, it is possible to use the reverberation mapping (RM) technique to measure time delays between different light curves that probe the structures of the SMBH accretion disks and broad line regions (BLRs), which are otherwise often too compact to be spatially resolved for most AGNs. Despite decades of RM studies, the critical physical process that controls the structures of SMBH accretion disk and BLR and their temporal evolution remains unclear. Here we report the variation of the SMBH accretion disk structure of NGC 4151 in response to changes in luminosity within 6 years. In the high-flux state, the time delays measured from our continuum RM with high-cadence (2 days) spectroscopy are 3.8 times larger than that in the low-flux state and 15 times longer than the classical standard thin disk (SSD) prediction. This result provides the first piece of direct evidence that the SMBH disk structure "breathes" in highly-variable AGN manifestations. The time-delay change severely challenges the popular X-ray reprocessing of the SSD model, with or without BLR contributions. More importantly, the continuum time delays can be comparable with the time delay between the broad Hb line and the nearby optical continuum, and the latter is commonly used to calculate the BLR sizes. Hence, the BLR sizes are significantly underestimated if the continuum time delays are not properly considered. This underestimation introduces up to 0.3 dex systematic uncertainties on RM SMBH masses and BLR parallax distances. Our findings underscore that simultaneous continuum and BLR RM studies are vital for better deciphering the SMBH mass growth and the cosmological expansion history., Comment: 18 pages, 8 figures, comments welcome
- Published
- 2024
24. Adversarial Attack for Explanation Robustness of Rationalization Models
- Author
-
Zhang, Yuankai, Kong, Lingxiao, Wang, Haozhao, Li, Ruixuan, Wang, Jun, Li, Yuhua, and Liu, Wei
- Subjects
Computer Science - Computation and Language - Abstract
Rationalization models, which select a subset of input text as rationale-crucial for humans to understand and trust predictions-have recently emerged as a prominent research area in eXplainable Artificial Intelligence. However, most of previous studies mainly focus on improving the quality of the rationale, ignoring its robustness to malicious attack. Specifically, whether the rationalization models can still generate high-quality rationale under the adversarial attack remains unknown. To explore this, this paper proposes UAT2E, which aims to undermine the explainability of rationalization models without altering their predictions, thereby eliciting distrust in these models from human users. UAT2E employs the gradient-based search on triggers and then inserts them into the original input to conduct both the non-target and target attack. Experimental results on five datasets reveal the vulnerability of rationalization models in terms of explanation, where they tend to select more meaningless tokens under attacks. Based on this, we make a series of recommendations for improving rationalization models in terms of explanation.
- Published
- 2024
25. MsMemoryGAN: A Multi-scale Memory GAN for Palm-vein Adversarial Purification
- Author
-
Qin, Huafeng, Fu, Yuming, Zhang, Huiyan, El-Yacoubi, Mounim A., Gao, Xinbo, Song, Qun, and Wang, Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Deep neural networks have recently achieved promising performance in the vein recognition task and have shown an increasing application trend, however, they are prone to adversarial perturbation attacks by adding imperceptible perturbations to the input, resulting in making incorrect recognition. To address this issue, we propose a novel defense model named MsMemoryGAN, which aims to filter the perturbations from adversarial samples before recognition. First, we design a multi-scale autoencoder to achieve high-quality reconstruction and two memory modules to learn the detailed patterns of normal samples at different scales. Second, we investigate a learnable metric in the memory module to retrieve the most relevant memory items to reconstruct the input image. Finally, the perceptional loss is combined with the pixel loss to further enhance the quality of the reconstructed image. During the training phase, the MsMemoryGAN learns to reconstruct the input by merely using fewer prototypical elements of the normal patterns recorded in the memory. At the testing stage, given an adversarial sample, the MsMemoryGAN retrieves its most relevant normal patterns in memory for the reconstruction. Perturbations in the adversarial sample are usually not reconstructed well, resulting in purifying the input from adversarial perturbations. We have conducted extensive experiments on two public vein datasets under different adversarial attack methods to evaluate the performance of the proposed approach. The experimental results show that our approach removes a wide variety of adversarial perturbations, allowing vein classifiers to achieve the highest recognition accuracy.
- Published
- 2024
26. Advances in Multiple Instance Learning for Whole Slide Image Analysis: Techniques, Challenges, and Future Directions
- Author
-
Wang, Jun, Mao, Yu, Guan, Nan, and Xue, Chun Jason
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Whole slide images (WSIs) are gigapixel-scale digital images of H\&E-stained tissue samples widely used in pathology. The substantial size and complexity of WSIs pose unique analytical challenges. Multiple Instance Learning (MIL) has emerged as a powerful approach for addressing these challenges, particularly in cancer classification and detection. This survey provides a comprehensive overview of the challenges and methodologies associated with applying MIL to WSI analysis, including attention mechanisms, pseudo-labeling, transformers, pooling functions, and graph neural networks. Additionally, it explores the potential of MIL in discovering cancer cell morphology, constructing interpretable machine learning models, and quantifying cancer grading. By summarizing the current challenges, methodologies, and potential applications of MIL in WSI analysis, this survey aims to inform researchers about the state of the field and inspire future research directions.
- Published
- 2024
27. Double pole structures of $X_1(2900)$ as the $P$-wave $\bar{D}^*K^*$ resonances
- Author
-
Wang, Jun-Zhang, Lin, Zi-Yang, Wang, Bo, Meng, Lu, and Zhu, Shi-Lin
- Subjects
High Energy Physics - Phenomenology ,High Energy Physics - Experiment ,High Energy Physics - Lattice - Abstract
We reveal the double pole structures of the manifestly exotic tetraquark state $X_1(2900)$ in the scenario of $P$-wave $\bar{D}^*K^*$ dimeson resonance. We find that the observed enhancement signal associated with $X_1(2900)$ in $B^+ \to D^+D^-K^+$ by LHCb contains two $P$-wave poles denoted as $T_{cs1-}(2900)$ and $T^{\prime}_{cs1-}(2900)$, respectively. After considering the channel couplings among the $\bar{D}K$, $\bar{D}^*K$, $\bar{D}K^*$ and $\bar{D}^*K^*$ and the width of the $K^*$ meson, the masses and widths of the $S$-wave pole $T_{cs0+}(2900)$ and two $P$-wave poles $T_{cs1-}(2900)$ and $T^{\prime}_{cs1-}(2900)$ coincide with those of the $X_0(2900)$ and $X_1(2900)$ remarkably, which provides strong support for identifying $X_0(2900)$ and $X_1(2900)$ as $\bar{D}^{(*)}K^{(*)}$ dimeson states. Furthermore, we extensively calculate all $S$-wave and $P$-wave $\bar{D}^{(*)}K^{(*)}$ systems up to $J=3$ and predict four new isoscalar charmed-strange dimeson-type tetraquark states: an $S$-wave state $T_{cs1+}(2900)$ with quantum number $J^P=1^+$, three $P$-wave states $T_{cs1-}(2760)$ with $J^P=1^-$, $T_{cs0-}(2760)$ with $J^P=0^-$ and $T_{cs2-}(2900)$ with $J^P=2^-$. These near-threshold poles can be searched for at LHCb, Belle II and BESIII., Comment: 12 pages, 5 figures and 4 tables
- Published
- 2024
28. xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
- Author
-
Xue, Le, Shu, Manli, Awadalla, Anas, Wang, Jun, Yan, An, Purushwalkam, Senthil, Zhou, Honglu, Prabhu, Viraj, Dai, Yutong, Ryoo, Michael S, Kendre, Shrikant, Zhang, Jieyu, Qin, Can, Zhang, Shu, Chen, Chia-Chih, Yu, Ning, Tan, Juntao, Awalgaonkar, Tulika Manoj, Heinecke, Shelby, Wang, Huan, Choi, Yejin, Schmidt, Ludwig, Chen, Zeyuan, Savarese, Silvio, Niebles, Juan Carlos, Xiong, Caiming, and Xu, Ran
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
This report introduces xGen-MM (also known as BLIP-3), a framework for developing Large Multimodal Models (LMMs). The framework comprises meticulously curated datasets, a training recipe, model architectures, and a resulting suite of LMMs. xGen-MM, short for xGen-MultiModal, expands the Salesforce xGen initiative on foundation AI models. Our models undergo rigorous evaluation across a range of tasks, including both single and multi-image benchmarks. Our pre-trained base model exhibits strong in-context learning capabilities and the instruction-tuned model demonstrates competitive performance among open-source LMMs with similar model sizes. In addition, we introduce a safety-tuned model with DPO, aiming to mitigate harmful behaviors such as hallucinations and improve safety. We open-source our models, curated large-scale datasets, and our fine-tuning codebase to facilitate further advancements in LMM research. Associated resources will be available on our project page above.
- Published
- 2024
29. LLM4DSR: Leveraing Large Language Model for Denoising Sequential Recommendation
- Author
-
Wang, Bohao, Liu, Feng, Chen, Jiawei, Wu, Yudi, Lou, Xingyu, Wang, Jun, Feng, Yan, Chen, Chun, and Wang, Can
- Subjects
Computer Science - Information Retrieval ,Computer Science - Artificial Intelligence - Abstract
Sequential recommendation systems fundamentally rely on users' historical interaction sequences, which are often contaminated by noisy interactions. Identifying these noisy interactions accurately without additional information is particularly difficult due to the lack of explicit supervisory signals to denote noise. Large Language Models (LLMs), equipped with extensive open knowledge and semantic reasoning abilities, present a promising avenue to bridge this information gap. However, employing LLMs for denoising in sequential recommendation introduces notable challenges: 1) Direct application of pretrained LLMs may not be competent for the denoising task, frequently generating nonsensical responses; 2) Even after fine-tuning, the reliability of LLM outputs remains questionable, especially given the complexity of the task and th inherent hallucinatory issue of LLMs. To tackle these challenges, we propose LLM4DSR, a tailored approach for denoising sequential recommendation using LLMs. We constructed a self-supervised fine-tuning task to activate LLMs' capabilities to identify noisy items and suggest replacements. Furthermore, we developed an uncertainty estimation module that ensures only high-confidence responses are utilized for sequence corrections. Remarkably, LLM4DSR is model-agnostic, allowing the corrected sequences to be flexibly applied across various recommendation models. Extensive experiments validate the superiority of LLM4DSR over existing methods across three datasets and three recommendation backbones.
- Published
- 2024
30. An Efficient Continuous Control Perspective for Reinforcement-Learning-based Sequential Recommendation
- Author
-
Wang, Jun, Wu, Likang, Liu, Qi, and Yang, Yu
- Subjects
Computer Science - Machine Learning ,Computer Science - Information Retrieval - Abstract
Sequential recommendation, where user preference is dynamically inferred from sequential historical behaviors, is a critical task in recommender systems (RSs). To further optimize long-term user engagement, offline reinforcement-learning-based RSs have become a mainstream technique as they provide an additional advantage in avoiding global explorations that may harm online users' experiences. However, previous studies mainly focus on discrete action and policy spaces, which might have difficulties in handling dramatically growing items efficiently. To mitigate this issue, in this paper, we aim to design an algorithmic framework applicable to continuous policies. To facilitate the control in the low-dimensional but dense user preference space, we propose an \underline{\textbf{E}}fficient \underline{\textbf{Co}}ntinuous \underline{\textbf{C}}ontrol framework (ECoC). Based on a statistically tested assumption, we first propose the novel unified action representation abstracted from normalized user and item spaces. Then, we develop the corresponding policy evaluation and policy improvement procedures. During this process, strategic exploration and directional control in terms of unified actions are carefully designed and crucial to final recommendation decisions. Moreover, beneficial from unified actions, the conservatism regularization for policies and value functions are combined and perfectly compatible with the continuous framework. The resulting dual regularization ensures the successful offline training of RL-based recommendation policies. Finally, we conduct extensive experiments to validate the effectiveness of our framework. The results show that compared to the discrete baselines, our ECoC is trained far more efficiently. Meanwhile, the final policies outperform baselines in both capturing the offline data and gaining long-term rewards.
- Published
- 2024
31. Experimental evaluation of offline reinforcement learning for HVAC control in buildings
- Author
-
Wang, Jun, Li, Linyan, Liu, Qi, and Yang, Yu
- Subjects
Computer Science - Machine Learning - Abstract
Reinforcement learning (RL) techniques have been increasingly investigated for dynamic HVAC control in buildings. However, most studies focus on exploring solutions in online or off-policy scenarios without discussing in detail the implementation feasibility or effectiveness of dealing with purely offline datasets or trajectories. The lack of these works limits the real-world deployment of RL-based HVAC controllers, especially considering the abundance of historical data. To this end, this paper comprehensively evaluates the strengths and limitations of state-of-the-art offline RL algorithms by conducting analytical and numerical studies. The analysis is conducted from two perspectives: algorithms and dataset characteristics. As a prerequisite, the necessity of applying offline RL algorithms is first confirmed in two building environments. The ability of observation history modeling to reduce violations and enhance performance is subsequently studied. Next, the performance of RL-based controllers under datasets with different qualitative and quantitative conditions is investigated, including constraint satisfaction and power consumption. Finally, the sensitivity of certain hyperparameters is also evaluated. The results indicate that datasets of a certain suboptimality level and relatively small scale can be utilized to effectively train a well-performed RL-based HVAC controller. Specifically, such controllers can reduce at most 28.5% violation ratios of indoor temperatures and achieve at most 12.1% power savings compared to the baseline controller. In summary, this paper presents our well-structured investigations and new findings when applying offline reinforcement learning to building HVAC systems.
- Published
- 2024
32. Voltran: Unlocking Trust and Confidentiality in Decentralized Federated Learning Aggregation
- Author
-
Wang, Hao, Cai, Yichen, Wang, Jun, Ma, Chuan, Ge, Chunpeng, Qu, Xiangmou, and Zhou, Lu
- Subjects
Computer Science - Cryptography and Security - Abstract
The decentralized Federated Learning (FL) paradigm built upon blockchain architectures leverages distributed node clusters to replace the single server for executing FL model aggregation. This paradigm tackles the vulnerability of the centralized malicious server in vanilla FL and inherits the trustfulness and robustness offered by blockchain. However, existing blockchain-enabled schemes face challenges related to inadequate confidentiality on models and limited computational resources of blockchains to perform large-scale FL computations. In this paper, we present Voltran, an innovative hybrid platform designed to achieve trust, confidentiality, and robustness for FL based on the combination of the Trusted Execution Environment (TEE) and blockchain technology. We offload the FL aggregation computation into TEE to provide an isolated, trusted and customizable off-chain execution, and then guarantee the authenticity and verifiability of aggregation results on the blockchain. Moreover, we provide strong scalability on multiple FL scenarios by introducing a multi-SGX parallel execution strategy to amortize the large-scale FL workload. We implement a prototype of Voltran and conduct a comprehensive performance evaluation. Extensive experimental results demonstrate that Voltran incurs minimal additional overhead while guaranteeing trust, confidentiality, and authenticity, and it significantly brings a significant speed-up compared to state-of-the-art ciphertext aggregation schemes.
- Published
- 2024
33. GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models
- Author
-
Zhang, Zhibo, Bai, Wuxia, Li, Yuxi, Meng, Mark Huasong, Wang, Kailong, Shi, Ling, Li, Li, Wang, Jun, and Wang, Haoyu
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in the input, may induce the model to produce incorrect, irrelevant, or even harmful results, drastically undermining the reliability and practicality of LLMs. In this work, we aim to enhance the understanding of glitch tokens and propose techniques for their detection and mitigation. We first reveal the characteristic features induced by glitch tokens on LLMs, which are evidenced by significant deviations in the distributions of attention patterns and dynamic information from intermediate model layers. Based on the insights, we develop GlitchProber, a tool for efficient glitch token detection and mitigation. GlitchProber utilizes small-scale sampling, principal component analysis for accelerated feature extraction, and a simple classifier for efficient vocabulary screening. Taking one step further, GlitchProber rectifies abnormal model intermediate layer values to mitigate the destructive effects of glitch tokens. Evaluated on five mainstream open-source LLMs, GlitchProber demonstrates higher efficiency, precision, and recall compared to existing approaches, with an average F1 score of 0.86 and an average repair rate of 50.06%. GlitchProber unveils a novel path to address the challenges posed by glitch tokens and inspires future research toward more robust and interpretable LLMs.
- Published
- 2024
34. HMDN: Hierarchical Multi-Distribution Network for Click-Through Rate Prediction
- Author
-
Lou, Xingyu, Yang, Yu, Dong, Kuiyao, Huang, Heyuan, Yu, Wenyi, Wang, Ping, Li, Xiu, and Wang, Jun
- Subjects
Computer Science - Machine Learning - Abstract
As the recommendation service needs to address increasingly diverse distributions, such as multi-population, multi-scenario, multitarget, and multi-interest, more and more recent works have focused on multi-distribution modeling and achieved great progress. However, most of them only consider modeling in a single multi-distribution manner, ignoring that mixed multi-distributions often coexist and form hierarchical relationships. To address these challenges, we propose a flexible modeling paradigm, named Hierarchical Multi-Distribution Network (HMDN), which efficiently models these hierarchical relationships and can seamlessly integrate with existing multi-distribution methods, such as Mixture of-Experts (MoE) and Dynamic-Weight (DW) models. Specifically, we first design a hierarchical multi-distribution representation refinement module, employing a multi-level residual quantization to obtain fine-grained hierarchical representation. Then, the refined hierarchical representation is integrated into the existing single multi-distribution models, seamlessly expanding them into mixed multi-distribution models. Experimental results on both public and industrial datasets validate the effectiveness and flexibility of HMDN.
- Published
- 2024
35. Is quasar variability regulated by the close environment of accretion?
- Author
-
Wu, Liang, Wang, Jun-Xian, Ren, Wen-Ke, and Kang, Wen-Yong
- Subjects
Astrophysics - Astrophysics of Galaxies - Abstract
UV/optical variability in quasars is a well-observed phenomenon, yet its primeval origins remain unclear. This study investigates whether the accretion disk turbulence, which is responsible for UV/optical variability, is influenced by the close environment of the accretion by analyzing the correlation between variability and infrared emission for two luminous SDSS quasar samples. The first sample includes light curves from SDSS, Pan-STARRS, and ZTF $g$ band photometry, while the second sample utilizes SDSS Stripe 82 $g$ band light curves. We explore the correlation between the $g$ band excess variance ($\sigma_{rms}$) and the wavelength-dependent infrared covering factor ($L_{\rm IR}(\lambda)/L_{\rm bol}$), controlling for the effects of redshift, luminosity, and black hole mass. An anti-correlation between two variables is observed in both samples, which is strongest at wavelengths of 2-3$\mu$m but gradually weakens towards longer wavelength. This suggests the equatorial dusty torus (which dominates near-infrared emission) plays a significant role in influencing the UV/optical variability, while the cooler polar dust (which contributes significantly to mid-infrared emission) does not. The findings indicate that quasar variability may be connected to the physical conditions within the dusty torus which feeds the accretion, and support the notion that the close environment of the accretion plays an important role in regulating the accretion disk turbulence., Comment: 10 pages, 13 figures, accepted for publication in MNRAS
- Published
- 2024
36. Lessons from Learning to Spin 'Pens'
- Author
-
Wang, Jun, Yuan, Ying, Che, Haichuan, Qi, Haozhi, Ma, Yi, Malik, Jitendra, and Wang, Xiaolong
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
In-hand manipulation of pen-like objects is an important skill in our daily lives, as many tools such as hammers and screwdrivers are similarly shaped. However, current learning-based methods struggle with this task due to a lack of high-quality demonstrations and the significant gap between simulation and the real world. In this work, we push the boundaries of learning-based in-hand manipulation systems by demonstrating the capability to spin pen-like objects. We first use reinforcement learning to train an oracle policy with privileged information and generate a high-fidelity trajectory dataset in simulation. This serves two purposes: 1) pre-training a sensorimotor policy in simulation; 2) conducting open-loop trajectory replay in the real world. We then fine-tune the sensorimotor policy using these real-world trajectories to adapt it to the real world dynamics. With less than 50 trajectories, our policy learns to rotate more than ten pen-like objects with different physical properties for multiple revolutions. We present a comprehensive analysis of our design choices and share the lessons learned during development., Comment: Website: https://penspin.github.io/
- Published
- 2024
37. SQLfuse: Enhancing Text-to-SQL Performance through Comprehensive LLM Synergy
- Author
-
Zhang, Tingkai, Chen, Chaoyu, Liao, Cong, Wang, Jun, Zhao, Xudong, Yu, Hang, Wang, Jianchao, Li, Jianguo, and Shi, Wenhui
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Databases - Abstract
Text-to-SQL conversion is a critical innovation, simplifying the transition from complex SQL to intuitive natural language queries, especially significant given SQL's prevalence in the job market across various roles. The rise of Large Language Models (LLMs) like GPT-3.5 and GPT-4 has greatly advanced this field, offering improved natural language understanding and the ability to generate nuanced SQL statements. However, the potential of open-source LLMs in Text-to-SQL applications remains underexplored, with many frameworks failing to leverage their full capabilities, particularly in handling complex database queries and incorporating feedback for iterative refinement. Addressing these limitations, this paper introduces SQLfuse, a robust system integrating open-source LLMs with a suite of tools to enhance Text-to-SQL translation's accuracy and usability. SQLfuse features four modules: schema mining, schema linking, SQL generation, and a SQL critic module, to not only generate but also continuously enhance SQL query quality. Demonstrated by its leading performance on the Spider Leaderboard and deployment by Ant Group, SQLfuse showcases the practical merits of open-source LLMs in diverse business contexts.
- Published
- 2024
38. EaDeblur-GS: Event assisted 3D Deblur Reconstruction with Gaussian Splatting
- Author
-
Weng, Yuchen, Shen, Zhengwen, Chen, Ruofan, Wang, Qi, and Wang, Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
3D deblurring reconstruction techniques have recently seen significant advancements with the development of Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). Although these techniques can recover relatively clear 3D reconstructions from blurry image inputs, they still face limitations in handling severe blurring and complex camera motion. To address these issues, we propose Event-assisted 3D Deblur Reconstruction with Gaussian Splatting (EaDeblur-GS), which integrates event camera data to enhance the robustness of 3DGS against motion blur. By employing an Adaptive Deviation Estimator (ADE) network to estimate Gaussian center deviations and using novel loss functions, EaDeblur-GS achieves sharp 3D reconstructions in real-time, demonstrating performance comparable to state-of-the-art methods.
- Published
- 2024
39. Don't Throw Away Data: Better Sequence Knowledge Distillation
- Author
-
Wang, Jun, Briakou, Eleftheria, Dadkhahi, Hamid, Agarwal, Rishabh, Cherry, Colin, and Cohn, Trevor
- Subjects
Computer Science - Computation and Language - Abstract
A critical component in knowledge distillation is the means of coupling the teacher and student. The predominant sequence knowledge distillation method involves supervised learning of the student against teacher-decoded outputs, and is exemplified by the current state of the art, which incorporates minimum Bayes risk (MBR) decoding. In this paper we seek to integrate MBR more tightly in distillation training, specifically by using several high scoring MBR translations, rather than a single selected sequence, thus capturing a rich diversity of teacher outputs. Our experiments on English to German and English to Japanese translation show consistent improvements over strong baseline methods for both tasks and with varying model sizes. Additionally, we conduct a detailed analysis focusing on data efficiency and capacity curse aspects to elucidate MBR-n and explore its further potential.
- Published
- 2024
40. Human-like Episodic Memory for Infinite Context LLMs
- Author
-
Fountas, Zafeirios, Benfeghoul, Martin A, Oomerjee, Adnan, Christopoulou, Fenia, Lampouras, Gerasimos, Bou-Ammar, Haitham, and Wang, Jun
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning ,Quantitative Biology - Neurons and Cognition - Abstract
Large language models (LLMs) have shown remarkable capabilities, but still struggle with processing extensive contexts, limiting their ability to maintain coherence and accuracy over long sequences. In contrast, the human brain excels at organising and retrieving episodic experiences across vast temporal scales, spanning a lifetime. In this work, we introduce EM-LLM, a novel approach that integrates key aspects of human episodic memory and event cognition into LLMs, enabling them to effectively handle practically infinite context lengths while maintaining computational efficiency. EM-LLM organises sequences of tokens into coherent episodic events using a combination of Bayesian surprise and graph-theoretic boundary refinement in an on-line fashion. When needed, these events are retrieved through a two-stage memory process, combining similarity-based and temporally contiguous retrieval for efficient and human-like access to relevant information. Experiments on the LongBench dataset demonstrate EM-LLM's superior performance, outperforming the state-of-the-art InfLLM model with an overall relative improvement of 4.3% across various tasks, including a 33% improvement on the PassageRetrieval task. Furthermore, our analysis reveals strong correlations between EM-LLM's event segmentation and human-perceived events, suggesting a bridge between this artificial system and its biological counterpart. This work not only advances LLM capabilities in processing extended contexts but also provides a computational framework for exploring human memory mechanisms, opening new avenues for interdisciplinary research in AI and cognitive science.
- Published
- 2024
41. Tissue-Contrastive Semi-Masked Autoencoders for Segmentation Pretraining on Chest CT
- Author
-
Zheng, Jie, Wen, Ru, Hu, Haiqin, Wei, Lina, Su, Kui, Chen, Wei, Liu, Chen, and Wang, Jun
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Existing Masked Image Modeling (MIM) depends on a spatial patch-based masking-reconstruction strategy to perceive objects'features from unlabeled images, which may face two limitations when applied to chest CT: 1) inefficient feature learning due to complex anatomical details presented in CT images, and 2) suboptimal knowledge transfer owing to input disparity between upstream and downstream models. To address these issues, we propose a new MIM method named Tissue-Contrastive Semi-Masked Autoencoder (TCS-MAE) for modeling chest CT images. Our method has two novel designs: 1) a tissue-based masking-reconstruction strategy to capture more fine-grained anatomical features, and 2) a dual-AE architecture with contrastive learning between the masked and original image views to bridge the gap of the upstream and downstream models. To validate our method, we systematically investigate representative contrastive, generative, and hybrid self-supervised learning methods on top of tasks involving segmenting pneumonia, mediastinal tumors, and various organs. The results demonstrate that, compared to existing methods, our TCS-MAE more effectively learns tissue-aware representations, thereby significantly enhancing segmentation performance across all tasks.
- Published
- 2024
42. How Deep is your Guess? A Fresh Perspective on Deep Learning for Medical Time-Series Imputation
- Author
-
Qian, Linglong, Wang, Tao, Wang, Jun, Ellis, Hugh Logan, Mitra, Robin, Dobson, Richard, and Ibrahim, Zina
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
We introduce a novel classification framework for time-series imputation using deep learning, with a particular focus on clinical data. By identifying conceptual gaps in the literature and existing reviews, we devise a taxonomy grounded on the inductive bias of neural imputation frameworks, resulting in a classification of existing deep imputation strategies based on their suitability for specific imputation scenarios and data-specific properties. Our review further examines the existing methodologies employed to benchmark deep imputation models, evaluating their effectiveness in capturing the missingness scenarios found in clinical data and emphasising the importance of reconciling mathematical abstraction with clinical insights. Our classification aims to serve as a guide for researchers to facilitate the selection of appropriate deep learning imputation techniques tailored to their specific clinical data. Our novel perspective also highlights the significance of bridging the gap between computational methodologies and medical insights to achieve clinically sound imputation models.
- Published
- 2024
43. Non-uniqueness of Leray weak solutions of the forced MHD equations
- Author
-
Wang, Jun, Xu, Fei, and Zhang, Yong
- Subjects
Mathematics - Analysis of PDEs - Abstract
In this paper, we exhibit non-uniqueness of Leray weak solutions of the forced magnetohydrodynamic (MHD for short) equations. Similar to the solutions constructed in \cite{ABC2}, we first find a special steady solution of ideal MHD equations whose linear unstability was proved in \cite{Lin}. It is possible to perturb the unstable scenario of ideal MHD to 3D viscous and resistive MHD equations, which can be regarded as the first unstable "background" solution. Our perturbation argument is based on the spectral theoretic approach \cite{Kato}. The second solution we would construct is a trajectory on the unstable manifold associated to the unstable steady solution. It is worth noting that these solutions live precisely on the borderline of the known well-posedness theory., Comment: 23pp. arXiv admin note: text overlap with arXiv:2310.10075, arXiv:2112.03116 by other authors
- Published
- 2024
44. Extraction of fissile isotope antineutrino spectra using feedforward neural network
- Author
-
Chen, Jian, Wang, Jun, Wang, Wei, and Wei, Yuehuan
- Subjects
High Energy Physics - Phenomenology - Abstract
Precise measurement of antineutrino spectra produced by isotope fission in reactors is of great significance for studying neutrino oscillations, refining nuclear databases, and addressing the reactor antineutrino anomaly. This work reports a method utilizing a feedforward neural network (FNN) model to decompose the reconstructed measured prompt energy spectrum observed by a short-baseline reactor neutrino experiment and extract the antineutrino spectra produced by the fission of major isotopes such as $^{235}$U, $^{238}$U, $^{239}$Pu, and $^{241}$Pu in a nuclear reactor. We present two training strategies for this model and compare them with the traditional $\chi^2$ minimization method, analyzing the same set of pseudo-data for a total exposure of $(2.9\times 5\times 1800)~\rm{GW_{th}\cdot tons\cdot days}$. The results show that the FNN model not only converges faster and better during the fitting process but also achieves relative errors in the extracted spectra within 1\% in the $2-8$ MeV range, outperforming the $\chi^2$ minimization method. The feasibility and superiority of this method have been validated in this study.
- Published
- 2024
45. Spatial Non-Stationary Dual-Wideband Channel Estimation for XL-MIMO Systems
- Author
-
Tang, Anzheng, Wang, Jun-Bo, Pan, Yijin, Wu, Tuo, Chang, Chuanwen, Chen, Yijian, Yu, Hongkang, and Elkashlan, Maged
- Subjects
Computer Science - Information Theory ,Electrical Engineering and Systems Science - Signal Processing - Abstract
In this paper, we investigate the channel estimation problem for extremely large-scale multi-input and multi-output (XL-MIMO) systems, considering the spherical wavefront effect, spatially non-stationary (SnS) property, and dual-wideband effects. To accurately characterize the XL-MIMO channel, we first derive a novel spatial-and-frequency-domain channel model for XL-MIMO systems and carefully examine the channel characteristics in the angular-and-delay domain. Based on the obtained channel representation, we formulate XL-MIMO channel estimation as a Bayesian inference problem. To fully exploit the clustered sparsity of angular-and-delay channels and capture the inter-antenna and inter-subcarrier correlations, a Markov random field (MRF)-based hierarchical prior model is adopted. Meanwhile, to facilitate efficient channel reconstruction, we propose a sparse Bayesian learning (SBL) algorithm based on approximate message passing (AMP) with a unitary transformation. Tailored to the MRF-based hierarchical prior model, the message passing equations are reformulated using structured variational inference, belief propagation, and mean-field rules. Finally, simulation results validate the convergence and superiority of the proposed algorithm over existing methods., Comment: This paper has been submitted to IEEE journal for possible publication
- Published
- 2024
46. EDPNet: An Efficient Dual Prototype Network for Motor Imagery EEG Decoding
- Author
-
Han, Can, Liu, Chen, Cai, Crystal, Wang, Jun, and Qian, Dahong
- Subjects
Computer Science - Human-Computer Interaction ,Electrical Engineering and Systems Science - Signal Processing - Abstract
Motor imagery electroencephalograph (MI-EEG) decoding plays a crucial role in developing motor imagery brain-computer interfaces (MI-BCIs). However, decoding intentions from MI remains challenging due to the inherent complexity of EEG signals relative to the small-sample size. In this paper, we propose an Efficient Dual Prototype Network (EDPNet) to enable accurate and fast MI decoding. EDPNet employs a lightweight adaptive spatial-spectral fusion module, which promotes more efficient information fusion between multiple EEG electrodes. Subsequently, a parameter-free multi-scale variance pooling module extracts more comprehensive temporal features. Furthermore, we introduce dual prototypical learning to optimize the feature space distribution and training process, thereby improving the model's generalization ability on small-sample MI datasets. Our experimental results show that the EDPNet outperforms state-of-the-art models with superior classification accuracy and kappa values (84.11% and 0.7881 for dataset BCI competition IV 2a, 86.65% and 0.7330 for dataset BCI competition IV 2b). Additionally, we use the BCI competition III IVa dataset with fewer training data to further validate the generalization ability of the proposed EDPNet. We also achieve superior performance with 82.03% classification accuracy. Benefiting from the lightweight parameters and superior decoding accuracy, our EDPNet shows great potential for MI-BCI applications. The code is publicly available at https://github.com/hancan16/EDPNet.
- Published
- 2024
47. Adaptive Perturbation Enhanced SCL Decoder for Polar Codes
- Author
-
Wang, Xianbin, Zhang, Huazi, Tong, Jiajie, Wang, Jun, and Tong, Wen
- Subjects
Computer Science - Information Theory - Abstract
For polar codes, successive cancellation list (SCL) decoding algorithm significantly improves finite-length performance compared to SC decoding. SCL-flip decoding can further enhance the performance but the gain diminishes as code length increases, due to the difficulty in locating the first error bit position. In this work, we introduce an SCL-perturbation decoding algorithm to address this issue. A basic version of the algorithm introduces small random perturbations to the received symbols before each SCL decoding attempt, and exhibits non-diminishing gain at large block lengths. Its enhanced version adaptively performs random perturbations or directional perturbation on each received symbol according to previous decoding results, and managed to correct more errors with fewer decoding attempts. Extensive simulation results demonstrate stable gains across various code rates, lengths and list sizes. To the best of our knowledge, this is the first SCL enhancement with non-diminishing gains as code length increases, and achieves unprecedented efficiency. With only one additional SCL-$L$ decoding attempt (in total two), the proposed algorithm achieves SCL-$2L$-equivalent performance. Since the gain is obtained without increasing list size, the algorithm is best suited for hardware implementation.
- Published
- 2024
48. PocketLLM: Enabling On-Device Fine-Tuning for Personalized LLMs
- Author
-
Peng, Dan, Fu, Zhihui, and Wang, Jun
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language - Abstract
Recent advancements in large language models (LLMs) have indeed showcased their impressive capabilities. On mobile devices, the wealth of valuable, non-public data generated daily holds great promise for locally fine-tuning personalized LLMs, while maintaining privacy through on-device processing. However, the constraints of mobile device resources pose challenges to direct on-device LLM fine-tuning, mainly due to the memory-intensive nature of derivative-based optimization required for saving gradients and optimizer states. To tackle this, we propose employing derivative-free optimization techniques to enable on-device fine-tuning of LLM, even on memory-limited mobile devices. Empirical results demonstrate that the RoBERTa-large model and OPT-1.3B can be fine-tuned locally on the OPPO Reno 6 smartphone using around 4GB and 6.5GB of memory respectively, using derivative-free optimization techniques. This highlights the feasibility of on-device LLM fine-tuning on mobile devices, paving the way for personalized LLMs on resource-constrained devices while safeguarding data privacy., Comment: Accepted to the ACL 2024 Workshop on Privacy in Natural Language Processing (PrivateNLP)
- Published
- 2024
49. Comparison of vaginal microbial community structure of beef cattle between luteal phase and follicular phase
- Author
-
Wang, Jun, Liu, Chang, Nesengani, Lucky T., Gong, Yongsheng, Yang, Yujiang, Yang, Lianyu, and Lu, Wenfa
- Published
- 2019
- Full Text
- View/download PDF
50. Effect of Micellar Morphology on the Temperature-Induced Structural Evolution of ABC Polypeptoid Triblock Terpolymers into Two-Compartment Hydrogel Network.
- Author
-
Jiang, Naisheng, Yu, Tianyi, Zhang, Meng, Barrett, Bailee, Sun, Haofeng, Wang, Jun, Luo, Ying, Sternhagen, Garrett, Xuan, Sunting, Yuan, Guangcui, Kelley, Elizabeth, Qian, Shuo, Bonnesen, Peter, Hong, Kunlun, Li, Dongcui, and Zhang, Donghui
- Abstract
We investigated the temperature-dependent structural evolution of thermoreversible triblock terpolypeptoid hydrogels, namely poly(N-allyl glycine)-b-poly(N-methyl glycine)-b-poly(N-decyl glycine) (AMD), using small-angle neutron scattering (SANS) with contrast matching in conjunction with X-ray scattering and cryogenic transmission electron microscopy (cryo-TEM) techniques. At room temperature, A100M101D10 triblock terpolypeptoids self-assemble into core-corona-type spherical micelles in aqueous solution. Upon heating above the critical gelation temperature (T gel), SANS analysis revealed the formation of a two-compartment hydrogel network comprising distinct micellar cores composed of dehydrated A blocks and hydrophobic D blocks. At T ≳ T gel, the temperature-dependent dehydration of A block further leads to the gradual rearrangement of both A and D domains, forming well-ordered micellar network at higher temperatures. For AMD polymers with either longer D block or shorter A block, such as A101M111D21 and A43M92D9, elongated nonspherical micelles with a crystalline D core were observed at T < T gel. Although these enlarged crystalline micelles still undergo a sharp sol-to-gel transition upon heating, the higher aggregation number of chains results in the immediate association of the micelles into ordered aggregates at the initial stage, followed by a disruption of the spatial ordering as the temperature further increases. On the other hand, fiber-like structures were also observed for AMD with longer A block, such as A153M127D10, due to the crystallization of A domains. This also influences the assembly pathway of the two-compartment network. Our findings emphasize the critical impact of initial micellar morphology on the structural evolution of AMD hydrogels during the sol-to-gel transition, providing valuable insights for the rational design of thermoresponsive hydrogels with tunable network structures at the nanometer scale.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.