49,447 results on '"Liu, Yang"'
Search Results
2. GCA-3D: Towards Generalized and Consistent Domain Adaptation of 3D Generators
- Author
-
Li, Hengjia, Liu, Yang, Zhao, Yibo, Cheng, Haoran, Yang, Yang, Xia, Linxuan, Luo, Zekai, Qiu, Qibo, Wu, Boxi, Zheng, Tu, Yang, Zheng, and Cai, Deng
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recently, 3D generative domain adaptation has emerged to adapt the pre-trained generator to other domains without collecting massive datasets and camera pose distributions. Typically, they leverage large-scale pre-trained text-to-image diffusion models to synthesize images for the target domain and then fine-tune the 3D model. However, they suffer from the tedious pipeline of data generation, which inevitably introduces pose bias between the source domain and synthetic dataset. Furthermore, they are not generalized to support one-shot image-guided domain adaptation, which is more challenging due to the more severe pose bias and additional identity bias introduced by the single image reference. To address these issues, we propose GCA-3D, a generalized and consistent 3D domain adaptation method without the intricate pipeline of data generation. Different from previous pipeline methods, we introduce multi-modal depth-aware score distillation sampling loss to efficiently adapt 3D generative models in a non-adversarial manner. This multi-modal loss enables GCA-3D in both text prompt and one-shot image prompt adaptation. Besides, it leverages per-instance depth maps from the volume rendering module to mitigate the overfitting problem and retain the diversity of results. To enhance the pose and identity consistency, we further propose a hierarchical spatial consistency loss to align the spatial structure between the generated images in the source and target domain. Experiments demonstrate that GCA-3D outperforms previous methods in terms of efficiency, generalization, pose accuracy, and identity consistency.
- Published
- 2024
3. Crabs: Consuming Resrouce via Auto-generation for LLM-DoS Attack under Black-box Settings
- Author
-
Zhang, Yuanhe, Zhou, Zhenhong, Zhang, Wei, Wang, Xinyue, Jia, Xiaojun, Liu, Yang, and Su, Sen
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Cryptography and Security - Abstract
Large Language Models (LLMs) have demonstrated remarkable performance across diverse tasks. LLMs continue to be vulnerable to external threats, particularly Denial-of-Service (DoS) attacks. Specifically, LLM-DoS attacks aim to exhaust computational resources and block services. However, prior works tend to focus on performing white-box attacks, overlooking black-box settings. In this work, we propose an automated algorithm designed for black-box LLMs, called Auto-Generation for LLM-DoS Attack (AutoDoS). AutoDoS introduces DoS Attack Tree and optimizes the prompt node coverage to enhance effectiveness under black-box conditions. Our method can bypass existing defense with enhanced stealthiness via semantic improvement of prompt nodes. Furthermore, we reveal that implanting Length Trojan in Basic DoS Prompt aids in achieving higher attack efficacy. Experimental results show that AutoDoS amplifies service response latency by over 250 $\times \uparrow$, leading to severe resource consumption in terms of GPU utilization and memory usage. Our code is available at \url{https://github.com/shuita2333/AutoDoS}., Comment: 20 pages, 7 figures, 11 tables
- Published
- 2024
4. What External Knowledge is Preferred by LLMs? Characterizing and Exploring Chain of Evidence in Imperfect Context
- Author
-
Chang, Zhiyuan, Li, Mingyang, Jia, Xiaojun, Wang, Junjie, Huang, Yuekai, Wang, Qing, Huang, Yihao, and Liu, Yang
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Incorporating external knowledge into large language models (LLMs) has emerged as a promising approach to mitigate outdated knowledge and hallucination in LLMs. However, external knowledge is often imperfect. In addition to useful knowledge, external knowledge is rich in irrelevant or misinformation in the context that can impair the reliability of LLM responses. This paper focuses on LLMs' preferred external knowledge in imperfect contexts when handling multi-hop QA. Inspired by criminal procedural law's Chain of Evidence (CoE), we characterize that knowledge preferred by LLMs should maintain both relevance to the question and mutual support among knowledge pieces. Accordingly, we propose an automated CoE discrimination approach and explore LLMs' preferences from their effectiveness, faithfulness and robustness, as well as CoE's usability in a naive Retrieval-Augmented Generation (RAG) case. The evaluation on five LLMs reveals that CoE enhances LLMs through more accurate generation, stronger answer faithfulness, better robustness against knowledge conflict, and improved performance in a popular RAG case., Comment: 12 pages, 4 figures
- Published
- 2024
5. Neural-Network-Driven Reward Prediction as a Heuristic: Advancing Q-Learning for Mobile Robot Path Planning
- Author
-
Ji, Yiming, Yun, Kaijie, Liu, Yang, Xie, Zongwu, and Liu, Hong
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Q-learning is a widely used reinforcement learning technique for solving path planning problems. It primarily involves the interaction between an agent and its environment, enabling the agent to learn an optimal strategy that maximizes cumulative rewards. Although many studies have reported the effectiveness of Q-learning, it still faces slow convergence issues in practical applications. To address this issue, we propose the NDR-QL method, which utilizes neural network outputs as heuristic information to accelerate the convergence process of Q-learning. Specifically, we improved the dual-output neural network model by introducing a start-end channel separation mechanism and enhancing the feature fusion process. After training, the proposed NDR model can output a narrowly focused optimal probability distribution, referred to as the guideline, and a broadly distributed suboptimal distribution, referred to as the region. Subsequently, based on the guideline prediction, we calculate the continuous reward function for the Q-learning method, and based on the region prediction, we initialize the Q-table with a bias. We conducted training, validation, and path planning simulation experiments on public datasets. The results indicate that the NDR model outperforms previous methods by up to 5\% in prediction accuracy. Furthermore, the proposed NDR-QL method improves the convergence speed of the baseline Q-learning method by 90\% and also surpasses the previously improved Q-learning methods in path quality metrics.
- Published
- 2024
6. Asymmetric protocols for mode pairing quantum key distribution with finite-key analysis
- Author
-
Li, Zhenhua, Dou, Tianqi, Xie, Yuheng, Kong, Weiwen, Liu, Yang, Ma, Haiqiang, and Tang, Jianjun
- Subjects
Quantum Physics - Abstract
The mode pairing quantum key distribution (MP-QKD) protocol has attracted considerable attention for its capability to ensure high secure key rates over long distances without requiring global phase locking. However, ensuring symmetric channels for the MP-QKD protocol is challenging in practical quantum communication networks. Previous studies on the asymmetric MP-QKD protocol have relied on ideal decoy state assumptions and infinite-key analysis, which are unattainable for real-world deployment. In this paper, we conduct a security analysis of asymmetric MP-QKD protocol with the finite-key analysis, where we discard the previously impractical assumptions made in the decoy-state method. Combined with statistical fluctuation analysis, we globally optimized the 12 independent parameters in the asymmetric MP-QKD protocol by employing our modified particle swarm optimization. The simulation results demonstrate that our work can achieve significantly enhanced secure key rates and transmission distances compared to the original strategy with adding extra attenuation. We further investigate the relationship between the intensities and probabilities of signal, decoy, and vacuum states with transmission distance, facilitating its more efficient deployment in future quantum networks., Comment: 9 pages, 6 figures
- Published
- 2024
7. Defending LVLMs Against Vision Attacks through Partial-Perception Supervision
- Author
-
Zhou, Qi, Li, Tianlin, Guo, Qing, Wang, Dongxia, Lin, Yun, Liu, Yang, and Dong, Jin Song
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Cryptography and Security - Abstract
Recent studies have raised significant concerns regarding the vulnerability of Large Vision Language Models (LVLMs) to maliciously injected or perturbed input images, which can mislead their responses. Existing defense methods show that such vision attacks are sensitive to image modifications especially cropping, using majority voting across responses of modified images as corrected responses. However, these modifications often result in partial images and distort the semantics, which reduces response quality on clean images after voting. Instead of directly using responses from partial images for voting, we investigate using them to supervise the LVLM's responses to the original images. We propose a black-box, training-free method called DPS (Defense through Partial-Perception Supervision). In this approach, the model is prompted using the responses generated by a model that perceives only a partial image. With DPS, the model can adjust its response based on partial image understanding when under attack, while confidently maintaining its original response for clean input. Our findings show that the weak model can supervise the strong model: when faced with an attacked input, the strong model becomes less confident and adjusts its response based on the weak model's partial understanding, effectively defending against the attack. With clean input, it confidently maintains its original response. Empirical experiments show our method outperforms the baseline, cutting the average attack success rate by 76.3% across six datasets on three popular models.
- Published
- 2024
8. SpatialMe: Stereo Video Conversion Using Depth-Warping and Blend-Inpainting
- Author
-
Zhang, Jiale, Jia, Qianxi, Liu, Yang, Zhang, Wei, Wei, Wei, and Tian, Xin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Stereo video conversion aims to transform monocular videos into immersive stereo format. Despite the advancements in novel view synthesis, it still remains two major challenges: i) difficulty of achieving high-fidelity and stable results, and ii) insufficiency of high-quality stereo video data. In this paper, we introduce SpatialMe, a novel stereo video conversion framework based on depth-warping and blend-inpainting. Specifically, we propose a mask-based hierarchy feature update (MHFU) refiner, which integrate and refine the outputs from designed multi-branch inpainting module, using feature update unit (FUU) and mask mechanism. We also propose a disparity expansion strategy to address the problem of foreground bleeding. Furthermore, we conduct a high-quality real-world stereo video dataset -- StereoV1K, to alleviate the data shortage. It contains 1000 stereo videos captured in real-world at a resolution of 1180 x 1180, covering various indoor and outdoor scenes. Extensive experiments demonstrate the superiority of our approach in generating stereo videos over state-of-the-art methods.
- Published
- 2024
9. Quantization of Climate Change Impacts on Renewable Energy Generation Capacity: A Super-Resolution Recurrent Diffusion Model
- Author
-
Dong, Xiaochong, Dan, Jun, Sun, Yingyun, Liu, Yang, Zhang, Xuemin, and Mei, Shengwei
- Subjects
Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Signal Processing - Abstract
Driven by global climate change and the ongoing energy transition, the coupling between power supply capabilities and meteorological factors has become increasingly significant. Over the long term, accurately quantifying the power generation capacity of renewable energy under the influence of climate change is essential for the development of sustainable power systems. However, due to interdisciplinary differences in data requirements, climate data often lacks the necessary hourly resolution to capture the short-term variability and uncertainties of renewable energy resources. To address this limitation, a super-resolution recurrent diffusion model (SRDM) has been developed to enhance the temporal resolution of climate data and model the short-term uncertainty. The SRDM incorporates a pre-trained decoder and a denoising network, that generates long-term, high-resolution climate data through a recurrent coupling mechanism. The high-resolution climate data is then converted into power value using the mechanism model, enabling the simulation of wind and photovoltaic (PV) power generation capacity on future long-term scales. Case studies were conducted in the Ejina region of Inner Mongolia, China, using fifth-generation reanalysis (ERA5) and coupled model intercomparison project (CMIP6) data under two climate pathways: SSP126 and SSP585. The results demonstrate that the SRDM outperforms existing generative models in generating super-resolution climate data. For the Ejina region, under a high-emission pathway, the annual utilization hours of wind power are projected to decrease by 2.82 hours/year, while those for PV power are projected to decrease by 0.26 hours/year. Furthermore, the research highlights the estimation biases introduced when low-resolution climate data is used for power conversion.
- Published
- 2024
10. Exploring Enhanced Contextual Information for Video-Level Object Tracking
- Author
-
Kang, Ben, Chen, Xin, Lai, Simiao, Liu, Yang, Liu, Yi, and Wang, Dong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Contextual information at the video level has become increasingly crucial for visual object tracking. However, existing methods typically use only a few tokens to convey this information, which can lead to information loss and limit their ability to fully capture the context. To address this issue, we propose a new video-level visual object tracking framework called MCITrack. It leverages Mamba's hidden states to continuously record and transmit extensive contextual information throughout the video stream, resulting in more robust object tracking. The core component of MCITrack is the Contextual Information Fusion module, which consists of the mamba layer and the cross-attention layer. The mamba layer stores historical contextual information, while the cross-attention layer integrates this information into the current visual features of each backbone block. This module enhances the model's ability to capture and utilize contextual information at multiple levels through deep integration with the backbone. Experiments demonstrate that MCITrack achieves competitive performance across numerous benchmarks. For instance, it gets 76.6% AUC on LaSOT and 80.0% AO on GOT-10k, establishing a new state-of-the-art performance. Code and models are available at https://github.com/kangben258/MCITrack., Comment: This paper was accepted by AAAI2025
- Published
- 2024
11. MASV: Speaker Verification with Global and Local Context Mamba
- Author
-
Liu, Yang, Wan, Li, Huang, Yiteng, Sun, Ming, Shi, Yangyang, and Metze, Florian
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Sound - Abstract
Deep learning models like Convolutional Neural Networks and transformers have shown impressive capabilities in speech verification, gaining considerable attention in the research community. However, CNN-based approaches struggle with modeling long-sequence audio effectively, resulting in suboptimal verification performance. On the other hand, transformer-based methods are often hindered by high computational demands, limiting their practicality. This paper presents the MASV model, a novel architecture that integrates the Mamba module into the ECAPA-TDNN framework. By introducing the Local Context Bidirectional Mamba and Tri-Mamba block, the model effectively captures both global and local context within audio sequences. Experimental results demonstrate that the MASV model substantially enhances verification performance, surpassing existing models in both accuracy and efficiency.
- Published
- 2024
12. A Novel Low-Background Photomultiplier Tube Developed for Xenon Based Detectors
- Author
-
Yun, Youhui, Zhou, Zhizhen, An, Baoguo, Gao, Zhixing, Han, Ke, Liu, Jianglai, Liang, Yuanzi, Liu, Yang, Meng, Yue, Qian, Zhicheng, Shang, Xiaofeng, Si, Lin, Song, Ziyan, Wang, Hao, Wang, Mingxin, Wang, Shaobo, Wu, Liangyu, Wu, Weihao, Wu, Yuan, Yan, Binbin, Yan, Xiyu, Yuan, Zhe, Zhang, Tao, Zhao, Qiang, and Zeng, Xinning
- Subjects
Physics - Instrumentation and Detectors ,High Energy Physics - Experiment - Abstract
Photomultiplier tubes (PMTs) are essential in xenon detectors like PandaX, LZ, and XENON experiments for dark matter searches and neutrino properties measurement. To minimize PMT-induced backgrounds, stringent requirements on PMT radioactivity are crucial. A novel 2-inch low-background R12699 PMT has been developed through a collaboration between the PandaX team and Hamamatsu Photonics K.K. corporation. Radioactivity measurements conducted with a high-purity germanium detector show levels of approximately 0.08 mBq/PMT for $\rm^{60}Co$ and 0.06~mBq/PMT for the $\rm^{238}U$ late chain, achieving a 15-fold reduction compared to R11410 PMT used in PandaX-4T. The radon emanation rate is below 3.2 $\rm \mu$Bq/PMT (@90\% confidence level), while the surface $\rm^{210}Po$ activity is less than 18.4 $\mu$Bq/cm$^2$. The electrical performance of these PMTs at cryogenic temperature was evaluated. With an optimized voltage distribution, the gain was enhanced by 30\%, achieving an average gain of $4.23 \times 10^6$ at -1000~V and -100~$^{\circ}$C. The dark count rate averaged 2.5~Hz per channel. Compactness, low radioactivity, and robust electrical performance in the cryogenic temperature make the R12699 PMT ideal for next-generation liquid xenon detectors and other rare event searches.
- Published
- 2024
13. MambaPro: Multi-Modal Object Re-Identification with Mamba Aggregation and Synergistic Prompt
- Author
-
Wang, Yuhao, Liu, Xuehu, Yan, Tianyu, Liu, Yang, Zheng, Aihua, Zhang, Pingping, and Lu, Huchuan
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Multimedia - Abstract
Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by utilizing complementary image information from different modalities. Recently, large-scale pre-trained models like CLIP have demonstrated impressive performance in traditional single-modal object ReID tasks. However, they remain unexplored for multi-modal object ReID. Furthermore, current multi-modal aggregation methods have obvious limitations in dealing with long sequences from different modalities. To address above issues, we introduce a novel framework called MambaPro for multi-modal object ReID. To be specific, we first employ a Parallel Feed-Forward Adapter (PFA) for adapting CLIP to multi-modal object ReID. Then, we propose the Synergistic Residual Prompt (SRP) to guide the joint learning of multi-modal features. Finally, leveraging Mamba's superior scalability for long sequences, we introduce Mamba Aggregation (MA) to efficiently model interactions between different modalities. As a result, MambaPro could extract more robust features with lower complexity. Extensive experiments on three multi-modal object ReID benchmarks (i.e., RGBNT201, RGBNT100 and MSVR310) validate the effectiveness of our proposed methods. The source code is available at https://github.com/924973292/MambaPro., Comment: This work is accepted by AAAI2025. More modifications may be performed
- Published
- 2024
14. DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification
- Author
-
Wang, Yuhao, Liu, Yang, Zheng, Aihua, and Zhang, Pingping
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by combining complementary information from multiple modalities. Existing multi-modal object ReID methods primarily focus on the fusion of heterogeneous features. However, they often overlook the dynamic quality changes in multi-modal imaging. In addition, the shared information between different modalities can weaken modality-specific information. To address these issues, we propose a novel feature learning framework called DeMo for multi-modal object ReID, which adaptively balances decoupled features using a mixture of experts. To be specific, we first deploy a Patch-Integrated Feature Extractor (PIFE) to extract multi-granularity and multi-modal features. Then, we introduce a Hierarchical Decoupling Module (HDM) to decouple multi-modal features into non-overlapping forms, preserving the modality uniqueness and increasing the feature diversity. Finally, we propose an Attention-Triggered Mixture of Experts (ATMoE), which replaces traditional gating with dynamic attention weights derived from decoupled features. With these modules, our DeMo can generate more robust multi-modal features. Extensive experiments on three multi-modal object ReID benchmarks fully verify the effectiveness of our methods. The source code is available at https://github.com/924973292/DeMo., Comment: This work is accepted by AAAI2025. More motifications may be performed
- Published
- 2024
15. Synthetic multi-dimensional Aharonov-Bohm cages in Fock state lattices
- Author
-
Zhang, Jiajian, Huang, Wenhui, Chu, Ji, Qiu, Jiawei, Sun, Xuandong, Tao, Ziyu, Zhang, Jiawei, Zhang, Libo, Zhou, Yuxuan, Chen, Yuanzhen, Liu, Yang, Liu, Song, Zhong, Youpeng, Miao, Jian-Jian, Niu, Jingjing, and Yu, Dapeng
- Subjects
Quantum Physics - Abstract
Fock-state lattices (FSLs), composed of photon number states with infinite Hilbert space, have emerged as a promising platform for simulating high-dimensional physics due to their potential to extend into arbitrarily high dimensions. Here, we demonstrate the construction of multi-dimensional FSLs using superconducting quantum circuits. By controlling artificial gauge fields within their internal structures, we investigate flux-induced extreme localization dynamics, such as Aharonov-Bohm caging, extending from 2D to 3D. We also explore the coherent interference of quantum superposition states, achieving extreme localization within specific subspaces assisted by quantum entanglement. Our findings pave the way for manipulating the behavior of a broad class of quantum states in higher-dimensional systems., Comment: 6+23 pages; 4+18 figures
- Published
- 2024
16. A Comprehensive Study on Dark Patterns
- Author
-
Li, Meng, Wang, Xiang, Nie, Liming, Li, Chenglin, Liu, Yang, Zhao, Yangyang, Xue, Lei, and Said, Kabir Sulaiman
- Subjects
Computer Science - Human-Computer Interaction - Abstract
As digital interfaces become increasingly prevalent, certain manipulative design elements have emerged that may harm user interests, raising associated ethical concerns and bringing dark patterns into focus as a significant research topic. Manipulative design strategies are widely used in user interfaces (UI) primarily to guide user behavior in ways that favor service providers, often at the cost of the users themselves. This paper addresses three main challenges in dark pattern research: inconsistencies and incompleteness in classification, limitations of detection tools, and insufficient comprehensiveness in existing datasets. In this study, we propose a comprehensive analytical framework--the Dark Pattern Analysis Framework (DPAF). Using this framework, we developed a taxonomy comprising 68 types of dark patterns, each annotated in detail to illustrate its impact on users, potential scenarios, and real-world examples, validated through industry surveys. Furthermore, we evaluated the effectiveness of current detection tools and assessed the completeness of available datasets. Our findings indicate that, among the 8 detection tools studied, only 31 types of dark patterns are identifiable, resulting in a coverage rate of just 45.5%. Similarly, our analysis of four datasets, encompassing 5,561 instances, reveals coverage of only 30 types of dark patterns, with an overall coverage rate of 44%. Based on the available datasets, we standardized classifications and merged datasets to form a unified image dataset and a unified text dataset. These results highlight significant room for improvement in the field of dark pattern detection. This research not only deepens our understanding of dark pattern classification and detection tools but also offers valuable insights for future research and practice in this domain., Comment: 29 pages
- Published
- 2024
17. Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
- Author
-
Song, Xinshuai, Chen, Weixing, Liu, Yang, Chen, Weikai, Li, Guanbin, and Lin, Liang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Existing Vision-Language Navigation (VLN) methods primarily focus on single-stage navigation, limiting their effectiveness in multi-stage and long-horizon tasks within complex and dynamic environments. To address these limitations, we propose a novel VLN task, named Long-Horizon Vision-Language Navigation (LH-VLN), which emphasizes long-term planning and decision consistency across consecutive subtasks. Furthermore, to support LH-VLN, we develop an automated data generation platform NavGen, which constructs datasets with complex task structures and improves data utility through a bidirectional, multi-granularity generation approach. To accurately evaluate complex tasks, we construct the Long-Horizon Planning and Reasoning in VLN (LHPR-VLN) benchmark consisting of 3,260 tasks with an average of 150 task steps, serving as the first dataset specifically designed for the long-horizon vision-language navigation task. Furthermore, we propose Independent Success Rate (ISR), Conditional Success Rate (CSR), and CSR weight by Ground Truth (CGT) metrics, to provide fine-grained assessments of task completion. To improve model adaptability in complex tasks, we propose a novel Multi-Granularity Dynamic Memory (MGDM) module that integrates short-term memory blurring with long-term memory retrieval to enable flexible navigation in dynamic environments. Our platform, benchmark and method supply LH-VLN with a robust data generation pipeline, comprehensive model evaluation dataset, reasonable metrics, and a novel VLN model, establishing a foundational framework for advancing LH-VLN., Comment: A novel Vision-Language Navigation task: Long-Horizon Vision-Language Navigation
- Published
- 2024
18. Bayesian optimized deep ensemble for uncertainty quantification of deep neural networks: a system safety case study on sodium fast reactor thermal stratification modeling
- Author
-
Abulawi, Zaid, Hu, Rui, Balaprakash, Prasanna, and Liu, Yang
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Accurate predictions and uncertainty quantification (UQ) are essential for decision-making in risk-sensitive fields such as system safety modeling. Deep ensembles (DEs) are efficient and scalable methods for UQ in Deep Neural Networks (DNNs); however, their performance is limited when constructed by simply retraining the same DNN multiple times with randomly sampled initializations. To overcome this limitation, we propose a novel method that combines Bayesian optimization (BO) with DE, referred to as BODE, to enhance both predictive accuracy and UQ. We apply BODE to a case study involving a Densely connected Convolutional Neural Network (DCNN) trained on computational fluid dynamics (CFD) data to predict eddy viscosity in sodium fast reactor thermal stratification modeling. Compared to a manually tuned baseline ensemble, BODE estimates total uncertainty approximately four times lower in a noise-free environment, primarily due to the baseline's overestimation of aleatoric uncertainty. Specifically, BODE estimates aleatoric uncertainty close to zero, while aleatoric uncertainty dominates the total uncertainty in the baseline ensemble. We also observe a reduction of more than 30% in epistemic uncertainty. When Gaussian noise with standard deviations of 5% and 10% is introduced into the data, BODE accurately fits the data and estimates uncertainty that aligns with the data noise. These results demonstrate that BODE effectively reduces uncertainty and enhances predictions in data-driven models, making it a flexible approach for various applications requiring accurate predictions and robust UQ.
- Published
- 2024
19. Diversity Drives Fairness: Ensemble of Higher Order Mutants for Intersectional Fairness of Machine Learning Software
- Author
-
Chen, Zhenpeng, Li, Xinyue, Zhang, Jie M., Sarro, Federica, and Liu, Yang
- Subjects
Computer Science - Machine Learning ,Computer Science - Software Engineering - Abstract
Intersectional fairness is a critical requirement for Machine Learning (ML) software, demanding fairness across subgroups defined by multiple protected attributes. This paper introduces FairHOME, a novel ensemble approach using higher order mutation of inputs to enhance intersectional fairness of ML software during the inference phase. Inspired by social science theories highlighting the benefits of diversity, FairHOME generates mutants representing diverse subgroups for each input instance, thus broadening the array of perspectives to foster a fairer decision-making process. Unlike conventional ensemble methods that combine predictions made by different models, FairHOME combines predictions for the original input and its mutants, all generated by the same ML model, to reach a final decision. Notably, FairHOME is even applicable to deployed ML software as it bypasses the need for training new models. We extensively evaluate FairHOME against seven state-of-the-art fairness improvement methods across 24 decision-making tasks using widely adopted metrics. FairHOME consistently outperforms existing methods across all metrics considered. On average, it enhances intersectional fairness by 47.5%, surpassing the currently best-performing method by 9.6 percentage points., Comment: Accepted by the 47th International Conference on Software Engineering (ICSE 2025). Please include ICSE in any citations
- Published
- 2024
20. Terabit-class coherent communications enabled by an integrated photonics erbium doped amplifier
- Author
-
Che, Di, Grillanda, Stefano, Liu, Yang, Qiu, Zheru, Ji, Xinru, Raybon, Gregory, Chen, Xi, Kim, Kwangwoong, Kippenberg, Tobias J., and Blanco-Redondo, Andrea
- Subjects
Physics - Optics - Abstract
Coherent technologies have revolutionized optical communications, driving the capacity per fiber to multi-terabit per second (Tb/s) in combination with wavelength division multiplexing (WDM). With an ever-increasing deployment density of coherent systems, the demand for highly integrated WDM coherent transceivers has been rising. While tremendous progress has been made on silicon photonics compatible high-speed modulation and photodetection on chip, a solution for monolithically integrable amplifier with high gain and output power remains a challenge. Recently, an erbium doped waveguide amplifier based on ultra-low loss silicon nitride waveguides has demonstrated gain and output power levels potentially suitable for Terabit class coherent communications. Here, we demonstrate a WDM coherent system enabled by this integrated photonic amplification solution. The system uses the waveguide amplifier as a booster amplifier of 16 WDM signals each carrying a net data rate of 1.6 Tb/s, achieving 25.6-Tb/s net capacity over 81-km fiber transmission. Our results highlight a fully integrated solution for highly parallel coherent transceivers including amplification, that has the potential to transform future optical communications.
- Published
- 2024
21. MAGIC: Mastering Physical Adversarial Generation in Context through Collaborative LLM Agents
- Author
-
Xing, Yun, Chung, Nhat, Zhang, Jie, Cao, Yue, Tsang, Ivor, Liu, Yang, Ma, Lei, and Guo, Qing
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Physical adversarial attacks in driving scenarios can expose critical vulnerabilities in visual perception models. However, developing such attacks remains challenging due to diverse real-world backgrounds and the requirement for maintaining visual naturality. Building upon this challenge, we reformulate physical adversarial attacks as a one-shot patch-generation problem. Our approach generates adversarial patches through a deep generative model that considers the specific scene context, enabling direct physical deployment in matching environments. The primary challenge lies in simultaneously achieving two objectives: generating adversarial patches that effectively mislead object detection systems while determining contextually appropriate placement within the scene. We propose MAGIC (Mastering Physical Adversarial Generation In Context), a novel framework powered by multi-modal LLM agents to address these challenges. MAGIC automatically understands scene context and orchestrates adversarial patch generation through the synergistic interaction of language and vision capabilities. MAGIC orchestrates three specialized LLM agents: The adv-patch generation agent (GAgent) masters the creation of deceptive patches through strategic prompt engineering for text-to-image models. The adv-patch deployment agent (DAgent) ensures contextual coherence by determining optimal placement strategies based on scene understanding. The self-examination agent (EAgent) completes this trilogy by providing critical oversight and iterative refinement of both processes. We validate our method on both digital and physical level, \ie, nuImage and manually captured real scenes, where both statistical and visual results prove that our MAGIC is powerful and effectively for attacking wide-used object detection systems.
- Published
- 2024
22. Timely reliable Bayesian decision-making enabled using memristors
- Author
-
Song, Lekai, Liu, Pengyu, Liu, Yang, Pei, Jingfang, Cui, Wenyu, Liu, Songwei, Wen, Yingyi, Ma, Teng, Pun, Kong-Pang, and Hu, Guohua
- Subjects
Computer Science - Machine Learning ,Computer Science - Hardware Architecture - Abstract
Brains perform timely reliable decision-making by Bayes theorem. Bayes theorem quantifies events as probabilities and, through probability rules, renders the decisions. Learning from this, applying Bayes theorem in practical problems can visualize the potential risks and decision confidence, thereby enabling efficient user-scene interactions. However, given the probabilistic nature, implementing Bayes theorem with the conventional deterministic computing can inevitably induce excessive computational cost and decision latency. Herein, we propose a probabilistic computing approach using memristors to implement Bayes theorem. We integrate volatile memristors with Boolean logics and, by exploiting the volatile stochastic switching of the memristors, realize Boolean operations with statistical probabilities and correlations, key for enabling Bayes theorem. To practically demonstrate the effectiveness of our memristor-enabled Bayes theorem approach in user-scene interactions, we design lightweight Bayesian inference and fusion operators using our probabilistic logics and apply the operators in road scene parsing for self-driving, including route planning and obstacle detection. The results show that our operators can achieve reliable decisions at a rate over 2,500 frames per second, outperforming human decision-making and the existing driving assistance systems.
- Published
- 2024
23. InfiniteWorld: A Unified Scalable Simulation Framework for General Visual-Language Robot Interaction
- Author
-
Ren, Pengzhen, Li, Min, Luo, Zhen, Song, Xinshuai, Chen, Ziwei, Liufu, Weijia, Yang, Yixuan, Zheng, Hao, Xu, Rongtao, Huang, Zitong, Ding, Tongsheng, Xie, Luyang, Zhang, Kaidong, Fu, Changfei, Liu, Yang, Lin, Liang, Zheng, Feng, and Liang, Xiaodan
- Subjects
Computer Science - Robotics - Abstract
Realizing scaling laws in embodied AI has become a focus. However, previous work has been scattered across diverse simulation platforms, with assets and models lacking unified interfaces, which has led to inefficiencies in research. To address this, we introduce InfiniteWorld, a unified and scalable simulator for general vision-language robot interaction built on Nvidia Isaac Sim. InfiniteWorld encompasses a comprehensive set of physics asset construction methods and generalized free robot interaction benchmarks. Specifically, we first built a unified and scalable simulation framework for embodied learning that integrates a series of improvements in generation-driven 3D asset construction, Real2Sim, automated annotation framework, and unified 3D asset processing. This framework provides a unified and scalable platform for robot interaction and learning. In addition, to simulate realistic robot interaction, we build four new general benchmarks, including scene graph collaborative exploration and open-world social mobile manipulation. The former is often overlooked as an important task for robots to explore the environment and build scene knowledge, while the latter simulates robot interaction tasks with different levels of knowledge agents based on the former. They can more comprehensively evaluate the embodied agent's capabilities in environmental understanding, task planning and execution, and intelligent interaction. We hope that this work can provide the community with a systematic asset interface, alleviate the dilemma of the lack of high-quality assets, and provide a more comprehensive evaluation of robot interactions., Comment: 8 pages, 5 figures
- Published
- 2024
24. Electrically functionalized body surface for deep-tissue bioelectrical recording
- Author
-
Zhang, Dehui, Zhang, Yucheng, Xu, Dong, Wang, Shaolei, Wang, Kaidong, Zhou, Boxuan, Ling, Yansong, Liu, Yang, Cui, Qingyu, Yin, Junyi, Zhu, Enbo, Zhao, Xun, Wan, Chengzhang, Chen, Jun, Hsiai, Tzung K., Huang, Yu, and Duan, Xiangfeng
- Subjects
Physics - Medical Physics ,Electrical Engineering and Systems Science - Signal Processing ,Physics - Biological Physics - Abstract
Directly probing deep tissue activities from body surfaces offers a noninvasive approach to monitoring essential physiological processes1-3. However, this method is technically challenged by rapid signal attenuation toward the body surface and confounding motion artifacts4-6 primarily due to excessive contact impedance and mechanical mismatch with conventional electrodes. Herein, by formulating and directly spray coating biocompatible two-dimensional nanosheet ink onto the human body under ambient conditions, we create microscopically conformal and adaptive van der Waals thin films (VDWTFs) that seamlessly merge with non-Euclidean, hairy, and dynamically evolving body surfaces. Unlike traditional deposition methods, which often struggle with conformality and adaptability while retaining high electronic performance, this gentle process enables the formation of high-performance VDWTFs directly on the body surface under bio-friendly conditions, making it ideal for biological applications. This results in low-impedance electrically functionalized body surfaces (EFBS), enabling highly robust monitoring of biopotential and bioimpedance modulations associated with deep-tissue activities, such as blood circulation, muscle movements, and brain activities. Compared to commercial solutions, our VDWTF-EFBS exhibits nearly two-orders of magnitude lower contact impedance and substantially reduces the extrinsic motion artifacts, enabling reliable extraction of bioelectrical signals from irregular surfaces, such as unshaved human scalps. This advancement defines a technology for continuous, noninvasive monitoring of deep-tissue activities during routine body movements.
- Published
- 2024
25. Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning
- Author
-
Peng, Wujian, Meng, Lingchen, Chen, Yitong, Xie, Yiweng, Liu, Yang, Gui, Tao, Xu, Hang, Qiu, Xipeng, Wu, Zuxuan, and Jiang, Yu-Gang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Large Multimodal Models (LMMs) have made significant breakthroughs with the advancement of instruction tuning. However, while existing models can understand images and videos at a holistic level, they still struggle with instance-level understanding that requires a more nuanced comprehension and alignment. Instance-level understanding is crucial, as it focuses on the specific elements that we are most interested in. Excitingly, existing works find that the state-of-the-art LMMs exhibit strong instance understanding capabilities when provided with explicit visual cues. Motivated by this, we introduce an automated annotation pipeline assisted by GPT-4o to extract instance-level information from images and videos through explicit visual prompting for instance guidance. Building upon this pipeline, we proposed Inst-IT, a solution to enhance LMMs in Instance understanding via explicit visual prompt Instruction Tuning. Inst-IT consists of a benchmark to diagnose multimodal instance-level understanding, a large-scale instruction-tuning dataset, and a continuous instruction-tuning training paradigm to effectively enhance spatial-temporal instance understanding capabilities of existing LMMs. Experimental results show that, with the boost of Inst-IT, our models not only achieve outstanding performance on Inst-IT Bench but also demonstrate significant improvements across various generic image and video understanding benchmarks. This highlights that our dataset not only boosts instance-level understanding but also strengthens the overall capabilities of generic image and video comprehension., Comment: Project page at https://inst-it.github.io
- Published
- 2024
26. Simulation of dark scalar particle sensitivity in $\eta$ rare decay channels at HIAF
- Author
-
Liu, Yang, Wang, Rong, Mushtaq, Zaiba, Tian, Ye, He, Xionghong, Qiu, Hao, and Chen, Xurong
- Subjects
High Energy Physics - Phenomenology - Abstract
Searching dark portal particle is a hot topic in particle physics frontier. We present a simulation study of an experiment targeted for searching the scalar portal particle at Huizhou $\eta$ factory. The HIAF high-intensity proton beam and a high event-rate spectrometer are suggested for the experiment aimed for the discovery of new physics. Under the conservative estimation, $5.9\times 10^{11}$ $\eta$ events could be produced in one month running of the experiment. The hadronic production of $\eta$ meson ($p + ^7\text{Li} \rightarrow \eta X$) is simulated at beam energy of 1.8 GeV using GiBUU event generator. We tend to search for the light dark scalar particle in the rare decay channels $\eta \rightarrow S \pi^0 \rightarrow \pi^+ \pi^- \pi^0$ and $\eta \rightarrow S \pi^0 \rightarrow e^+ e^- \pi^0$. The detection efficiencies of the channels and the spectrometer resolutions are studied in the simulation. We also present the projected upper limits of the decay branching ratios of the dark scalar particle and the projected sensitivities to the model parameters., Comment: 11 pages, 19 figures
- Published
- 2024
27. Robust Multi-bit Text Watermark with LLM-based Paraphrasers
- Author
-
Xu, Xiaojun, Jia, Jinghan, Yao, Yuanshun, Liu, Yang, and Li, Hang
- Subjects
Computer Science - Artificial Intelligence - Abstract
We propose an imperceptible multi-bit text watermark embedded by paraphrasing with LLMs. We fine-tune a pair of LLM paraphrasers that are designed to behave differently so that their paraphrasing difference reflected in the text semantics can be identified by a trained decoder. To embed our multi-bit watermark, we use two paraphrasers alternatively to encode the pre-defined binary code at the sentence level. Then we use a text classifier as the decoder to decode each bit of the watermark. Through extensive experiments, we show that our watermarks can achieve over 99.99\% detection AUC with small (1.1B) text paraphrasers while keeping the semantic information of the original sentence. More importantly, our pipeline is robust under word substitution and sentence paraphrasing perturbations and generalizes well to out-of-distributional data. We also show the stealthiness of our watermark with LLM-based evaluation. We open-source the code: https://github.com/xiaojunxu/multi-bit-text-watermark.
- Published
- 2024
28. BDefects4NN: A Backdoor Defect Database for Controlled Localization Studies in Neural Networks
- Author
-
Xiao, Yisong, Liu, Aishan, Zhang, Xinwei, Zhang, Tianyuan, Li, Tianlin, Liang, Siyuan, Liu, Xianglong, Liu, Yang, and Tao, Dacheng
- Subjects
Computer Science - Software Engineering - Abstract
Pre-trained large deep learning models are now serving as the dominant component for downstream middleware users and have revolutionized the learning paradigm, replacing the traditional approach of training from scratch locally. To reduce development costs, developers often integrate third-party pre-trained deep neural networks (DNNs) into their intelligent software systems. However, utilizing untrusted DNNs presents significant security risks, as these models may contain intentional backdoor defects resulting from the black-box training process. These backdoor defects can be activated by hidden triggers, allowing attackers to maliciously control the model and compromise the overall reliability of the intelligent software. To ensure the safe adoption of DNNs in critical software systems, it is crucial to establish a backdoor defect database for localization studies. This paper addresses this research gap by introducing BDefects4NN, the first backdoor defect database, which provides labeled backdoor-defected DNNs at the neuron granularity and enables controlled localization studies of defect root causes. In BDefects4NN, we define three defect injection rules and employ four representative backdoor attacks across four popular network architectures and three widely adopted datasets, yielding a comprehensive database of 1,654 backdoor-defected DNNs with four defect quantities and varying infected neurons. Based on BDefects4NN, we conduct extensive experiments on evaluating six fault localization criteria and two defect repair techniques, which show limited effectiveness for backdoor defects. Additionally, we investigate backdoor-defected models in practical scenarios, specifically in lane detection for autonomous driving and large language models (LLMs), revealing potential threats and highlighting current limitations in precise defect localization., Comment: 11 pages, accepted by ICSE 2025
- Published
- 2024
29. Hybrid Discriminative Attribute-Object Embedding Network for Compositional Zero-Shot Learning
- Author
-
Liu, Yang, Wang, Xinshuo, Du, Jiale, Gao, Xinbo, and Han, Jungong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Compositional Zero-Shot Learning (CZSL) recognizes new combinations by learning from known attribute-object pairs. However, the main challenge of this task lies in the complex interactions between attributes and object visual representations, which lead to significant differences in images. In addition, the long-tail label distribution in the real world makes the recognition task more complicated. To address these problems, we propose a novel method, named Hybrid Discriminative Attribute-Object Embedding (HDA-OE) network. To increase the variability of training data, HDA-OE introduces an attribute-driven data synthesis (ADDS) module. ADDS generates new samples with diverse attribute labels by combining multiple attributes of the same object. By expanding the attribute space in the dataset, the model is encouraged to learn and distinguish subtle differences between attributes. To further improve the discriminative ability of the model, HDA-OE introduces the subclass-driven discriminative embedding (SDDE) module, which enhances the subclass discriminative ability of the encoding by embedding subclass information in a fine-grained manner, helping to capture the complex dependencies between attributes and object visual features. The proposed model has been evaluated on three benchmark datasets, and the results verify its effectiveness and reliability.
- Published
- 2024
30. Relation-Aware Meta-Learning for Zero-shot Sketch-Based Image Retrieval
- Author
-
Liu, Yang, Du, Jiale, Gao, Xinbo, and Han, Jungong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Sketch-based image retrieval (SBIR) relies on free-hand sketches to retrieve natural photos within the same class. However, its practical application is limited by its inability to retrieve classes absent from the training set. To address this limitation, the task has evolved into Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR), where model performance is evaluated on unseen categories. Traditional SBIR primarily focuses on narrowing the domain gap between photo and sketch modalities. However, in the zero-shot setting, the model not only needs to address this cross-modal discrepancy but also requires a strong generalization capability to transfer knowledge to unseen categories. To this end, we propose a novel framework for ZS-SBIR that employs a pair-based relation-aware quadruplet loss to bridge feature gaps. By incorporating two negative samples from different modalities, the approach prevents positive features from becoming disproportionately distant from one modality while remaining close to another, thus enhancing inter-class separability. We also propose a Relation-Aware Meta-Learning Network (RAMLN) to obtain the margin, a hyper-parameter of cross-modal quadruplet loss, to improve the generalization ability of the model. RAMLN leverages external memory to store feature information, which it utilizes to assign optimal margin values. Experimental results obtained on the extended Sketchy and TU-Berlin datasets show a sharp improvement over existing state-of-the-art methods in ZS-SBIR.
- Published
- 2024
31. SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments
- Author
-
Cao, Yue, Xing, Yun, Zhang, Jie, Lin, Di, Zhang, Tianwei, Tsang, Ivor, Liu, Yang, and Guo, Qing
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Large vision-language models (LVLMs) have shown remarkable capabilities in interpreting visual content. While existing works demonstrate these models' vulnerability to deliberately placed adversarial texts, such texts are often easily identifiable as anomalous. In this paper, we present the first approach to generate scene-coherent typographic adversarial attacks that mislead advanced LVLMs while maintaining visual naturalness through the capability of the LLM-based agent. Our approach addresses three critical questions: what adversarial text to generate, where to place it within the scene, and how to integrate it seamlessly. We propose a training-free, multi-modal LLM-driven scene-coherent typographic adversarial planning (SceneTAP) that employs a three-stage process: scene understanding, adversarial planning, and seamless integration. The SceneTAP utilizes chain-of-thought reasoning to comprehend the scene, formulate effective adversarial text, strategically plan its placement, and provide detailed instructions for natural integration within the image. This is followed by a scene-coherent TextDiffuser that executes the attack using a local diffusion mechanism. We extend our method to real-world scenarios by printing and placing generated patches in physical environments, demonstrating its practical implications. Extensive experiments show that our scene-coherent adversarial text successfully misleads state-of-the-art LVLMs, including ChatGPT-4o, even after capturing new images of physical setups. Our evaluations demonstrate a significant increase in attack success rates while maintaining visual naturalness and contextual appropriateness. This work highlights vulnerabilities in current vision-language models to sophisticated, scene-coherent adversarial attacks and provides insights into potential defense mechanisms.
- Published
- 2024
32. Know Your Account: Double Graph Inference-based Account De-anonymization on Ethereum
- Author
-
Miao, Shuyi, Qiu, Wangjie, Zheng, Hongwei, Zhang, Qinnan, Tu, Xiaofan, Liu, Xunan, Liu, Yang, Dong, Jin, and Zheng, Zhiming
- Subjects
Computer Science - Social and Information Networks - Abstract
The scaled Web 3.0 digital economy, represented by decentralized finance (DeFi), has sparked increasing interest in the past few years, which usually relies on blockchain for token transfer and diverse transaction logic. However, illegal behaviors, such as financial fraud, hacker attacks, and money laundering, are rampant in the blockchain ecosystem and seriously threaten its integrity and security. In this paper, we propose a novel double graph-based Ethereum account de-anonymization inference method, dubbed DBG4ETH, which aims to capture the behavioral patterns of accounts comprehensively and has more robust analytical and judgment capabilities for current complex and continuously generated transaction behaviors. Specifically, we first construct a global static graph to build complex interactions between the various account nodes for all transaction data. Then, we also construct a local dynamic graph to learn about the gradual evolution of transactions over different periods. Different graphs focus on information from different perspectives, and features of global and local, static and dynamic transaction graphs are available through DBG4ETH. In addition, we propose an adaptive confidence calibration method to predict the results by feeding the calibrated weighted prediction values into the classifier. Experimental results show that DBG4ETH achieves state-of-the-art results in the account identification task, improving the F1-score by at least 3.75% and up to 40.52% compared to processing each graph type individually and outperforming similar account identity inference methods by 5.23% to 12.91%.
- Published
- 2024
33. Weakly Supervised Framework Considering Multi-temporal Information for Large-scale Cropland Mapping with Satellite Imagery
- Author
-
Wang, Yuze, Hu, Aoran, Qi, Ji, Liu, Yang, and Tao, Chao
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Accurately mapping large-scale cropland is crucial for agricultural production management and planning. Currently, the combination of remote sensing data and deep learning techniques has shown outstanding performance in cropland mapping. However, those approaches require massive precise labels, which are labor-intensive. To reduce the label cost, this study presented a weakly supervised framework considering multi-temporal information for large-scale cropland mapping. Specifically, we extract high-quality labels according to their consistency among global land cover (GLC) products to construct the supervised learning signal. On the one hand, to alleviate the overfitting problem caused by the model's over-trust of remaining errors in high-quality labels, we encode the similarity/aggregation of cropland in the visual/spatial domain to construct the unsupervised learning signal, and take it as the regularization term to constrain the supervised part. On the other hand, to sufficiently leverage the plentiful information in the samples without high-quality labels, we also incorporate the unsupervised learning signal in these samples, enriching the diversity of the feature space. After that, to capture the phenological features of croplands, we introduce dense satellite image time series (SITS) to extend the proposed framework in the temporal dimension. We also visualized the high dimensional phenological features to uncover how multi-temporal information benefits cropland extraction, and assessed the method's robustness under conditions of data scarcity. The proposed framework has been experimentally validated for strong adaptability across three study areas (Hunan Province, Southeast France, and Kansas) in large-scale cropland mapping, and the internal mechanism and temporal generalizability are also investigated.
- Published
- 2024
34. Decoding Urban Industrial Complexity: Enhancing Knowledge-Driven Insights via IndustryScopeGPT
- Author
-
Wang, Siqi, Liang, Chao, Gao, Yunfan, Liu, Yang, Li, Jing, and Wang, Haofen
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computers and Society ,Computer Science - Social and Information Networks ,I.2.0 ,I.2.7 ,H.3.3 ,H.4.0 - Abstract
Industrial parks are critical to urban economic growth. Yet, their development often encounters challenges stemming from imbalances between industrial requirements and urban services, underscoring the need for strategic planning and operations. This paper introduces IndustryScopeKG, a pioneering large-scale multi-modal, multi-level industrial park knowledge graph, which integrates diverse urban data including street views, corporate, socio-economic, and geospatial information, capturing the complex relationships and semantics within industrial parks. Alongside this, we present the IndustryScopeGPT framework, which leverages Large Language Models (LLMs) with Monte Carlo Tree Search to enhance tool-augmented reasoning and decision-making in Industrial Park Planning and Operation (IPPO). Our work significantly improves site recommendation and functional planning, demonstrating the potential of combining LLMs with structured datasets to advance industrial park management. This approach sets a new benchmark for intelligent IPPO research and lays a robust foundation for advancing urban industrial development. The dataset and related code are available at https://github.com/Tongji-KGLLM/IndustryScope., Comment: 9 pages, 6 figures, the 32nd ACM International Conference on Multimedia
- Published
- 2024
- Full Text
- View/download PDF
35. Interactive Visual Assessment for Text-to-Image Generation Models
- Author
-
Mi, Xiaoyue, Tang, Fan, Cao, Juan, Sheng, Qiang, Huang, Ziyao, Li, Peng, Liu, Yang, and Lee, Tong-Yee
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Visual generation models have achieved remarkable progress in computer graphics applications but still face significant challenges in real-world deployment. Current assessment approaches for visual generation tasks typically follow an isolated three-phase framework: test input collection, model output generation, and user assessment. These fashions suffer from fixed coverage, evolving difficulty, and data leakage risks, limiting their effectiveness in comprehensively evaluating increasingly complex generation models. To address these limitations, we propose DyEval, an LLM-powered dynamic interactive visual assessment framework that facilitates collaborative evaluation between humans and generative models for text-to-image systems. DyEval features an intuitive visual interface that enables users to interactively explore and analyze model behaviors, while adaptively generating hierarchical, fine-grained, and diverse textual inputs to continuously probe the capability boundaries of the models based on their feedback. Additionally, to provide interpretable analysis for users to further improve tested models, we develop a contextual reflection module that mines failure triggers of test inputs and reflects model potential failure patterns supporting in-depth analysis using the logical reasoning ability of LLM. Qualitative and quantitative experiments demonstrate that DyEval can effectively help users identify max up to 2.56 times generation failures than conventional methods, and uncover complex and rare failure patterns, such as issues with pronoun generation and specific cultural context generation. Our framework provides valuable insights for improving generative models and has broad implications for advancing the reliability and capabilities of visual generation systems across various domains., Comment: Under Review
- Published
- 2024
36. On Approximability of Satisfiable $k$-CSPs: VI
- Author
-
Bhangale, Amey, Khot, Subhash, Liu, Yang P., and Minzer, Dor
- Subjects
Computer Science - Computational Complexity ,Mathematics - Combinatorics - Abstract
We prove local and global inverse theorems for general $3$-wise correlations over pairwise-connected distributions. Let $\mu$ be a distribution over $\Sigma \times \Gamma \times \Phi$ such that the supports of $\mu_{xy}$, $\mu_{xz}$, and $\mu_{yz}$ are all connected, and let $f: \Sigma^n \to \mathbb{C}$, $g: \Gamma^n \to \mathbb{C}$, $h: \Phi^n \to \mathbb{C}$ be $1$-bounded functions satisfying \[ \left|\mathbb{E}_{(x,y,z) \sim \mu^{\otimes n}}[f(x)g(y)h(z)]\right| \geq \varepsilon. \] In this setting, our local inverse theorem asserts that there is $\delta :=\textsf{exp}(-\varepsilon^{-O_{\mu}(1)})$ such that with probability at least $\delta$, a random restriction of $f$ down to $\delta n$ coordinates $\delta$-correlates to a product function. To get a global inverse theorem, we prove a restriction inverse theorem for general product functions, stating that if a random restriction of $f$ down to $\delta n$ coordinates is $\delta$-correlated with a product function with probability at least $\delta$, then $f$ is $2^{-\textsf{poly}(\log(1/\delta))}$-correlated with a function of the form $L\cdot P$, where $L$ is a function of degree $\textsf{poly}(1/\delta)$, $\|L\|_2\leq 1$, and $P$ is a product function. We show applications to property testing and to additive combinatorics. In particular, we show the following result via a density increment argument. Let $\Sigma$ be a finite set and $S \subseteq \Sigma \times \Sigma \times \Sigma$ such that: (1) $(x, x, x) \in S$ for all $x \in S$, and (2) the supports of $S_{xy}$, $S_{xz}$, and $S_{yz}$ are all connected. Then, any set $A \subseteq \Sigma^n$ with $|\Sigma|^{-n}|A| \geq \Omega((\log \log \log n)^{-c})$ contains $x, y, z \in A$, not all equal, such that $(x_i,y_i,z_i) \in S$ for all $i$. This gives the first reasonable bounds for the restricted 3-AP problem over finite fields.
- Published
- 2024
37. Reasonable Bounds for Combinatorial Lines of Length Three
- Author
-
Bhangale, Amey, Khot, Subhash, Liu, Yang P., and Minzer, Dor
- Subjects
Mathematics - Combinatorics ,Computer Science - Computational Complexity - Abstract
We prove that any subset $A \subseteq [3]^n$ with $3^{-n}|A| \ge (\log\log\log\log n)^{-c}$ contains a combinatorial line of length $3$, i.e., $x, y, z \in A$, not all equal, with $x_i=y_i=z_i$ or $(x_i,y_i,z_i)=(0,1,2)$ for all $i = 1, 2, \dots, n$. This improves on the previous best bound of $3^{-n}|A| \ge \Omega((\log^* n)^{-1/2})$ of [D.H.J. Polymath, Ann. of Math. 2012].
- Published
- 2024
38. Independent Optical Frequency Combs Powered 546 km Field Test of Twin-Field Quantum Key Distribution
- Author
-
Zhou, Lai, Lin, Jinping, Ge, Chengfang, Fan, Yuanbin, Yuan, Zhiliang, Dong, Hao, Liu, Yang, Ma, Di, Chen, Jiu-Peng, Jiang, Cong, Wang, Xiang-Bin, You, Li-Xing, Zhang, Qiang, and Pan, Jian-Wei
- Subjects
Quantum Physics ,Physics - Applied Physics ,Physics - Optics - Abstract
Owing to its repeater-like rate-loss scaling, twin-field quantum key distribution (TF-QKD) has repeatedly exhibited in laboratory its superiority for secure communication over record fiber lengths. Field trials pose a new set of challenges however, which must be addressed before the technology's roll-out into real-world. Here, we verify in field the viability of using independent optical frequency combs -- installed at sites separated by a straight-line distance of 300~km -- to achieve a versatile TF-QKD setup that has no need for optical frequency dissemination and thus enables an open and network-friendly fiber configuration. Over 546 and 603 km symmetric links, we record a finite-size secure key rate (SKR) of 0.53~bit/s and an asymptotic SKR of 0.12 bit/s, respectively. Of practical importance, the setup is demonstrated to support 44~km fiber asymmetry in the 452 km link. Our work marks an important step towards incorporation of long-haul fiber links into large quantum networks., Comment: To appear in Physical Review Applied
- Published
- 2024
39. Privacy-Preserving Video Anomaly Detection: A Survey
- Author
-
Liu, Jing, Liu, Yang, and Zhu, Xiaoguang
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Cryptography and Security ,Computer Science - Machine Learning - Abstract
Video Anomaly Detection (VAD) aims to automatically analyze spatiotemporal patterns in surveillance videos collected from open spaces to detect anomalous events that may cause harm without physical contact. However, vision-based surveillance systems such as closed-circuit television often capture personally identifiable information. The lack of transparency and interpretability in video transmission and usage raises public concerns about privacy and ethics, limiting the real-world application of VAD. Recently, researchers have focused on privacy concerns in VAD by conducting systematic studies from various perspectives including data, features, and systems, making Privacy-Preserving Video Anomaly Detection (P2VAD) a hotspot in the AI community. However, current research in P2VAD is fragmented, and prior reviews have mostly focused on methods using RGB sequences, overlooking privacy leakage and appearance bias considerations. To address this gap, this article systematically reviews the progress of P2VAD for the first time, defining its scope and providing an intuitive taxonomy. We outline the basic assumptions, learning frameworks, and optimization objectives of various approaches, analyzing their strengths, weaknesses, and potential correlations. Additionally, we provide open access to research resources such as benchmark datasets and available code. Finally, we discuss key challenges and future opportunities from the perspectives of AI development and P2VAD deployment, aiming to guide future work in the field., Comment: 19 pages, 6 figures
- Published
- 2024
40. Global Challenge for Safe and Secure LLMs Track 1
- Author
-
Jia, Xiaojun, Huang, Yihao, Liu, Yang, Tan, Peng Yan, Yau, Weng Kuan, Mak, Mun-Thye, Sim, Xin Ming, Ng, Wee Siong, Ng, See Kiong, Liu, Hanqing, Zhou, Lifeng, Yan, Huanqian, Sun, Xiaobing, Liu, Wei, Wang, Long, Qian, Yiming, Liu, Yong, Yang, Junxiao, Zhang, Zhexin, Lei, Leqi, Chen, Renmiao, Lu, Yida, Cui, Shiyao, Wang, Zizhou, Li, Shaohua, Wang, Yan, Goh, Rick Siow Mong, Zhen, Liangli, Zhang, Yingjie, and Zhao, Zhe
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence ,Computer Science - Computers and Society - Abstract
This paper introduces the Global Challenge for Safe and Secure Large Language Models (LLMs), a pioneering initiative organized by AI Singapore (AISG) and the CyberSG R&D Programme Office (CRPO) to foster the development of advanced defense mechanisms against automated jailbreaking attacks. With the increasing integration of LLMs in critical sectors such as healthcare, finance, and public administration, ensuring these models are resilient to adversarial attacks is vital for preventing misuse and upholding ethical standards. This competition focused on two distinct tracks designed to evaluate and enhance the robustness of LLM security frameworks. Track 1 tasked participants with developing automated methods to probe LLM vulnerabilities by eliciting undesirable responses, effectively testing the limits of existing safety protocols within LLMs. Participants were challenged to devise techniques that could bypass content safeguards across a diverse array of scenarios, from offensive language to misinformation and illegal activities. Through this process, Track 1 aimed to deepen the understanding of LLM vulnerabilities and provide insights for creating more resilient models.
- Published
- 2024
41. Understanding Chain-of-Thought in LLMs through Information Theory
- Author
-
Ton, Jean-Francois, Taufiq, Muhammad Faaiz, and Liu, Yang
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Large Language Models (LLMs) have shown impressive performance in complex reasoning tasks through Chain-of-Thought (CoT) reasoning, allowing models to break down problems into manageable sub-tasks. However, existing CoT evaluation techniques either require annotated CoT data or fall short in accurately assessing intermediate reasoning steps, leading to high rates of false positives. In this paper, we formalize CoT reasoning in LLMs through an information-theoretic lens. Specifically, our framework quantifies the `information gain' at each reasoning step, enabling the identification of failure modes in LLMs without the need for expensive annotated datasets. We demonstrate the efficacy of our approach through extensive experiments on toy and GSM-8K data, where it significantly outperforms existing outcome-based methods by providing more accurate insights into model performance on individual tasks.
- Published
- 2024
42. CKGFuzzer: LLM-Based Fuzz Driver Generation Enhanced By Code Knowledge Graph
- Author
-
Xu, Hanxiang, Ma, Wei, Zhou, Ting, Zhao, Yanjie, Chen, Kai, Hu, Qiang, Liu, Yang, and Wang, Haoyu
- Subjects
Computer Science - Software Engineering ,Computer Science - Cryptography and Security - Abstract
In recent years, the programming capabilities of large language models (LLMs) have garnered significant attention. Fuzz testing, a highly effective technique, plays a key role in enhancing software reliability and detecting vulnerabilities. However, traditional fuzz testing tools rely on manually crafted fuzz drivers, which can limit both testing efficiency and effectiveness. To address this challenge, we propose an automated fuzz testing method driven by a code knowledge graph and powered by an LLM-based intelligent agent system, referred to as CKGFuzzer. We approach fuzz driver creation as a code generation task, leveraging the knowledge graph of the code repository to automate the generation process within the fuzzing loop, while continuously refining both the fuzz driver and input seeds. The code knowledge graph is constructed through interprocedural program analysis, where each node in the graph represents a code entity, such as a function or a file. The knowledge graph-enhanced CKGFuzzer not only effectively resolves compilation errors in fuzz drivers and generates input seeds tailored to specific API usage scenarios, but also analyzes fuzz driver crash reports, assisting developers in improving code quality. By querying the knowledge graph of the code repository and learning from API usage scenarios, we can better identify testing targets and understand the specific purpose of each fuzz driver. We evaluated our approach using eight open-source software projects. The experimental results indicate that CKGFuzzer achieved an average improvement of 8.73% in code coverage compared to state-of-the-art techniques. Additionally, CKGFuzzer reduced the manual review workload in crash case analysis by 84.4% and successfully detected 11 real bugs (including nine previously unreported bugs) across the tested libraries., Comment: 12 pages, 3 figures
- Published
- 2024
43. Tuneable large nonlinear charge transport driven by the quantum metric at room temperatures in TbMn6Sn6
- Author
-
Zhao, Weiyao, Xing, Kaijian, Zhao, Yufei, Chen, Lei, Hong, Min, Yin, Yuefeng, Liu, Yang, Le, Khoa Dang, Gayles, Jacob, Tang, Fang, Fang, Yong, Yan, Binghai, and Karel, Julie
- Subjects
Condensed Matter - Materials Science - Abstract
Nonlinear electrodynamics in materials manifests as an electronic response that depends on second- or higher-order powers of the applied electromagnetic field. This response is highly dependent on the underlying crystal symmetries in the material and is typically smaller than the linear responses. Nonlinear responses are therefore usually employed to expose the symmetry breaking, geometric properties of the electronic band structure in materials. Naturally, a material system with a strong nonlinear response is also the key component in nonlinear devices. Here we report the strong room-temperature second-harmonic transport response in a quantum magnet,TbMn6Sn6, which is governed by the quantum metric and can be tuned with applied magnetic fields and temperature. We show that around room temperature, which is close to the spontaneous spin-reorientation transition, the magnetic configurations, and therefore the related symmetry breaking phases, are easily controlled. Our results pave the way from quantum materials to high performance tuneable nonlinear device applications at room temperature., Comment: 12 pages, 3 figures
- Published
- 2024
44. AIGS: Generating Science from AI-Powered Automated Falsification
- Author
-
Liu, Zijun, Liu, Kaiming, Zhu, Yiqi, Lei, Xuanyu, Yang, Zonghan, Zhang, Zhenhe, Li, Peng, and Liu, Yang
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Rapid development of artificial intelligence has drastically accelerated the development of scientific discovery. Trained with large-scale observation data, deep neural networks extract the underlying patterns in an end-to-end manner and assist human researchers with highly-precised predictions in unseen scenarios. The recent rise of Large Language Models (LLMs) and the empowered autonomous agents enable scientists to gain help through interaction in different stages of their research, including but not limited to literature review, research ideation, idea implementation, and academic writing. However, AI researchers instantiated by foundation model empowered agents with full-process autonomy are still in their infancy. In this paper, we study $\textbf{AI-Generated Science}$ (AIGS), where agents independently and autonomously complete the entire research process and discover scientific laws. By revisiting the definition of scientific research, we argue that $\textit{falsification}$ is the essence of both human research process and the design of an AIGS system. Through the lens of falsification, prior systems attempting towards AI-Generated Science either lack the part in their design, or rely heavily on existing verification engines that narrow the use in specialized domains. In this work, we propose Baby-AIGS as a baby-step demonstration of a full-process AIGS system, which is a multi-agent system with agents in roles representing key research process. By introducing FalsificationAgent, which identify and then verify possible scientific discoveries, we empower the system with explicit falsification. Experiments on three tasks preliminarily show that Baby-AIGS could produce meaningful scientific discoveries, though not on par with experienced human researchers. Finally, we discuss on the limitations of current Baby-AIGS, actionable insights, and related ethical issues in detail., Comment: Pre-print. 35 pages. Official website: https://agent-force.github.io/AIGS/
- Published
- 2024
45. Movable Antenna Enhanced Networked Full-Duplex Integrated Sensing and Communication System
- Author
-
Guo, Yuan, Chen, Wen, Wu, Qingqing, Liu, Yang, Wu, Qiong, Wang, Kunlun, Li, Jun, and Xu, Lexi
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
Integrated sensing and communication (ISAC) is envisioned as a key technology for future sixth-generation (6G) networks. Classical ISAC system considering monostatic and/or bistatic settings will inevitably degrade both communication and sensing performance due to the limited service coverage and easily blocked transmission paths. Besides, existing ISAC studies usually focus on downlink (DL) or uplink (UL) communication demands and unable to achieve the systematic DL and UL communication tasks. These challenges can be overcome by networked FD ISAC framework. Moreover, ISAC generally considers the trade-off between communication and sensing, unavoidably leading to a loss in communication performance. This shortcoming can be solved by the emerging movable antenna (MA) technology. In this paper, we utilize the MA to promote communication capability with guaranteed sensing performance via jointly designing beamforming, power allocation, receiving filters and MA configuration towards maximizing sum rate. The optimization problem is highly difficult due to the unique channel model deriving from the MA. To resolve this challenge, via leveraging the cutting-the-edge majorization-minimization (MM) method, we develop an efficient solution that optimizes all variables via convex optimization techniques. Extensive simulation results verify the effectiveness of our proposed algorithms and demonstrate the substantial performance promotion by deploying MA in the networked FD ISAC system.
- Published
- 2024
46. Wafer-scale Semiconductor Grafting: Enabling High-Performance, Lattice-Mismatched Heterojunctions
- Author
-
Zhou, Jie, Zhang, Qiming, Gong, Jiarui, Lu, Yi, Liu, Yang, Abbasi, Haris, Qiu, Haining, Kim, Jisoo, Lin, Wei, Kim, Donghyeok, Li, Yiran, Ng, Tien Khee, Jang, Hokyung, Liu, Dong, Wang, Haiyan, Ooi, Boon S., and Ma, Zhenqiang
- Subjects
Physics - Applied Physics ,Condensed Matter - Materials Science - Abstract
Semiconductor heterojunctions are foundational to many advanced electronic and optoelectronic devices. However, achieving high-quality, lattice-mismatched interfaces remains challenging, limiting both scalability and device performance. Semiconductor grafting offers a promising solution by directly forming electrically active, lattice-mismatched heterojunctions between dissimilar materials. However, its scalability and uniformity at the wafer level have yet to be demonstrated. This work demonstrates the achievement of highly uniform, reproducible results across silicon, sapphire, and gallium nitride (GaN) substrates using wafer-scale semiconductor grafting. To illustrate this scalability, we conducted an in-depth study of a grafted Si/GaN heterojunction, examining band alignment through X-ray photoelectron spectroscopy and confirming crystallinity and interfacial integrity with scanning transmission electron microscopy. The resulting p-n diodes exhibit significantly enhanced electrical performance and wafer-scale uniformity compared to conventional approaches. This work establishes wafer-scale semiconductor grafting as a versatile and scalable technology, bridging the gap between laboratory-scale research and industrial manufacturing for heterogeneous semiconductor integration, and paving the way for novel, high-performance electronic and optoelectronic devices., Comment: 23 pages, 6 figures
- Published
- 2024
47. Hybrid skin-topological effect in non-Hermitian checkerboard lattices with large Chern numbers
- Author
-
Zhang, Yi-Ling, Wang, Li-Wei, Liu, Yang, Chen, Zhao-Xian, and Jiang, Jian-Hua
- Subjects
Physics - Optics ,Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
Non-Hermitian topology provides a research frontier for exploring topological phenomena, revealing novel topological effects and driving the development of emergent materials and platforms. Here, we explore the non-Hermitian Chern insulator phases and the hybrid skin-topological effects in checkerboard lattices with synthetic gauge fluxes. Such lattices can be realized in integrated silicon photonic nanocircuits and microresonators as well as in arrays of evanescently coupled helical optical waveguides. With a simple and tunable design, the system is found to support non-Hermitian hybrid skin topological effects, exhibiting corner skin effects when the lattice symmetry either $C_4$ or $C_2$. An unconventional physical mechanism is revealed as the origin of such a transition which is connected to the corner-induced scattering between the multiple chiral edge channels. These properties are enabled by the large Chern number and the rich non-Hermitian topological edge states in our system, revealing the diverse non-Hermitian topological bulk-boundary correspondence. Our design offers excellent controllability and experimental feasibility, making it appealing for studying non-Hermitian topological phenomena.
- Published
- 2024
48. Generically Automating Separation Logic by Functors, Homomorphisms and Modules
- Author
-
Xu, Qiyuan, Sanan, David, Hou, Zhe, Luan, Xiaokun, Watt, Conrad, and Liu, Yang
- Subjects
Computer Science - Programming Languages ,Computer Science - Logic in Computer Science ,F.3.1 ,F.4.1 ,D.3.1 - Abstract
Foundational verification considers the functional correctness of programming languages with formalized semantics and uses proof assistants (e.g., Coq, Isabelle) to certify proofs. The need for verifying complex programs compels it to involve expressive Separation Logics (SLs) that exceed the scopes of well-studied automated proof theories, e.g., symbolic heap. Consequently, automation of SL in foundational verification relies heavily on ad-hoc heuristics that lack a systematic meta-theory and face scalability issues. To mitigate the gap, we propose a theory to specify SL predicates using abstract algebras including functors, homomorphisms, and modules over rings. Based on this theory, we develop a generic SL automation algorithm to reason about any data structures that can be characterized by these algebras. In addition, we also present algorithms for automatically instantiating the algebraic models to real data structures. The instantiation reuses the algebraic models of component structures and preserves their data abstractions. Case studies on formalized imperative semantics show our algorithm can instantiate the algebraic models automatically for a variety of complex data structures. Experimental results indicate the automatically instantiated reasoners from our generic theory show similar results to the state-of-the-art systems made of specifically crafted reasoning rules. The presented theories, proofs, and the verification framework are formalized in Isabelle/HOL., Comment: Accepted by POPL'25
- Published
- 2024
- Full Text
- View/download PDF
49. Collective Pinning and Vortex Dynamics in type 2 superconducting thin films with Varying Magnetic Field
- Author
-
Wu, Yu, Guo, Liangliang, Wang, Renfei, Guo, Jiawei, Jia, Shuang, Tian, Mingliang, Lu, Xiaobo, Guo, Hangwen, Shen, Jian, and Liu, Yang
- Subjects
Condensed Matter - Superconductivity - Abstract
A perpendicular magnetic field penetrating a thin type-II superconductor slab produces vortices, with one vortex per flux quantum, h/2e. The vortices interact repulsively and form an ordered array (Abrikosov lattice) in clean systems, while strong disorder changes the lattice into a vortex glass. Here we investigate type-II superconducting films (PdBi2 and NbSe2) with surface acoustic waves (SAWs) at mK temperature. When sweeping the magnetic field at an extremely slow rate, we observe a series of spikes in the attenuation and velocity of the SAW, on average separated in field by approximately Hc1. We suspect the following scenario: The vortex-free region at the edges of the film produces an edge barrier across which the vortices can enter or leave. When the applied field changes, the induced supercurrents flowing along this edge region lowers this barrier until there is an instability. At that point, vortices avalanche into (or out of) the bulk and change the vortex crystal, suggested by the sharp jump in each such spike. The vortices then gradually relax to a new stable pinned configuration, leading to a ~30s relaxation after the jump. Our observation enriches the limited experimental evidence on the important topic of real-time vortex dynamics in superconductors.
- Published
- 2024
50. NeuroFly: A framework for whole-brain single neuron reconstruction
- Author
-
Zhao, Rubin, Liu, Yang, Zhang, Shiqi, Yi, Zijian, Xiao, Yanyang, Xu, Fang, Yang, Yi, and Zhou, Pencheng
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Quantitative Biology - Quantitative Methods - Abstract
Neurons, with their elongated, tree-like dendritic and axonal structures, enable efficient signal integration and long-range communication across brain regions. By reconstructing individual neurons' morphology, we can gain valuable insights into brain connectivity, revealing the structure basis of cognition, movement, and perception. Despite the accumulation of extensive 3D microscopic imaging data, progress has been considerably hindered by the absence of automated tools to streamline this process. Here we introduce NeuroFly, a validated framework for large-scale automatic single neuron reconstruction. This framework breaks down the process into three distinct stages: segmentation, connection, and proofreading. In the segmentation stage, we perform automatic segmentation followed by skeletonization to generate over-segmented neuronal fragments without branches. During the connection stage, we use a 3D image-based path following approach to extend each fragment and connect it with other fragments of the same neuron. Finally, human annotators are required only to proofread the few unresolved positions. The first two stages of our process are clearly defined computer vision problems, and we have trained robust baseline models to solve them. We validated NeuroFly's efficiency using in-house datasets that include a variety of challenging scenarios, such as dense arborizations, weak axons, images with contamination. We will release the datasets along with a suite of visualization and annotation tools for better reproducibility. Our goal is to foster collaboration among researchers to address the neuron reconstruction challenge, ultimately accelerating advancements in neuroscience research. The dataset and code are available at https://github.com/beanli161514/neurofly
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.