Author: "Zhang,Fan" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhang,Fan"' showing total 43,240 results

Start Over Author "Zhang,Fan"

43,240 results on '"Zhang,Fan"'

1. HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos

Author: Banerjee, Prithviraj, Shkodrani, Sindi, Moulon, Pierre, Hampali, Shreyas, Han, Shangchen, Zhang, Fan, Zhang, Linguang, Fountain, Jade, Miller, Edward, Basol, Selen, Newcombe, Richard, Wang, Robert, Engel, Jakob Julian, and Hodan, Tomas
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Robotics
Abstract: We introduce HOT3D, a publicly available dataset for egocentric hand and object tracking in 3D. The dataset offers over 833 minutes (more than 3.7M images) of multi-view RGB/monochrome image streams showing 19 subjects interacting with 33 diverse rigid objects, multi-modal signals such as eye gaze or scene point clouds, as well as comprehensive ground-truth annotations including 3D poses of objects, hands, and cameras, and 3D models of hands and objects. In addition to simple pick-up/observe/put-down actions, HOT3D contains scenarios resembling typical actions in a kitchen, office, and living room environment. The dataset is recorded by two head-mounted devices from Meta: Project Aria, a research prototype of light-weight AR/AI glasses, and Quest 3, a production VR headset sold in millions of units. Ground-truth poses were obtained by a professional motion-capture system using small optical markers attached to hands and objects. Hand annotations are provided in the UmeTrack and MANO formats and objects are represented by 3D meshes with PBR materials obtained by an in-house scanner. In our experiments, we demonstrate the effectiveness of multi-view egocentric data for three popular tasks: 3D hand tracking, 6DoF object pose estimation, and 3D lifting of unknown in-hand objects. The evaluated multi-view methods, whose benchmarking is uniquely enabled by HOT3D, significantly outperform their single-view counterparts., Comment: arXiv admin note: substantial text overlap with arXiv:2406.09598
Published: 2024

2. DiM-Gestor: Co-Speech Gesture Generation with Adaptive Layer Normalization Mamba-2

Author: Zhang, Fan, Zhao, Siyuan, Ji, Naye, Wang, Zhaohan, Wu, Jingmei, Gao, Fuxing, Ye, Zhenqing, Yan, Leyao, Dai, Lanxin, Geng, Weidong, Lyu, Xin, Zhao, Bozuo, Yu, Dingguo, Du, Hui, and Hu, Bin
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Computer Science - Graphics, Computer Science - Human-Computer Interaction, Computer Science - Multimedia, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Speech-driven gesture generation using transformer-based generative models represents a rapidly advancing area within virtual human creation. However, existing models face significant challenges due to their quadratic time and space complexities, limiting scalability and efficiency. To address these limitations, we introduce DiM-Gestor, an innovative end-to-end generative model leveraging the Mamba-2 architecture. DiM-Gestor features a dual-component framework: (1) a fuzzy feature extractor and (2) a speech-to-gesture mapping module, both built on the Mamba-2. The fuzzy feature extractor, integrated with a Chinese Pre-trained Model and Mamba-2, autonomously extracts implicit, continuous speech features. These features are synthesized into a unified latent representation and then processed by the speech-to-gesture mapping module. This module employs an Adaptive Layer Normalization (AdaLN)-enhanced Mamba-2 mechanism to uniformly apply transformations across all sequence tokens. This enables precise modeling of the nuanced interplay between speech features and gesture dynamics. We utilize a diffusion model to train and infer diverse gesture outputs. Extensive subjective and objective evaluations conducted on the newly released Chinese Co-Speech Gestures dataset corroborate the efficacy of our proposed model. Compared with Transformer-based architecture, the assessments reveal that our approach delivers competitive results and significantly reduces memory usage, approximately 2.4 times, and enhances inference speeds by 2 to 4 times. Additionally, we released the CCG dataset, a Chinese Co-Speech Gestures dataset, comprising 15.97 hours (six styles across five scenarios) of 3D full-body skeleton gesture motion performed by professional Chinese TV broadcasters., Comment: 13 pages, 11 figures
Published: 2024

3. VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

Author: Huang, Ziqi, Zhang, Fan, Xu, Xiaojie, He, Yinan, Yu, Jiashuo, Dong, Ziyue, Ma, Qianli, Chanpaisit, Nattapol, Si, Chenyang, Jiang, Yuming, Wang, Yaohui, Chen, Xinyuan, Chen, Ying-Cong, Wang, Limin, Lin, Dahua, Qiao, Yu, and Liu, Ziwei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Video generation has witnessed significant advancements, yet evaluating these models remains a challenge. A comprehensive evaluation benchmark for video generation is indispensable for two reasons: 1) Existing metrics do not fully align with human perceptions; 2) An ideal evaluation system should provide insights to inform future developments of video generation. To this end, we present VBench, a comprehensive benchmark suite that dissects "video generation quality" into specific, hierarchical, and disentangled dimensions, each with tailored prompts and evaluation methods. VBench has several appealing properties: 1) Comprehensive Dimensions: VBench comprises 16 dimensions in video generation (e.g., subject identity inconsistency, motion smoothness, temporal flickering, and spatial relationship, etc). The evaluation metrics with fine-grained levels reveal individual models' strengths and weaknesses. 2) Human Alignment: We also provide a dataset of human preference annotations to validate our benchmarks' alignment with human perception, for each evaluation dimension respectively. 3) Valuable Insights: We look into current models' ability across various evaluation dimensions, and various content types. We also investigate the gaps between video and image generation models. 4) Versatile Benchmarking: VBench++ supports evaluating text-to-video and image-to-video. We introduce a high-quality Image Suite with an adaptive aspect ratio to enable fair evaluations across different image-to-video generation settings. Beyond assessing technical quality, VBench++ evaluates the trustworthiness of video generative models, providing a more holistic view of model performance. 5) Full Open-Sourcing: We fully open-source VBench++ and continually add new video generation models to our leaderboard to drive forward the field of video generation., Comment: Leaderboard: https://huggingface.co/spaces/Vchitect/VBench_Leaderboard Code: https://github.com/Vchitect/VBench Project page: https://vchitect.github.io/VBench-project/ extension of arXiv:2311.17982. arXiv admin note: substantial text overlap with arXiv:2311.17982
Published: 2024

4. RTSR: A Real-Time Super-Resolution Model for AV1 Compressed Content

Author: Jiang, Yuxuan, Nawała, Jakub, Feng, Chen, Zhang, Fan, Zhu, Xiaoqing, Sole, Joel, and Bull, David
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Super-resolution (SR) is a key technique for improving the visual quality of video content by increasing its spatial resolution while reconstructing fine details. SR has been employed in many applications including video streaming, where compressed low-resolution content is typically transmitted to end users and then reconstructed with a higher resolution and enhanced quality. To support real-time playback, it is important to implement fast SR models while preserving reconstruction quality; however most existing solutions, in particular those based on complex deep neural networks, fail to do so. To address this issue, this paper proposes a low-complexity SR method, RTSR, designed to enhance the visual quality of compressed video content, focusing on resolution up-scaling from a) 360p to 1080p and from b) 540p to 4K. The proposed approach utilizes a CNN-based network architecture, which was optimized for AV1 (SVT)-encoded content at various quantization levels based on a dual-teacher knowledge distillation method. This method was submitted to the AIM 2024 Video Super-Resolution Challenge, specifically targeting the Efficient/Mobile Real-Time Video Super-Resolution competition. It achieved the best trade-off between complexity and coding performance (measured in PSNR, SSIM and VMAF) among all six submissions. The code will be available soon.
Published: 2024

5. Advancing Large Language Models for Spatiotemporal and Semantic Association Mining of Similar Environmental Events

Author: Tian, Yuanyuan, Li, Wenwen, Hu, Lei, Chen, Xiao, Brook, Michael, Brubaker, Michael, Zhang, Fan, and Liljedahl, Anna K.
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence
Abstract: Retrieval and recommendation are two essential tasks in modern search tools. This paper introduces a novel retrieval-reranking framework leveraging Large Language Models (LLMs) to enhance the spatiotemporal and semantic associated mining and recommendation of relevant unusual climate and environmental events described in news articles and web posts. This framework uses advanced natural language processing techniques to address the limitations of traditional manual curation methods in terms of high labor cost and lack of scalability. Specifically, we explore an optimized solution to employ cutting-edge embedding models for semantically analyzing spatiotemporal events (news) and propose a Geo-Time Re-ranking (GT-R) strategy that integrates multi-faceted criteria including spatial proximity, temporal association, semantic similarity, and category-instructed similarity to rank and identify similar spatiotemporal events. We apply the proposed framework to a dataset of four thousand Local Environmental Observer (LEO) Network events, achieving top performance in recommending similar events among multiple cutting-edge dense retrieval models. The search and recommendation pipeline can be applied to a wide range of similar data search tasks dealing with geospatial and temporal data. We hope that by linking relevant events, we can better aid the general public to gain an enhanced understanding of climate change and its impact on different communities.
Published: 2024

6. AsynEIO: Asynchronous Monocular Event-Inertial Odometry Using Gaussian Process Regression

Author: Wang, Zhixiang, Li, Xudong, Zhang, Yizhai, Zhang, Fan, and Panfeng
Subjects: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition
Abstract: Event cameras, when combined with inertial sensors, show significant potential for motion estimation in challenging scenarios, such as high-speed maneuvers and low-light environments. There are many methods for producing such estimations, but most boil down to a synchronous discrete-time fusion problem. However, the asynchronous nature of event cameras and their unique fusion mechanism with inertial sensors remain underexplored. In this paper, we introduce a monocular event-inertial odometry method called AsynEIO, designed to fuse asynchronous event and inertial data within a unified Gaussian Process (GP) regression framework. Our approach incorporates an event-driven frontend that tracks feature trajectories directly from raw event streams at a high temporal resolution. These tracked feature trajectories, along with various inertial factors, are integrated into the same GP regression framework to enable asynchronous fusion. With deriving analytical residual Jacobians and noise models, our method constructs a factor graph that is iteratively optimized and pruned using a sliding-window optimizer. Comparative assessments highlight the performance of different inertial fusion strategies, suggesting optimal choices for varying conditions. Experimental results on both public datasets and our own event-inertial sequences indicate that AsynEIO outperforms existing methods, especially in high-speed and low-illumination scenarios., Comment: Submitted to IEEE (2024-11-4)
Published: 2024

7. BVI-CR: A Multi-View Human Dataset for Volumetric Video Compression

Author: Gao, Ge, Azzarelli, Adrian, Kwan, Ho Man, Anantrasirichai, Nantheera, Zhang, Fan, Moolan-Feroze, Oliver, and Bull, David
Subjects: Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: The advances in immersive technologies and 3D reconstruction have enabled the creation of digital replicas of real-world objects and environments with fine details. These processes generate vast amounts of 3D data, requiring more efficient compression methods to satisfy the memory and bandwidth constraints associated with data storage and transmission. However, the development and validation of efficient 3D data compression methods are constrained by the lack of comprehensive and high-quality volumetric video datasets, which typically require much more effort to acquire and consume increased resources compared to 2D image and video databases. To bridge this gap, we present an open multi-view volumetric human dataset, denoted BVI-CR, which contains 18 multi-view RGB-D captures and their corresponding textured polygonal meshes, depicting a range of diverse human actions. Each video sequence contains 10 views in 1080p resolution with durations between 10-15 seconds at 30FPS. Using BVI-CR, we benchmarked three conventional and neural coordinate-based multi-view video compression methods, following the MPEG MIV Common Test Conditions, and reported their rate quality performance based on various quality metrics. The results show the great potential of neural representation based methods in volumetric video compression compared to conventional video coding methods (with an up to 38\% average coding gain in PSNR). This dataset provides a development and validation platform for a variety of tasks including volumetric reconstruction, compression, and quality assessment. The database will be shared publicly at \url{https://github.com/fan-aaron-zhang/bvi-cr}.
Published: 2024

8. MICCAI-CDMRI 2023 QuantConn Challenge Findings on Achieving Robust Quantitative Connectivity through Harmonized Preprocessing of Diffusion MRI

Author: Newlin, Nancy R., Schilling, Kurt, Koudoro, Serge, Chandio, Bramsh Qamar, Kanakaraj, Praitayini, Moyer, Daniel, Kelly, Claire E., Genc, Sila, Chen, Jian, Yang, Joseph Yuan-Mou, Wu, Ye, He, Yifei, Zhang, Jiawei, Zeng, Qingrun, Zhang, Fan, Adluru, Nagesh, Nath, Vishwesh, Pathak, Sudhir, Schneider, Walter, Gade, Anurag, Rathi, Yogesh, Hendriks, Tom, Vilanova, Anna, Chamberland, Maxime, Pieciak, Tomasz, Ciupek, Dominika, Vega, Antonio Tristán, Aja-Fernández, Santiago, Malawski, Maciej, Ouedraogo, Gani, Machnio, Julia, Ewert, Christian, Thompson, Paul M., Jahanshad, Neda, Garyfallidis, Eleftherios, and Landman, Bennett A.
Subjects: Physics - Medical Physics, Computer Science - Machine Learning
Abstract: White matter alterations are increasingly implicated in neurological diseases and their progression. International-scale studies use diffusion-weighted magnetic resonance imaging (DW-MRI) to qualitatively identify changes in white matter microstructure and connectivity. Yet, quantitative analysis of DW-MRI data is hindered by inconsistencies stemming from varying acquisition protocols. There is a pressing need to harmonize the preprocessing of DW-MRI datasets to ensure the derivation of robust quantitative diffusion metrics across acquisitions. In the MICCAI-CDMRI 2023 QuantConn challenge, participants were provided raw data from the same individuals collected on the same scanner but with two different acquisitions and tasked with preprocessing the DW-MRI to minimize acquisition differences while retaining biological variation. Submissions are evaluated on the reproducibility and comparability of cross-acquisition bundle-wise microstructure measures, bundle shape features, and connectomics. The key innovations of the QuantConn challenge are that (1) we assess bundles and tractography in the context of harmonization for the first time, (2) we assess connectomics in the context of harmonization for the first time, and (3) we have 10x additional subjects over prior harmonization challenge, MUSHAC and 100x over SuperMUDI. We find that bundle surface area, fractional anisotropy, connectome assortativity, betweenness centrality, edge count, modularity, nodal strength, and participation coefficient measures are most biased by acquisition and that machine learning voxel-wise correction, RISH mapping, and NeSH methods effectively reduce these biases. In addition, microstructure measures AD, MD, RD, bundle length, connectome density, efficiency, and path length are least biased by these acquisition differences., Comment: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2024/019
Published: 2024
Full Text: View/download PDF

9. Disposable Opto-Acoustic Window Enabled Cost-effective Photoacoustic-Ultrasound Dual-modal Imaging

Author: Jiang, Yunhui, Zhang, Fan, Zheng, Yuwei, Sun, Ruixi, and Gao, Fei
Subjects: Physics - Medical Physics
Abstract: Photoacoustic imaging (PAI) and ultrasound imaging (USI) are important biomedical imaging techniques, due to their unique and complementary advantages in tissue's structure and function visualization. In this Letter, we proposed a coaxial photoacoustic-ultrasound dual-modal imaging system (coPAUS) with disposable opto-acoustic window. This opto-acoustic window allows part of light to go through it, and another part of light to be converted to ultrasound transmission signal by photoacoustic effect. By single laser pulse illumination, both PA signals and reflected US signals can be generated. Then, a linear array probe receives both PA and US signals, enabling simultaneous dual-modal PA and US imaging. Ex vivo experiments were conducted involving pencil lead, hair, and plastic tube with black spot, as well as in vivo experiment on human finger. The system's resolutions for PA and US imaging are 215 um and 91.125 um, with signal-to-noise ratios for PA and US signals reached up to 37.48 dB and 29.75 dB, respectively, proving the feasibility of the coPAUS dual-modal imaging. The proposed coPAUS system with disposable opto-acoustic window provides an immediate and cost-effective approach to enable US imaging capability based on an existing PA imaging system., Comment: 9 pages, 6 figures, 1 table
Published: 2024

10. A Novel Deep Learning Tractography Fiber Clustering Framework for Functionally Consistent White Matter Parcellation Using Multimodal Diffusion MRI and Functional MRI

Author: Wang, Jin, Guo, Bocheng, Li, Yijie, Wang, Junyi, Chen, Yuqian, Rushmore, Jarrett, Makris, Nikos, Rathi, Yogesh, O'Donnell, Lauren J, and Zhang, Fan
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Tractography fiber clustering using diffusion MRI (dMRI) is a crucial strategy for white matter (WM) parcellation. Current methods primarily use the geometric information of fibers (i.e., the spatial trajectories) to group similar fibers into clusters, overlooking the important functional signals present along the fiber tracts. There is increasing evidence that neural activity in the WM can be measured using functional MRI (fMRI), offering potentially valuable multimodal information for fiber clustering. In this paper, we develop a novel deep learning fiber clustering framework, namely Deep Multi-view Fiber Clustering (DMVFC), that uses joint dMRI and fMRI data to enable functionally consistent WM parcellation. DMVFC can effectively integrate the geometric characteristics of the WM fibers with the fMRI BOLD signals along the fiber tracts. It includes two major components: 1) a multi-view pretraining module to compute embedding features from fiber geometric information and functional signals separately, and 2) a collaborative fine-tuning module to simultaneously refine the two kinds of embeddings. In the experiments, we compare DMVFC with two state-of-the-art fiber clustering methods and demonstrate superior performance in achieving functionally meaningful and consistent WM parcellation results., Comment: 5 pages, 3 figures
Published: 2024

11. Human-inspired Perspectives: A Survey on AI Long-term Memory

Author: He, Zihong, Lin, Weizhe, Zheng, Hao, Zhang, Fan, Jones, Matt, Aitchison, Laurence, Xu, Xuhai, Liu, Miao, Kristensson, Per Ola, and Shen, Junxiao
Subjects: Computer Science - Artificial Intelligence
Abstract: With the rapid advancement of AI systems, their abilities to store, retrieve, and utilize information over the long term - referred to as long-term memory - have become increasingly significant. These capabilities are crucial for enhancing the performance of AI systems across a wide range of tasks. However, there is currently no comprehensive survey that systematically investigates AI's long-term memory capabilities, formulates a theoretical framework, and inspires the development of next-generation AI long-term memory systems. This paper begins by systematically introducing the mechanisms of human long-term memory, then explores AI long-term memory mechanisms, establishing a mapping between the two. Based on the mapping relationships identified, we extend the current cognitive architectures and propose the Cognitive Architecture of Self-Adaptive Long-term Memory (SALM). SALM provides a theoretical framework for the practice of AI long-term memory and holds potential for guiding the creation of next-generation long-term memory driven AI systems. Finally, we delve into the future directions and application prospects of AI long-term memory.
Published: 2024

12. AdaptiveISP: Learning an Adaptive Image Signal Processor for Object Detection

Author: Wang, Yujin, Xu, Tianyi, Zhang, Fan, Xue, Tianfan, and Gu, Jinwei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Image Signal Processors (ISPs) convert raw sensor signals into digital images, which significantly influence the image quality and the performance of downstream computer vision tasks. Designing ISP pipeline and tuning ISP parameters are two key steps for building an imaging and vision system. To find optimal ISP configurations, recent works use deep neural networks as a proxy to search for ISP parameters or ISP pipelines. However, these methods are primarily designed to maximize the image quality, which are sub-optimal in the performance of high-level computer vision tasks such as detection, recognition, and tracking. Moreover, after training, the learned ISP pipelines are mostly fixed at the inference time, whose performance degrades in dynamic scenes. To jointly optimize ISP structures and parameters, we propose AdaptiveISP, a task-driven and scene-adaptive ISP. One key observation is that for the majority of input images, only a few processing modules are needed to improve the performance of downstream recognition tasks, and only a few inputs require more processing. Based on this, AdaptiveISP utilizes deep reinforcement learning to automatically generate an optimal ISP pipeline and the associated ISP parameters to maximize the detection performance. Experimental results show that AdaptiveISP not only surpasses the prior state-of-the-art methods for object detection but also dynamically manages the trade-off between detection performance and computational cost, especially suitable for scenes with large dynamic range variations. Project website: https://openimaginglab.github.io/AdaptiveISP/., Comment: Accepted at NeurIPS2024
Published: 2024

13. TractShapeNet: Efficient Multi-Shape Learning with 3D Tractography Point Clouds

Author: Lo, Yui, Chen, Yuqian, Liu, Dongnan, Legarreta, Jon Haitz, Zekelman, Leo, Zhang, Fan, Rushmore, Jarrett, Rathi, Yogesh, Makris, Nikos, Golby, Alexandra J., Cai, Weidong, and O'Donnell, Lauren J.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Brain imaging studies have demonstrated that diffusion MRI tractography geometric shape descriptors can inform the study of the brain's white matter pathways and their relationship to brain function. In this work, we investigate the possibility of utilizing a deep learning model to compute shape measures of the brain's white matter connections. We introduce a novel framework, TractShapeNet, that leverages a point cloud representation of tractography to compute five shape measures: length, span, volume, total surface area, and irregularity. We assess the performance of the method on a large dataset including 1065 healthy young adults. Experiments for shape measure computation demonstrate that our proposed TractShapeNet outperforms other point cloud-based neural network models in both the Pearson correlation coefficient and normalized error metrics. We compare the inference runtime results with the conventional shape computation tool DSI-Studio. Our results demonstrate that a deep learning approach enables faster and more efficient shape measure computation. We also conduct experiments on two downstream language cognition prediction tasks, showing that shape measures from TractShapeNet perform similarly to those computed by DSI-Studio. Our code will be available at: https://github.com/SlicerDMRI/TractShapeNet., Comment: 10 pages, 2 figures, 4 tables. This work has been submitted to the IEEE for possible publication
Published: 2024

14. RediSwap: MEV Redistribution Mechanism for CFMMs

Author: Zhang, Mengqian, Yang, Sen, and Zhang, Fan
Subjects: Computer Science - Computer Science and Game Theory, Computer Science - Cryptography and Security
Abstract: Automated Market Makers (AMMs) are essential to decentralized finance, offering continuous liquidity and enabling intermediary-free trading on blockchains. However, participants in AMMs are vulnerable to Maximal Extractable Value (MEV) exploitation. Users face threats such as front-running, back-running, and sandwich attacks, while liquidity providers (LPs) incur the loss-versus-rebalancing (LVR). In this paper, we introduce RediSwap, a novel AMM designed to capture MEV at the application level and refund it fairly among users and liquidity providers. At its core, RediSwap features an MEV-redistribution mechanism that manages arbitrage opportunities within the AMM pool. We formalize the mechanism design problem and the desired game-theoretical properties. A central insight underpinning our mechanism is the interpretation of the maximal MEV value as the sum of LVR and individual user losses. We prove that our mechanism is incentive-compatible and Sybil-proof, and demonstrate that it is easy for arbitrageurs to participate. We empirically compared RediSwap with existing solutions by replaying historical AMM trades. Our results suggest that RediSwap can achieve better execution than UniswapX in 89% of trades and reduce LPs' loss to under 0.5% of the original LVR in most cases.
Published: 2024

15. Neural Predictor for Flight Control with Payload

Author: Jin, Ao, Li, Chenhao, Wang, Qinyi, Liu, Ya, Huang, Panfeng, and Zhang, Fan
Subjects: Computer Science - Robotics, Electrical Engineering and Systems Science - Systems and Control
Abstract: Aerial robotics for transporting suspended payloads as the form of freely-floating manipulator are growing great interest in recent years. However, the prior information of the payload, such as the mass, is always hard to obtain accurately in practice. The force/torque caused by payload and residual dynamics will introduce unmodeled perturbations to the system, which negatively affects the closed-loop performance. Different from estimation-like methods, this paper proposes Neural Predictor, a learning-based approach to model force/torque caused by payload and residual dynamics as a dynamical system. It results a hybrid model including both the first-principles dynamics and the learned dynamics. This hybrid model is then integrated into a MPC framework to improve closed-loop performance. Effectiveness of proposed framework is verified extensively in both numerical simulations and real-world flight experiments. The results indicate that our approach can capture force/torque caused by payload and residual dynamics accurately, respond quickly to the changes of them and improve the closed-loop performance significantly. In particular, Neural Predictor outperforms a state-of-the-art learning-based estimator and has reduced the force and torque estimation errors by up to 66.15% and 33.33% while using less samples., Comment: 8 pages
Published: 2024

16. The shape of the brain's connections is predictive of cognitive performance: an explainable machine learning study

Author: Lo, Yui, Chen, Yuqian, Liu, Dongnan, Liu, Wan, Zekelman, Leo, Rushmore, Jarrett, Zhang, Fan, Rathi, Yogesh, Makris, Nikos, Golby, Alexandra J., Cai, Weidong, and O'Donnell, Lauren J.
Subjects: Quantitative Biology - Neurons and Cognition, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: The shape of the brain's white matter connections is relatively unexplored in diffusion MRI tractography analysis. While it is known that tract shape varies in populations and across the human lifespan, it is unknown if the variability in dMRI tractography-derived shape may relate to the brain's functional variability across individuals. This work explores the potential of leveraging tractography fiber cluster shape measures to predict subject-specific cognitive performance. We implement machine learning models to predict individual cognitive performance scores. We study a large-scale database from the HCP-YA study. We apply an atlas-based fiber cluster parcellation to the dMRI tractography of each individual. We compute 15 shape, microstructure, and connectivity features for each fiber cluster. Using these features as input, we train a total of 210 models to predict 7 different NIH Toolbox cognitive performance assessments. We apply an explainable AI technique, SHAP, to assess the importance of each fiber cluster for prediction. Our results demonstrate that shape measures are predictive of individual cognitive performance. The studied shape measures, such as irregularity, diameter, total surface area, volume, and branch volume, are as effective for prediction as microstructure and connectivity measures. The overall best-performing feature is a shape feature, irregularity, which describes how different a cluster's shape is from an idealized cylinder. Further interpretation using SHAP values suggest that fiber clusters with features highly predictive of cognitive ability are widespread throughout the brain, including fiber clusters from the superficial association, deep association, cerebellar, striatal, and projection pathways. This study demonstrates the strong potential of shape descriptors to enhance the study of the brain's white matter and its relationship to cognitive function.
Published: 2024

17. Resolution Enhancement of Under-sampled Photoacoustic Microscopy Images using Implicit Neural Representations

Author: Xiao, Youshen, Liao, Sheng, Tian, Xuanyang, Zhang, Fan, Dong, Xinlong, Jiang, Yunhui, Chen, Xiyu, Sun, Ruixi, Zhang, Yuyao, and Gao, Fei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Acoustic-Resolution Photoacoustic Microscopy (AR-PAM) is promising for subcutaneous vascular imaging, but its spatial resolution is constrained by the Point Spread Function (PSF). Traditional deconvolution methods like Richardson-Lucy and model-based deconvolution use the PSF to improve resolution. However, accurately measuring the PSF is difficult, leading to reliance on less accurate blind deconvolution techniques. Additionally, AR-PAM suffers from long scanning times, which can be reduced via down-sampling, but this necessitates effective image recovery from under-sampled data, a task where traditional interpolation methods fall short, particularly at high under-sampling rates. To address these challenges, we propose an approach based on Implicit Neural Representations (INR). This method learns a continuous mapping from spatial coordinates to initial acoustic pressure, overcoming the limitations of discrete imaging and enhancing AR-PAM's resolution. By treating the PSF as a learnable parameter within the INR framework, our technique mitigates inaccuracies associated with PSF estimation. We evaluated our method on simulated vascular data, showing significant improvements in Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) over conventional methods. Qualitative enhancements were also observed in leaf vein and in vivo mouse brain microvasculature images. When applied to a custom AR-PAM system, experiments with pencil lead demonstrated that our method delivers sharper, higher-resolution results, indicating its potential to advance photoacoustic microscopy.
Published: 2024

18. Few Exemplar-Based General Medical Image Segmentation via Domain-Aware Selective Adaptation

Author: Xu, Chen, Huang, Qiming, Hou, Yuqi, Wu, Jiangxing, Zhang, Fan, Chang, Hyung Jin, and Jiao, Jianbo
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Medical image segmentation poses challenges due to domain gaps, data modality variations, and dependency on domain knowledge or experts, especially for low- and middle-income countries (LMICs). Whereas for humans, given a few exemplars (with corresponding labels), we are able to segment different medical images even without exten-sive domain-specific clinical training. In addition, current SAM-based medical segmentation models use fine-grained visual prompts, such as the bounding rectangle generated from manually annotated target segmentation mask, as the bounding box (bbox) prompt during the testing phase. However, in actual clinical scenarios, no such precise prior knowledge is available. Our experimental results also reveal that previous models nearly fail to predict when given coarser bbox prompts. Considering these issues, in this paper, we introduce a domain-aware selective adaptation approach to adapt the general knowledge learned from a large model trained with natural images to the corresponding medical domains/modalities, with access to only a few (e.g. less than 5) exemplars. Our method mitigates the aforementioned limitations, providing an efficient and LMICs-friendly solution. Extensive experimental analysis showcases the effectiveness of our approach, offering potential advancements in healthcare diagnostics and clinical applications in LMICs., Comment: Accepcted in ACCV 2024
Published: 2024

19. iFANnpp: Nuclear Power Plant Digital Twin for Robots and Autonomous Intelligence

Author: Do, Youndo, Zebrowitz, Marc, Stahl, Jackson, and Zhang, Fan
Subjects: Computer Science - Robotics
Abstract: Robotics has gained significant attention due to its autonomy and ability to automate in the nuclear industry. However, the increasing complexity of robots has led to a growing demand for advanced simulation and control methods to predict robot behavior and optimize plant performance. Most existing digital twins only address parts of systems and do not offer an overall design of nuclear power plants. Furthermore, they are often designed for specific algorithms or tasks, making them unsuitable for broader research applications or other potential projects. In response, we propose a comprehensive nuclear power plant designed to enhance real-time monitoring, operational efficiency, and predictive maintenance. We selected to model a full-scope nuclear power plant in Unreal Engine 5 to incorporate the complexities and various phenomena. The high-resolution simulation environment is integrated with a General Pressurized Water Reactor Simulator, a high-fidelity physics-driven software, to create a realistic flow of nuclear power plant and a real-time updating virtual environment. Furthermore, the virtual environment provides various features and a Python bridge for researchers to test custom algorithms and frameworks easily. The digital twin's performance is presented, and several research ideas - such as multi-robot task scheduling and robot navigation in the radiation area - using implemented features are presented., Comment: 12 pages, 9 figures
Published: 2024

20. UW-GS: Distractor-Aware 3D Gaussian Splatting for Enhanced Underwater Scene Reconstruction

Author: Wang, Haoran, Anantrasirichai, Nantheera, Zhang, Fan, and Bull, David
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: 3D Gaussian splatting (3DGS) offers the capability to achieve real-time high quality 3D scene rendering. However, 3DGS assumes that the scene is in a clear medium environment and struggles to generate satisfactory representations in underwater scenes, where light absorption and scattering are prevalent and moving objects are involved. To overcome these, we introduce a novel Gaussian Splatting-based method, UW-GS, designed specifically for underwater applications. It introduces a color appearance that models distance-dependent color variation, employs a new physics-based density control strategy to enhance clarity for distant objects, and uses a binary motion mask to handle dynamic content. Optimized with a well-designed loss function supporting for scattering media and strengthened by pseudo-depth maps, UW-GS outperforms existing methods with PSNR gains up to 1.26dB. To fully verify the effectiveness of the model, we also developed a new underwater dataset, S-UW, with dynamic object masks.
Published: 2024

21. The effects of mosaicism on biological and clinical markers of Alzheimer's disease in adults with Down syndrome

Author: Xicota, Laura, Dang, Lam-Ha T, Lee, Alice, Krinsky-McHale, Sharon, Pang, Deborah, Melilli, Lisa, O'Bryant, Sid, Henson, Rachel L, Laymon, Charles, Lai, Florence, Rosas, H Diana, Ances, Beau, Lott, Ira, Hom, Christy, Christian, Bradley, Hartley, Sigan, Zaman, Shahid, Head, Elizabeth, Mapstone, Mark, Jin, Zhezhen, Silverman, Wayne, Schupf, Nicole, Handen, Benjamin, Lee, Joseph H, Syndrome, Alzheimer's Biomarker Consortium–Down, Aizenstein, Howard J, Ances, Beau M, Andrews, Howard F, Bell, Karen, Birn, Rasmus, Brickman, Adam M, Bulova, Peter, Cheema, Amrita, Chen, Kewei, Christian, Bradley T, Clare, Isabel, Clark, Lorraine, Cohen, Ann D, Constantino, John N, Doran, Eric W, Fagan, Anne, Feingold, Eleanor, Foroud, Tatiana M, Handen, Benjamin L, Harp, Jordan, Hartley, Sigan L, Henson, Rachel, Honig, Lawrence, Ikonomovic, Milos D, Johnson, Sterling C, Jordan, Courtney, Kamboh, M Ilyas, Keator, David, Klunk, William E, Kofler, Julia K, Kreisl, William Charles, Krinsky-McHale, Sharon J, Lao, Patrick, Lott, Ira T, Lupson, Victoria, Mathis, Chester A, Minhas, Davneet Singh, Nadkarni, Neelesh, O’Bryant, Sid, Parisi, Melisa, Pettersen, Melissa, Price, Julie C, Pulsifer, Margaret, Rafii, Michael S, Reiman, Eric, Rizvi, Batool, Ryan, Laurie, Schmitt, Frederick, Silverman, Wayne P, Tudorascu, Dana L, Tumuluru, Rameshwari, Tycko, Benjamin, Varadarajan, Badri, White, Desiree A, Yassa, Michael A, and Zhang, Fan
Subjects: Epidemiology, Health Sciences, Alzheimer's Disease, Clinical Research, Down Syndrome, Brain Disorders, Intellectual and Developmental Disabilities (IDD), Acquired Cognitive Impairment, Aging, Neurosciences, Dementia, Prevention, Alzheimer's Disease including Alzheimer's Disease Related Dementias (AD/ADRD), Neurodegenerative, 2.1 Biological and endogenous factors, 4.2 Evaluation of markers and technologies, Neurological, Congenital, Down syndrome, Mosaicism, Alzheimer's disease, Plasma biomarkers, CSF, PET, Alzheimer's Biomarker Consortium – Down Syndrome, mosaicism, Alzheimer&#x27, s disease, plasma biomarkers, Clinical Sciences, Public Health and Health Services, Clinical sciences
Abstract: BackgroundIndividuals with Down syndrome (DS) are at high risk of early-onset Alzheimer's disease (AD); yet, some 20 percent do not develop any signs of dementia until after 65 years or in their lifetime. Mosaicism could contribute to this phenotypic variation, where some disomic cells could lead to lower levels of gene products from chromosome 21.MethodsWe examined longitudinal neuropsychological and biomarker data from two large studies of DS: the Alzheimer Biomarker Consortium-Down syndrome study (ABC-DS) (n = 357); and a legacy study (n = 468). We assessed mosaicism using karyotyping or GWAS data. Participants had data on plasma AD biomarkers (Aβ40, Aβ42, tau, and NfL) and longitudinal cognitive measures. A subset had cerebrospinal fluid biomarkers (Aβ40, Aβ42, tau, ptau181, and NfL) and amyloid and tau PET data.FindingsFor both cohorts, the prevalence of mosaicism was
Published: 2024

22. A longitudinal single-cell atlas of anti-tumour necrosis factor treatment in inflammatory bowel disease.

Author: Thomas, Tom, Friedrich, Matthias, Rich-Griffin, Charlotte, Pohin, Mathilde, Agarwal, Devika, Pakpoor, Julia, Lee, Carl, Tandon, Ruchi, Rendek, Aniko, Aschenbrenner, Dominik, Jainarayanan, Ashwin, Voda, Alexandru, Siu, Jacqueline, Sanches-Peres, Raphael, Nee, Eloise, Sathananthan, Dharshan, Kotliar, Dylan, Todd, Peter, Kiourlappou, Maria, Gartner, Lisa, Ilott, Nicholas, Issa, Fadi, Hester, Joanna, Turner, Jason, Nayar, Saba, Mackerodt, Jonas, Zhang, Fan, Jonsson, Anna, Brenner, Michael, Raychaudhuri, Soumya, Kulicke, Ruth, Ramsdell, Danielle, Stransky, Nicolas, Pagliarini, Ray, Bielecki, Piotr, Spies, Noah, Marsden, Brian, Taylor, Stephen, Wagner, Allon, Klenerman, Paul, Walsh, Alissa, Coles, Mark, Jostins-Dean, Luke, Powrie, Fiona, Filer, Andrew, Travis, Simon, Uhlig, Holm, Dendrou, Calliope, and Buckley, Christopher
Subjects: Humans, Single-Cell Analysis, Adalimumab, Inflammatory Bowel Diseases, Crohn Disease, Longitudinal Studies, Colitis, Ulcerative, Tumor Necrosis Factor-alpha, Transcriptome, Female, Adult, Male, Interferons, Signal Transduction, Arthritis, Rheumatoid
Abstract: Precision medicine in immune-mediated inflammatory diseases (IMIDs) requires a cellular understanding of treatment response. We describe a therapeutic atlas for Crohns disease (CD) and ulcerative colitis (UC) following adalimumab, an anti-tumour necrosis factor (anti-TNF) treatment. We generated ~1 million single-cell transcriptomes, organised into 109 cell states, from 216 gut biopsies (41 subjects), revealing disease-specific differences. A systems biology-spatial analysis identified granuloma signatures in CD and interferon (IFN)-response signatures localising to T cell aggregates and epithelial damage in CD and UC. Pretreatment differences in epithelial and myeloid compartments were associated with remission outcomes in both diseases. Longitudinal comparisons demonstrated disease progression in nonremission: myeloid and T cell perturbations in CD and increased multi-cellular IFN signalling in UC. IFN signalling was also observed in rheumatoid arthritis (RA) synovium with a lymphoid pathotype. Our therapeutic atlas represents the largest cellular census of perturbation with the most common biologic treatment, anti-TNF, across multiple inflammatory diseases.
Published: 2024

23. Emu3: Next-Token Prediction is All You Need

Author: Wang, Xinlong, Zhang, Xiaosong, Luo, Zhengxiong, Sun, Quan, Cui, Yufeng, Wang, Jinsheng, Zhang, Fan, Wang, Yueze, Li, Zhen, Yu, Qiying, Zhao, Yingli, Ao, Yulong, Min, Xuebin, Li, Tao, Wu, Boya, Zhao, Bo, Zhang, Bowen, Wang, Liangdong, Liu, Guang, He, Zheqi, Yang, Xi, Liu, Jingjing, Lin, Yonghua, Huang, Tiejun, and Wang, Zhongyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: While next-token prediction is considered a promising path towards artificial general intelligence, it has struggled to excel in multimodal tasks, which are still dominated by diffusion models (e.g., Stable Diffusion) and compositional approaches (e.g., CLIP combined with LLMs). In this paper, we introduce Emu3, a new suite of state-of-the-art multimodal models trained solely with next-token prediction. By tokenizing images, text, and videos into a discrete space, we train a single transformer from scratch on a mixture of multimodal sequences. Emu3 outperforms several well-established task-specific models in both generation and perception tasks, surpassing flagship models such as SDXL and LLaVA-1.6, while eliminating the need for diffusion or compositional architectures. Emu3 is also capable of generating high-fidelity video via predicting the next token in a video sequence. We simplify complex multimodal model designs by converging on a singular focus: tokens, unlocking great potential for scaling both during training and inference. Our results demonstrate that next-token prediction is a promising path towards building general multimodal intelligence beyond language. We open-source key techniques and models to support further research in this direction., Comment: Project Page: https://emu.baai.ac.cn
Published: 2024

24. DualDn: Dual-domain Denoising via Differentiable ISP

Author: Li, Ruikang, Wang, Yujin, Chen, Shiqi, Zhang, Fan, Gu, Jinwei, and Xue, Tianfan
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Image denoising is a critical component in a camera's Image Signal Processing (ISP) pipeline. There are two typical ways to inject a denoiser into the ISP pipeline: applying a denoiser directly to captured raw frames (raw domain) or to the ISP's output sRGB images (sRGB domain). However, both approaches have their limitations. Residual noise from raw-domain denoising can be amplified by the subsequent ISP processing, and the sRGB domain struggles to handle spatially varying noise since it only sees noise distorted by the ISP. Consequently, most raw or sRGB domain denoising works only for specific noise distributions and ISP configurations. To address these challenges, we propose DualDn, a novel learning-based dual-domain denoising. Unlike previous single-domain denoising, DualDn consists of two denoising networks: one in the raw domain and one in the sRGB domain. The raw domain denoising adapts to sensor-specific noise as well as spatially varying noise levels, while the sRGB domain denoising adapts to ISP variations and removes residual noise amplified by the ISP. Both denoising networks are connected with a differentiable ISP, which is trained end-to-end and discarded during the inference stage. With this design, DualDn achieves greater generalizability compared to most learning-based denoising methods, as it can adapt to different unseen noises, ISP parameters, and even novel ISP pipelines. Experiments show that DualDn achieves state-of-the-art performance and can adapt to different denoising architectures. Moreover, DualDn can be used as a plug-and-play denoising module with real cameras without retraining, and still demonstrate better performance than commercial on-camera denoising. The project website is available at: https://openimaginglab.github.io/DualDn/, Comment: Accepted at ECCV 2024, Project page: https://openimaginglab.github.io/DualDn/
Published: 2024

25. Cloud Adversarial Example Generation for Remote Sensing Image Classification

Author: Ma, Fei, Feng, Yuqiang, Zhang, Fan, and Zhou, Yongsheng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Most existing adversarial attack methods for remote sensing images merely add adversarial perturbations or patches, resulting in unnatural modifications. Clouds are common atmospheric effects in remote sensing images. Generating clouds on these images can produce adversarial examples better aligning with human perception. In this paper, we propose a Perlin noise based cloud generation attack method. Common Perlin noise based cloud generation is a random, non-optimizable process, which cannot be directly used to attack the target models. We design a Perlin Gradient Generator Network (PGGN), which takes a gradient parameter vector as input and outputs the grids of Perlin noise gradient vectors at different scales. After a series of computations based on the gradient vectors, cloud masks at corresponding scales can be produced. These cloud masks are then weighted and summed depending on a mixing coefficient vector and a scaling factor to produce the final cloud masks. The gradient vector, coefficient vector and scaling factor are collectively represented as a cloud parameter vector, transforming the cloud generation into a black-box optimization problem. The Differential Evolution (DE) algorithm is employed to solve for the optimal solution of the cloud parameter vector, achieving a query-based black-box attack. Detailed experiments confirm that this method has strong attack capabilities and achieves high query efficiency. Additionally, we analyze the transferability of the generated adversarial examples and their robustness in adversarial defense scenarios.
Published: 2024

26. NVRC: Neural Video Representation Compression

Author: Kwan, Ho Man, Gao, Ge, Zhang, Fan, Gower, Andrew, and Bull, David
Subjects: Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Recent advances in implicit neural representation (INR)-based video coding have demonstrated its potential to compete with both conventional and other learning-based approaches. With INR methods, a neural network is trained to overfit a video sequence, with its parameters compressed to obtain a compact representation of the video content. However, although promising results have been achieved, the best INR-based methods are still out-performed by the latest standard codecs, such as VVC VTM, partially due to the simple model compression techniques employed. In this paper, rather than focusing on representation architectures as in many existing works, we propose a novel INR-based video compression framework, Neural Video Representation Compression (NVRC), targeting compression of the representation. Based on the novel entropy coding and quantization models proposed, NVRC, for the first time, is able to optimize an INR-based video codec in a fully end-to-end manner. To further minimize the additional bitrate overhead introduced by the entropy models, we have also proposed a new model compression framework for coding all the network, quantization and entropy model parameters hierarchically. Our experiments show that NVRC outperforms many conventional and learning-based benchmark codecs, with a 24% average coding gain over VVC VTM (Random Access) on the UVG dataset, measured in PSNR. As far as we are aware, this is the first time an INR-based video codec achieving such performance. The implementation of NVRC will be released at www.github.com.
Published: 2024

27. EVENet: Evidence-based Ensemble Learning for Uncertainty-aware Brain Parcellation Using Diffusion MRI

Author: Li, Chenjun, Yang, Dian, Yao, Shun, Wang, Shuyue, Wu, Ye, Zhang, Le, Li, Qiannuo, Cho, Kang Ik Kevin, Seitz-Holland, Johanna, Ning, Lipeng, Legarreta, Jon Haitz, Rathi, Yogesh, Westin, Carl-Fredrik, O'Donnell, Lauren J., Sochen, Nir A., Pasternak, Ofer, and Zhang, Fan
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: In this study, we developed an Evidence-based Ensemble Neural Network, namely EVENet, for anatomical brain parcellation using diffusion MRI. The key innovation of EVENet is the design of an evidential deep learning framework to quantify predictive uncertainty at each voxel during a single inference. Using EVENet, we obtained accurate parcellation and uncertainty estimates across different datasets from healthy and clinical populations and with different imaging acquisitions. The overall network includes five parallel subnetworks, where each is dedicated to learning the FreeSurfer parcellation for a certain diffusion MRI parameter. An evidence-based ensemble methodology is then proposed to fuse the individual outputs. We perform experimental evaluations on large-scale datasets from multiple imaging sources, including high-quality diffusion MRI data from healthy adults and clinically diffusion MRI data from participants with various brain diseases (schizophrenia, bipolar disorder, attention-deficit/hyperactivity disorder, Parkinson's disease, cerebral small vessel disease, and neurosurgical patients with brain tumors). Compared to several state-of-the-art methods, our experimental results demonstrate highly improved parcellation accuracy across the multiple testing datasets despite the differences in dMRI acquisition protocols and health conditions. Furthermore, thanks to the uncertainty estimation, our EVENet approach demonstrates a good ability to detect abnormal brain regions in patients with lesions, enhancing the interpretability and reliability of the segmentation results., Comment: 15 pages, 5 figures
Published: 2024

28. Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery

Author: Zhang, Fan, Li, Lingling, Jiao, Licheng, Liu, Xu, Liu, Fang, Yang, Shuyuan, and Hou, Biao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Satellite imagery, due to its long-range imaging, brings with it a variety of scale-preferred tasks, such as the detection of tiny/small objects, making the precise localization and detection of small objects of interest a challenging task. In this article, we design a Knowledge Discovery Network (KDN) to implement the renormalization group theory in terms of efficient feature extraction. Renormalized connection (RC) on the KDN enables ``synergistic focusing'' of multi-scale features. Based on our observations of KDN, we abstract a class of RCs with different connection strengths, called n21C, and generalize it to FPN-based multi-branch detectors. In a series of FPN experiments on the scale-preferred tasks, we found that the ``divide-and-conquer'' idea of FPN severely hampers the detector's learning in the right direction due to the large number of large-scale negative samples and interference from background noise. Moreover, these negative samples cannot be eliminated by the focal loss function. The RCs extends the multi-level feature's ``divide-and-conquer'' mechanism of the FPN-based detectors to a wide range of scale-preferred tasks, and enables synergistic effects of multi-level features on the specific learning goal. In addition, interference activations in two aspects are greatly reduced and the detector learns in a more correct direction. Extensive experiments of 17 well-designed detection architectures embedded with n21s on five different levels of scale-preferred tasks validate the effectiveness and efficiency of the RCs. Especially the simplest linear form of RC, E421C performs well in all tasks and it satisfies the scaling property of RGT. We hope that our approach will transfer a large number of well-designed detectors from the computer vision community to the remote sensing community., Comment: 24 pages, 14 figures Journal
Published: 2024
Full Text: View/download PDF

29. Transmit Beamforming Design for ISAC with Stacked Intelligent Metasurfaces

Author: Li, Shunyu, Zhang, Fan, Mao, Tianqi, Na, Rui, Wang, Zhaocheng, and Karagiannidis, George K.
Subjects: Electrical Engineering and Systems Science - Signal Processing
Abstract: This paper proposes a transmit beamforming strategy for the integrated sensing and communication (ISAC) systems enabled by the novel stacked intelligent metasurface (SIM) architecture, where the base station (BS) simultaneously performs downlink communication and radar target detection via different beams. To ensure superior dual-function performance simultaneously, we design the multi-layer cascading beamformer by maximizing the sum rate of the users while optimally shaping the normalized beam pattern for detection. A dual-normalized differential gradient descent (D3) algorithm is further proposed to solve the resulting non-convex multi-objective problem (MOP), where gradient differences and dual normalization are employed to ensure a fair trade-off between communication and sensing objectives. Numerical results demonstrate the superiority of the proposed beamforming design in terms of balancing communication and sensing performance.
Published: 2024

30. Affordance-based Robot Manipulation with Flow Matching

Author: Zhang, Fan and Gienger, Michael
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: We present a framework for assistive robot manipulation, which focuses on two fundamental challenges: first, efficiently adapting large-scale models to downstream scene affordance understanding tasks, especially in daily living scenarios where gathering multi-task data involving humans requires strenuous effort; second, effectively learning robot trajectories by grounding the visual affordance model. We tackle the first challenge by employing a parameter-efficient prompt tuning method that prepends learnable text prompts to the frozen vision model to predict manipulation affordances in multi-task scenarios. Then we propose to learn robot trajectories guided by affordances in a supervised Flow Matching method. Flow matching represents a robot visuomotor policy as a conditional process of flowing random waypoints to desired robot trajectories. Finally, we introduce a real-world dataset with 10 tasks across Activities of Daily Living to test our framework. Our extensive evaluation highlights that the proposed prompt tuning method for learning manipulation affordance with language prompter achieves competitive performance and even outperforms other finetuning protocols across data scales, while satisfying parameter efficiency. Learning multi-task robot trajectories with flow matching policy also leads to consistently better generalization performance and faster inference than alternative behavior cloning methods, especially given multimodal robot action distributions. Our framework seamlessly unifies affordance model learning and trajectory generation with flow matching for robot manipulation.
Published: 2024

31. PNVC: Towards Practical INR-based Video Compression

Author: Gao, Ge, Kwan, Ho Man, Zhang, Fan, and Bull, David
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Neural video compression has recently demonstrated significant potential to compete with conventional video codecs in terms of rate-quality performance. These learned video codecs are however associated with various issues related to decoding complexity (for autoencoder-based methods) and/or system delays (for implicit neural representation (INR) based models), which currently prevent them from being deployed in practical applications. In this paper, targeting a practical neural video codec, we propose a novel INR-based coding framework, PNVC, which innovatively combines autoencoder-based and overfitted solutions. Our approach benefits from several design innovations, including a new structural reparameterization-based architecture, hierarchical quality control, modulation-based entropy modeling, and scale-aware positional embedding. Supporting both low delay (LD) and random access (RA) configurations, PNVC outperforms existing INR-based codecs, achieving nearly 35%+ BD-rate savings against HEVC HM 18.0 (LD) - almost 10% more compared to one of the state-of-the-art INR-based codecs, HiNeRV and 5% more over VTM 20.0 (LD), while maintaining 20+ FPS decoding speeds for 1080p content. This represents an important step forward for INR-based video coding, moving it towards practical deployment. The source code will be available for public evaluation.
Published: 2024

32. Signatures of sliding Wigner crystals in bilayer graphene at zero and finite magnetic fields

Author: Seiler, Anna M., Statz, Martin, Eckel, Christian, Weimer, Isabell, Pöhls, Jonas, Watanabe, Kenji, Taniguchi, Takashi, Zhang, Fan, and Weitz, R. Thomas
Subjects: Condensed Matter - Mesoscale and Nanoscale Physics, Condensed Matter - Strongly Correlated Electrons
Abstract: AB-stacked bilayer graphene has emerged as a fascinating yet simple platform for exploring macroscopic quantum phenomena of correlated electrons. Unexpectedly, a phase with negative dR/dT has recently been observed when a large electric displacement field is applied and the charge carrier density is tuned to the vicinity of an ultra-low-density van Hove singularity. This phase exhibits features consistent with Wigner crystallization, including a characteristic temperature dependence and non-linear current bias behavior. However, more direct evidence for the emergence of an electron crystal in AB-stacked bilayer graphene at zero magnetic field remains elusive. Here we explore the low-frequency noise consistent with depinning and sliding of a Wigner crystal lattice. The current bias and frequency dependence of these noise spectra align well with findings from previous experimental and theoretical studies on the quantum electron solids. Our results offer transport signatures consistent with Wigner crystallization in AB-stacked bilayer graphene at zero and finite magnetic fields, paving the way for further substantiating an anomalous Hall crystal in its original form.
Published: 2024

33. When Diffusion MRI Meets Diffusion Model: A Novel Deep Generative Model for Diffusion MRI Generation

Author: Zhu, Xi, Zhang, Wei, Li, Yijie, O'Donnell, Lauren J., and Zhang, Fan
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Diffusion MRI (dMRI) is an advanced imaging technique characterizing tissue microstructure and white matter structural connectivity of the human brain. The demand for high-quality dMRI data is growing, driven by the need for better resolution and improved tissue contrast. However, acquiring high-quality dMRI data is expensive and time-consuming. In this context, deep generative modeling emerges as a promising solution to enhance image quality while minimizing acquisition costs and scanning time. In this study, we propose a novel generative approach to perform dMRI generation using deep diffusion models. It can generate high dimension (4D) and high resolution data preserving the gradients information and brain structure. We demonstrated our method through an image mapping task aimed at enhancing the quality of dMRI images from 3T to 7T. Our approach demonstrates highly enhanced performance in generating dMRI images when compared to the current state-of-the-art (SOTA) methods. This achievement underscores a substantial progression in enhancing dMRI quality, highlighting the potential of our novel generative approach to revolutionize dMRI imaging standards., Comment: 11 pages, 3 figures
Published: 2024

34. HDN:Hybrid Deep-learning and Non-line-of-sight Reconstruction Framework for Photoacoustic Brain Imaging

Author: Wan, Pengcheng, Zhang, Fan, Shen, Yuting, Shang, Xin, Zhao, Hulin, Liu, Shuangli, Feng, Xiaohua, and Gao, Fei
Subjects: Physics - Medical Physics, Electrical Engineering and Systems Science - Image and Video Processing, Physics - Optics
Abstract: Photoacoustic imaging (PAI) combines the high contrast of optical imaging with the deep penetration depth of ultrasonic imaging, showing great potential in cerebrovascular disease detection. However, the ultrasonic wave suffers strong attenuation and multi-scattering when it passes through the skull tissue, resulting in the distortion of the collected photoacoustic (PA) signal. In this paper, inspired by the principles of deep learning and non-line-of-sight (NLOS) imaging, we propose an image reconstruction framework named HDN (Hybrid Deep-learning and Non-line-of-sight), which consists of the signal extraction part and difference utilization part. The signal extraction part is used to correct the distorted signal and reconstruct an initial image. The difference utilization part is used to make further use of the signal difference between the distorted signal and corrected signal, reconstructing the residual image between the initial image and the target image. The test results on a PA digital brain simulation dataset show that compared with the traditional delay-and-sum (DAS) method and deep-learning-based method, HDN achieved superior performance in both signal correction and image reconstruction. Specifically for the SSIM index, the HDN reached 0.606 in imaging results, compared to 0.154 for the DAS method and 0.307 for the deep-learning-based method., Comment: 8 pages, 8figures
Published: 2024

35. A dynamical systems perspective on the celestial mechanical contribution to the emergence of life

Author: Zhang, Fan
Subjects: Nonlinear Sciences - Chaotic Dynamics
Abstract: Biological activities are often seen entrained onto the day-night and other celestial mechanical cycles (e.g., seasonal and lunar), but studies on the origin of life have largely not accounted for such periodic external environmental variations. We argue that this may be an important omission, because the signature replication behaviour of life represents temporal memory in the dynamics of ecosystems, that signifies the absence of mixing properties (i.e., the dynamics are not fully chaotic), and entrainment onto regular, periodic external perturbative influences has been proven capable of suppressing chaos, and thus may bring otherwise unstable chemical reaction sets into viability, as precursors to abiogenesis. As well, external perturbations may be necessary to prevent an open dissipative (bio)chemical system from collapsing into the opposite extreme -- the point attractor of thermal equilibrium. In short, life may precariously rest on the edge of chaos, and open-loop periodic perturbation rooted in celestial mechanics (and should be simulated in laboratory experiments in origin-of-life studies) may help with the balancing. Such considerations, if pertinent, would also be consequential to exobiology, e.g., in regard to tidal-locking properties of potential host worlds., Comment: 6 pages
Published: 2024

36. Diverse Impacts of Spin-Orbit Coupling on Superconductivity in Rhombohedral Graphene

Author: Yang, Jixiang, Shi, Xiaoyan, Ye, Shenyong, Yoon, Chiho, Lu, Zhengguang, Kakani, Vivek, Han, Tonghang, Seo, Junseok, Shi, Lihan, Watanabe, Kenji, Taniguchi, Takashi, Zhang, Fan, and Ju, Long
Subjects: Condensed Matter - Superconductivity, Condensed Matter - Strongly Correlated Electrons
Abstract: Engineering non-Abelian quasiparticles by combining superconductivity and topological states have been proposed as a route to realize topological quantum computation. Rhombohedral multilayer graphene with layer number N>=3 has been shown as a promising platform, as it hosts integer and fractional quantum anomalous Hall effects when proximitized by transition metal dichalcogenide (TMD) and a moire potential. However, superconductivity in similar devices have remained largely unexplored, although proximitized spin-orbit-coupling (SOC) effect has been shown to strengthen or induce superconductivity in both crystalline and twisted graphene. Here we report electron transport measurements of TMD-proximitized rhombohedral trilayer graphene (RTG) at temperatures down to 40 mK. We observed a new hole-doped superconducting state SC4 with a transition temperature Tc of 230 mK. On the electron-doped side, we identified a new isospin-symmetry breaking three-quarter-metal (TQM) phase. Near this three-quarter-metal state, the state SC3, very weak in bare RTG, is fully developed into a superconducting state at 110 mK. By performing fermiology analysis based on the quantum oscillation measurement, we showed that the SC3 and SC4 states reside at the phase boundaries between different isospin-symmetry-breaking states. These observations are aligned with the existing understanding that SOC enhances graphene superconductivity. Surprisingly, the original superconducting state SC1 in bare RTG is strongly suppressed in the presence of TMD, and we cannot find it down to the base temperature of our measurement. Our observations form the basis of exploring superconductivity and non-Abelian quasiparticles in rhombohedral graphene devices, and provide experimental evidence that challenges the understanding of the impacts of SOC on graphene superconductivity., Comment: 35 pages; 4 figures, 1 table, 13 extended data figures
Published: 2024

37. BVI-UGC: A Video Quality Database for User-Generated Content Transcoding

Author: Qi, Zihao, Feng, Chen, Zhang, Fan, Xu, Xiaozhong, Liu, Shan, and Bull, David
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: In recent years, user-generated content (UGC) has become one of the major video types consumed via streaming networks. Numerous research contributions have focused on assessing its visual quality through subjective tests and objective modeling. In most cases, objective assessments are based on a no-reference scenario, where the corresponding reference content is assumed not to be available. However, full-reference video quality assessment is also important for UGC in the delivery pipeline, particularly associated with the video transcoding process. In this context, we present a new UGC video quality database, BVI-UGC, for user-generated content transcoding, which contains 60 (non-pristine) reference videos and 1,080 test sequences. In this work, we simulated the creation of non-pristine reference sequences (with a wide range of compression distortions), typical of content uploaded to UGC platforms for transcoding. A comprehensive crowdsourced subjective study was then conducted involving more than 3,500 human participants. Based on this collected subjective data, we benchmarked the performance of 10 full-reference and 11 no-reference quality metrics. Our results demonstrate the poor performance (SROCC values are lower than 0.6) of these metrics in predicting the perceptual quality of UGC in two different scenarios (with or without a reference)., Comment: 12 pages, 11 figures
Published: 2024

38. Non-Hermitian Singularities in Scattering Spectra of Mie Resonators

Author: Zhang, Fan, Solodovchenko, Nikolay S., Fan, Hangkai, Limonov, Mikhail F., Song, Mingzhao, Kivshar, Yuri S., and Bogdanov, Andrey A.
Subjects: Physics - Classical Physics, Physics - Optics
Abstract: Non-Hermitian systems are known to possess unique singularities in the scattering spectra such as exceptional points, bound states in the continuum, Diabolic points, and anapole states, which are usually considered to be independent. Here, we demonstrate the fundamental relationships between non-Hermitian singularities and observe them experimentally in the scattering spectra. We reveal that exceptional points appear in the anapole regime, and diabolic points are associated with superscattering. We confirm our findings with microwave experiments by measuring the scattering spectra of subwavelength Mie-resonant ceramic rings. Our study underpins the generic behavior of non-Hermitian singularities in the scattering spectra of subwavelength resonators, uncovering their novel applications in non-Hermitian nonlinear optics and topological photonics., Comment: 19 pages, 5 figures
Published: 2024

39. Mesh deformation-based single-view 3D reconstruction of thin eyeglasses frames with differentiable rendering

Author: Zhang, Fan, Ji, Ziyue, Kang, Weiguang, Li, Weiqing, and Su, Zhiyong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics
Abstract: With the support of Virtual Reality (VR) and Augmented Reality (AR) technologies, the 3D virtual eyeglasses try-on application is well on its way to becoming a new trending solution that offers a "try on" option to select the perfect pair of eyeglasses at the comfort of your own home. Reconstructing eyeglasses frames from a single image with traditional depth and image-based methods is extremely difficult due to their unique characteristics such as lack of sufficient texture features, thin elements, and severe self-occlusions. In this paper, we propose the first mesh deformation-based reconstruction framework for recovering high-precision 3D full-frame eyeglasses models from a single RGB image, leveraging prior and domain-specific knowledge. Specifically, based on the construction of a synthetic eyeglasses frame dataset, we first define a class-specific eyeglasses frame template with pre-defined keypoints. Then, given an input eyeglasses frame image with thin structure and few texture features, we design a keypoint detector and refiner to detect predefined keypoints in a coarse-to-fine manner to estimate the camera pose accurately. After that, using differentiable rendering, we propose a novel optimization approach for producing correct geometry by progressively performing free-form deformation (FFD) on the template mesh. We define a series of loss functions to enforce consistency between the rendered result and the corresponding RGB input, utilizing constraints from inherent structure, silhouettes, keypoints, per-pixel shading information, and so on. Experimental results on both the synthetic dataset and real images demonstrate the effectiveness of the proposed algorithm.
Published: 2024
Full Text: View/download PDF

40. Benchmarking Conventional and Learned Video Codecs with a Low-Delay Configuration

Author: Teng, Siyue, Jiang, Yuxuan, Gao, Ge, Zhang, Fan, Davis, Thomas, Liu, Zoe, and Bull, David
Subjects: Computer Science - Multimedia, Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Recent advances in video compression have seen significant coding performance improvements with the development of new standards and learning-based video codecs. However, most of these works focus on application scenarios that allow a certain amount of system delay (e.g., Random Access mode in MPEG codecs), which is not always acceptable for live delivery. This paper conducts a comparative study of state-of-the-art conventional and learned video coding methods based on a low delay configuration. Specifically, this study includes two MPEG standard codecs (H.266/VVC VTM and JVET ECM), two AOM codecs (AV1 libaom and AVM), and two recent neural video coding models (DCVC-DC and DCVC-FM). To allow a fair and meaningful comparison, the evaluation was performed on test sequences defined in the AOM and MPEG common test conditions in the YCbCr 4:2:0 color space. The evaluation results show that the JVET ECM codecs offer the best overall coding performance among all codecs tested, with a 16.1% (based on PSNR) average BD-rate saving over AOM AVM, and 11.0% over DCVC-FM. We also observed inconsistent performance with the learned video codecs, DCVC-DC and DCVC-FM, for test content with large background motions.
Published: 2024

41. BVI-AOM: A New Training Dataset for Deep Video Compression Optimization

Author: Nawała, Jakub, Jiang, Yuxuan, Zhang, Fan, Zhu, Xiaoqing, Sole, Joel, and Bull, David
Subjects: Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Deep learning is now playing an important role in enhancing the performance of conventional hybrid video codecs. These learning-based methods typically require diverse and representative training material for optimization in order to achieve model generalization and optimal coding performance. However, existing datasets either offer limited content variability or come with restricted licensing terms constraining their use to research purposes only. To address these issues, we propose a new training dataset, named BVI-AOM, which contains 956 uncompressed sequences at various resolutions from 270p to 2160p, covering a wide range of content and texture types. The dataset comes with more flexible licensing terms and offers competitive performance when used as a training set for optimizing deep video coding tools. The experimental results demonstrate that when used as a training set to optimize two popular network architectures for two different coding tools, the proposed dataset leads to additional bitrate savings of up to 0.29 and 2.98 percentage points in terms of PSNR-Y and VMAF, respectively, compared to an existing training dataset, BVI-DVC, which has been widely used for deep video coding. The BVI-AOM dataset is available at https://github.com/fan-aaron-zhang/bvi-aom, Comment: 5 pages, 5 figures. Swapped the PSNR-HVS plot in Fig. 3 for a PSNR-YUV plot. Updated Fig. 3 (SI/TI/CF plots) and added the URL to the dataset
Published: 2024

42. Field-Tunable Valley Coupling and Localization in a Dodecagonal Semiconductor Quasicrystal

Author: Liu, Zhida, Gao, Qiang, Li, Yanxing, Liu, Xiaohui, Zhang, Fan, Kim, Dong Seob, Ni, Yue, Mackenzie, Miles, Abudayyeh, Hamza, Watanabe, Kenji, Taniguchi, Takashi, Shih, Chih-Kang, Khalaf, Eslam, and Li, Xiaoqin
Subjects: Condensed Matter - Materials Science, Physics - Optics
Abstract: Quasicrystals are characterized by atomic arrangements possessing long-range order without periodicity. Van der Waals (vdW) bilayers provide a unique opportunity to controllably vary atomic alignment between two layers from a periodic moir\'e crystal to an aperiodic quasicrystal. Here, we reveal a remarkable consequence of the unique atomic arrangement in a dodecagonal WSe2 quasicrystal: the K and Q valleys in separate layers are brought arbitrarily close in momentum space via higher-order Umklapp scatterings. A modest perpendicular electric field is sufficient to induce strong interlayer K-Q hybridization, manifested as a new hybrid excitonic doublet. Concurrently, we observe the disappearance of the trion resonance and attribute it to quasicrystal potential driven localization. Our findings highlight the remarkable attribute of incommensurate systems to bring any pair of momenta into close proximity, thereby introducing a novel aspect to valley engineering., Comment: 12 pages, 12 figures
Published: 2024

43. DiM-Gesture: Co-Speech Gesture Generation with Adaptive Layer Normalization Mamba-2 framework

Author: Zhang, Fan, Ji, Naye, Gao, Fuxing, Zhao, Bozuo, Wu, Jingmei, Jiang, Yanbing, Du, Hui, Ye, Zhenqing, Zhu, Jiayang, Zhong, WeiFan, Yan, Leyao, and Ma, Xiaomeng
Subjects: Computer Science - Graphics, Computer Science - Artificial Intelligence, Computer Science - Robotics, Computer Science - Sound
Abstract: Speech-driven gesture generation is an emerging domain within virtual human creation, where current methods predominantly utilize Transformer-based architectures that necessitate extensive memory and are characterized by slow inference speeds. In response to these limitations, we propose \textit{DiM-Gestures}, a novel end-to-end generative model crafted to create highly personalized 3D full-body gestures solely from raw speech audio, employing Mamba-based architectures. This model integrates a Mamba-based fuzzy feature extractor with a non-autoregressive Adaptive Layer Normalization (AdaLN) Mamba-2 diffusion architecture. The extractor, leveraging a Mamba framework and a WavLM pre-trained model, autonomously derives implicit, continuous fuzzy features, which are then unified into a singular latent feature. This feature is processed by the AdaLN Mamba-2, which implements a uniform conditional mechanism across all tokens to robustly model the interplay between the fuzzy features and the resultant gesture sequence. This innovative approach guarantees high fidelity in gesture-speech synchronization while maintaining the naturalness of the gestures. Employing a diffusion model for training and inference, our framework has undergone extensive subjective and objective evaluations on the ZEGGS and BEAT datasets. These assessments substantiate our model's enhanced performance relative to contemporary state-of-the-art methods, demonstrating competitive outcomes with the DiTs architecture (Persona-Gestors) while optimizing memory usage and accelerating inference speed., Comment: 10 pages,10 figures. arXiv admin note: text overlap with arXiv:2403.10805
Published: 2024

44. Diffusion Feedback Helps CLIP See Better

Author: Wang, Wenxuan, Sun, Quan, Zhang, Fan, Tang, Yepeng, Liu, Jing, and Wang, Xinlong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Contrastive Language-Image Pre-training (CLIP), which excels at abstracting open-world representations across domains and modalities, has become a foundation for a variety of vision and multimodal tasks. However, recent studies reveal that CLIP has severe visual shortcomings, such as which can hardly distinguish orientation, quantity, color, structure, etc. These visual shortcomings also limit the perception capabilities of multimodal large language models (MLLMs) built on CLIP. The main reason could be that the image-text pairs used to train CLIP are inherently biased, due to the lack of the distinctiveness of the text and the diversity of images. In this work, we present a simple post-training approach for CLIP models, which largely overcomes its visual shortcomings via a self-supervised diffusion process. We introduce DIVA, which uses the DIffusion model as a Visual Assistant for CLIP. Specifically, DIVA leverages generative feedback from text-to-image diffusion models to optimize CLIP representations, with only images (without corresponding text). We demonstrate that DIVA improves CLIP's performance on the challenging MMVP-VLM benchmark which assesses fine-grained visual abilities to a large extent (e.g., 3-7%), and enhances the performance of MLLMs and vision models on multimodal understanding and segmentation tasks. Extensive evaluation on 29 image classification and retrieval benchmarks confirms that our framework preserves CLIP's strong zero-shot capabilities. The code is available at https://github.com/baaivision/DIVA.
Published: 2024

45. White Matter Geometry-Guided Score-Based Diffusion Model for Tissue Microstructure Imputation in Tractography Imaging

Author: Lo, Yui, Chen, Yuqian, Zhang, Fan, Liu, Dongnan, Zekelman, Leo, Cetin-Karayumak, Suheyla, Rathi, Yogesh, Cai, Weidong, and O'Donnell, Lauren J.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Parcellation of white matter tractography provides anatomical features for disease prediction, anatomical tract segmentation, surgical brain mapping, and non-imaging phenotype classifications. However, parcellation does not always reach 100\% accuracy due to various factors, including inter-individual anatomical variability and the quality of neuroimaging scan data. The failure to identify parcels causes a problem of missing microstructure data values, which is especially challenging for downstream tasks that analyze large brain datasets. In this work, we propose a novel deep-learning model to impute tissue microstructure: the White Matter Geometry-guided Diffusion (WMG-Diff) model. Specifically, we first propose a deep score-based guided diffusion model to impute tissue microstructure for diffusion magnetic resonance imaging (dMRI) tractography fiber clusters. Second, we propose a white matter atlas geometric relationship-guided denoising function to guide the reverse denoising process at the subject-specific level. Third, we train and evaluate our model on a large dataset with 9342 subjects. Comprehensive experiments for tissue microstructure imputation and a downstream non-imaging phenotype prediction task demonstrate that our proposed WMG-Diff outperforms the compared state-of-the-art methods in both error and accuracy metrics. Our code will be available at: https://github.com/SlicerDMRI/WMG-Diff., Comment: This paper has been accepted for presentation at The 31st International Conference on Neural Information Processing (ICONIP 2024). 12 pages, 3 figures, 2 tables
Published: 2024

46. CrudiTEE: A Stick-and-Carrot Approach to Building Trustworthy Cryptocurrency Wallets with TEEs

Author: Zhou, Lulu, Liu, Zeyu, Zhang, Fan, and Reiter, Michael K.
Subjects: Computer Science - Cryptography and Security
Abstract: Cryptocurrency introduces usability challenges by requiring users to manage signing keys. Popular signing key management services (e.g., custodial wallets), however, either introduce a trusted party or burden users with managing signing key shares, posing the same usability challenges. TEEs (Trusted Execution Environments) are a promising technology to avoid both, but practical implementations of TEEs suffer from various side-channel attacks that have proven hard to eliminate. This paper explores a new approach to side-channel mitigation through economic incentives for TEE-based cryptocurrency wallet solutions. By taking the cost and profit of side-channel attacks into consideration, we designed a Stick-and-Carrot-based cryptocurrency wallet, CrudiTEE, that leverages penalties (the stick) and rewards (the carrot) to disincentivize attackers from exfiltrating signing keys in the first place. We model the attacker's behavior using a Markov Decision Process (MDP) to evaluate the effectiveness of the bounty and enable the service provider to adjust the parameters of the bounty's reward function accordingly.
Published: 2024

47. Deep multimodal saliency parcellation of cerebellar pathways: linking microstructure and individual function through explainable multitask learning

Author: Tchetchenian, Ari, Zekelman, Leo, Chen, Yuqian, Rushmore, Jarrett, Zhang, Fan, Yeterian, Edward H., Makris, Nikos, Rathi, Yogesh, Meijering, Erik, Song, Yang, and O'Donnell, Lauren J.
Subjects: Quantitative Biology - Neurons and Cognition, Computer Science - Machine Learning
Abstract: Parcellation of human cerebellar pathways is essential for advancing our understanding of the human brain. Existing diffusion MRI tractography parcellation methods have been successful in defining major cerebellar fibre tracts, while relying solely on fibre tract structure. However, each fibre tract may relay information related to multiple cognitive and motor functions of the cerebellum. Hence, it may be beneficial for parcellation to consider the potential importance of the fibre tracts for individual motor and cognitive functional performance measures. In this work, we propose a multimodal data-driven method for cerebellar pathway parcellation, which incorporates both measures of microstructure and connectivity, and measures of individual functional performance. Our method involves first training a multitask deep network to predict various cognitive and motor measures from a set of fibre tract structural features. The importance of each structural feature for predicting each functional measure is then computed, resulting in a set of structure-function saliency values that are clustered to parcellate cerebellar pathways. We refer to our method as Deep Multimodal Saliency Parcellation (DeepMSP), as it computes the saliency of structural measures for predicting cognitive and motor functional performance, with these saliencies being applied to the task of parcellation. Applying DeepMSP we found that it was feasible to identify multiple cerebellar pathway parcels with unique structure-function saliency patterns that were stable across training folds.
Published: 2024

48. AGORA: Open More and Trust Less in Binary Verification Service

Author: Chen, Hongbo, Zhou, Quan, Yang, Sen, Han, Xing, Zhang, Fan, Zhang, Danfeng, and Wang, Xiaofeng
Subjects: Computer Science - Cryptography and Security
Abstract: Binary verification plays a pivotal role in software security, yet building a verification service that is both open and trustworthy poses a formidable challenge. In this paper, we introduce a novel binary verification service, AGORA, scrupulously designed to overcome the challenge. At the heart of this approach lies a strategic insight: certain tasks can be delegated to untrusted entities, while the corresponding validators are securely housed within the trusted computing base (TCB). AGORA can validate untrusted assertions generated for versatile policies. Through a novel blockchain-based bounty task manager, it also utilizes crowdsourcing to remove trust in theorem provers. These synergistic techniques successfully ameliorate the TCB size burden associated with two procedures: binary analysis and theorem proving. The design of AGORA allows untrusted parties to participate in these complex processes. Moreover, based on running the optimized TCB within trusted execution environments and recording the verification process on a blockchain, the public can audit the correctness of verification results. By implementing verification workflows for software-based fault isolation policy and side-channel mitigation, our evaluation demonstrates the efficacy of AGORA.
Published: 2024

49. Cluster Sliding Ferroelectricity in Trilayer Quasi-Hexagonal C60

Author: Wang, Xuefei, Ren, Yanhan, Qiu, Shi, Zhang, Fan, Li, Xueao, Gao, Junfeng, Gao, Weiwei, and Zhao, Jijun
Subjects: Condensed Matter - Materials Science, Physics - Computational Physics
Abstract: Electric polarization typically originates from non-centrosymmetric charge distributions. Since chemical bonds between atoms of the same elements favor centrosymmetric crystal structures and symmetrically distributed electron charges, elemental ferroelectrics are extremely rare. In comparison to atoms, elemental clusters are less symmetric and typically have various preferred orientations in crystals. Consequently, the assembly of clusters with different orientations tends to break the inversion symmetry. Based on this concept, we show that sliding ferroelectricity naturally emerges in trilayer quasi-hexagonal phase (qHP) C60, a cluster-assembled carbon allotrope recently synthesized. Trilayer qHP C60's have several stable polar structures, which are distinguishable in second-harmonic generation (SHG) responses. Compared to previously found elemental ferroelectrics, trilayer qHP C60's have sizable band gaps and some of them have both switchable out-of-plane and in-plane polarizations. Remarkably, the out-of-plane and in-plane polarizations are decoupled, enabling an easy-to-implement construction of Van der Waals homostructures with ferroelectrically switchable chirality., Comment: 5 figures
Published: 2024

50. Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models

Author: Chen, Zhuo, Liu, Jiawei, Liu, Haotan, Cheng, Qikai, Zhang, Fan, Lu, Wei, and Liu, Xiaozhong
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security
Abstract: Retrieval-Augmented Generation (RAG) is applied to solve hallucination problems and real-time constraints of large language models, but it also induces vulnerabilities against retrieval corruption attacks. Existing research mainly explores the unreliability of RAG in white-box and closed-domain QA tasks. In this paper, we aim to reveal the vulnerabilities of Retrieval-Enhanced Generative (RAG) models when faced with black-box attacks for opinion manipulation. We explore the impact of such attacks on user cognition and decision-making, providing new insight to enhance the reliability and security of RAG models. We manipulate the ranking results of the retrieval model in RAG with instruction and use these results as data to train a surrogate model. By employing adversarial retrieval attack methods to the surrogate model, black-box transfer attacks on RAG are further realized. Experiments conducted on opinion datasets across multiple topics show that the proposed attack strategy can significantly alter the opinion polarity of the content generated by RAG. This demonstrates the model's vulnerability and, more importantly, reveals the potential negative impact on user cognition and decision-making, making it easier to mislead users into accepting incorrect or biased information., Comment: 10 pages, 3 figures, under review
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Region

Database

Publisher

43,240 results on '"Zhang,Fan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources