17,668 results on '"Wang, Fan"'
Search Results
2. How Late Imperial Chinese Literati Read Their Books: Inscribing, Collating, Excerpting
- Author
-
Wang, Fan
- Published
- 2021
- Full Text
- View/download PDF
3. TopoTxR: A topology-guided deep convolutional network for breast parenchyma learning on DCE-MRIs
- Author
-
Wang, Fan, Zou, Zhilin, Sakla, Nicole, Partyka, Luke, Rawal, Nil, Singh, Gagandeep, Zhao, Wei, Ling, Haibin, Huang, Chuan, Prasanna, Prateek, and Chen, Chao
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Characterization of breast parenchyma in dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is a challenging task owing to the complexity of underlying tissue structures. Existing quantitative approaches, like radiomics and deep learning models, lack explicit quantification of intricate and subtle parenchymal structures, including fibroglandular tissue. To address this, we propose a novel topological approach that explicitly extracts multi-scale topological structures to better approximate breast parenchymal structures, and then incorporates these structures into a deep-learning-based prediction model via an attention mechanism. Our topology-informed deep learning model, \emph{TopoTxR}, leverages topology to provide enhanced insights into tissues critical for disease pathophysiology and treatment response. We empirically validate \emph{TopoTxR} using the VICTRE phantom breast dataset, showing that the topological structures extracted by our model effectively approximate the breast parenchymal structures. We further demonstrate \emph{TopoTxR}'s efficacy in predicting response to neoadjuvant chemotherapy. Our qualitative and quantitative analyses suggest differential topological behavior of breast tissue in treatment-na\"ive imaging, in patients who respond favorably to therapy as achieving pathological complete response (pCR) versus those who do not. In a comparative analysis with several baselines on the publicly available I-SPY 1 dataset (N=161, including 47 patients with pCR and 114 without) and the Rutgers proprietary dataset (N=120, with 69 patients achieving pCR and 51 not), \emph{TopoTxR} demonstrates a notable improvement, achieving a 2.6\% increase in accuracy and a 4.6\% enhancement in AUC compared to the state-of-the-art method., Comment: 22 pages, 8 figures, 8 tables, accepted by Medical Image Analysis ( https://www.sciencedirect.com/science/article/abs/pii/S1361841524002986 )
- Published
- 2024
- Full Text
- View/download PDF
4. MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media
- Author
-
Zhai, Wei, Bai, Nan, Zhao, Qing, Li, Jianqiang, Wang, Fan, Qi, Hongzhi, Jiang, Meng, Wang, Xiaoqin, Yang, Bing Xiang, and Fu, Guanghui
- Subjects
Computer Science - Computation and Language - Abstract
As the prevalence of mental health challenges, social media has emerged as a key platform for individuals to express their emotions.Deep learning tends to be a promising solution for analyzing mental health on social media. However, black box models are often inflexible when switching between tasks, and their results typically lack explanations. With the rise of large language models (LLMs), their flexibility has introduced new approaches to the field. Also due to the generative nature, they can be prompted to explain decision-making processes. However, their performance on complex psychological analysis still lags behind deep learning. In this paper, we introduce the first multi-task Chinese Social Media Interpretable Mental Health Instructions (C-IMHI) dataset, consisting of 9K samples, which has been quality-controlled and manually validated. We also propose MentalGLM series models, the first open-source LLMs designed for explainable mental health analysis targeting Chinese social media, trained on a corpus of 50K instructions. The proposed models were evaluated on three downstream tasks and achieved better or comparable performance compared to deep learning models, generalized LLMs, and task fine-tuned LLMs. We validated a portion of the generated decision explanations with experts, showing promising results. We also evaluated the proposed models on a clinical dataset, where they outperformed other LLMs, indicating their potential applicability in the clinical field. Our models show strong performance, validated across tasks and perspectives. The decision explanations enhance usability and facilitate better understanding and practical application of the models. Both the constructed dataset and the models are publicly available via: https://github.com/zwzzzQAQ/MentalGLM.
- Published
- 2024
5. Deeper Insights Without Updates: The Power of In-Context Learning Over Fine-Tuning
- Author
-
Yin, Qingyu, He, Xuzheng, Deng, Luoao, Leong, Chak Tou, Wang, Fan, Yan, Yanzhao, Shen, Xiaoyu, and Zhang, Qiang
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language - Abstract
Fine-tuning and in-context learning (ICL) are two prevalent methods in imbuing large language models with task-specific knowledge. It is commonly believed that fine-tuning can surpass ICL given sufficient training samples as it allows the model to adjust its internal parameters based on the data. However, this paper presents a counterintuitive finding: For tasks with implicit patterns, ICL captures these patterns significantly better than fine-tuning. We developed several datasets featuring implicit patterns, such as sequences determining answers through parity or identifying reducible terms in calculations. We then evaluated the models' understanding of these patterns under both fine-tuning and ICL across models ranging from 0.5B to 7B parameters. The results indicate that models employing ICL can quickly grasp deep patterns and significantly improve accuracy. In contrast, fine-tuning, despite utilizing thousands of times more training samples than ICL, achieved only limited improvements. We also proposed circuit shift theory from a mechanistic interpretability's view to explain why ICL wins., Comment: EMNLP'24 Findings
- Published
- 2024
6. Dynamic Diffusion Transformer
- Author
-
Zhao, Wangbo, Han, Yizeng, Tang, Jiasheng, Wang, Kai, Song, Yibing, Huang, Gao, Wang, Fan, and You, Yang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Diffusion Transformer (DiT), an emerging diffusion model for image generation, has demonstrated superior performance but suffers from substantial computational costs. Our investigations reveal that these costs stem from the static inference paradigm, which inevitably introduces redundant computation in certain diffusion timesteps and spatial regions. To address this inefficiency, we propose Dynamic Diffusion Transformer (DyDiT), an architecture that dynamically adjusts its computation along both timestep and spatial dimensions during generation. Specifically, we introduce a Timestep-wise Dynamic Width (TDW) approach that adapts model width conditioned on the generation timesteps. In addition, we design a Spatial-wise Dynamic Token (SDT) strategy to avoid redundant computation at unnecessary spatial locations. Extensive experiments on various datasets and different-sized models verify the superiority of DyDiT. Notably, with <3% additional fine-tuning iterations, our method reduces the FLOPs of DiT-XL by 51%, accelerates generation by 1.73, and achieves a competitive FID score of 2.07 on ImageNet. The code is publicly available at https://github.com/NUS-HPC-AI-Lab/ Dynamic-Diffusion-Transformer.
- Published
- 2024
7. AnyLogo: Symbiotic Subject-Driven Diffusion System with Gemini Status
- Author
-
Zhang, Jinghao, Qian, Wen, Luo, Hao, Wang, Fan, and Zhao, Feng
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Diffusion models have made compelling progress on facilitating high-throughput daily production. Nevertheless, the appealing customized requirements are remain suffered from instance-level finetuning for authentic fidelity. Prior zero-shot customization works achieve the semantic consistence through the condensed injection of identity features, while addressing detailed low-level signatures through complex model configurations and subject-specific fabrications, which significantly break the statistical coherence within the overall system and limit the applicability across various scenarios. To facilitate the generic signature concentration with rectified efficiency, we present \textbf{AnyLogo}, a zero-shot region customizer with remarkable detail consistency, building upon the symbiotic diffusion system with eliminated cumbersome designs. Streamlined as vanilla image generation, we discern that the rigorous signature extraction and creative content generation are promisingly compatible and can be systematically recycled within a single denoising model. In place of the external configurations, the gemini status of the denoising model promote the reinforced subject transmission efficiency and disentangled semantic-signature space with continuous signature decoration. Moreover, the sparse recycling paradigm is adopted to prevent the duplicated risk with compressed transmission quota for diversified signature stimulation. Extensive experiments on constructed logo-level benchmarks demonstrate the effectiveness and practicability of our methods., Comment: 13 pages, 12 figures
- Published
- 2024
8. RealisDance: Equip controllable character animation with realistic hands
- Author
-
Zhou, Jingkai, Wang, Benzhi, Chen, Weihua, Bai, Jingqi, Li, Dongyang, Zhang, Aixi, Xu, Hao, Yang, Mingyang, and Wang, Fan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Controllable character animation is an emerging task that generates character videos controlled by pose sequences from given character images. Although character consistency has made significant progress via reference UNet, another crucial factor, pose control, has not been well studied by existing methods yet, resulting in several issues: 1) The generation may fail when the input pose sequence is corrupted. 2) The hands generated using the DWPose sequence are blurry and unrealistic. 3) The generated video will be shaky if the pose sequence is not smooth enough. In this paper, we present RealisDance to handle all the above issues. RealisDance adaptively leverages three types of poses, avoiding failed generation caused by corrupted pose sequences. Among these pose types, HaMeR provides accurate 3D and depth information of hands, enabling RealisDance to generate realistic hands even for complex gestures. Besides using temporal attention in the main UNet, RealisDance also inserts temporal attention into the pose guidance network, smoothing the video from the pose condition aspect. Moreover, we introduce pose shuffle augmentation during training to further improve generation robustness and video smoothness. Qualitative experiments demonstrate the superiority of RealisDance over other existing methods, especially in hand quality., Comment: Technical Report
- Published
- 2024
9. RealisHuman: A Two-Stage Approach for Refining Malformed Human Parts in Generated Images
- Author
-
Wang, Benzhi, Zhou, Jingkai, Bai, Jingqi, Yang, Yang, Chen, Weihua, Wang, Fan, and Lei, Zhen
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In recent years, diffusion models have revolutionized visual generation, outperforming traditional frameworks like Generative Adversarial Networks (GANs). However, generating images of humans with realistic semantic parts, such as hands and faces, remains a significant challenge due to their intricate structural complexity. To address this issue, we propose a novel post-processing solution named RealisHuman. The RealisHuman framework operates in two stages. First, it generates realistic human parts, such as hands or faces, using the original malformed parts as references, ensuring consistent details with the original image. Second, it seamlessly integrates the rectified human parts back into their corresponding positions by repainting the surrounding areas to ensure smooth and realistic blending. The RealisHuman framework significantly enhances the realism of human generation, as demonstrated by notable improvements in both qualitative and quantitative metrics. Code is available at https://github.com/Wangbenzhi/RealisHuman.
- Published
- 2024
10. LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs
- Author
-
Park, Chansung, Jiang, Juyong, Wang, Fan, Paul, Sayak, and Tang, Jing
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
The widespread adoption of cloud-based proprietary large language models (LLMs) has introduced significant challenges, including operational dependencies, privacy concerns, and the necessity of continuous internet connectivity. In this work, we introduce an LLMOps pipeline, "LlamaDuo", for the seamless migration of knowledge and abilities from service-oriented LLMs to smaller, locally manageable models. This pipeline is crucial for ensuring service continuity in the presence of operational failures, strict privacy policies, or offline requirements. Our LlamaDuo involves fine-tuning a small language model against the service LLM using a synthetic dataset generated by the latter. If the performance of the fine-tuned model falls short of expectations, it is enhanced by further fine-tuning with additional similar data created by the service LLM. This iterative process guarantees that the smaller model can eventually match or even surpass the service LLM's capabilities in specific downstream tasks, offering a practical and scalable solution for managing AI deployments in constrained environments. Extensive experiments with leading edge LLMs are conducted to demonstrate the effectiveness, adaptability, and affordability of LlamaDuo across various downstream tasks. Our pipeline implementation is available at https://github.com/deep-diver/llamaduo., Comment: 28 pages, 18 figures, 6 tables
- Published
- 2024
11. MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and 3D Editing
- Author
-
Cao, Chenjie, Yu, Chaohui, Wang, Fan, Xue, Xiangyang, and Fu, Yanwei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Novel View Synthesis (NVS) and 3D generation have recently achieved prominent improvements. However, these works mainly focus on confined categories or synthetic 3D assets, which are discouraged from generalizing to challenging in-the-wild scenes and fail to be employed with 2D synthesis directly. Moreover, these methods heavily depended on camera poses, limiting their real-world applications. To overcome these issues, we propose MVInpainter, re-formulating the 3D editing as a multi-view 2D inpainting task. Specifically, MVInpainter partially inpaints multi-view images with the reference guidance rather than intractably generating an entirely novel view from scratch, which largely simplifies the difficulty of in-the-wild NVS and leverages unmasked clues instead of explicit pose conditions. To ensure cross-view consistency, MVInpainter is enhanced by video priors from motion components and appearance guidance from concatenated reference key&value attention. Furthermore, MVInpainter incorporates slot attention to aggregate high-level optical flow features from unmasked regions to control the camera movement with pose-free training and inference. Sufficient scene-level experiments on both object-centric and forward-facing datasets verify the effectiveness of MVInpainter, including diverse tasks, such as multi-view object removal, synthesis, insertion, and replacement. The project page is https://ewrfcas.github.io/MVInpainter/., Comment: Project page: https://ewrfcas.github.io/MVInpainter/. Accepted at NeurIPS2024
- Published
- 2024
12. A Generic Review of Integrating Artificial Intelligence in Cognitive Behavioral Therapy
- Author
-
Jiang, Meng, Zhao, Qing, Li, Jianqiang, Wang, Fan, He, Tianyu, Cheng, Xinyan, Yang, Bing Xiang, Ho, Grace W. K., and Fu, Guanghui
- Subjects
Computer Science - Artificial Intelligence - Abstract
Cognitive Behavioral Therapy (CBT) is a well-established intervention for mitigating psychological issues by modifying maladaptive cognitive and behavioral patterns. However, delivery of CBT is often constrained by resource limitations and barriers to access. Advancements in artificial intelligence (AI) have provided technical support for the digital transformation of CBT. Particularly, the emergence of pre-training models (PTMs) and large language models (LLMs) holds immense potential to support, augment, optimize and automate CBT delivery. This paper reviews the literature on integrating AI into CBT interventions. We begin with an overview of CBT. Then, we introduce the integration of AI into CBT across various stages: pre-treatment, therapeutic process, and post-treatment. Next, we summarized the datasets relevant to some CBT-related tasks. Finally, we discuss the benefits and current limitations of applying AI to CBT. We suggest key areas for future research, highlighting the need for further exploration and validation of the long-term efficacy and clinical utility of AI-enhanced CBT. The transformative potential of AI in reshaping the practice of CBT heralds a new era of more accessible, efficient, and personalized mental health interventions.
- Published
- 2024
13. MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence
- Author
-
Zhao, Canyu, Liu, Mingyu, Wang, Wen, Chen, Weihua, Wang, Fan, Chen, Hao, Zhang, Bo, and Shen, Chunhua
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent advancements in video generation have primarily leveraged diffusion models for short-duration content. However, these approaches often fall short in modeling complex narratives and maintaining character consistency over extended periods, which is essential for long-form video production like movies. We propose MovieDreamer, a novel hierarchical framework that integrates the strengths of autoregressive models with diffusion-based rendering to pioneer long-duration video generation with intricate plot progressions and high visual fidelity. Our approach utilizes autoregressive models for global narrative coherence, predicting sequences of visual tokens that are subsequently transformed into high-quality video frames through diffusion rendering. This method is akin to traditional movie production processes, where complex stories are factorized down into manageable scene capturing. Further, we employ a multimodal script that enriches scene descriptions with detailed character information and visual style, enhancing continuity and character identity across scenes. We present extensive experiments across various movie genres, demonstrating that our approach not only achieves superior visual and narrative quality but also effectively extends the duration of generated content significantly beyond current capabilities. Homepage: https://aim-uofa.github.io/MovieDreamer/., Comment: 30 pages, 22 figures
- Published
- 2024
14. Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning
- Author
-
Shen, Chen, Lian, Chunfeng, Zhang, Wanqing, Wang, Fan, Zhang, Jianhua, Fan, Shuanliang, Wei, Xin, Wang, Gongji, Li, Kehan, Mu, Hongshu, Wu, Hao, Liang, Xinggong, Ma, Jianhua, and Wang, Zhenyuan
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Forensic pathology is critical in determining the cause and manner of death through post-mortem examinations, both macroscopic and microscopic. The field, however, grapples with issues such as outcome variability, laborious processes, and a scarcity of trained professionals. This paper presents SongCi, an innovative visual-language model (VLM) designed specifically for forensic pathology. SongCi utilizes advanced prototypical cross-modal self-supervised contrastive learning to enhance the accuracy, efficiency, and generalizability of forensic analyses. It was pre-trained and evaluated on a comprehensive multi-center dataset, which includes over 16 million high-resolution image patches, 2,228 vision-language pairs of post-mortem whole slide images (WSIs), and corresponding gross key findings, along with 471 distinct diagnostic outcomes. Our findings indicate that SongCi surpasses existing multi-modal AI models in many forensic pathology tasks, performs comparably to experienced forensic pathologists and significantly better than less experienced ones, and provides detailed multi-modal explainability, offering critical assistance in forensic investigations. To the best of our knowledge, SongCi is the first VLM specifically developed for forensic pathological analysis and the first large-vocabulary computational pathology (CPath) model that directly processes gigapixel WSIs in forensic science., Comment: 28 pages, 6 figures, under review
- Published
- 2024
15. Animate3D: Animating Any 3D Model with Multi-view Video Diffusion
- Author
-
Jiang, Yanqin, Yu, Chaohui, Cao, Chenjie, Wang, Fan, Hu, Weiming, and Gao, Jin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent advances in 4D generation mainly focus on generating 4D content by distilling pre-trained text or single-view image-conditioned models. It is inconvenient for them to take advantage of various off-the-shelf 3D assets with multi-view attributes, and their results suffer from spatiotemporal inconsistency owing to the inherent ambiguity in the supervision signals. In this work, we present Animate3D, a novel framework for animating any static 3D model. The core idea is two-fold: 1) We propose a novel multi-view video diffusion model (MV-VDM) conditioned on multi-view renderings of the static 3D object, which is trained on our presented large-scale multi-view video dataset (MV-Video). 2) Based on MV-VDM, we introduce a framework combining reconstruction and 4D Score Distillation Sampling (4D-SDS) to leverage the multi-view video diffusion priors for animating 3D objects. Specifically, for MV-VDM, we design a new spatiotemporal attention module to enhance spatial and temporal consistency by integrating 3D and video diffusion models. Additionally, we leverage the static 3D model's multi-view renderings as conditions to preserve its identity. For animating 3D models, an effective two-stage pipeline is proposed: we first reconstruct motions directly from generated multi-view videos, followed by the introduced 4D-SDS to refine both appearance and motion. Benefiting from accurate motion learning, we could achieve straightforward mesh animation. Qualitative and quantitative experiments demonstrate that Animate3D significantly outperforms previous approaches. Data, code, and models will be open-released., Comment: Project Page: https://animate3d.github.io/
- Published
- 2024
16. Exploring the Causality of End-to-End Autonomous Driving
- Author
-
Li, Jiankun, Li, Hao, Liu, Jiangjiang, Zou, Zhikang, Ye, Xiaoqing, Wang, Fan, Huang, Jizhou, Wu, Hua, and Wang, Haifeng
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Robotics - Abstract
Deep learning-based models are widely deployed in autonomous driving areas, especially the increasingly noticed end-to-end solutions. However, the black-box property of these models raises concerns about their trustworthiness and safety for autonomous driving, and how to debug the causality has become a pressing concern. Despite some existing research on the explainability of autonomous driving, there is currently no systematic solution to help researchers debug and identify the key factors that lead to the final predicted action of end-to-end autonomous driving. In this work, we propose a comprehensive approach to explore and analyze the causality of end-to-end autonomous driving. First, we validate the essential information that the final planning depends on by using controlled variables and counterfactual interventions for qualitative analysis. Then, we quantitatively assess the factors influencing model decisions by visualizing and statistically analyzing the response of key model inputs. Finally, based on the comprehensive study of the multi-factorial end-to-end autonomous driving system, we have developed a strong baseline and a tool for exploring causality in the close-loop simulator CARLA. It leverages the essential input sources to obtain a well-designed model, resulting in highly competitive capabilities. As far as we know, our work is the first to unveil the mystery of end-to-end autonomous driving and turn the black box into a white one. Thorough close-loop experiments demonstrate that our method can be applied to end-to-end autonomous driving solutions for causality debugging. Code will be available at https://github.com/bdvisl/DriveInsight.
- Published
- 2024
17. BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space
- Author
-
Zhang, Yumeng, Gong, Shi, Xiong, Kaixin, Ye, Xiaoqing, Tan, Xiao, Wang, Fan, Huang, Jizhou, Wu, Hua, and Wang, Haifeng
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
World models are receiving increasing attention in autonomous driving for their ability to predict potential future scenarios. In this paper, we present BEVWorld, a novel approach that tokenizes multimodal sensor inputs into a unified and compact Bird's Eye View (BEV) latent space for environment modeling. The world model consists of two parts: the multi-modal tokenizer and the latent BEV sequence diffusion model. The multi-modal tokenizer first encodes multi-modality information and the decoder is able to reconstruct the latent BEV tokens into LiDAR and image observations by ray-casting rendering in a self-supervised manner. Then the latent BEV sequence diffusion model predicts future scenarios given action tokens as conditions. Experiments demonstrate the effectiveness of BEVWorld in autonomous driving tasks, showcasing its capability in generating future scenes and benefiting downstream tasks such as perception and motion prediction. Code will be available at https://github.com/zympsyche/BevWorld., Comment: 10 pages
- Published
- 2024
18. VCD-Texture: Variance Alignment based 3D-2D Co-Denoising for Text-Guided Texturing
- Author
-
Liu, Shang, Yu, Chaohui, Cao, Chenjie, Qian, Wen, and Wang, Fan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent research on texture synthesis for 3D shapes benefits a lot from dramatically developed 2D text-to-image diffusion models, including inpainting-based and optimization-based approaches. However, these methods ignore the modal gap between the 2D diffusion model and 3D objects, which primarily render 3D objects into 2D images and texture each image separately. In this paper, we revisit the texture synthesis and propose a Variance alignment based 3D-2D Collaborative Denoising framework, dubbed VCD-Texture, to address these issues. Formally, we first unify both 2D and 3D latent feature learning in diffusion self-attention modules with re-projected 3D attention receptive fields. Subsequently, the denoised multi-view 2D latent features are aggregated into 3D space and then rasterized back to formulate more consistent 2D predictions. However, the rasterization process suffers from an intractable variance bias, which is theoretically addressed by the proposed variance alignment, achieving high-fidelity texture synthesis. Moreover, we present an inpainting refinement to further improve the details with conflicting regions. Notably, there is not a publicly available benchmark to evaluate texture synthesis, which hinders its development. Thus we construct a new evaluation set built upon three open-source 3D datasets and propose to use four metrics to thoroughly validate the texturing performance. Comprehensive experiments demonstrate that VCD-Texture achieves superior performance against other counterparts., Comment: ECCV 2024
- Published
- 2024
19. A Survey on Mixture of Experts
- Author
-
Cai, Weilin, Jiang, Juyong, Wang, Fan, Tang, Jing, Kim, Sunghun, and Huang, Jiayi
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language - Abstract
Large language models (LLMs) have garnered unprecedented advancements across diverse fields, ranging from natural language processing to computer vision and beyond. The prowess of LLMs is underpinned by their substantial model size, extensive and diverse datasets, and the vast computational power harnessed during training, all of which contribute to the emergent abilities of LLMs (e.g., in-context learning) that are not present in small models. Within this context, the mixture of experts (MoE) has emerged as an effective method for substantially scaling up model capacity with minimal computation overhead, gaining significant attention from academia and industry. Despite its growing prevalence, there lacks a systematic and comprehensive review of the literature on MoE. This survey seeks to bridge that gap, serving as an essential resource for researchers delving into the intricacies of MoE. We first briefly introduce the structure of the MoE layer, followed by proposing a new taxonomy of MoE. Next, we overview the core designs for various MoE models including both algorithmic and systemic aspects, alongside collections of available open-source implementations, hyperparameter configurations and empirical evaluations. Furthermore, we delineate the multifaceted applications of MoE in practice, and outline some potential directions for future research. To facilitate ongoing updates and the sharing of cutting-edge developments in MoE research, we have established a resource repository accessible at https://github.com/withinmiaov/A-Survey-on-Mixture-of-Experts.
- Published
- 2024
20. Multitype entanglement dynamics induced by exceptional points
- Author
-
Li, Zigeng, Huang, Xinyao, Zhu, Hongyan, Zhang, Guofeng, Wang, Fan, and Zhong, Xiaolan
- Subjects
Quantum Physics - Abstract
As a most important feature of non-Hermitian systems, exceptional points (EPs) lead to a variety of unconventional phenomena and applications. Here we discover that multitype entanglement dynamics can be induced by engineering different orders of EP. By studying a generic model composed of two coupled non-Hermitian qubits, we find that diverse entanglement dynamics on the two sides of the fourth-order EP (EP4) and second-order EP (EP2) can be observed simultaneously in the weak coupling regime. With the increase of the coupling strength, the EP4 is replaced by an additional EP2, leading to the disappearance of the entanglement dynamics transition induced by EP4 in the strong coupling regime. Considering the case of Ising type interaction, we also realize EP-induced entanglement dynamics transition without the driving field. Our study paves the way for the investigation of EP-induced quantum effects and applications of EP-related quantum technologies., Comment: 14 pages, 11 figures
- Published
- 2024
21. Pionic transitions of the spin-2 partner of $X(3872)$ to $\chi_{cJ}$
- Author
-
Liu, Shi-Dong, Wang, Fan, Jia, Zhao-Sai, Li, Gang, Liu, Xiao-Hai, and Xie, Ju-Jun
- Subjects
High Energy Physics - Phenomenology - Abstract
We investigated the pionic transitions between the $X_2$ [spin-2 partner of the $X(3872)$] and $\chi_{c1,2}$ using a nonrelativistic effective field theory. The $X_2$ is assumed to be a bound state of the $D^{*}$ and $\bar{D}^*$ mesons and to decay through several kinds of loops, including the bubble, triangle and box loops. Within the present model, the widths for the single-pion decays $X_2\to\pi^0\chi_{cJ}$ are predicted to be about $3$--$30$ keV. For the dipion decays, the widths are a few keVs. These widths yield a branching fraction of $10^{-3}$--$10^{-2}$. The ratio $R_{\mathrm{c}0}=\Gamma (X_2\to\pi^+\pi^-\chi_{cJ})/\Gamma (X_2\to\pi^0\pi^0\chi_{cJ}) \simeq 1.6$, which is a bit smaller than the expected value of $2$, and $R_{21}=\Gamma (X_2\to\pi\pi\chi_{c2})/\Gamma (X_2\to\pi\pi\chi_{c1}) \simeq 0.85$. These ratios are nearly independent of the $X_2$ mass and the coupling constants, which might be a good quantity for the experiments. Moreover, the invariant mass spectra of the $\pi^0\chi_{cJ}$ final state for the dipion processes are presented, showing a cusp structure at the $D {\bar D}^*$ threshold enhanced and narrowed by the nearby triangle singularity., Comment: 12 pages, 9 figure, Comments welcome; Correct the last Paragraph of Introduction, accepted by PRD
- Published
- 2024
22. A Survey on Large Language Models for Code Generation
- Author
-
Jiang, Juyong, Wang, Fan, Shen, Jiasi, Kim, Sungju, and Kim, Sunghun
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Software Engineering - Abstract
Large Language Models (LLMs) have garnered remarkable advancements across diverse code-related tasks, known as Code LLMs, particularly in code generation that generates source code with LLM from natural language descriptions. This burgeoning field has captured significant interest from both academic researchers and industry professionals due to its practical significance in software development, e.g., GitHub Copilot. Despite the active exploration of LLMs for a variety of code tasks, either from the perspective of natural language processing (NLP) or software engineering (SE) or both, there is a noticeable absence of a comprehensive and up-to-date literature review dedicated to LLM for code generation. In this survey, we aim to bridge this gap by providing a systematic literature review that serves as a valuable reference for researchers investigating the cutting-edge progress in LLMs for code generation. We introduce a taxonomy to categorize and discuss the recent developments in LLMs for code generation, covering aspects such as data curation, latest advances, performance evaluation, ethical implications, environmental impact, and real-world applications. In addition, we present a historical overview of the evolution of LLMs for code generation and offer an empirical comparison using the HumanEval, MBPP, and BigCodeBench benchmarks across various levels of difficulty and types of programming tasks to highlight the progressive enhancements in LLM capabilities for code generation. We identify critical challenges and promising opportunities regarding the gap between academia and practical development. Furthermore, we have established a dedicated resource GitHub page (https://github.com/juyongjiang/CodeLLMSurvey) to continuously document and disseminate the most recent advances in the field.
- Published
- 2024
23. The Distant Sound of Book Boats: The Itinerant Book Trade in Jiangnan from the Sixteenth to the Nineteenth Centuries
- Author
-
Wang, Fan
- Published
- 2019
- Full Text
- View/download PDF
24. Inference-Time Alignment of Diffusion Models with Direct Noise Optimization
- Author
-
Tang, Zhiwei, Peng, Jiangweizhi, Tang, Jiasheng, Hong, Mingyi, Wang, Fan, and Chang, Tsung-Hui
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
In this work, we focus on the alignment problem of diffusion models with a continuous reward function, which represents specific objectives for downstream tasks, such as increasing darkness or improving the aesthetics of images. The central goal of the alignment problem is to adjust the distribution learned by diffusion models such that the generated samples maximize the target reward function. We propose a novel alignment approach, named Direct Noise Optimization (DNO), that optimizes the injected noise during the sampling process of diffusion models. By design, DNO operates at inference-time, and thus is tuning-free and prompt-agnostic, with the alignment occurring in an online fashion during generation. We rigorously study the theoretical properties of DNO and also propose variants to deal with non-differentiable reward functions. Furthermore, we identify that naive implementation of DNO occasionally suffers from the out-of-distribution reward hacking problem, where optimized samples have high rewards but are no longer in the support of the pretrained distribution. To remedy this issue, we leverage classical high-dimensional statistics theory to an effective probability regularization technique. We conduct extensive experiments on several important reward functions and demonstrate that the proposed DNO approach can achieve state-of-the-art reward scores within a reasonable time budget for generation.
- Published
- 2024
25. Benchmarking General-Purpose In-Context Learning
- Author
-
Wang, Fan, Lin, Chuan, Cao, Yang, and Kang, Yu
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
In-context learning (ICL) empowers generative models to address new tasks effectively and efficiently on the fly, without relying on any artificially crafted optimization techniques. In this paper, we study extending ICL to address a broader range of tasks with an extended learning horizon and higher improvement potential, namely General Purpose In-Context Learning (GPICL). To this end, we introduce two lightweight benchmarks specifically crafted to train and evaluate GPICL functionalities. Each benchmark encompasses a vast number of tasks characterized by significant task variance. These tasks are also crafted to promote long-horizon in-context learning through continuous generation and interaction, covering domains such as language modeling, decision-making, and world modeling. The benchmarks necessitate the models to leverage contexts and history interactions to enhance their capabilities, which we believe to be the key characteristics of GPICL. Our experiments indicate that the diversity of training tasks is positively correlated with the ability to generalize with ICL, but inversely correlated with zero-shot capabilities. Additionally, our findings indicate that the scale of parameters alone may not be crucial for ICL or GPICL, suggesting alternative approaches such as increasing the scale of contexts and memory states.
- Published
- 2024
26. Certified $\ell_2$ Attribution Robustness via Uniformly Smoothed Attributions
- Author
-
Wang, Fan and Kong, Adams Wai-Kin
- Subjects
Computer Science - Machine Learning - Abstract
Model attribution is a popular tool to explain the rationales behind model predictions. However, recent work suggests that the attributions are vulnerable to minute perturbations, which can be added to input samples to fool the attributions while maintaining the prediction outputs. Although empirical studies have shown positive performance via adversarial training, an effective certified defense method is eminently needed to understand the robustness of attributions. In this work, we propose to use uniform smoothing technique that augments the vanilla attributions by noises uniformly sampled from a certain space. It is proved that, for all perturbations within the attack region, the cosine similarity between uniformly smoothed attribution of perturbed sample and the unperturbed sample is guaranteed to be lower bounded. We also derive alternative formulations of the certification that is equivalent to the original one and provides the maximum size of perturbation or the minimum smoothing radius such that the attribution can not be perturbed. We evaluate the proposed method on three datasets and show that the proposed method can effectively protect the attributions from attacks, regardless of the architecture of networks, training schemes and the size of the datasets.
- Published
- 2024
27. Network shell structure based on hub and non-hub nodes
- Author
-
Dong, Gaogao, Sun, Nannan, Wang, Fan, and Lambiotte, Renaud
- Subjects
Physics - Physics and Society ,Mathematics - General Topology - Abstract
The shell structure holds significant importance in various domains such as information dissemination, supply chain management, and transportation. This study focuses on investigating the shell structure of hub and non-hub nodes, which play important roles in these domains. Our framework explores the topology of Erd\"{o}s-R\'{e}nyi (ER) and Scale-Free (SF) networks, considering source node selection strategies dependent on the nodes' degrees. We define the shell $l$ in a network as the set of nodes at a distance $l$ from a given node and represent $r_l$ as the fraction of nodes outside shell $l$. Statistical properties of the shells are examined for a selected node, taking into account the node's degree. For a network with a given degree distribution, we analytically derive the degree distribution and average degree of nodes outside shell $l$ as functions of $r_l$. Moreover, we discover that $r_l$ follows an iterative functional form $r_l = \phi(r_{l-1})$, where $\phi$ is expressed in terms of the generating function of the original degree distribution of the network.
- Published
- 2024
28. Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey
- Author
-
Conde, Marcos V., Vasluianu, Florin-Alexandru, Timofte, Radu, Zhang, Jianxing, Li, Jia, Wang, Fan, Li, Xiaopeng, Liu, Zikun, Park, Hyunhee, Song, Sejun, Kim, Changho, Huang, Zhijuan, Yu, Hongyuan, Wan, Cheng, Xiang, Wending, Lin, Jiamin, Zhong, Hang, Zhang, Qiaosong, Sun, Yue, Yin, Xuanwu, Zuo, Kunlong, Xu, Senyan, Jiang, Siyuan, Sun, Zhijing, Zhu, Jiaying, Li, Liangyan, Chen, Ke, Li, Yunzhe, Ning, Yimo, Zhao, Guanhua, Chen, Jun, Yu, Jinyang, Xu, Kele, Xu, Qisheng, and Dou, Yong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as noise and blur. In the challenge, a total of 230 participants registered, and 45 submitted results during thee challenge period. The performance of the top-5 submissions is reviewed and provided here as a gauge for the current state-of-the-art in RAW Image Super-Resolution., Comment: CVPR 2024 - NTIRE Workshop
- Published
- 2024
29. Deep and Dynamic Metabolic and Structural Imaging in Living Tissues
- Author
-
Liu, Kunzan, Cao, Honghao, Shashaty, Kasey, Yu, Li-Yu, Spitz, Sarah, Pramotton, Francesca Michela, Wan, Zhengpeng, Kan, Ellen L., Tevonian, Erin N., Levy, Manuel, Lendaro, Eva, Kamm, Roger D., Griffith, Linda G., Wang, Fan, Qiu, Tong, and You, Sixian
- Subjects
Physics - Optics ,Physics - Biological Physics - Abstract
Label-free imaging through two-photon autofluorescence (2PAF) of NAD(P)H allows for non-destructive and high-resolution visualization of cellular activities in living systems. However, its application to thick tissues and organoids has been restricted by its limited penetration depth within 300 $\mu$m, largely due to tissue scattering at the typical excitation wavelength (~750 nm) required for NAD(P)H. Here, we demonstrate that the imaging depth for NAD(P)H can be extended to over 700 $\mu$m in living engineered human multicellular microtissues by adopting multimode fiber (MMF)-based low-repetition-rate high-peak-power three-photon (3P) excitation of NAD(P)H at 1100 nm. This is achieved by having over 0.5 MW peak power at the band of 1100$\pm$25 nm through adaptively modulating multimodal nonlinear pulse propagation with a compact fiber shaper. Moreover, the 8-fold increase in pulse energy at 1100 nm enables faster imaging of monocyte behaviors in the living multicellular models. These results represent a significant advance for deep and dynamic metabolic and structural imaging of intact living biosystems. The modular design (MMF with a slip-on fiber shaper) is anticipated to allow wide adoption of this methodology for demanding in vivo and in vitro imaging applications, including cancer research, autoimmune diseases, and tissue engineering., Comment: 20 pages, 5 figures, under review in Science Advances
- Published
- 2024
30. A Fast Analytical Model for Predicting Battery Performance Under Mixed Kinetic Control
- Author
-
Wang, Hongxuan, Wang, Fan, and Tang, Ming
- Subjects
Condensed Matter - Materials Science ,Physics - Chemical Physics - Abstract
The prediction of battery rate performance traditionally relies on computation-intensive numerical simulations. While simplified analytical models have been developed to accelerate the calculation, they usually assume battery performance to be controlled by a single rate-limiting process, such as solid diffusion or electrolyte transport. Here, we propose an improved analytical model that could be applied to battery discharging under mixed control of mass transport in both solid and electrolyte phases. Compared to previous single-particle models extended to incorporate the electrolyte kinetics, our model is able to predict the effect of salt depletion on diminishing the discharge capacity, a phenomenon that becomes important in thick electrodes and/or at high rates. The model demonstrates good agreement with the full-order simulation over a wide range of cell parameters and offers a speedup of over 600 times at the same time. Furthermore, it could be combined with gradient-based optimization algorithms to very efficiently search for the optimal battery cell configurations while numerical simulation fails at the task due to its inability to accurately evaluate the derivatives of the objective function. The high efficiency and the analytical nature of the model render it a powerful tool for battery cell design and optimization.
- Published
- 2024
31. SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
- Author
-
Wu, Zijie, Yu, Chaohui, Jiang, Yanqin, Cao, Chenjie, Wang, Fan, and Bai, Xiang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent advances in 2D/3D generative models enable the generation of dynamic 3D objects from a single-view video. Existing approaches utilize score distillation sampling to form the dynamic scene as dynamic NeRF or dense 3D Gaussians. However, these methods struggle to strike a balance among reference view alignment, spatio-temporal consistency, and motion fidelity under single-view conditions due to the implicit nature of NeRF or the intricate dense Gaussian motion prediction. To address these issues, this paper proposes an efficient, sparse-controlled video-to-4D framework named SC4D, that decouples motion and appearance to achieve superior video-to-4D generation. Moreover, we introduce Adaptive Gaussian (AG) initialization and Gaussian Alignment (GA) loss to mitigate shape degeneration issue, ensuring the fidelity of the learned motion and shape. Comprehensive experimental results demonstrate that our method surpasses existing methods in both quality and efficiency. In addition, facilitated by the disentangled modeling of motion and appearance of SC4D, we devise a novel application that seamlessly transfers the learned motion onto a diverse array of 4D entities according to textual descriptions., Comment: Accepted by ECCV2024! Project Page: https://sc4d.github.io/ Code is available at: https://github.com/JarrentWu1031/SC4D
- Published
- 2024
32. Uncovering the Text Embedding in Text-to-Image Diffusion Models
- Author
-
Yu, Hu, Luo, Hao, Wang, Fan, and Zhao, Feng
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
The correspondence between input text and the generated image exhibits opacity, wherein minor textual modifications can induce substantial deviations in the generated image. While, text embedding, as the pivotal intermediary between text and images, remains relatively underexplored. In this paper, we address this research gap by delving into the text embedding space, unleashing its capacity for controllable image editing and explicable semantic direction attributes within a learning-free framework. Specifically, we identify two critical insights regarding the importance of per-word embedding and their contextual correlations within text embedding, providing instructive principles for learning-free image editing. Additionally, we find that text embedding inherently possesses diverse semantic potentials, and further reveal this property through the lens of singular value decomposition (SVD). These uncovered properties offer practical utility for image editing and semantic discovery. More importantly, we expect the in-depth analyses and findings of the text embedding can enhance the understanding of text-to-image diffusion models.
- Published
- 2024
33. Text Data-Centric Image Captioning with Interactive Prompts
- Author
-
Wang, Yiyu, Luo, Hao, Xu, Jungang, Sun, Yingfei, and Wang, Fan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Supervised image captioning approaches have made great progress, but it is challenging to collect high-quality human-annotated image-text data. Recently, large-scale vision and language models (e.g., CLIP) and large-scale generative language models (e.g., GPT-2) have shown strong performances in various tasks, which also provide some new solutions for image captioning with web paired data, unpaired data or even text-only data. Among them, the mainstream solution is to project image embeddings into the text embedding space with the assistance of consistent representations between image-text pairs from the CLIP model. However, the current methods still face several challenges in adapting to the diversity of data configurations in a unified solution, accurately estimating image-text embedding bias, and correcting unsatisfactory prediction results in the inference stage. This paper proposes a new Text data-centric approach with Interactive Prompts for image Captioning, named TIPCap. 1) We consider four different settings which gradually reduce the dependence on paired data. 2) We construct a mapping module driven by multivariate Gaussian distribution to mitigate the modality gap, which is applicable to the above four different settings. 3) We propose a prompt interaction module that can incorporate optional prompt information before generating captions. Extensive experiments show that our TIPCap outperforms other weakly or unsupervised image captioning methods and achieves a new state-of-the-art performance on two widely used datasets, i.e., MS-COCO and Flickr30K.
- Published
- 2024
34. XScale-NVS: Cross-Scale Novel View Synthesis with Hash Featurized Manifold
- Author
-
Wang, Guangyu, Zhang, Jinzhi, Wang, Fan, Huang, Ruqi, and Fang, Lu
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We propose XScale-NVS for high-fidelity cross-scale novel view synthesis of real-world large-scale scenes. Existing representations based on explicit surface suffer from discretization resolution or UV distortion, while implicit volumetric representations lack scalability for large scenes due to the dispersed weight distribution and surface ambiguity. In light of the above challenges, we introduce hash featurized manifold, a novel hash-based featurization coupled with a deferred neural rendering framework. This approach fully unlocks the expressivity of the representation by explicitly concentrating the hash entries on the 2D manifold, thus effectively representing highly detailed contents independent of the discretization resolution. We also introduce a novel dataset, namely GigaNVS, to benchmark cross-scale, high-resolution novel view synthesis of realworld large-scale scenes. Our method significantly outperforms competing baselines on various real-world scenes, yielding an average LPIPS that is 40% lower than prior state-of-the-art on the challenging GigaNVS benchmark. Please see our project page at: xscalenvs.github.io., Comment: Accepted to CVPR 2024. Project page: xscalenvs.github.io/
- Published
- 2024
35. Learning-based Multi-continuum Model for Multiscale Flow Problems
- Author
-
Wang, Fan, Wang, Yating, Leung, Wing Tat, and Xu, Zongben
- Subjects
Mathematics - Numerical Analysis ,Computer Science - Machine Learning - Abstract
Multiscale problems can usually be approximated through numerical homogenization by an equation with some effective parameters that can capture the macroscopic behavior of the original system on the coarse grid to speed up the simulation. However, this approach usually assumes scale separation and that the heterogeneity of the solution can be approximated by the solution average in each coarse block. For complex multiscale problems, the computed single effective properties/continuum might be inadequate. In this paper, we propose a novel learning-based multi-continuum model to enrich the homogenized equation and improve the accuracy of the single continuum model for multiscale problems with some given data. Without loss of generalization, we consider a two-continuum case. The first flow equation keeps the information of the original homogenized equation with an additional interaction term. The second continuum is newly introduced, and the effective permeability in the second flow equation is determined by a neural network. The interaction term between the two continua aligns with that used in the Dual-porosity model but with a learnable coefficient determined by another neural network. The new model with neural network terms is then optimized using trusted data. We discuss both direct back-propagation and the adjoint method for the PDE-constraint optimization problem. Our proposed learning-based multi-continuum model can resolve multiple interacted media within each coarse grid block and describe the mass transfer among them, and it has been demonstrated to significantly improve the simulation results through numerical experiments involving both linear and nonlinear flow equations., Comment: Corrected typos
- Published
- 2024
36. Multi-photon super-linear image scanning microscopy using upconversion nanoparticles
- Author
-
Wang, Yao, Liu, Baolei, Ding, Lei, Chen, Chaohao, Shan, Xuchen, Wang, Dajing, Tian, Menghan, Song, Jiaqi, Zheng, Ze, Xu, Xiaoxue, Zhong, Xiaolan, and Wang, Fan
- Subjects
Physics - Optics ,Physics - Applied Physics - Abstract
Super-resolution fluorescence microscopy is of great interest in life science studies for visualizing subcellular structures at the nanometer scale. Among various kinds of super-resolution approaches, image scanning microscopy (ISM) offers a doubled resolution enhancement in a simple and straightforward manner, based on the commonly used confocal microscopes. ISM is also suitable to be integrated with multi-photon microscopy techniques, such as two-photon excitation and second-harmonic generation imaging, for deep tissue imaging, but it remains the twofold limited resolution enhancement and requires expensive femtosecond lasers. Here, we present and experimentally demonstrate the super-linear ISM (SL-ISM) to push the resolution enhancement beyond the factor of two, with a single low-power, continuous-wave, and near-infrared laser, by harnessing the emission nonlinearity within the multiphoton excitation process of lanthanide-doped upconversion nanoparticles (UCNPs). Based on a modified confocal microscope, we achieve a resolution of about 120 nm, 1/8th of the excitation wavelength. Furthermore, we demonstrate a parallel detection strategy of SL-ISM with the multifocal structured excitation pattern, to speed up the acquisition frame rate. This method suggests a new perspective for super-resolution imaging or sensing, multi-photon imaging, and deep-tissue imaging with simple, low-cost, and straightforward implementations., Comment: 9 pages, 4 figures
- Published
- 2024
- Full Text
- View/download PDF
37. Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation
- Author
-
Zhao, Wangbo, Tang, Jiasheng, Han, Yizeng, Song, Yibing, Wang, Kai, Huang, Gao, Wang, Fan, and You, Yang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Existing parameter-efficient fine-tuning (PEFT) methods have achieved significant success on vision transformers (ViTs) adaptation by improving parameter efficiency. However, the exploration of enhancing inference efficiency during adaptation remains underexplored. This limits the broader application of pre-trained ViT models, especially when the model is computationally extensive. In this paper, we propose Dynamic Tuning (DyT), a novel approach to improve both parameter and inference efficiency for ViT adaptation. Specifically, besides using the lightweight adapter modules, we propose a token dispatcher to distinguish informative tokens from less important ones, allowing the latter to dynamically skip the original block, thereby reducing the redundant computation during inference. Additionally, we explore multiple design variants to find the best practice of DyT. Finally, inspired by the mixture-of-experts (MoE) mechanism, we introduce an enhanced adapter to further boost the adaptation performance. We validate DyT across various tasks, including image/video recognition and semantic segmentation. For instance, DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark., Comment: Accepted to NeurIPS2024
- Published
- 2024
38. Production of $X_b$ via radiative transition of $\Upsilon(10753)$
- Author
-
Liu, Shi-Dong, Cai, Hao-Dong, Cai, Zu-Xin, Gao, Hong-Shuo, Li, Gang, Wang, Fan, and Xie, Ju-Jun
- Subjects
High Energy Physics - Phenomenology - Abstract
We studied the radiative transitions between the $\Upsilon(10753)$, the $S$-$D$ mixed state of the $\Upsilon(4S)$ and $\Upsilon_1(3\,{}^3D_1)$, and the $X_b$, the heavy quark flavor symmetry counterpart of the $X(3782)$ in the bottomonium sector. The radiative transition was assumed to occur through the intermediate bottom mesons, including $P$-wave $B_1^{(\prime)}$ mesons as well as the $S$-wave $B^{(*)}$ ones. The consideration of the $B_1^{(\prime)}$ mesons leads to the couplings to be in $S$-wave, and hence enhances the contributions of the intermediate meson loops. The radiative decay width for the $\Upsilon(10753)\to\gamma X_b$ is predicted to be order of $10~\mathrm{keV}$, corresponding to a branching fraction of $10^{-4}$. Based on the theoretical results, we strongly suggest to search for the $X_b$ in the $e^+e^-\to\gamma X_b$ with $X_b\to\pi\pi\chi_{b1}$ near $\sqrt{s}=10.754~\mathrm{GeV}$, and it is hoped that the calculations here could be tested by the future Belle II experiments., Comment: 7 pages, 4 figures, accepted by PRD(20240510)
- Published
- 2024
39. Neural radiance fields-based holography [Invited]
- Author
-
Kang, Minsung, Wang, Fan, Kumano, Kai, Ito, Tomoyoshi, and Shimobaba, Tomoyoshi
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Graphics ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
This study presents a novel approach for generating holograms based on the neural radiance fields (NeRF) technique. Generating three-dimensional (3D) data is difficult in hologram computation. NeRF is a state-of-the-art technique for 3D light-field reconstruction from 2D images based on volume rendering. The NeRF can rapidly predict new-view images that do not include a training dataset. In this study, we constructed a rendering pipeline directly from a 3D light field generated from 2D images by NeRF for hologram generation using deep neural networks within a reasonable time. The pipeline comprises three main components: the NeRF, a depth predictor, and a hologram generator, all constructed using deep neural networks. The pipeline does not include any physical calculations. The predicted holograms of a 3D scene viewed from any direction were computed using the proposed pipeline. The simulation and experimental results are presented.
- Published
- 2024
40. Characteristics of nitrogen dynamics and impact of nitrate enrichment in the agri-food system of the Taihu Basin
- Author
-
Liang, Yihang, Yan, Mengfan, Wu, Jing, Wang, Fan, Guo, Jiayu, Cai, Zucong, and Wang, Yanhua
- Published
- 2024
- Full Text
- View/download PDF
41. Efficacy and Safety of Yangxue Qingnao Pills Combined with Amlodipine in Treatment of Hypertensive Patients with Blood Deficiency and Gan-Yang Hyperactivity: A Multicenter, Randomized Controlled Trial
- Author
-
Wang, Fan, Gao, Hai-qing, Lyu, Zhe, Wang, Xiao-ming, Han, Hui, Wang, Yong-xia, Lu, Feng, Dong, Bo, Pu, Jun, Liu, Feng, Zu, Xiu-guang, Liu, Hong-bin, Yang, Li, Zhang, Shao-ying, Yan, Yong-mei, Wang, Xiao-li, Chen, Jin-han, Liu, Min, Yang, Yun-mei, and Li, Xiao-ying
- Published
- 2024
- Full Text
- View/download PDF
42. Evaluation of the production of lignin-containing cellulose nanofibrils with high whiteness for blocking UV light
- Author
-
Deng, Wenjuan, Hu, Xianghua, Wang, Keyan, Wang, Fan, Zhai, Yingying, Liu, Tong, Yuan, Zhaoyang, and Wen, Yangbing
- Published
- 2024
- Full Text
- View/download PDF
43. Patterns of defender behaviors and outsider behaviors in school bullying among Chinese adolescents: based on latent profile analysis
- Author
-
Bao, Zhenzhou, Liu, Hua, and Wang, Fan
- Published
- 2024
- Full Text
- View/download PDF
44. Ultra-thin dual color rendering mechanism structural coloration film with freeze-resistant and self-cleaning properties
- Author
-
Sun, Xi-Di, Li, Hao, Yu, Hui-Wen, Guo, Xin, Wang, Fan-Yu, Zhang, Jia-Han, Wu, Jing, Shi, Yi, and Pan, Li-Jia
- Published
- 2024
- Full Text
- View/download PDF
45. Preparation and Characterization of Cyano-Silicon-Containing Arylacetylene Resins and Their Composites: Dual Enhancement Strategy Involving Physical Interfacial Interactions and Chemical Crosslinking
- Author
-
Jin, Chao-En, Zhu, Hua-Mei, Wang, Lei, Wang, Fan, Zhu, Ya-Ping, Deng, Shi-Feng, Qi, Hui-Min, and Du, Lei
- Published
- 2024
- Full Text
- View/download PDF
46. A potent dual inhibitor targeting COX-2 and HDAC of acute myeloid leukemia cells
- Author
-
Qin, Xiang, Wang, Xueting, Yang, Chunmei, Wang, Fan, Fang, Tingting, Gu, Didi, Guo, Qulian, Meng, Qiuyu, Liu, Wenjun, and Yang, Lu
- Published
- 2024
- Full Text
- View/download PDF
47. Nuclear imaging of PD-L1 expression promotes the synergistic antitumor efficacy of targeted radionuclide therapy and immune checkpoint blockade
- Author
-
Shi, Jiyun, Gao, Hannan, Wu, Yue, Luo, Chuangwei, Yang, Guangjie, Luo, Qi, Jia, Bing, Han, Chuanhui, Liu, Zhaofei, and Wang, Fan
- Published
- 2024
- Full Text
- View/download PDF
48. Leaching Behaviors of Germanium, Zinc and Iron from Germanium Distillation Residues by Water Leaching Followed by Alkaline Leaching
- Author
-
Wang, Fan, Wu, Shiyan, Jin, Bingjie, Sun, Baohua, and Zhang, Yuhui
- Published
- 2024
- Full Text
- View/download PDF
49. A DEM-based framework to optimize the gradation of concrete aggregate using fractal approach
- Author
-
Ma, Gang and Wang, Fan
- Published
- 2024
- Full Text
- View/download PDF
50. Application of bridging mesh repair in giant ventral incisional hernia
- Author
-
Cai, Xuan, Wang, Fan, Zhu, Yilin, Shen, Yingmo, Peng, Peng, Cui, Yan, Di, Zhishan, and Chen, Jie
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.