120,945 results on '"Wang, Yan"'
Search Results
2. On the uniqueness of solutions to quadratic BSDEs with non-convex generators and unbounded terminal conditions: the certain exponential moment case
- Author
-
Wang, Yan, Zhang, Yaqi, and Fan, Shengjun
- Subjects
Mathematics - Probability - Abstract
With the terminal value $|\xi|$ admitting some given exponential moment, we put forward and prove several existence and uniqueness results for the unbounded solutions of quadratic backward stochastic differential equations whose generators may be represented as a uniformly continuous (not necessarily locally Lipschitz continuous) perturbation of some convex (concave) function with quadratic growth. These results generalize those posed in \cite{Delbaen 2011} and \cite{Fan-Hu-Tang 2020} to some extent. The critical case is also tackled, which strengthens the main result of \cite{Delbaen 2015}., Comment: 22 pages
- Published
- 2024
3. PlainUSR: Chasing Faster ConvNet for Efficient Super-Resolution
- Author
-
Wang, Yan, Li, Yusen, Wang, Gang, and Liu, Xiaoguang
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Reducing latency is a roaring trend in recent super-resolution (SR) research. While recent progress exploits various convolutional blocks, attention modules, and backbones to unlock the full potentials of the convolutional neural network (ConvNet), achieving real-time performance remains a challenge. To this end, we present PlainUSR, a novel framework incorporating three pertinent modifications to expedite ConvNet for efficient SR. For the convolutional block, we squeeze the lighter but slower MobileNetv3 block into a heavier but faster vanilla convolution by reparameterization tricks to balance memory access and calculations. For the attention module, by modulating input with a regional importance map and gate, we introduce local importance-based attention to realize high-order information interaction within a 1-order attention latency. As to the backbone, we propose a plain U-Net that executes channel-wise discriminate splitting and concatenation. In the experimental phase, PlainUSR exhibits impressively low latency, great scalability, and competitive performance compared to both state-of-the-art latency-oriented and quality-oriented methods. In particular, compared to recent NGswin, the PlainUSR-L is 16.4x faster with competitive performance., Comment: Accepted by ACCV 2024. Under camera-ready revision
- Published
- 2024
4. RenderWorld: World Model with Self-Supervised 3D Label
- Author
-
Yan, Ziyang, Dong, Wenzhen, Shao, Yihua, Lu, Yuhang, Haiyang, Liu, Liu, Jingwen, Wang, Haozhe, Wang, Zhe, Wang, Yan, Remondino, Fabio, and Ma, Yuexin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
End-to-end autonomous driving with vision-only is not only more cost-effective compared to LiDAR-vision fusion but also more reliable than traditional methods. To achieve a economical and robust purely visual autonomous driving system, we propose RenderWorld, a vision-only end-to-end autonomous driving framework, which generates 3D occupancy labels using a self-supervised gaussian-based Img2Occ Module, then encodes the labels by AM-VAE, and uses world model for forecasting and planning. RenderWorld employs Gaussian Splatting to represent 3D scenes and render 2D images greatly improves segmentation accuracy and reduces GPU memory consumption compared with NeRF-based methods. By applying AM-VAE to encode air and non-air separately, RenderWorld achieves more fine-grained scene element representation, leading to state-of-the-art performance in both 4D occupancy forecasting and motion planning from autoregressive world model.
- Published
- 2024
5. KVPruner: Structural Pruning for Faster and Memory-Efficient Large Language Models
- Author
-
Lv, Bo, Zhou, Quan, Ding, Xuanang, Wang, Yan, and Ma, Zeming
- Subjects
Computer Science - Computation and Language - Abstract
The bottleneck associated with the key-value(KV) cache presents a significant challenge during the inference processes of large language models. While depth pruning accelerates inference, it requires extensive recovery training, which can take up to two weeks. On the other hand, width pruning retains much of the performance but offers slight speed gains. To tackle these challenges, we propose KVPruner to improve model efficiency while maintaining performance. Our method uses global perplexity-based analysis to determine the importance ratio for each block and provides multiple strategies to prune non-essential KV channels within blocks. Compared to the original model, KVPruner reduces runtime memory usage by 50% and boosts throughput by over 35%. Additionally, our method requires only two hours of LoRA fine-tuning on small datasets to recover most of the performance.
- Published
- 2024
6. Magnetic topological Weyl fermions in half-metallic In$_2$CoSe$_4$
- Author
-
Bai, Xiaosong, Wang, Yan, Yang, Wenwen, Xu, Qiunan, and Liu, Wenjian
- Subjects
Condensed Matter - Materials Science ,Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
Magnetic Weyl semimetals (WSM) have recently attracted much attention due to their potential in realizing strong anomalous Hall effects. Yet, how to design such systems remains unclear. Based on first-principles calculations, we show here that the ferromagnetic half-metallic compound In$_2$CoSe$_4$ has several pairs of Weyl points and is hence a good candidate for magnetic WSM. These Weyl points would approach the Fermi level gradually as the Hubbard $U$ increases, and finally disappear after a critical value $U_c$. The range of the Hubbard $U$ that can realize the magnetic WSM state can be expanded by pressure, manifesting the practical utility of the present prediction. Moreover, by generating two surface terminations at Co or In atom after cleaving the compound at the Co-Se bonds, the nontrivial Fermi arcs connecting one pair of Weyl points with opposite chirality are discovered in surface states. Furthermore, it is possible to observe the nontrivial surface state experimentally, e.g., angle-resolved photoemission spectroscopy (ARPES) measurements. As such, the present findings imply strongly a new magnetic WSM which may host a large anomalous Hall conductivity.
- Published
- 2024
7. ProteinBench: A Holistic Evaluation of Protein Foundation Models
- Author
-
Ye, Fei, Zheng, Zaixiang, Xue, Dongyu, Shen, Yuning, Wang, Lihao, Ma, Yiming, Wang, Yan, Wang, Xinyou, Zhou, Xiangxin, and Gu, Quanquan
- Subjects
Quantitative Biology - Quantitative Methods ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Quantitative Biology - Biomolecules - Abstract
Recent years have witnessed a surge in the development of protein foundation models, significantly improving performance in protein prediction and generative tasks ranging from 3D structure prediction and protein design to conformational dynamics. However, the capabilities and limitations associated with these models remain poorly understood due to the absence of a unified evaluation framework. To fill this gap, we introduce ProteinBench, a holistic evaluation framework designed to enhance the transparency of protein foundation models. Our approach consists of three key components: (i) A taxonomic classification of tasks that broadly encompass the main challenges in the protein domain, based on the relationships between different protein modalities; (ii) A multi-metric evaluation approach that assesses performance across four key dimensions: quality, novelty, diversity, and robustness; and (iii) In-depth analyses from various user objectives, providing a holistic view of model performance. Our comprehensive evaluation of protein foundation models reveals several key findings that shed light on their current capabilities and limitations. To promote transparency and facilitate further research, we release the evaluation dataset, code, and a public leaderboard publicly for further analysis and a general modular toolkit. We intend for ProteinBench to be a living benchmark for establishing a standardized, in-depth evaluation framework for protein foundation models, driving their development and application while fostering collaboration within the field., Comment: 29 pages, 1 figure and 11 tables
- Published
- 2024
8. Nonreciprocal tripartite entanglement and asymmetric Einstein-Podolsky-Rosen steering via directional quantum squeezing
- Author
-
Jiao, Ya-Feng, Wang, Jie, Wang, Dong-Yang, Tang, Lei, Wang, Yan, Zuo, Yun-Lan, Bao, Wan-Su, Kuang, Le-Man, and Jing, Hui
- Subjects
Quantum Physics - Abstract
The generation and manipulation of multipartite entanglement and EPR steering in macroscopic systems not only play a fundamental role in exploring the nature of quantum mechanics, but are also at the core of current developments of various nascent quantum technologies. Here we report a theoretical method using directional injection of quantum squeezing to produce nonreciprocal multipartite entanglement and EPR steering in a three-mode optomechanical system with closed-loop coupling. We show that by directionally applying a two-photon parametric driving field with a phase-matched squeezed vacuum reservoir to an optomechanical resonator, a squeezed optical mode can be introduced for one of its input directions, thereby yielding an asymmetric enhancement of optomechanical interaction and the time-reversal symmetry breaking of the system. Based on this feature, it is found that bipartite and tripartite entanglement and the associated EPR steering of the subsystems can only be generated when the coherent driving field input from the squeezing injection direction, namely, achieving nonreciprocity in such quantum correlations. More excitingly, it is also found that by properly adjusting the squeezing parameter, the overall asymmetry of EPR steering can be stepwise driven from no-way regime, one-way regime to two-way regime. These findings, holding promise for preparing rich types of entangled quantum resources with nonreciprocal correlations, may have potential applications in the area of quantum information processing such as quantum secure direct communication and one-way quantum computing., Comment: 15 pages, 3 figures
- Published
- 2024
9. Hidden charm ${\cal P}_{cs}(4338)^0$ production in baryonic $B^-\to J/\psi \Lambda\bar p$ decay
- Author
-
Hsiao, Yu-Kuo, Cai, Shu-Ting, and Wang, Yan-Li
- Subjects
High Energy Physics - Phenomenology ,High Energy Physics - Experiment - Abstract
We investigate the resonant baryonic $B$ decay $B^-\to {\cal P}_{cs}^0\bar p,{\cal P}_{cs}^0\to J/\psi \Lambda$, where ${\cal P}_{cs}^0\equiv {\cal P}_{cs}(4338)^0$ is identified as a hidden charm pentaquark candidate with strangeness. By interpreting ${\cal P}_{cs}^0$ as the $\Xi_c\bar D$ molecule that strongly decays into $J/\psi\Lambda$ and $\eta_c\Lambda$, we discover a dominant triangle rescattering effect for $B^-\to {\cal P}_{cs}^0\bar p$, initiated by $B^-\to J/\psi K^-$. Through the exchange of a $\bar \Lambda$ anti-baryon, $J/\psi$ and $K^-$ undergo rescattering, transforming into ${\cal P}_{cs}^0$ and $\bar p$, respectively. Based on this rescattering mechanism, we calculate ${\cal B}(B^-\to {\cal P}_{cs}^0\bar p,{\cal P}_{cs}^0\to J/\psi \Lambda) =(1.7^{+1.2}_{-0.8})\times10^{-6}$, which is consistent with the measured data., Comment: 11 pages, 1 figure, 2 tables
- Published
- 2024
10. FS-MedSAM2: Exploring the Potential of SAM2 for Few-Shot Medical Image Segmentation without Fine-tuning
- Author
-
Bai, Yunhao, Yu, Qinji, Yun, Boxiang, Jin, Dakai, Xia, Yingda, and Wang, Yan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The Segment Anything Model 2 (SAM2) has recently demonstrated exceptional performance in zero-shot prompt segmentation for natural images and videos. However, it faces significant challenges when applied to medical images. Since its release, many attempts have been made to adapt SAM2's segmentation capabilities to the medical imaging domain. These efforts typically involve using a substantial amount of labeled data to fine-tune the model's weights. In this paper, we explore SAM2 from a different perspective via making the full use of its trained memory attention module and its ability of processing mask prompts. We introduce FS-MedSAM2, a simple yet effective framework that enables SAM2 to achieve superior medical image segmentation in a few-shot setting, without the need for fine-tuning. Our framework outperforms the current state-of-the-arts on two publicly available medical image datasets. The code is available at https://github.com/DeepMed-Lab-ECNU/FS_MedSAM2., Comment: 13 pages, 4 figures
- Published
- 2024
11. Dynamics of Small Solid Particles on Substrates of Arbitrary Topography
- Author
-
Zhao, Quan, Jiang, Wei, Wang, Yan, Srolovitz, David J., Qian, Tiezheng, and Bao, Weizhu
- Subjects
Condensed Matter - Materials Science - Abstract
We study the dynamics of a small solid particle arising from the dewetting of a thin film on a curved substrate driven by capillarity, where mass transport is controlled by surface diffusion. We consider the case when the size of the deposited particle is much smaller than the local radius of curvature of the substrate surface. The application of the Onsager variational principle leads to a reduced-order model for the dynamic behaviour of particles on arbitrarily curved substrates. We demonstrate that particles move toward region of the substrate surface with lower mean curvature with a determined velocity. In particular, the velocity is proportional to the substrate curvature gradient and inversely proportional to the size of the particle, with a coefficient that depends on material properties that include the surface energy, surface diffusivity, density, and Young's (wetting) angle. The reduced model is validated by comparing with numerical results for the full, sharp-interface model in both two and three dimensions., Comment: 12 pages, 8 figures
- Published
- 2024
12. Electrical contacts for high performance optoelectronic devices of BaZrS3 single crystals
- Author
-
Chen, Huandong, Singh, Shantanu, Surendran, Mythili, Zhao, Boyang, Wang, Yan-Ting, and Ravichandran, Jayakanth
- Subjects
Condensed Matter - Materials Science - Abstract
Chalcogenide perovskites such as BaZrS3 are promising candidates for next generation optoelectronics such as photodetectors and solar cells. Compared to widely studied polycrystalline thin films, single crystals of BaZrS3 with minimal extended and point defects, are ideal platform to study the material's intrinsic transport properties and to make first-generation optoelectronic devices. However, the surface dielectrics formed on BaZrS3 single crystals due to sulfating or oxidation have led to significant challenges to achieving high quality electrical contacts, and hence, realizing the high-performance optoelectronic devices. Here, we report the development of electrical contact fabrication processes on BaZrS3 single crystals, where various processes were employed to address the surface dielectric issue. Moreover, with optimized electrical contacts fabricated through dry etching, high-performance BaZrS3 photoconductive devices with a low dark current of 0.1 nA at 10 V bias and a fast transient photoresponse with rise and decay time of <0.2 s were demonstrated. Our study emphasizes the importance of damage-free fabrication processes in making high-quality optoelectronic devices of BaZrS3 single crystals and sheds light on exploring its intrinsic transport properties.
- Published
- 2024
13. LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models
- Author
-
Hu, Zhiyuan, Liu, Yuliang, Zhao, Jinman, Wang, Suyuchen, Wang, Yan, Shen, Wei, Gu, Qing, Luu, Anh Tuan, Ng, See-Kiong, Jiang, Zhiwei, and Hooi, Bryan
- Subjects
Computer Science - Computation and Language - Abstract
Large language models (LLMs) face significant challenges in handling long-context tasks because of their limited effective context window size during pretraining, which restricts their ability to generalize over extended sequences. Meanwhile, extending the context window in LLMs through post-pretraining is highly resource-intensive. To address this, we introduce LongRecipe, an efficient training strategy for extending the context window of LLMs, including impactful token analysis, position index transformation, and training optimization strategies. It simulates long-sequence inputs while maintaining training efficiency and significantly improves the model's understanding of long-range dependencies. Experiments on three types of LLMs show that LongRecipe can utilize long sequences while requiring only 30% of the target context window size, and reduces computational training resource over 85% compared to full sequence training. Furthermore, LongRecipe also preserves the original LLM's capabilities in general tasks. Ultimately, we can extend the effective context window of open-source LLMs from 8k to 128k, achieving performance close to GPT-4 with just one day of dedicated training using a single GPU with 80G memory. Our code is released at https://github.com/zhiyuanhubj/LongRecipe., Comment: Work in Progress
- Published
- 2024
14. Spectral heat flux redistribution upon interfacial transmission
- Author
-
Cui, Haoran, Maranets, Theodore, Ma, Tengfei, and Wang, Yan
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
In nonmetallic crystals, heat is transported by phonons of different frequencies, each contributing differently to the overall heat flux spectrum. In this study, we demonstrate a significant redistribution of heat flux among phonon frequencies when phonons transmit across the interface between dissimilar solids. This redistribution arises from the natural tendency of phononic heat to re-establish the equilibrium distribution characteristic of the material through which it propagates. Remarkably, while the heat flux spectra of dissimilar solids are typically distinct in their bulk forms, they can become nearly identical in superlattices or sandwich structures where the layer thicknesses are smaller than the phonon mean free paths. This phenomenon reflects that the redistribution of heat among phonon frequencies to the equilibrium distribution does not occur instantaneously at the interface, rather it develops over some time and distance.
- Published
- 2024
15. BTMuda: A Bi-level Multi-source unsupervised domain adaptation framework for breast cancer diagnosis
- Author
-
Yang, Yuxiang, Zeng, Xinyi, Zeng, Pinxian, Yan, Binyu, Wu, Xi, Zhou, Jiliu, and Wang, Yan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Deep learning has revolutionized the early detection of breast cancer, resulting in a significant decrease in mortality rates. However, difficulties in obtaining annotations and huge variations in distribution between training sets and real scenes have limited their clinical applications. To address these limitations, unsupervised domain adaptation (UDA) methods have been used to transfer knowledge from one labeled source domain to the unlabeled target domain, yet these approaches suffer from severe domain shift issues and often ignore the potential benefits of leveraging multiple relevant sources in practical applications. To address these limitations, in this work, we construct a Three-Branch Mixed extractor and propose a Bi-level Multi-source unsupervised domain adaptation method called BTMuda for breast cancer diagnosis. Our method addresses the problems of domain shift by dividing domain shift issues into two levels: intra-domain and inter-domain. To reduce the intra-domain shift, we jointly train a CNN and a Transformer as two paths of a domain mixed feature extractor to obtain robust representations rich in both low-level local and high-level global information. As for the inter-domain shift, we redesign the Transformer delicately to a three-branch architecture with cross-attention and distillation, which learns domain-invariant representations from multiple domains. Besides, we introduce two alignment modules - one for feature alignment and one for classifier alignment - to improve the alignment process. Extensive experiments conducted on three public mammographic datasets demonstrate that our BTMuda outperforms state-of-the-art methods.
- Published
- 2024
16. Significantly Enhanced Interfacial Thermal Transport between Single-layer Graphene and Water Through Basal-plane Oxidation
- Author
-
Cui, Haoran, Panneerselvam, Iyyappa Rajan, Chakraborty, Pranay, Nian, Qiong, Liao, Yiliang, and Wang, Yan
- Subjects
Condensed Matter - Materials Science ,Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
Heat transfer between graphene and water is pivotal for various applications, including solarthermal vapor generation and the advanced manufacturing of graphene-based hierarchical structures in solution. In this study, we employ a deep-neural network potential derived from ab initio molecular dynamics to conduct extensive simulations of single-layer graphenewater systems with different levels of oxidation (carbon/oxygen ratio) of the graphene layer. Remarkably, our findings reveal a one-order-of-magnitude enhancement in heat transfer upon oxidizing graphene with hydroxyl or epoxide groups at the graphene surface, underscoring the significant tunability of heat transfer within this system. Given the same oxidation ratio, more dispersed locations of functional groups on graphene surface leads to faster heat dissipation to water., Comment: 21 pages, 8 figures
- Published
- 2024
17. CogVLM2: Visual Language Models for Image and Video Understanding
- Author
-
Hong, Wenyi, Wang, Weihan, Ding, Ming, Yu, Wenmeng, Lv, Qingsong, Wang, Yan, Cheng, Yean, Huang, Shiyu, Ji, Junhui, Xue, Zhao, Zhao, Lei, Yang, Zhuoyi, Gu, Xiaotao, Zhang, Xiaohan, Feng, Guanyu, Yin, Da, Wang, Zihan, Qi, Ji, Song, Xixuan, Zhang, Peng, Liu, Debing, Xu, Bin, Li, Juanzi, Dong, Yuxiao, and Tang, Jie
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications. Here we propose the CogVLM2 family, a new generation of visual language models for image and video understanding including CogVLM2, CogVLM2-Video and GLM-4V. As an image understanding model, CogVLM2 inherits the visual expert architecture with improved training recipes in both pre-training and post-training stages, supporting input resolution up to $1344 \times 1344$ pixels. As a video understanding model, CogVLM2-Video integrates multi-frame input with timestamps and proposes automated temporal grounding data construction. Notably, CogVLM2 family has achieved state-of-the-art results on benchmarks like MMBench, MM-Vet, TextVQA, MVBench and VCGBench. All models are open-sourced in https://github.com/THUDM/CogVLM2 and https://github.com/THUDM/GLM-4, contributing to the advancement of the field.
- Published
- 2024
18. A Survey on Facial Expression Recognition of Static and Dynamic Emotions
- Author
-
Wang, Yan, Yan, Shaoqi, Liu, Yang, Song, Wei, Liu, Jing, Chang, Yang, Mai, Xinji, Hu, Xiping, Zhang, Wenqiang, and Gan, Zhongxue
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Facial expression recognition (FER) aims to analyze emotional states from static images and dynamic sequences, which is pivotal in enhancing anthropomorphic communication among humans, robots, and digital avatars by leveraging AI technologies. As the FER field evolves from controlled laboratory environments to more complex in-the-wild scenarios, advanced methods have been rapidly developed and new challenges and apporaches are encounted, which are not well addressed in existing reviews of FER. This paper offers a comprehensive survey of both image-based static FER (SFER) and video-based dynamic FER (DFER) methods, analyzing from model-oriented development to challenge-focused categorization. We begin with a critical comparison of recent reviews, an introduction to common datasets and evaluation criteria, and an in-depth workflow on FER to establish a robust research foundation. We then systematically review representative approaches addressing eight main challenges in SFER (such as expression disturbance, uncertainties, compound emotions, and cross-domain inconsistency) as well as seven main challenges in DFER (such as key frame sampling, expression intensity variations, and cross-modal alignment). Additionally, we analyze recent advancements, benchmark performances, major applications, and ethical considerations. Finally, we propose five promising future directions and development trends to guide ongoing research. The project page for this paper can be found at https://github.com/wangyanckxx/SurveyFER.
- Published
- 2024
19. Alleviating Class Imbalance in Semi-supervised Multi-organ Segmentation via Balanced Subclass Regularization
- Author
-
Feng, Zhenghao, Wen, Lu, Yan, Binyu, Cui, Jiaqi, and Wang, Yan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Semi-supervised learning (SSL) has shown notable potential in relieving the heavy demand of dense prediction tasks on large-scale well-annotated datasets, especially for the challenging multi-organ segmentation (MoS). However, the prevailing class-imbalance problem in MoS, caused by the substantial variations in organ size, exacerbates the learning difficulty of the SSL network. To alleviate this issue, we present a two-phase semi-supervised network (BSR-Net) with balanced subclass regularization for MoS. Concretely, in Phase I, we introduce a class-balanced subclass generation strategy based on balanced clustering to effectively generate multiple balanced subclasses from original biased ones according to their pixel proportions. Then, in Phase II, we design an auxiliary subclass segmentation (SCS) task within the multi-task framework of the main MoS task. The SCS task contributes a balanced subclass regularization to the main MoS task and transfers unbiased knowledge to the MoS network, thus alleviating the influence of the class-imbalance problem. Extensive experiments conducted on two publicly available datasets, i.e., the MICCAI FLARE 2022 dataset and the WORD dataset, verify the superior performance of our method compared with other methods.
- Published
- 2024
20. ARANet: Attention-based Residual Adversarial Network with Deep Supervision for Radiotherapy Dose Prediction of Cervical Cancer
- Author
-
Wen, Lu, Yin, Wenxia, Feng, Zhenghao, Wu, Xi, Xiong, Deng, and Wang, Yan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Radiation therapy is the mainstay treatment for cervical cancer, and its ultimate goal is to ensure the planning target volume (PTV) reaches the prescribed dose while reducing dose deposition of organs-at-risk (OARs) as much as possible. To achieve these clinical requirements, the medical physicist needs to manually tweak the radiotherapy plan repeatedly in a trial-anderror manner until finding the optimal one in the clinic. However, such trial-and-error processes are quite time-consuming, and the quality of plans highly depends on the experience of the medical physicist. In this paper, we propose an end-to-end Attentionbased Residual Adversarial Network with deep supervision, namely ARANet, to automatically predict the 3D dose distribution of cervical cancer. Specifically, given the computer tomography (CT) images and their corresponding segmentation masks of PTV and OARs, ARANet employs a prediction network to generate the dose maps. We also utilize a multi-scale residual attention module and deep supervision mechanism to enforce the prediction network to extract more valuable dose features while suppressing irrelevant information. Our proposed method is validated on an in-house dataset including 54 cervical cancer patients, and experimental results have demonstrated its obvious superiority compared to other state-of-the-art methods., Comment: Accepted by 2024 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, Automation and Mechatronics (RAM)
- Published
- 2024
21. Partition subcubic planar graphs into independent sets
- Author
-
Liu, Xujun and Wang, Yan
- Subjects
Mathematics - Combinatorics - Abstract
A packing $(1^{\ell}, 2^k)$-coloring of a graph $G$ is a partition of $V(G)$ into $\ell$ independent sets and $k$ $2$-independent sets (whose pairwise vertex distance is at least $3$). The famous Four Color Theorem, proved by Appel and Haken as well as Robertson et al., shows that every planar graph is $4$-colorable, i.e., every planar graph is packing $(1^4)$-colorable. The square coloring of planar graphs was first studied by Wegner in 1977. Thomassen, and independently Hartke et al. proved one can always square color a cubic planar graph with $7$ colors, i.e., every subcubic planar graph is packing $(2^7)$-colorable. We focus on packing $(1^{\ell}, 2^k)$-colorings, which lie between proper coloring and square coloring. Gastineau and Togni proved every subcubic graph is packing $(1,2^6)$-colorable. Furthermore, they asked whether every subcubic graph except the Petersen graph is packing $(1,2^5)$-colorable. In this paper, we prove that every subcubic planar graph is packing $(1,2^5)$-colorable, extending the result of Thomassen and Hartke et al.. This also answers the question of Gastineau and Togni affirmatively for subcubic planar graphs. Moreover, we prove that there exists an infinite family of subcubic planar graphs that are not packing $(1,2^4)$-colorable, which shows that our result is the best possible. Besides, our result is also sharp in the sense that the disjoint union of Petersen graphs is subcubic and non-planar, but not packing $(1,2^5)$-colorable., Comment: 34 pages, 2 figures
- Published
- 2024
22. Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype
- Author
-
Lu, Yadong, Zhao, Shitian, Yun, Boxiang, Jiang, Dongsheng, Li, Yin, Li, Qingli, and Wang, Yan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Despite recent progress in enhancing the efficacy of Open-Domain Continual Learning (ODCL) in Vision-Language Models (VLM), failing to (1) correctly identify the Task-ID of a test image and (2) use only the category set corresponding to the Task-ID, while preserving the knowledge related to each domain, cannot address the two primary challenges of ODCL: forgetting old knowledge and maintaining zero-shot capabilities, as well as the confusions caused by category-relatedness between domains. In this paper, we propose a simple yet effective solution: leveraging intra-domain category-aware prototypes for ODCL in CLIP (DPeCLIP), where the prototype is the key to bridging the above two processes. Concretely, we propose a training-free Task-ID discriminator method, by utilizing prototypes as classifiers for identifying Task-IDs. Furthermore, to maintain the knowledge corresponding to each domain, we incorporate intra-domain category-aware prototypes as domain prior prompts into the training process. Extensive experiments conducted on 11 different datasets demonstrate the effectiveness of our approach, achieving 2.37% and 1.14% average improvement in class-incremental and task-incremental settings, respectively.
- Published
- 2024
23. Could Micro-Expressions be Quantified? Electromyography Gives Affirmative Evidence
- Author
-
Li, Jingting, Lu, Shaoyuan, Wang, Yan, Dong, Zizhao, Wang, Su-Jing, and Fu, Xiaolan
- Subjects
Computer Science - Human-Computer Interaction - Abstract
Micro-expressions (MEs) are brief, subtle facial expressions that reveal concealed emotions, offering key behavioral cues for social interaction. Characterized by short duration, low intensity, and spontaneity, MEs have been mostly studied through subjective coding, lacking objective, quantitative indicators. This paper explores ME characteristics using facial electromyography (EMG), analyzing data from 147 macro-expressions (MaEs) and 233 MEs collected from 35 participants. First, regarding external characteristics, we demonstrate that MEs are short in duration and low in intensity. Precisely, we proposed an EMG-based indicator, the percentage of maximum voluntary contraction (MVC\%), to measure ME intensity. Moreover, we provided precise interval estimations of ME intensity and duration, with MVC\% ranging from 7\% to 9.2\% and the duration ranging from 307 ms to 327 ms. This research facilitates fine-grained ME quantification. Second, regarding the internal characteristics, we confirm that MEs are less controllable and consciously recognized compared to MaEs, as shown by participants responses and self-reports. This study provides a theoretical basis for research on ME mechanisms and real-life applications. Third, building on our previous work, we present CASMEMG, the first public ME database including EMG signals, providing a robust foundation for studying micro-expression mechanisms and movement dynamics through physiological signals.
- Published
- 2024
24. Rainbow perfect matchings in 3-partite 3-uniform hypergraphs
- Author
-
Lu, Hongliang and Wang, Yan
- Subjects
Mathematics - Combinatorics - Abstract
Let $m,n,r,s$ be nonnegative integers such that $n\ge m=3r+s$ and $1\leq s\leq 3$. Let \[\delta(n,r,s)=\left\{\begin{array}{ll} n^2-(n-r)^2 &\text{if}\ s=1 , \\[5pt] n^2-(n-r+1)(n-r-1) &\text{if}\ s=2,\\[5pt] n^2 - (n-r)(n-r-1) &\text{if}\ s=3. \end{array}\right.\] We show that there exists a constant $n_0 > 0$ such that if $F_1,\ldots, F_n$ are 3-partite 3-graphs with $n\ge n_0$ vertices in each partition class and minimum vertex degree of $F_i$ is at least $\delta(n,r,s)+1$ for $i \in [n]$ then $\{F_1,\ldots,F_n\}$ admits a rainbow perfect matching. This generalizes a result of Lo and Markstr\"om on the vertex degree threshold for the existence of perfect matchings in 3-partite 3-graphs. In this proof, we use a fractional rainbow matching theory obtained by Aharoni et al. to find edge-disjoint fractional perfect matching.
- Published
- 2024
25. Evolving Virtual World with Delta-Engine
- Author
-
Wu, Hongqiu, Xu, Zekai, Xu, Tianyang, Wei, Shize, Wang, Yan, Hong, Jiale, Wu, Weiqi, Zhao, Hai, Zhang, Min, and He, Zhezhi
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Human-Computer Interaction - Abstract
In this paper, we focus on the \emph{virtual world}, a cyberspace where people can live in. An ideal virtual world shares great similarity with our real world. One of the crucial aspects is its evolving nature, reflected by individuals' capability to grow and thereby influence the objective world. Such dynamics is unpredictable and beyond the reach of existing systems. For this, we propose a special engine called \textbf{\emph{Delta-Engine}} to drive this virtual world. $\Delta$ associates the world's evolution to the engine's scalability. It consists of a base engine and a neural proxy. The base engine programs the prototype of the virtual world; given a trigger, the neural proxy generates new snippets on the base engine through \emph{incremental prediction}. This paper presents a full-stack introduction to the delta-engine. The key feature of the delta-engine is its scalability to unknown elements within the world, Technically, it derives from the prefect co-work of the neural proxy and the base engine, and the alignment with high-quality data. We introduce an engine-oriented fine-tuning method that embeds the base engine into the proxy. We then discuss the human-LLM collaborative design to produce novel and interesting data efficiently. Eventually, we propose three evaluation principles to comprehensively assess the performance of a delta engine: naive evaluation, incremental evaluation, and adversarial evaluation.
- Published
- 2024
26. MR-ULINS: A Tightly-Coupled UWB-LiDAR-Inertial Estimator with Multi-Epoch Outlier Rejection
- Author
-
Zhang, Tisheng, Yuan, Man, Wei, Linfu, Wang, Yan, Tang, Hailiang, and Niu, Xiaoji
- Subjects
Computer Science - Robotics ,Electrical Engineering and Systems Science - Signal Processing - Abstract
The LiDAR-inertial odometry (LIO) and the ultra-wideband (UWB) have been integrated together to achieve driftless positioning in global navigation satellite system (GNSS)-denied environments. However, the UWB may be affected by systematic range errors (such as the clock drift and the antenna phase center offset) and non-line-of-sight (NLOS) signals, resulting in reduced robustness. In this study, we propose a UWB-LiDAR-inertial estimator (MR-ULINS) that tightly integrates the UWB range, LiDAR frame-to-frame, and IMU measurements within the multi-state constraint Kalman filter (MSCKF) framework. The systematic range errors are precisely modeled to be estimated and compensated online. Besides, we propose a multi-epoch outlier rejection algorithm for UWB NLOS by utilizing the relative accuracy of the LIO. Specifically, the relative trajectory of the LIO is employed to verify the consistency of all range measurements within the sliding window. Extensive experiment results demonstrate that MR-ULINS achieves a positioning accuracy of around 0.1 m in complex indoor environments with severe NLOS interference. Ablation experiments show that the online estimation and multi-epoch outlier rejection can effectively improve the positioning accuracy. Besides, MR-ULINS maintains high accuracy and robustness in LiDAR-degenerated scenes and UWB-challenging conditions with spare base stations., Comment: 8 pages, 9 figures
- Published
- 2024
27. Self-consistent theory of $2\times2$ pair density waves in kagome superconductors
- Author
-
Yao, Meng, Wang, Yan, Wang, Da, Yin, Jia-Xin, and Wang, Qiang-Hua
- Subjects
Condensed Matter - Superconductivity - Abstract
Pair density wave (PDW) is an intriguing quantum matter proposed in the frontier of condensed matter physics. However, the existence of PDW in microscopic models has been rare. In this work, we obtain, by Ginzburg-Landau arguments and self-consistent mean field theory, novel $2a_0\times2a_0$ PDW on the kagome lattice arising from attractive on-bond pairing interactions and the distinct Bloch wave functions near the p-type van Hove singularity. The PDW state carrying three independent wave-vectors, the so-called 3Q PDW, is nodeless and falls into two topological classes characterized by the Chern number $C = 0$ or $C = \pm2$. The chiral ($C=\pm2$) PDW state presents a rare case of interaction driven topological quantum state without the requirement of spin-orbit coupling. Finally, we analyze the stabilities and properties of these PDWs intertwining with charge orders, and discuss the relevance of our minimal model to recent experimental observations in kagome superconductors. Our theory not only elucidates the driving force of the chiral PDW, but also predicts strongly anisotropic superconducting gap structure in the momentum space and quantized transverse thermal conductivity that can be tested in future experiments., Comment: 7 pages, 4 figures
- Published
- 2024
28. PromptSAM+: Malware Detection based on Prompt Segment Anything Model
- Author
-
Wei, Xingyuan, Liu, Yichen, Li, Ce, Li, Ning, Sun, Degang, and Wang, Yan
- Subjects
Computer Science - Cryptography and Security ,F.2.2, I.2.7 ,F.2.2 ,I.2.7 - Abstract
Machine learning and deep learning (ML/DL) have been extensively applied in malware detection, and some existing methods demonstrate robust performance. However, several issues persist in the field of malware detection: (1) Existing work often overemphasizes accuracy at the expense of practicality, rarely considering false positive and false negative rates as important metrics. (2) Considering the evolution of malware, the performance of classifiers significantly declines over time, greatly reducing the practicality of malware detectors. (3) Prior ML/DL-based efforts heavily rely on ample labeled data for model training, largely dependent on feature engineering or domain knowledge to build feature databases, making them vulnerable if correct labels are scarce. With the development of computer vision, vision-based malware detection technology has also rapidly evolved. In this paper, we propose a visual malware general enhancement classification framework, `PromptSAM+', based on a large visual network segmentation model, the Prompt Segment Anything Model(named PromptSAM+). Our experimental results indicate that 'PromptSAM+' is effective and efficient in malware detection and classification, achieving high accuracy and low rates of false positives and negatives. The proposed method outperforms the most advanced image-based malware detection technologies on several datasets. 'PromptSAM+' can mitigate aging in existing image-based malware classifiers, reducing the considerable manpower needed for labeling new malware samples through active learning. We conducted experiments on datasets for both Windows and Android platforms, achieving favorable outcomes. Additionally, our ablation experiments on several datasets demonstrate that our model identifies effective modules within the large visual network., Comment: 13pages, 10figures
- Published
- 2024
29. Mitigating the Impact of Malware Evolution on API Sequence-based Windows Malware Detector
- Author
-
Wei, Xingyuan, Li, Ce, Lv, Qiujian, Li, Ning, Sun, Degang, and Wang, Yan
- Subjects
Computer Science - Cryptography and Security ,F.2.2 ,I.2.7 - Abstract
In dynamic Windows malware detection, deep learning models are extensively deployed to analyze API sequences. Methods based on API sequences play a crucial role in malware prevention. However, due to the continuous updates of APIs and the changes in API sequence calls leading to the constant evolution of malware variants, the detection capability of API sequence-based malware detection models significantly diminishes over time. We observe that the API sequences of malware samples before and after evolution usually have similar malicious semantics. Specifically, compared to the original samples, evolved malware samples often use the API sequences of the pre-evolution samples to achieve similar malicious behaviors. For instance, they access similar sensitive system resources and extend new malicious functions based on the original functionalities. In this paper, we propose a frame(MME), a framework that can enhance existing API sequence-based malware detectors and mitigate the adverse effects of malware evolution. To help detection models capture the similar semantics of these post-evolution API sequences, our framework represents API sequences using API knowledge graphs and system resource encodings and applies contrastive learning to enhance the model's encoder. Results indicate that, compared to Regular Text-CNN, our framework can significantly reduce the false positive rate by 13.10% and improve the F1-Score by 8.47% on five years of data, achieving the best experimental results. Additionally, evaluations show that our framework can save on the human costs required for model maintenance. We only need 1% of the budget per month to reduce the false positive rate by 11.16% and improve the F1-Score by 6.44%., Comment: 13pages, 11 figures
- Published
- 2024
30. Thermal spin-crossover and temperature-dependent zero-field splitting in magnetic nanographene chains
- Author
-
Wang, Yan, Paz, Alejandro Pérez, Boström, Emil Viñas, Zhang, Xiaoxi, Li, Juan, Berger, Reinhard, Liu, Kun, Ma, Ji, Huang, Li, Du, Shixuan, Gao, Hong-jun, Müllen, Klaus, Narita, Akimitsu, Feng, Xinliang, Rubio, Angel, and Palma, CA
- Subjects
Physics - Chemical Physics - Abstract
Nanographene-based magnetism at interfaces offers an avenue to designer quantum materials towards novel phases of matter and atomic-scale applications. Key to spintronics applications at the nanoscale is bistable spin-crossover which however remains to be demonstrated in nanographenes. Here we show that antiaromatic 1,4-disubstituted pyrazine-embedded nanographene derivatives, which promote magnetism through oxidation to a non-aromatic radical are prototypical models for the study of carbon-based thermal spin-crossover. Scanning tunneling spectroscopy studies reveal symmetric spin excitation signals which evolve at Tc to a zero-energy peak, and are assigned to the transition of a S = 3/2 high-spin to a S = 1/2 low-spin state by density functional theory. At temperatures below and close to the spin-crossover Tc, the high-spin S= 3/2 excitations evidence pronouncedly different temperature-dependent excitation energies corresponding to a zero-field splitting in the Hubbard-Kanamori Hamiltonian. The discovery of thermal spin crossover and temperature-dependent zero-field splitting in carbon nanomaterials promises to accelerate quantum information, spintronics and thermometry at the atomic scale.
- Published
- 2024
31. S3PET: Semi-supervised Standard-dose PET Image Reconstruction via Dose-aware Token Swap
- Author
-
Cui, Jiaqi, Zeng, Pinxian, Xu, Yuanyuan, Wu, Xi, Zhou, Jiliu, and Wang, Yan
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
To acquire high-quality positron emission tomography (PET) images while reducing the radiation tracer dose, numerous efforts have been devoted to reconstructing standard-dose PET (SPET) images from low-dose PET (LPET). However, the success of current fully-supervised approaches relies on abundant paired LPET and SPET images, which are often unavailable in clinic. Moreover, these methods often mix the dose-invariant content with dose level-related dose-specific details during reconstruction, resulting in distorted images. To alleviate these problems, in this paper, we propose a two-stage Semi-Supervised SPET reconstruction framework, namely S3PET, to accommodate the training of abundant unpaired and limited paired SPET and LPET images. Our S3PET involves an un-supervised pre-training stage (Stage I) to extract representations from unpaired images, and a supervised dose-aware reconstruction stage (Stage II) to achieve LPET-to-SPET reconstruction by transferring the dose-specific knowledge between paired images. Specifically, in stage I, two independent dose-specific masked autoencoders (DsMAEs) are adopted to comprehensively understand the unpaired SPET and LPET images. Then, in Stage II, the pre-trained DsMAEs are further finetuned using paired images. To prevent distortions in both content and details, we introduce two elaborate modules, i.e., a dose knowledge decouple module to disentangle the respective dose-specific and dose-invariant knowledge of LPET and SPET, and a dose-specific knowledge learning module to transfer the dose-specific information from SPET to LPET, thereby achieving high-quality SPET reconstruction from LPET images. Experiments on two datasets demonstrate that our S3PET achieves state-of-the-art performance quantitatively and qualitatively.
- Published
- 2024
32. Building spin-1/2 antiferromagnetic Heisenberg chains with diaza-nanographenes
- Author
-
Fu, Xiaoshuai, Huang, Li, Liu, Kun, Henriques, João C. G., Gao, Yixuan, Han, Xianghe, Chen, Hui, Wang, Yan, Palma, Carlos-Andres, Cheng, Zhihai, Lin, Xiao, Du, Shixuan, Ma, Ji, Fernández-Rossier, Joaquín, Feng, Xinliang, and Gao, Hong-Jun
- Subjects
Condensed Matter - Materials Science ,Condensed Matter - Mesoscale and Nanoscale Physics ,Physics - Chemical Physics ,Quantum Physics - Abstract
Understanding and engineering the coupling of spins in nanomaterials is of central importance for designing novel devices. Graphene nanostructures with {\pi}-magnetism offer a chemically tunable platform to explore quantum magnetic interactions. However, realizing spin chains bearing controlled odd-even effects with suitable nanographene systems is challenging. Here, we demonstrate the successful on-surface synthesis of spin-1/2 antiferromagnetic Heisenberg chains with parity-dependent magnetization based on antiaromatic diaza-hexa-peri-hexabenzocoronene (diaza-HBC) units. Using distinct synthetic strategies, two types of spin chains with different terminals were synthesized, both exhibiting a robust odd-even effect on the spin coupling along the chain. Combined investigations using scanning tunneling microscopy, non-contact atomic force microscopy, density functional theory calculations, and quantum spin models confirmed the structures of the diaza-HBC chains and revealed their magnetic properties, which has an S = 1/2 spin per unit through electron donation from the diaza-HBC core to the Au(111) substrate. Gapped excitations were observed in even-numbered chains, while enhanced Kondo resonance emerged in odd-numbered units of odd-numbered chains due to the redistribution of the unpaired spin along the chain. Our findings provide an effective strategy to construct nanographene spin chains and unveil the odd-even effect in their magnetic properties, offering potential applications in nanoscale spintronics.
- Published
- 2024
33. Collective optical properties of moir\'e excitons
- Author
-
Huang, Tsung-Sheng, Wang, Yu-Xin, Wang, Yan-Qi, Chang, Darrick, Hafezi, Mohammad, and Grankin, Andrey
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Condensed Matter - Materials Science ,Condensed Matter - Strongly Correlated Electrons ,Quantum Physics - Abstract
We propose that excitons in moir\'e transition metal dichalcogenide bilayers offer a promising platform for investigating collective radiative properties. While some of these optical properties resemble those of cold atom arrays, moir\'e excitons extend to the deep subwavelength limit, beyond the reach of current optical lattice experiments. Remarkably, we show that the collective optical properties can be exploited to probe certain correlated electron states. Specifically, we illustrate that the Wigner crystal states of electrons doped into these bilayers act as an emergent periodic potential for excitons. Moreover, the collective dissipative excitonic bands and their associated Berry curvature can reveal various charge orders that emerge at the corresponding electronic doping. Our study provides a promising pathway for future research on the interplay between collective effects and strong correlations involving moir\'e excitons.
- Published
- 2024
34. $VILA^2$: VILA Augmented VILA
- Author
-
Fang, Yunhao, Zhu, Ligeng, Lu, Yao, Wang, Yan, Molchanov, Pavlo, Cho, Jang Hyun, Pavone, Marco, Han, Song, and Yin, Hongxu
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Visual language models (VLMs) have rapidly progressed, driven by the success of large language models (LLMs). While model architectures and training infrastructures advance rapidly, data curation remains under-explored. When data quantity and quality become a bottleneck, existing work either directly crawls more raw data from the Internet that does not have a guarantee of data quality or distills from black-box commercial models (e.g., GPT-4V / Gemini) causing the performance upper bounded by that model. In this work, we introduce a novel approach that includes a self-augment step and a specialist-augment step to iteratively improve data quality and model performance. In the self-augment step, a VLM recaptions its own pretraining data to enhance data quality, and then retrains from scratch using this refined dataset to improve model performance. This process can iterate for several rounds. Once self-augmentation saturates, we employ several specialist VLMs finetuned from the self-augmented VLM with domain-specific expertise, to further infuse specialist knowledge into the generalist VLM through task-oriented recaptioning and retraining. With the combined self-augmented and specialist-augmented training, we introduce $VILA^2$ (VILA-augmented-VILA), a VLM family that consistently improves the accuracy on a wide range of tasks over prior art, and achieves new state-of-the-art results on MMMU leaderboard among open-sourced models.
- Published
- 2024
35. LoFormer: Local Frequency Transformer for Image Deblurring
- Author
-
Mao, Xintian, Wang, Jiansheng, Xie, Xingran, Li, Qingli, and Wang, Yan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Due to the computational complexity of self-attention (SA), prevalent techniques for image deblurring often resort to either adopting localized SA or employing coarse-grained global SA methods, both of which exhibit drawbacks such as compromising global modeling or lacking fine-grained correlation. In order to address this issue by effectively modeling long-range dependencies without sacrificing fine-grained details, we introduce a novel approach termed Local Frequency Transformer (LoFormer). Within each unit of LoFormer, we incorporate a Local Channel-wise SA in the frequency domain (Freq-LC) to simultaneously capture cross-covariance within low- and high-frequency local windows. These operations offer the advantage of (1) ensuring equitable learning opportunities for both coarse-grained structures and fine-grained details, and (2) exploring a broader range of representational properties compared to coarse-grained global SA methods. Additionally, we introduce an MLP Gating mechanism complementary to Freq-LC, which serves to filter out irrelevant features while enhancing global learning capabilities. Our experiments demonstrate that LoFormer significantly improves performance in the image deblurring task, achieving a PSNR of 34.09 dB on the GoPro dataset with 126G FLOPs. https://github.com/DeepMed-Lab-ECNU/Single-Image-Deblur
- Published
- 2024
36. Hi-EF: Benchmarking Emotion Forecasting in Human-interaction
- Author
-
Wang, Haoran, Mai, Xinji, Tao, Zeng, Wang, Yan, Yu, Jiawen, Zhou, Ziheng, Tong, Xuan, Yan, Shaoqi, Zhao, Qing, Gao, Shuyong, and Zhang, Wenqiang
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Affective Forecasting, a research direction in psychology that predicts individuals future emotions, is often constrained by numerous external factors like social influence and temporal distance. To address this, we transform Affective Forecasting into a Deep Learning problem by designing an Emotion Forecasting paradigm based on two-party interactions. We propose a novel Emotion Forecasting (EF) task grounded in the theory that an individuals emotions are easily influenced by the emotions or other information conveyed during interactions with another person. To tackle this task, we have developed a specialized dataset, Human-interaction-based Emotion Forecasting (Hi-EF), which contains 3069 two-party Multilayered-Contextual Interaction Samples (MCIS) with abundant affective-relevant labels and three modalities. Hi-EF not only demonstrates the feasibility of the EF task but also highlights its potential. Additionally, we propose a methodology that establishes a foundational and referential baseline model for the EF task and extensive experiments are provided. The dataset and code is available at https://github.com/Anonymize-Author/Hi-EF.
- Published
- 2024
37. All rivers run into the sea: Unified Modality Brain-like Emotional Central Mechanism
- Author
-
Mai, Xinji, Lin, Junxiong, Wang, Haoran, Tao, Zeng, Wang, Yan, Yan, Shaoqi, Tong, Xuan, Yu, Jiawen, Wang, Boyang, Zhou, Ziheng, Zhao, Qing, Gao, Shuyong, and Zhang, Wenqiang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In the field of affective computing, fully leveraging information from a variety of sensory modalities is essential for the comprehensive understanding and processing of human emotions. Inspired by the process through which the human brain handles emotions and the theory of cross-modal plasticity, we propose UMBEnet, a brain-like unified modal affective processing network. The primary design of UMBEnet includes a Dual-Stream (DS) structure that fuses inherent prompts with a Prompt Pool and a Sparse Feature Fusion (SFF) module. The design of the Prompt Pool is aimed at integrating information from different modalities, while inherent prompts are intended to enhance the system's predictive guidance capabilities and effectively manage knowledge related to emotion classification. Moreover, considering the sparsity of effective information across different modalities, the SSF module aims to make full use of all available sensory data through the sparse integration of modality fusion prompts and inherent prompts, maintaining high adaptability and sensitivity to complex emotional states. Extensive experiments on the largest benchmark datasets in the Dynamic Facial Expression Recognition (DFER) field, including DFEW, FERV39k, and MAFW, have proven that UMBEnet consistently outperforms the current state-of-the-art methods. Notably, in scenarios of Modality Missingness and multimodal contexts, UMBEnet significantly surpasses the leading current methods, demonstrating outstanding performance and adaptability in tasks that involve complex emotional understanding with rich multimodal information.
- Published
- 2024
38. The Oscars of AI Theater: A Survey on Role-Playing with Language Models
- Author
-
Chen, Nuo, Wang, Yan, Deng, Yang, and Li, Jia
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
This survey explores the burgeoning field of role-playing with language models, focusing on their development from early persona-based models to advanced character-driven simulations facilitated by Large Language Models (LLMs). Initially confined to simple persona consistency due to limited model capabilities, role-playing tasks have now expanded to embrace complex character portrayals involving character consistency, behavioral alignment, and overall attractiveness. We provide a comprehensive taxonomy of the critical components in designing these systems, including data, models and alignment, agent architecture and evaluation. This survey not only outlines the current methodologies and challenges, such as managing dynamic personal profiles and achieving high-level persona consistency but also suggests avenues for future research in improving the depth and realism of role-playing applications. The goal is to guide future research by offering a structured overview of current methodologies and identifying potential areas for improvement. Related resources and papers are available at https://github.com/nuochenpku/Awesome-Role-Play-Papers., Comment: 28 pages
- Published
- 2024
39. A foundation model approach to guide antimicrobial peptide design in the era of artificial intelligence driven scientific discovery
- Author
-
Wang, Jike, Feng, Jianwen, Kang, Yu, Pan, Peichen, Ge, Jingxuan, Wang, Yan, Wang, Mingyang, Wu, Zhenxing, Zhang, Xingcai, Yu, Jiameng, Zhang, Xujun, Wang, Tianyue, Wen, Lirong, Yan, Guangning, Deng, Yafeng, Shi, Hui, Hsieh, Chang-Yu, Jiang, Zhihui, and Hou, Tingjun
- Subjects
Quantitative Biology - Biomolecules - Abstract
We propose AMP-Designer, an LLM-based foundation model approach for the rapid design of novel antimicrobial peptides (AMPs) with multiple desired properties. Within 11 days, AMP-Designer enables de novo design of 18 novel candidates with broad-spectrum potency against Gram-negative bacteria. Subsequent in vitro validation experiments demonstrate that almost all in silico recommended candidates exhibit notable antibacterial activity, yielding a 94.4% positive rate. Two of these candidates exhibit exceptional activity, minimal hemotoxicity, substantial stability in human plasma, and a low propensity of inducing antibiotic resistance as observed in murine lung infection experiments, showcasing their significant efficacy in reducing bacterial load by approximately one hundredfold. The entire process, from in silico design to in vitro and in vivo validation, is completed within a timeframe of 48 days. Moreover, AMP-Designer demonstrates its remarkable capability in designing specific AMPs to target strains with extremely limited labeled datasets. The most outstanding candidate against Propionibacterium acnes suggested by AMP-Designer exhibits an in vitro minimum inhibitory concentration value of 2.0 $\mu$g/ml. Through the integration of advanced machine learning methodologies such as contrastive prompt tuning, knowledge distillation, and reinforcement learning within the AMP-Designer framework, the process of designing AMPs demonstrates exceptional efficiency. This efficiency remains conspicuous even in the face of challenges posed by constraints arising from a scarcity of labeled data. These findings highlight the tremendous potential of AMP-Designer as a promising approach in combating the global health threat of antibiotic resistance., Comment: 43 pages, 6 figures, 5 tables. Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract appearing here is slightly shorter than that in the PDF file
- Published
- 2024
40. LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction
- Author
-
Du, Penghui, Wang, Yu, Sun, Yifan, Wang, Luting, Liao, Yue, Zhang, Gang, Ding, Errui, Wang, Yan, Wang, Jingdong, and Liu, Si
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Existing methods enhance open-vocabulary object detection by leveraging the robust open-vocabulary recognition capabilities of Vision-Language Models (VLMs), such as CLIP.However, two main challenges emerge:(1) A deficiency in concept representation, where the category names in CLIP's text space lack textual and visual knowledge.(2) An overfitting tendency towards base categories, with the open vocabulary knowledge biased towards base categories during the transfer from VLMs to detectors.To address these challenges, we propose the Language Model Instruction (LaMI) strategy, which leverages the relationships between visual concepts and applies them within a simple yet effective DETR-like detector, termed LaMI-DETR.LaMI utilizes GPT to construct visual concepts and employs T5 to investigate visual similarities across categories.These inter-category relationships refine concept representation and avoid overfitting to base categories.Comprehensive experiments validate our approach's superior performance over existing methods in the same rigorous setting without reliance on external training resources.LaMI-DETR achieves a rare box AP of 43.4 on OV-LVIS, surpassing the previous best by 7.8 rare box AP., Comment: ECCV2024
- Published
- 2024
41. Hierarchical search method for gravitational waves from stellar-mass binary black holes in noisy space-based detector data
- Author
-
Fu, Yao, Wang, Yan, and Mohanty, Soumya D.
- Subjects
Astrophysics - Instrumentation and Methods for Astrophysics ,Astrophysics - High Energy Astrophysical Phenomena ,General Relativity and Quantum Cosmology - Abstract
Future space-based laser interferometric detectors, such as LISA, will be able to detect gravitational waves (GWs) generated during the inspiral phase of stellar-mass binary black holes (SmBBHs). The detection and characterization of GWs from SmBBHs poses a formidable data analysis challenge, arising from the large number of wave cycles that make the search extremely sensitive to mismatches in signal and template parameters in a likelihood-based approach. This makes the search for the maximum of the likelihood function over the signal parameter space an extremely difficult task. We present a data analysis method that addresses this problem using both algorithmic innovations and hardware acceleration driven by GPUs. The method follows a hierarchical approach in which a semi-coherent $\mathcal{F}$-statistic is computed with different numbers of frequency domain partitions at different stages, with multiple particle swarm optimization (PSO) runs used in each stage for global optimization. An important step in the method is the judicious partitioning of the parameter space at each stage to improve the convergence probability of PSO and avoid premature convergence to noise-induced secondary maxima. The hierarchy of stages confines the semi-coherent searches to progressively smaller parameter ranges, with the final stage performing a search for the global maximum of the fully-coherent $\mathcal{F}$-statistic. We test our method on 2.5 years of a single LISA TDI combination and find that for an injected SmBBH signal with a SNR between $\approx 11$ and $\approx 14$, the method can estimate (i) the chirp mass with a relative error of $\lesssim 0.01\%$, (ii) the time of coalescence within $\approx 100$ sec, (iii) the sky location within $\approx 0.2$ ${\rm deg}^2$, and (iv) orbital eccentricity at a fiducial signal frequency of 10 mHz with a relative error of $\lesssim 1\%$. (abr.), Comment: 15 pages, 5 figures, 6 tables
- Published
- 2024
42. Power Optimization and Deep Learning for Channel Estimation of Active IRS-Aided IoT
- Author
-
Wang, Yan, Shu, Feng, Dong, Rongen, Gao, Wei, Zhang, Qi, and Liu, Jiajia
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
In this paper, channel estimation of an active intelligent reflecting surface (IRS) aided uplink Internet of Things (IoT) network is investigated. Firstly, the least square (LS) estimators for the direct channel and the cascaded channel are presented, respectively. The corresponding mean square errors (MSE) of channel estimators are derived. Subsequently, in order to evaluate the influence of adjusting the transmit power at the IoT devices or the reflected power at the active IRS on Sum-MSE performance, two situations are considered. In the first case, under the total power sum constraint of the IoT devices and active IRS, the closed-form expression of the optimal power allocation factor is derived. In the second case, when the transmit power at the IoT devices is fixed, there exists an optimal reflective power at active IRS. To further improve the estimation performance, the convolutional neural network (CNN)-based direct channel estimation (CDCE) algorithm and the CNN-based cascaded channel estimation (CCCE) algorithm are designed. Finally, simulation results demonstrate the existence of an optimal power allocation strategy that minimizes the Sum-MSE, and further validate the superiority of the proposed CDCE / CCCE algorithms over their respective traditional LS and minimum mean square error (MMSE) baselines.
- Published
- 2024
43. Multistate ferroelectric diodes with high electroresistance based on van der Waals heterostructures
- Author
-
Sarkar, Soumya, Han, Zirun, Ghani, Maheera Abdul, Strkalj, Nives, Kim, Jung Ho, Wang, Yan, Jariwala, Deep, and Chhowalla, Manish
- Subjects
Physics - Applied Physics ,Condensed Matter - Materials Science - Abstract
Some van der Waals (vdW) materials exhibit ferroelectricity, making them promising for novel non-volatile memories (NVMs) such as ferroelectric diodes (FeDs). CuInP2S6 (CIPS) is a well-known vdW ferroelectric that has been integrated with graphene for memory devices. Here we demonstrate FeDs with self-rectifying, hysteretic current-voltage characteristics based on vertical heterostructures of 10-nm-thick CIPS and graphene. By using vdW indium-cobalt top electrodes and graphene bottom electrodes, we achieve high electroresistance (on- and off-state resistance ratios) of ~10^6, on-state rectification ratios of ~2500 for read/write voltages of 2 V/0.5 V and maximum output current densities of 100 A/cm^2. These metrics compare favourably with state-of-the-art FeDs. Piezoresponse force microscopy measurements show that stabilization of intermediate net polarization states in CIPS leads to stable multi-bit data retention at room temperature. The combination of two-terminal design, multi-bit memory, and low-power operation in CIPS-based FeDs is potentially interesting for compute-in-memory and neuromorphic computing applications., Comment: 17 Pages
- Published
- 2024
44. Spatially-Variant Degradation Model for Dataset-free Super-resolution
- Author
-
Guo, Shaojie, Song, Haofei, Li, Qingli, and Wang, Yan
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
This paper focuses on the dataset-free Blind Image Super-Resolution (BISR). Unlike existing dataset-free BISR methods that focus on obtaining a degradation kernel for the entire image, we are the first to explicitly design a spatially-variant degradation model for each pixel. Our method also benefits from having a significantly smaller number of learnable parameters compared to data-driven spatially-variant BISR methods. Concretely, each pixel's degradation kernel is expressed as a linear combination of a learnable dictionary composed of a small number of spatially-variant atom kernels. The coefficient matrices of the atom degradation kernels are derived using membership functions of fuzzy set theory. We construct a novel Probabilistic BISR model with tailored likelihood function and prior terms. Subsequently, we employ the Monte Carlo EM algorithm to infer the degradation kernels for each pixel. Our method achieves a significant improvement over other state-of-the-art BISR methods, with an average improvement of 1 dB (2x).Code will be released at https://github.com/shaojieguoECNU/SVDSR.
- Published
- 2024
45. Learning with Alignments: Tackling the Inter- and Intra-domain Shifts for Cross-multidomain Facial Expression Recognition
- Author
-
Yang, Yuxiang, Wen, Lu, Zeng, Xinyi, Xu, Yuanyuan, Wu, Xi, Zhou, Jiliu, and Wang, Yan
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Facial Expression Recognition (FER) holds significant importance in human-computer interactions. Existing cross-domain FER methods often transfer knowledge solely from a single labeled source domain to an unlabeled target domain, neglecting the comprehensive information across multiple sources. Nevertheless, cross-multidomain FER (CMFER) is very challenging for (i) the inherent inter-domain shifts across multiple domains and (ii) the intra-domain shifts stemming from the ambiguous expressions and low inter-class distinctions. In this paper, we propose a novel Learning with Alignments CMFER framework, named LA-CMFER, to handle both inter- and intra-domain shifts. Specifically, LA-CMFER is constructed with a global branch and a local branch to extract features from the full images and local subtle expressions, respectively. Based on this, LA-CMFER presents a dual-level inter-domain alignment method to force the model to prioritize hard-to-align samples in knowledge transfer at a sample level while gradually generating a well-clustered feature space with the guidance of class attributes at a cluster level, thus narrowing the inter-domain shifts. To address the intra-domain shifts, LA-CMFER introduces a multi-view intra-domain alignment method with a multi-view clustering consistency constraint where a prediction similarity matrix is built to pursue consistency between the global and local views, thus refining pseudo labels and eliminating latent noise. Extensive experiments on six benchmark datasets have validated the superiority of our LA-CMFER., Comment: Accepted by ACM MM 2024
- Published
- 2024
46. VideoCoT: A Video Chain-of-Thought Dataset with Active Annotation Tool
- Author
-
Wang, Yan, Zeng, Yawen, Zheng, Jingsheng, Xing, Xiaofen, Xu, Jin, and Xu, Xiangmin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
Multimodal large language models (MLLMs) are flourishing, but mainly focus on images with less attention than videos, especially in sub-fields such as prompt engineering, video chain-of-thought (CoT), and instruction tuning on videos. Therefore, we try to explore the collection of CoT datasets in videos to lead to video OpenQA and improve the reasoning ability of MLLMs. Unfortunately, making such video CoT datasets is not an easy task. Given that human annotation is too cumbersome and expensive, while machine-generated is not reliable due to the hallucination issue, we develop an automatic annotation tool that combines machine and human experts, under the active learning paradigm. Active learning is an interactive strategy between the model and human experts, in this way, the workload of human labeling can be reduced and the quality of the dataset can be guaranteed. With the help of the automatic annotation tool, we strive to contribute three datasets, namely VideoCoT, TopicQA, TopicCoT. Furthermore, we propose a simple but effective benchmark based on the collected datasets, which exploits CoT to maximize the complex reasoning capabilities of MLLMs. Extensive experiments demonstrate the effectiveness our solution., Comment: ACL 2024 Workshop
- Published
- 2024
47. xLSTM-UNet can be an Effective 2D & 3D Medical Image Segmentation Backbone with Vision-LSTM (ViL) better than its Mamba Counterpart
- Author
-
Chen, Tianrun, Ding, Chaotao, Zhu, Lanyun, Xu, Tao, Ji, Deyi, Wang, Yan, Zang, Ying, and Li, Zejian
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Convolutional Neural Networks (CNNs) and Vision Transformers (ViT) have been pivotal in biomedical image segmentation, yet their ability to manage long-range dependencies remains constrained by inherent locality and computational overhead. To overcome these challenges, in this technical report, we first propose xLSTM-UNet, a UNet structured deep learning neural network that leverages Vision-LSTM (xLSTM) as its backbone for medical image segmentation. xLSTM is a recently proposed as the successor of Long Short-Term Memory (LSTM) networks and have demonstrated superior performance compared to Transformers and State Space Models (SSMs) like Mamba in Neural Language Processing (NLP) and image classification (as demonstrated in Vision-LSTM, or ViL implementation). Here, xLSTM-UNet we designed extend the success in biomedical image segmentation domain. By integrating the local feature extraction strengths of convolutional layers with the long-range dependency capturing abilities of xLSTM, xLSTM-UNet offers a robust solution for comprehensive image analysis. We validate the efficacy of xLSTM-UNet through experiments. Our findings demonstrate that xLSTM-UNet consistently surpasses the performance of leading CNN-based, Transformer-based, and Mamba-based segmentation networks in multiple datasets in biomedical segmentation including organs in abdomen MRI, instruments in endoscopic images, and cells in microscopic images. With comprehensive experiments performed, this technical report highlights the potential of xLSTM-based architectures in advancing biomedical image analysis in both 2D and 3D. The code, models, and datasets are publicly available at http://tianrun-chen.github.io/xLSTM-UNet/
- Published
- 2024
48. Targeting Lactobacillus johnsonii to reverse chronic kidney disease.
- Author
-
Miao, Hua, Liu, Fei, Wang, Yan-Ni, Yu, Xiao-Yong, Zhuang, Shougang, Guo, Yan, Vaziri, Nosratola, Ma, Shi-Xing, Su, Wei, Shang, You-Quan, Gao, Ming, Zhang, Jin-Hua, Zhang, Li, Zhao, Ying-Yong, and Cao, Gang
- Subjects
Renal Insufficiency ,Chronic ,Animals ,Rats ,Humans ,Mice ,Male ,Lactobacillus johnsonii ,Indoles ,Receptors ,Aryl Hydrocarbon ,Gastrointestinal Microbiome ,Female - Abstract
Accumulated evidence suggested that gut microbial dysbiosis interplayed with progressive chronic kidney disease (CKD). However, no available therapy is effective in suppressing progressive CKD. Here, using microbiomics in 480 participants including healthy controls and patients with stage 1-5 CKD, we identified an elongation taxonomic chain Bacilli-Lactobacillales-Lactobacillaceae-Lactobacillus-Lactobacillus johnsonii correlated with patients with CKD progression, whose abundance strongly correlated with clinical kidney markers. L. johnsonii abundance reduced with progressive CKD in rats with adenine-induced CKD. L. johnsonii supplementation ameliorated kidney lesion. Serum indole-3-aldehyde (IAld), whose level strongly negatively correlated with creatinine level in CKD rats, decreased in serum of rats induced using unilateral ureteral obstruction (UUO) and 5/6 nephrectomy (NX) as well as late CKD patients. Treatment with IAld dampened kidney lesion through suppressing aryl hydrocarbon receptor (AHR) signal in rats with CKD or UUO, and in cultured 1-hydroxypyrene-induced HK-2 cells. Renoprotective effect of IAld was partially diminished in AHR deficiency mice and HK-2 cells. Our further data showed that treatment with L. johnsonii attenuated kidney lesion by suppressing AHR signal via increasing serum IAld level. Taken together, targeting L. johnsonii might reverse patients with CKD. This study provides a deeper understanding of how microbial-produced tryptophan metabolism affects host disease and discovers potential pathways for prophylactic and therapeutic treatments for CKD patients.
- Published
- 2024
49. Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning
- Author
-
Vasan, Gautham, Wang, Yan, Shahriar, Fahim, Bergstra, James, Jagersand, Martin, and Mahmood, A. Rupam
- Subjects
Computer Science - Robotics ,Computer Science - Machine Learning - Abstract
Many real-world robot learning problems, such as pick-and-place or arriving at a destination, can be seen as a problem of reaching a goal state as soon as possible. These problems, when formulated as episodic reinforcement learning tasks, can easily be specified to align well with our intended goal: -1 reward every time step with termination upon reaching the goal state, called minimum-time tasks. Despite this simplicity, such formulations are often overlooked in favor of dense rewards due to their perceived difficulty and lack of informativeness. Our studies contrast the two reward paradigms, revealing that the minimum-time task specification not only facilitates learning higher-quality policies but can also surpass dense-reward-based policies on their own performance metrics. Crucially, we also identify the goal-hit rate of the initial policy as a robust early indicator for learning success in such sparse feedback settings. Finally, using four distinct real-robotic platforms, we show that it is possible to learn pixel-based policies from scratch within two to three hours using constant negative rewards., Comment: In Proceedings of Reinforcement Learning Conference 2024. For a video demo, see https://youtu.be/a6zlVUuKzBc
- Published
- 2024
50. From Efficient Multimodal Models to World Models: A Survey
- Author
-
Mai, Xinji, Tao, Zeng, Lin, Junxiong, Wang, Haoran, Chang, Yang, Kang, Yanlan, Wang, Yan, and Zhang, Wenqiang
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Multimodal Large Models (MLMs) are becoming a significant research focus, combining powerful large language models with multimodal learning to perform complex tasks across different data modalities. This review explores the latest developments and challenges in MLMs, emphasizing their potential in achieving artificial general intelligence and as a pathway to world models. We provide an overview of key techniques such as Multimodal Chain of Thought (M-COT), Multimodal Instruction Tuning (M-IT), and Multimodal In-Context Learning (M-ICL). Additionally, we discuss both the fundamental and specific technologies of multimodal models, highlighting their applications, input/output modalities, and design characteristics. Despite significant advancements, the development of a unified multimodal model remains elusive. We discuss the integration of 3D generation and embodied intelligence to enhance world simulation capabilities and propose incorporating external rule systems for improved reasoning and decision-making. Finally, we outline future research directions to address these challenges and advance the field.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.