116,181 results on '"LI, Xin"'
Search Results
2. On K-theoretic invariants of semigroup C *-algebras from actions of congruence monoids
- Author
-
Bruce, Chris and Li, Xin
- Published
- 2023
- Full Text
- View/download PDF
3. Dynamic strain sensing using Doppler-shift-immune phase-sensitive OFDR with ultra-weak reflection array and frequency-tracking
- Author
-
Yang, Qiang, Xie, Weilin, Wang, Congfan, Li, Bowen, Li, Xin, Zheng, Xiang, Wei, Wei, and Dong, Yi
- Subjects
Physics - Optics ,Physics - Applied Physics - Abstract
In distributed fiber-optic sensing based on optical frequency domain reflectometry (OFDR), Doppler frequency shifts due to the changes of disturbances during one sweep period introduce demodulation errors that accumulate along both the distance and time, impairing the sensing performance. Here, we report distributed dynamic strain sensing using Doppler-shift-immune phase-sensitive OFDR based on frequency-tracking and spectrum-zooming with ultra-weak reflection array. Theoretical study has been carried out with the introduction of mismatch coefficient, unveiling quantitatively the impact of Doppler shift. Following a numerical analysis of the proposed method, a retained precision has been experimentally verified regardless of the position mismatch due to the Doppler effect. Doppler-shift-immune sensing for dynamic strains covering continuous spatial resolution over a distance of 1000 m with a 2.5 cm sensing spatial resolution has been demonstrated, verifying the high fidelity promised by the proposed method.
- Published
- 2024
4. GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration
- Author
-
Li, Xin, Chu, Qizhi, Chen, Yubin, Liu, Yang, Liu, Yaoqi, Yu, Zekai, Chen, Weize, Qian, Chen, Shi, Chuan, and Yang, Cheng
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Multiagent Systems - Abstract
Graphs are widely used for modeling relational data in real-world scenarios, such as social networks and urban computing. Existing LLM-based graph analysis approaches either integrate graph neural networks (GNNs) for specific machine learning tasks, limiting their transferability, or rely solely on LLMs' internal reasoning ability, resulting in suboptimal performance. To address these limitations, we take advantage of recent advances in LLM-based agents, which have shown capabilities of utilizing external knowledge or tools for problem solving. By simulating human problem-solving strategies such as analogy and collaboration, we propose a multi-agent system based on LLMs named GraphTeam, for graph analysis. GraphTeam consists of five LLM-based agents from three modules, and the agents with different specialities can collaborate with each other to address complex problems. Specifically, (1) input-output normalization module: the question agent extracts and refines four key arguments from the original question, facilitating the problem understanding, and the answer agent organizes the results to meet the output requirement; (2) external knowledge retrieval module: we first build a knowledge base consisting of relevant documentation and experience information, and then the search agent retrieves the most relevant entries for each question. (3) problem-solving module: given the retrieved information from search agent, the coding agent uses established algorithms via programming to generate solutions, and in case the coding agent does not work, the reasoning agent will directly compute the results without programming. Extensive experiments on six graph analysis benchmarks demonstrate that GraphTeam achieves state-of-the-art performance with an average 25.85% improvement over the best baseline in terms of accuracy. The code and data are available at https://github.com/BUPT-GAMMA/GraphTeam.
- Published
- 2024
5. Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
- Author
-
Cheng, Zesen, Zhang, Hang, Li, Kehan, Leng, Sicong, Hu, Zhiqiang, Wu, Fei, Zhao, Deli, Li, Xin, and Bing, Lidong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Contrastive loss is a powerful approach for representation learning, where larger batch sizes enhance performance by providing more negative samples to better distinguish between similar and dissimilar data. However, scaling batch sizes is constrained by the quadratic growth in GPU memory consumption, primarily due to the full instantiation of the similarity matrix. To address this, we propose a tile-based computation strategy that partitions the contrastive loss calculation into arbitrary small blocks, avoiding full materialization of the similarity matrix. Furthermore, we introduce a multi-level tiling strategy to leverage the hierarchical structure of distributed systems, employing ring-based communication at the GPU level to optimize synchronization and fused kernels at the CUDA core level to reduce I/O overhead. Experimental results show that the proposed method scales batch sizes to unprecedented levels. For instance, it enables contrastive training of a CLIP-ViT-L/14 model with a batch size of 4M or 12M using 8 or 32 A800 80GB without sacrificing any accuracy. Compared to SOTA memory-efficient solutions, it achieves a two-order-of-magnitude reduction in memory while maintaining comparable speed. The code will be made publicly available.
- Published
- 2024
6. CLAP: Concave Linear APproximation for Quadratic Graph Matching
- Author
-
Liang, Yongqing, Han, Huijun, and Li, Xin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Solving point-wise feature correspondence in visual data is a fundamental problem in computer vision. A powerful model that addresses this challenge is to formulate it as graph matching, which entails solving a Quadratic Assignment Problem (QAP) with node-wise and edge-wise constraints. However, solving such a QAP can be both expensive and difficult due to numerous local extreme points. In this work, we introduce a novel linear model and solver designed to accelerate the computation of graph matching. Specifically, we employ a positive semi-definite matrix approximation to establish the structural attribute constraint.We then transform the original QAP into a linear model that is concave for maximization. This model can subsequently be solved using the Sinkhorn optimal transport algorithm, known for its enhanced efficiency and numerical stability compared to existing approaches. Experimental results on the widely used benchmark PascalVOC showcase that our algorithm achieves state-of-the-art performance with significantly improved efficiency. Source code: https://github.com/xmlyqing00/clap, Comment: Accepted as an oral paper in International Symposium on Visual Computing (ISCV2024)
- Published
- 2024
7. Improved Explicit Near-Optimal Codes in the High-Noise Regimes
- Author
-
Li, Xin and Mao, Songtao
- Subjects
Computer Science - Information Theory ,Computer Science - Data Structures and Algorithms ,Mathematics - Combinatorics - Abstract
We study uniquely decodable codes and list decodable codes in the high-noise regime, specifically codes that are uniquely decodable from $\frac{1-\varepsilon}{2}$ fraction of errors and list decodable from $1-\varepsilon$ fraction of errors. We present several improved explicit constructions that achieve near-optimal rates, as well as efficient or even linear-time decoding algorithms. Our contributions are as follows. 1. Explicit Near-Optimal Linear Time Uniquely Decodable Codes: We construct a family of explicit $\mathbb{F}_2$-linear codes with rate $\Omega(\varepsilon)$ and alphabet size $2^{\mathrm{poly} \log(1/\varepsilon)}$, that are capable of correcting $e$ errors and $s$ erasures whenever $2e + s < (1 - \varepsilon)n$ in linear-time. 2. Explicit Near-Optimal List Decodable Codes: We construct a family of explicit list decodable codes with rate $\Omega(\varepsilon)$ and alphabet size $2^{\mathrm{poly} \log(1/\varepsilon)}$, that are capable of list decoding from $1-\varepsilon$ fraction of errors with a list size $L = \exp\exp\exp(\log^{\ast}n)$ in polynomial time. 3. List Decodable Code with Near-Optimal List Size: We construct a family of explicit list decodable codes with an optimal list size of $O(1/\varepsilon)$, albeit with a suboptimal rate of $O(\varepsilon^2)$, capable of list decoding from $1-\varepsilon$ fraction of errors in polynomial time. Furthermore, we introduce a new combinatorial object called multi-set disperser, and use it to give a family of list decodable codes with near-optimal rate $\frac{\varepsilon}{\log^2(1/\varepsilon)}$ and list size $\frac{\log^2(1/\varepsilon)}{\varepsilon}$, that can be constructed in probabilistic polynomial time and decoded in deterministic polynomial time. We also introduce new decoding algorithms that may prove valuable for other graph-based codes., Comment: 28 pages
- Published
- 2024
8. EViT-Unet: U-Net Like Efficient Vision Transformer for Medical Image Segmentation on Mobile and Edge Devices
- Author
-
Li, Xin, Zhu, Wenhui, Dong, Xuanzhao, Dumitrascu, Oana M., and Wang, Yalin
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
With the rapid development of deep learning, CNN-based U-shaped networks have succeeded in medical image segmentation and are widely applied for various tasks. However, their limitations in capturing global features hinder their performance in complex segmentation tasks. The rise of Vision Transformer (ViT) has effectively compensated for this deficiency of CNNs and promoted the application of ViT-based U-networks in medical image segmentation. However, the high computational demands of ViT make it unsuitable for many medical devices and mobile platforms with limited resources, restricting its deployment on resource-constrained and edge devices. To address this, we propose EViT-UNet, an efficient ViT-based segmentation network that reduces computational complexity while maintaining accuracy, making it ideal for resource-constrained medical devices. EViT-UNet is built on a U-shaped architecture, comprising an encoder, decoder, bottleneck layer, and skip connections, combining convolutional operations with self-attention mechanisms to optimize efficiency. Experimental results demonstrate that EViT-UNet achieves high accuracy in medical image segmentation while significantly reducing computational complexity., Comment: 5 pages, 3 figures
- Published
- 2024
9. Cooperative non-reciprocal emission and quantum sensing of symmetry breaking
- Author
-
Li, Xin and Flebus, Benedetta
- Subjects
Quantum Physics ,Condensed Matter - Mesoscale and Nanoscale Physics ,Condensed Matter - Other Condensed Matter - Abstract
Non-reciprocal propagation of energy and information is fundamental to a wide range of quantum technology applications. In this work, we explore the quantum many-body dynamics of a qubit ensemble coupled to a shared bath that mediates coherent and dissipative inter-qubit interactions with both symmetric and anti-symmetric components. We find that the interplay between anti-symmetric (symmetric) coherent and symmetric (anti-symmetric) dissipative interactions results in non-reciprocal couplings, which, in turn, generate a spatially asymmetric emission pattern. We demonstrate that this pattern arises from non-reciprocal interactions coupling different quantum many-body states within a specific excitation manifold. Focusing on solid-state baths, we show that their lack of time-reversal and inversion symmetry is a key ingredient for generating non-reciprocal dynamics in the qubit ensemble. With the plethora of quantum materials that exhibit this symmetry breaking at equilibrium, our approach paves the way for realizing cooperative non-reciprocal transport in qubit ensembles without requiring time-modulated external drives or complex engineering. Using an ensemble of nitrogen-vacancy (NV) centers coupled to a generic non-centrosymmetric ferromagnetic bath as a concrete example, we demonstrate that our predictions can be tested in near-future experiments. As the spatial asymmetry in the relaxation dynamics of the qubit ensemble is a direct probe of symmetry breaking in the solid-state bath, our work also opens the door to developing model-agnostic quantum sensing schemes capable of detecting bath properties invisible to current state-of-the-art protocols, which operate solid-state defects as single-qubit sensors.
- Published
- 2024
10. A Survey on Computational Solutions for Reconstructing Complete Objects by Reassembling Their Fractured Parts
- Author
-
Lu, Jiaxin, Liang, Yongqing, Han, Huijun, Hua, Jiacheng, Jiang, Junfeng, Li, Xin, and Huang, Qixing
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Graphics - Abstract
Reconstructing a complete object from its parts is a fundamental problem in many scientific domains. The purpose of this article is to provide a systematic survey on this topic. The reassembly problem requires understanding the attributes of individual pieces and establishing matches between different pieces. Many approaches also model priors of the underlying complete object. Existing approaches are tightly connected problems of shape segmentation, shape matching, and learning shape priors. We provide existing algorithms in this context and emphasize their similarities and differences to general-purpose approaches. We also survey the trends from early non-deep learning approaches to more recent deep learning approaches. In addition to algorithms, this survey will also describe existing datasets, open-source software packages, and applications. To the best of our knowledge, this is the first comprehensive survey on this topic in computer graphics., Comment: 36 pages, 22 figures
- Published
- 2024
11. PGC 44685: A Dwarf Star-forming Lenticular Galaxy with Wolf-Rayet Population
- Author
-
Lu, Shiying, Gu, Qiusheng, Gao, Yulong, Shi, Yong, Zhou, Luwenjia, García-Benito, Rubén, Li, Xiangdong, Cui, Jiantong, Li, Xin, Long, Liuze, and Chen, Zhengyi
- Subjects
Astrophysics - Astrophysics of Galaxies - Abstract
Lenticular galaxies (S0s) are formed mainly from the gas stripping of spirals in the cluster. But how S0s form and evolve in the field is still untangled. Based on spatially resolved observations from the optical Hispanic Astronomical Center in Andalusia 3.5-m telescope with the PPAK Integral Field Spectroscopy instrument and NOrthern Extended Millimeter Array, we study a dwarf (M*<10^9 Msun) S0, PGC 44685, with triple star-forming regions in the central region, namely A, B, and C, respectively. In northwest region C, we clearly detect the spectral features of Wolf-Rayet (WR) stars and quantify the WR population by stacking spectra with high WR significance. Most of the molecular gas is concentrated in the region C(WR), and there is diffuse gas around regions A and B. The WR region possesses the strongest intensities of Ha, CO(1-0), and 3mm continuum, indicating its ongoing violent star formation (gas depletion timescale $\lesssim$25 Myr) with tentative hundreds (<500) km/s stellar winds accompanied by the WR phase. Most (~96%) of three star-forming regions show relatively low metallicity distributions, suggesting possible (minor) accretions of metal-poor gas that trigger the subsequent complex star formation in a field S0 galaxy. We speculate that PGC 44685 will become quiescent in less than 30 Myr if there is no new molecular gas to provide raw materials for star formation. The existence of this dwarf star-forming S0 presents an example of star formation in the low-mass/metallicity S0 galaxy., Comment: 19 pages, 12 figures, 3 tables, ApJ accepted
- Published
- 2024
12. The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
- Author
-
Leng, Sicong, Xing, Yun, Cheng, Zesen, Zhou, Yang, Zhang, Hang, Li, Xin, Zhao, Deli, Lu, Shijian, Miao, Chunyan, and Bing, Lidong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent advancements in large multimodal models (LMMs) have significantly enhanced performance across diverse tasks, with ongoing efforts to further integrate additional modalities such as video and audio. However, most existing LMMs remain vulnerable to hallucinations, the discrepancy between the factual multimodal input and the generated textual output, which has limited their applicability in various real-world scenarios. This paper presents the first systematic investigation of hallucinations in LMMs involving the three most common modalities: language, visual, and audio. Our study reveals two key contributors to hallucinations: overreliance on unimodal priors and spurious inter-modality correlations. To address these challenges, we introduce the benchmark The Curse of Multi-Modalities (CMM), which comprehensively evaluates hallucinations in LMMs, providing a detailed analysis of their underlying issues. Our findings highlight key vulnerabilities, including imbalances in modality integration and biases from training data, underscoring the need for balanced cross-modal learning and enhanced hallucination mitigation strategies. Based on our observations and findings, we suggest potential research directions that could enhance the reliability of LMMs., Comment: Project Page: cmm-damovl.site
- Published
- 2024
13. Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
- Author
-
Zhu, Yongxin, Li, Bocheng, Zhang, Hang, Li, Xin, Xu, Linli, and Bing, Lidong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Latent-based image generative models, such as Latent Diffusion Models (LDMs) and Mask Image Models (MIMs), have achieved notable success in image generation tasks. These models typically leverage reconstructive autoencoders like VQGAN or VAE to encode pixels into a more compact latent space and learn the data distribution in the latent space instead of directly from pixels. However, this practice raises a pertinent question: Is it truly the optimal choice? In response, we begin with an intriguing observation: despite sharing the same latent space, autoregressive models significantly lag behind LDMs and MIMs in image generation. This finding contrasts sharply with the field of NLP, where the autoregressive model GPT has established a commanding presence. To address this discrepancy, we introduce a unified perspective on the relationship between latent space and generative models, emphasizing the stability of latent space in image generative modeling. Furthermore, we propose a simple but effective discrete image tokenizer to stabilize the latent space for image generative modeling. Experimental results show that image autoregressive modeling with our tokenizer (DiGIT) benefits both image understanding and image generation with the next token prediction principle, which is inherently straightforward for GPT models but challenging for other generative models. Remarkably, for the first time, a GPT-style autoregressive model for images outperforms LDMs, which also exhibits substantial improvement akin to GPT when scaling up model size. Our findings underscore the potential of an optimized latent space and the integration of discrete tokenization in advancing the capabilities of image generative models. The code is available at \url{https://github.com/DAMO-NLP-SG/DiGIT}., Comment: Accepted at NeurIPS 2024
- Published
- 2024
14. AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data
- Author
-
Zhao, Xinjie, Blum, Moritz, Yang, Rui, Yang, Boming, Carpintero, Luis Márquez, Pina-Navarro, Mónica, Wang, Tony, Li, Xin, Li, Huitao, Fu, Yanran, Wang, Rongrong, Zhang, Juntao, and Li, Irene
- Subjects
Computer Science - Artificial Intelligence - Abstract
Large Language Models~(LLMs) have demonstrated capabilities across various applications but face challenges such as hallucination, limited reasoning abilities, and factual inconsistencies, especially when tackling complex, domain-specific tasks like question answering~(QA). While Knowledge Graphs~(KGs) have been shown to help mitigate these issues, research on the integration of LLMs with background KGs remains limited. In particular, user accessibility and the flexibility of the underlying KG have not been thoroughly explored. We introduce AGENTiGraph (Adaptive Generative ENgine for Task-based Interaction and Graphical Representation), a platform for knowledge management through natural language interaction. It integrates knowledge extraction, integration, and real-time visualization. AGENTiGraph employs a multi-agent architecture to dynamically interpret user intents, manage tasks, and integrate new knowledge, ensuring adaptability to evolving user requirements and data contexts. Our approach demonstrates superior performance in knowledge graph interactions, particularly for complex domain-specific tasks. Experimental results on a dataset of 3,500 test cases show AGENTiGraph significantly outperforms state-of-the-art zero-shot baselines, achieving 95.12\% accuracy in task classification and 90.45\% success rate in task execution. User studies corroborate its effectiveness in real-world scenarios. To showcase versatility, we extended AGENTiGraph to legislation and healthcare domains, constructing specialized KGs capable of answering complex queries in legal and medical contexts., Comment: 30 pages, 7 figures; Submitted to COLING 2025 System Demonstrations Track
- Published
- 2024
15. Towards Defining an Efficient and Expandable File Format for AI-Generated Contents
- Author
-
Gao, Yixin, Feng, Runsen, Li, Xin, Li, Weiping, and Chen, Zhibo
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Recently, AI-generated content (AIGC) has gained significant traction due to its powerful creation capability. However, the storage and transmission of large amounts of high-quality AIGC images inevitably pose new challenges for recent file formats. To overcome this, we define a new file format for AIGC images, named AIGIF, enabling ultra-low bitrate coding of AIGC images. Unlike compressing AIGC images intuitively with pixel-wise space as existing file formats, AIGIF instead compresses the generation syntax. This raises a crucial question: Which generation syntax elements, e.g., text prompt, device configuration, etc, are necessary for compression/transmission? To answer this question, we systematically investigate the effects of three essential factors: platform, generative model, and data configuration. We experimentally find that a well-designed composable bitstream structure incorporating the above three factors can achieve an impressive compression ratio of even up to 1/10,000 while still ensuring high fidelity. We also introduce an expandable syntax in AIGIF to support the extension of the most advanced generation models to be developed in the future.
- Published
- 2024
16. Flexible Operation of Electricity-HCNG Networks with Variable Hydrogen Fraction: A Distributionally Robust Joint Chance-Constrained Approach
- Author
-
Liu, Sicheng, Yang, Bo, Yang, Xu, Li, Xin, Wang, Zhaojian, and Guan, Xinping
- Subjects
Electrical Engineering and Systems Science - Systems and Control - Abstract
Hydrogen-enriched compressed natural gas (HCNG) is a promising way to utilize surplus renewable energy through hydrogen electrolysis and blending it into natural gas. However, the optimal hydrogen volume fraction (HVF) of HCNG varies following the daily fluctuations of renewable energy. Besides, facing the rapid volatility of renewable energy, ensuring rapid and reliable real-time adjustments is challenging for electricity-HCNG (E-HCNG) coupling networks. To this end, this paper proposes a flexible operation framework for electricity-HCNG (E-HCNG) networks against the fluctuations and volatility of renewable energy. Based on operations with variable HVF, the framework developed an E-HCNG system-level affine policy, which allows real-time re-dispatch of operations according to the volatility. Meanwhile, to guarantee the operational reliability of the affine policy, a distributionally robust joint chance constraint (DRJCC) is introduced, which limits the violation probability of operational constraints under the uncertainties of renewable energy volatility. Furthermore, in the solving process, to mitigate the over-conservation in DRJCC decomposition, an improved risk allocation method is proposed, utilizing the correlations among violations under the affine policy. Moreover, to tackle the non-convexities arising from the variable HVF, customized approximations for HCNG flow formulations are developed. The problem is finally reformulated into a mix-integer second-order cone programming problem. The effectiveness of the proposed method is validated both in small-scale and large-scale experiments.
- Published
- 2024
17. Improved Condensers for Chor-Goldreich Sources
- Author
-
Goodman, Jesse, Li, Xin, and Zuckerman, David
- Subjects
Computer Science - Computational Complexity - Abstract
One of the earliest models of weak randomness is the Chor-Goldreich (CG) source. A $(t,n,k)$-CG source is a sequence of random variables $X=(X_1,\dots,X_t)\sim(\{0,1\}^n)^t$, where each $X_i$ has min-entropy $k$ conditioned on any fixing of $X_1,\dots,X_{i-1}$. Chor and Goldreich proved that there is no deterministic way to extract randomness from such a source. Nevertheless, Doron, Moshkovitz, Oh, and Zuckerman showed that there is a deterministic way to condense a CG source into a string with small entropy gap. They gave applications of such a condenser to simulating randomized algorithms with small error and to certain cryptographic tasks. They studied the case where the block length $n$ and entropy rate $k/n$ are both constant. We study the much more general setting where the block length can be arbitrarily large, and the entropy rate can be arbitrarily small. We construct the first explicit condenser for CG sources in this setting, and it can be instantiated in a number of different ways. When the entropy rate of the CG source is constant, our condenser requires just a constant number of blocks $t$ to produce an output with entropy rate $0.9$, say. In the low entropy regime, using $t=$ poly$(n)$ blocks, our condenser can achieve output entropy rate $0.9$ even if each block has just $1$ bit of min-entropy. Moreover, these condensers have exponentially small error. Finally, we provide strong existential and impossibility results. For our existential result, we show that a random function is a seedless condenser (with surprisingly strong parameters) for any small family of sources. As a corollary, we get new existential results for seeded condensers and condensers for CG sources. For our impossibility result, we show the latter result is nearly tight, by giving a simple proof that the output of any condenser for CG sources must inherit the entropy gap of (one block of) its input., Comment: 66 pages
- Published
- 2024
18. Recent Progress on Multiferroic Hexagonal Rare-Earth Ferrites (h-RFeO3, R = Y,Dy-Lu)
- Author
-
Li, Xin, Yun, Yu, and Xu, Xiaoshan
- Subjects
Condensed Matter - Materials Science - Abstract
Multiferroic hexagonal rare-earth ferrites (h-RFeO3, R=Sc, Y, and rare earth), in which the improper ferroelectricity and canted antiferromagnetism coexist, have been advocated as promising candidates to pursue the room-temperature multiferroics, because of strong spin-spin interaction. The strong interactions between the ferroic orders and the structural distortions are appealing for high-density, energy-efficient electronic devices. Over the past decade, remarkable advances in atomic-scale synthesis, characterization, and material modeling enable the significant progresses in the understanding and manipulation of ferroic orders and their couplings in h-RFeO3 thin films. These results reveal a physical picture of rich ferroelectric and magnetic phenomena interconnected by a set of structural distortions and spin-lattice couplings, which provides guidance for the control of ferroic orders down to the nano scale and the discovery of novel physical phenomena. This review focus on state-of-the-art studies in complex phenomena related to the ferroelectricity and magnetism as well as the magnetoelectric couplings in multiferroic h-RFeO3, based on mostly the recent experimental efforts, aiming to stimulate fresh ideas in this field.
- Published
- 2024
19. Observation of polaronic state assisted sub-bandgap saturable absorption
- Author
-
Zhou, Li, Wang, Yiduo, Kang, Jianlong, Li, Xin, Long, Quan, Zhong, Xianming, Chen, Zhihui, Tong, Chuanjia, Chen, Keqiang, Deng, Zi-Lan, Zhang, Zhengwei, Shu, Chuan-Cun, Yuan, Yongbo, Ni, Xiang, Xiao, Si, Li, Xiangping, Wang, Yingwei, and He, Jun
- Subjects
Physics - Optics ,Physics - Applied Physics - Abstract
Polaronic effects involving stabilization of localized charge character by structural deformations and polarizations have attracted considerable investigations in soft lattice lead halide perovskites. However, the concept of polaron assisted nonlinear photonics remains largely unexplored, which has a wide range of applications from optoelectronics to telecommunications and quantum technologies. Here, we report the first observation of the polaronic state assisted saturable absorption through subbandgap excitation with a redshift exceeding 60 meV. By combining photoluminescence, transient absorption measurements and density functional theory calculations, we explicate that the anomalous nonlinear saturable absorption is caused by the transient picosecond timescale polaronic state formed by strong carrier exciton phonon coupling effect. The bandgap fluctuation can be further tuned through exciton phonon coupling of perovskites with different Young's modulus. This suggests that we can design targeted soft lattice lead halide perovskite with a specific structure to effectively manipulate exciton phonon coupling and exciton polaron formation. These findings profoundly expand our understanding of exciton polaronic nonlinear optics physics and provide an ideal platform for developing actively tunable nonlinear photonics applications.
- Published
- 2024
20. FairFML: Fair Federated Machine Learning with a Case Study on Reducing Gender Disparities in Cardiac Arrest Outcome Prediction
- Author
-
Li, Siqi, Wu, Qiming, Li, Xin, Miao, Di, Hong, Chuan, Gu, Wenjun, Shang, Yuqing, Okada, Yohei, Chen, Michael Hao, Yan, Mengying, Ning, Yilin, Ong, Marcus Eng Hock, and Liu, Nan
- Subjects
Computer Science - Computers and Society ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Objective: Mitigating algorithmic disparities is a critical challenge in healthcare research, where ensuring equity and fairness is paramount. While large-scale healthcare data exist across multiple institutions, cross-institutional collaborations often face privacy constraints, highlighting the need for privacy-preserving solutions that also promote fairness. Materials and Methods: In this study, we present Fair Federated Machine Learning (FairFML), a model-agnostic solution designed to reduce algorithmic bias in cross-institutional healthcare collaborations while preserving patient privacy. As a proof of concept, we validated FairFML using a real-world clinical case study focused on reducing gender disparities in cardiac arrest outcome prediction. Results: We demonstrate that the proposed FairFML framework enhances fairness in federated learning (FL) models without compromising predictive performance. Our findings show that FairFML improves model fairness by up to 65% compared to the centralized model, while maintaining performance comparable to both local and centralized models, as measured by receiver operating characteristic analysis. Discussion and Conclusion: FairFML offers a promising and flexible solution for FL collaborations, with its adaptability allowing seamless integration with various FL frameworks and models, from traditional statistical methods to deep learning techniques. This makes FairFML a robust approach for developing fairer FL models across diverse clinical and biomedical applications.
- Published
- 2024
21. A Comprehensive Study on GDPR-Oriented Analysis of Privacy Policies: Taxonomy, Corpus and GDPR Concept Classifiers
- Author
-
Tang, Peng, Li, Xin, Chen, Yuxin, Qiu, Weidong, Mei, Haochen, Holmes, Allison, Li, Fenghua, and Li, Shujun
- Subjects
Computer Science - Cryptography and Security - Abstract
Machine learning based classifiers that take a privacy policy as the input and predict relevant concepts are useful in different applications such as (semi-)automated compliance analysis against requirements of the EU GDPR. In all past studies, such classifiers produce a concept label per segment (e.g., sentence or paragraph) and their performances were evaluated by using a dataset of labeled segments without considering the privacy policy they belong to. However, such an approach could overestimate the performance in real-world settings, where all segments in a new privacy policy are supposed to be unseen. Additionally, we also observed other research gaps, including the lack of a more complete GDPR taxonomy and the less consideration of hierarchical information in privacy policies. To fill such research gaps, we developed a more complete GDPR taxonomy, created the first corpus of labeled privacy policies with hierarchical information, and conducted the most comprehensive performance evaluation of GDPR concept classifiers for privacy policies. Our work leads to multiple novel findings, including the confirmed inappropriateness of splitting training and test sets at the segment level, the benefits of considering hierarchical information, and the limitations of the "one size fits all" approach, and the significance of testing cross-corpus generalizability.
- Published
- 2024
22. LHAASO detection of very-high-energy gamma-ray emission surrounding PSR J0248+6021
- Author
-
Cao, Zhen, Aharonian, F., An, Q., Axikegu, Bai, Y. X., Bao, Y. W., Bastieri, D., Bi, X. J., Bi, Y. J., Cai, J. T., Cao, Q., Cao, W. Y., Cao, Zhe, Chang, J., Chang, J. F., Chen, A. M., Chen, E. S., Chen, Liang, Chen, Lin, Chen, Long, Chen, M. J., Chen, M. L., Chen, Q. H., Chen, S. H., Chen, S. Z., Chen, T. L., Chen, Y., Cheng, N., Cheng, Y. D., Cui, M. Y., Cui, S. W., Cui, X. H., Cui, Y. D., Dai, B. Z., Dai, H. L., Dai, Z. G., Danzengluobu, Dong, X. Q., Duan, K. K., Fan, J. H., Fan, Y. Z., Fang, J., Fang, K., Feng, C. F., Feng, L., Feng, S. H., Feng, X. T., Feng, Y. L., Gabici, S., Gao, B., Gao, C. D., Gao, L. Q., Gao, Q., Gao, W., Gao, W. K., Ge, M. M., Geng, L. S., Giacinti, G., Gong, G. H., Gou, Q. B., Gu, M. H., Guo, F. L., Guo, X. L., Guo, Y. Q., Guo, Y. Y., Han, Y. A., He, H. H., He, H. N., He, J. Y., He, X. B., He, Y., Hor, Y. K., Hou, B. W., Hou, C., Hou, X., Hu, H. B., Hu, Q., Hu, S. C., Huang, D. H., Huang, T. Q., Huang, W. J., Huang, X. T., Huang, X. Y., Huang, Y., Huang, Z. C., Ji, X. L., Jia, H. Y., Jia, K., Jiang, K., Jiang, X. W., Jiang, Z. J., Jin, M., Kang, M. M., Ke, T., Kuleshov, D., Kurinov, K., Li, B. B., Li, Cheng, Li, Cong, Li, D., Li, F., Li, H. B., Li, H. C., Li, H. Y., Li, J., Li, Jian, Li, Jie, Li, K., Li, W. L., Li, X. R., Li, Xin, Li, Y. Z., Li, Zhe, Li, Zhuo, Liang, E. W., Liang, Y. F., Lin, J., Liu, B., Liu, C., Liu, D., Liu, H., Liu, H. D., Liu, J., Liu, J. L., Liu, J. Y., Liu, M. Y., Liu, R. Y., Liu, S. M., Liu, W., Liu, Y., Liu, Y. N., Lu, R., Luo, Q., Lv, H. K., Ma, B. Q., Ma, L. L., Ma, X. H., Mao, J. R., Min, Z., Mitthumsiri, W., Mu, H. J., Nan, Y. C., Neronov, A., Ou, Z. W., Pang, B. Y., Pattarakijwanich, P., Pei, Z. Y., Qi, M. Y., Qi, Y. Q., Qiao, B. Q., Qin, J. J., Ruffolo, D., Sáiz, A., Semikoz, D., Shao, C. Y., Shao, L., Shchegolev, O., Sheng, X. D., Shu, F. W., Song, H. C., Stenkin, Yu. V., Stepanov, V., Su, Y., Sun, Q. N., Sun, X. N., Sun, Z. B., Tam, P. H. T., Tang, Q. W., Tang, Z. B., Tian, W. W., Wang, C., Wang, C. B., Wang, G. W., Wang, H. G., Wang, H. H., Wang, J. C., Wang, K., Wang, L. P., Wang, L. Y., Wang, P. H., Wang, R., Wang, W., Wang, X. G., Wang, X. Y., Wang, Y., Wang, Y. D., Wang, Y. J., Wang, Z. H., Wang, Z. X., Wang, Zhen, Wang, Zheng, Wei, D. M., Wei, J. J., Wei, Y. J., Wen, T., Wu, C. Y., Wu, H. R., Wu, S., Wu, X. F., Wu, Y. S., Xi, S. Q., Xia, J., Xia, J. J., Xiang, G. M., Xiao, D. X., Xiao, G., Xin, G. G., Xin, Y. L., Xing, Y., Xiong, Z., Xu, D. L., Xu, R. F., Xu, R. X., Xu, W. L., Xue, L., Yan, D. H., Yan, J. Z., Yan, T., Yang, C. W., Yang, F., Yang, F. F., Yang, H. W., Yang, J. Y., Yang, L. L., Yang, M. J., Yang, R. Z., Yang, S. B., Yao, Y. H., Yao, Z. G., Ye, Y. M., Yin, L. Q., Yin, N., You, X. H., You, Z. Y., Yu, Y. H., Yuan, Q., Yue, H., Zeng, H. D., Zeng, T. X., Zeng, W., Zha, M., Zhang, B. B., Zhang, F., Zhang, H. M., Zhang, H. Y., Zhang, J. L., Zhang, L. X., Zhang, Li, Zhang, P. F., Zhang, P. P., Zhang, R., Zhang, S. B., Zhang, S. R., Zhang, S. S., Zhang, X., Zhang, X. P., Zhang, Y. F., Zhang, Yi, Zhang, Yong, Zhao, B., Zhao, J., Zhao, L., Zhao, L. Z., Zhao, S. P., Zheng, F., Zheng, J. H., Zhou, B., Zhou, H., Zhou, J. N., Zhou, M., Zhou, P., Zhou, R., Zhou, X. X., Zhu, C. G., Zhu, F. R., Zhu, H., Zhu, K. J., Zou, Y. C., and Zuo, X.
- Subjects
Astrophysics - High Energy Astrophysical Phenomena - Abstract
We report the detection of an extended very-high-energy (VHE) gamma-ray source coincident with the locations of middle-aged (62.4~\rm kyr) pulsar PSR J0248+6021, by using the LHAASO-WCDA data of live 796 days and LHAASO-KM2A data of live 1216 days. A significant excess of \gray induced showers is observed both by WCDA in energy bands of 1-25~\rm TeV and KM2A in energy bands of $>$ 25~\rm TeV with 7.3 $\sigma$ and 13.5 $\sigma$, respectively. The best-fit position derived through WCDA data is R.A. = 42.06$^\circ \pm$ 0.12$^\circ$ and Dec. = 60.24$^\circ \pm $ 0.13$^\circ$ with an extension of 0.69$^\circ\pm$0.15$^\circ$ and that of the KM2A data is R.A.= 42.29$^\circ \pm $ 0.13$^\circ$ and Dec. = 60.38$^\circ \pm$ 0.07$^\circ$ with an extension of 0.37$^\circ\pm$0.07$^\circ$. No clear extended multiwavelength counterpart of this LHAASO source has been found from the radio band to the GeV band. The most plausible explanation of the VHE \gray emission is the inverse Compton process of highly relativistic electrons and positrons injected by the pulsar. These electrons/positrons are hypothesized to be either confined within the pulsar wind nebula or to have already escaped into the interstellar medium, forming a pulsar halo., Comment: 12 pages, 10 figures, Accepted by Sci. China-Phys. Mech. Astron
- Published
- 2024
23. Model Comparisons: XNet Outperforms KAN
- Author
-
Li, Xin, Xia, Zhihong Jeff, and Zheng, Xiaotao
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
In the fields of computational mathematics and artificial intelligence, the need for precise data modeling is crucial, especially for predictive machine learning tasks. This paper explores further XNet, a novel algorithm that employs the complex-valued Cauchy integral formula, offering a superior network architecture that surpasses traditional Multi-Layer Perceptrons (MLPs) and Kolmogorov-Arnold Networks (KANs). XNet significant improves speed and accuracy across various tasks in both low and high-dimensional spaces, redefining the scope of data-driven model development and providing substantial improvements over established time series models like LSTMs.
- Published
- 2024
24. AMR-Evol: Adaptive Modular Response Evolution Elicits Better Knowledge Distillation for Large Language Models in Code Generation
- Author
-
Luo, Ziyang, Li, Xin, Lin, Hongzhan, Ma, Jing, and Bing, Lidong
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Software Engineering - Abstract
The impressive performance of proprietary LLMs like GPT4 in code generation has led to a trend to replicate these capabilities in open-source models through knowledge distillation (e.g. Code Evol-Instruct). However, these efforts often neglect the crucial aspect of response quality, relying heavily on teacher models for direct response distillation. This paradigm, especially for complex instructions, can degrade the quality of synthesized data, compromising the knowledge distillation process. To this end, our study introduces the Adaptive Modular Response Evolution (AMR-Evol) framework, which employs a two-stage process to refine response distillation. The first stage, modular decomposition, breaks down the direct response into more manageable sub-modules. The second stage, adaptive response evolution, automatically evolves the response with the related function modules. Our experiments with three popular code benchmarks (HumanEval, MBPP, and EvalPlus) attest to the superiority of the AMR-Evol framework over baseline response distillation methods. By comparing with the open-source Code LLMs trained on a similar scale of data, we observed performance enhancements: more than +3.0 points on HumanEval-Plus and +1.0 points on MBPP-Plus, which underscores the effectiveness of our framework. Our codes are available at https://github.com/ChiYeungLaw/AMR-Evol., Comment: EMNLP 2024
- Published
- 2024
25. Robust Gaussian Splatting SLAM by Leveraging Loop Closure
- Author
-
Zhu, Zunjie, Fang, Youxu, Li, Xin, Yan, Chengang, Xu, Feng, Yuen, Chau, and Li, Yanyan
- Subjects
Computer Science - Robotics - Abstract
3D Gaussian Splatting algorithms excel in novel view rendering applications and have been adapted to extend the capabilities of traditional SLAM systems. However, current Gaussian Splatting SLAM methods, designed mainly for hand-held RGB or RGB-D sensors, struggle with tracking drifts when used with rotating RGB-D camera setups. In this paper, we propose a robust Gaussian Splatting SLAM architecture that utilizes inputs from rotating multiple RGB-D cameras to achieve accurate localization and photorealistic rendering performance. The carefully designed Gaussian Splatting Loop Closure module effectively addresses the issue of accumulated tracking and mapping errors found in conventional Gaussian Splatting SLAM systems. First, each Gaussian is associated with an anchor frame and categorized as historical or novel based on its timestamp. By rendering different types of Gaussians at the same viewpoint, the proposed loop detection strategy considers both co-visibility relationships and distinct rendering outcomes. Furthermore, a loop closure optimization approach is proposed to remove camera pose drift and maintain the high quality of 3D Gaussian models. The approach uses a lightweight pose graph optimization algorithm to correct pose drift and updates Gaussians based on the optimized poses. Additionally, a bundle adjustment scheme further refines camera poses using photometric and geometric constraints, ultimately enhancing the global consistency of scenarios. Quantitative and qualitative evaluations on both synthetic and real-world datasets demonstrate that our method outperforms state-of-the-art methods in camera pose estimation and novel view rendering tasks. The code will be open-sourced for the community.
- Published
- 2024
26. On a sum of the error term of the Dirichlet divisor function over primes
- Author
-
Guo, Zhen and Li, Xin
- Subjects
Mathematics - Number Theory - Abstract
Let $d(n)$ be the Dirichlet divisor function and $\Delta(x)$ denote the error term of the sum $\sum_{n\leqslant x}d(n)$ for a large real variable $x$. In this paper we focus on the sum $\sum_{p\leqslant x}\Delta^2(p)$, where $p$ runs over primes. We prove that there exists an asymptotic formula., Comment: 13 pages
- Published
- 2024
27. Resonant amplitude distribution of the Hilda asteroids and the free-floating planet flyby scenario
- Author
-
Li, Jian, Xia, Zhihong Jeff, Lei, Hanlun, Georgakarakos, Nikolaos, Yoshida, Fumi, and Li, Xin
- Subjects
Astrophysics - Earth and Planetary Astrophysics - Abstract
In some recent work, we provided a quantitative explanation for the number asymmetry of Jupiter Trojans by hypothesizing a free-floating planet (FFP) flyby into the Solar System. In support of that explanation, this paper examines the influence of the same FFP flyby on the Hilda asteroids, which orbit stably in the 3:2 mean motion resonance with Jupiter. The observed Hilda population exhibits two distinct resonant patterns: (1) a lack of Hildas with resonant amplitudes < 40 deg. at eccentricities < 0.1; (2) a nearly complete absence of Hildas with amplitudes < 20 deg., regardless of eccentricity. Previous models of Jupiter migration and resonance capture could account for the eccentricity distribution of Hildas but have failed to replicate the unusual absence of those with the smallest resonant amplitudes, which theoretically should be the most stable. Here we report that the FFP flyby can trigger an extremely rapid outward migration of Jupiter, causing a sudden shift in the 3:2 Jovian resonance. Consequently, Hildas with varying eccentricities would have their resonant amplitudes changed by different degrees, leading to the observed resonant patterns. We additionally show that, in our FFP flyby scenario, these patterns are consistently present across different resonant amplitude distributions of primordial Hildas arising from various formation models. We also place constraints on the potential parameters of the FFP, suggesting it should have an eccentricity of 1-1.3 or larger, an inclination up to 30 deg. or higher, and a minimum mass of about 50 Earth masses., Comment: 22 pages, 10 figures, accepted for publication in Icarus
- Published
- 2024
28. On high power moments of the error term of the Dirichlet divisor function over primes
- Author
-
Guo, Zhen and Li, Xin
- Subjects
Mathematics - Number Theory - Abstract
Let $3\leqslant k\leqslant9$ be a fixed integer, $p$ be a prime and $d(n)$ denote the Dirichlet divisor function. We use $\Delta(x)$ to denote the error term in the asymptotic formula of the summatory function of $d(n)$. The aim of this paper is to study the $k$-th power moments of $\Delta(p)$, namely $\sum_{p\leqslant x}\Delta^k(p)$, and we give an asymptotic formula., Comment: 15 pages. arXiv admin note: substantial text overlap with arXiv:2410.00329
- Published
- 2024
29. UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
- Author
-
Yu, Qiaojun, Huang, Siyuan, Yuan, Xibin, Jiang, Zhengkai, Hao, Ce, Li, Xin, Chang, Haonan, Wang, Junbo, Liu, Liu, Li, Hongsheng, Gao, Peng, and Lu, Cewu
- Subjects
Computer Science - Robotics - Abstract
Previous studies on robotic manipulation are based on a limited understanding of the underlying 3D motion constraints and affordances. To address these challenges, we propose a comprehensive paradigm, termed UniAff, that integrates 3D object-centric manipulation and task understanding in a unified formulation. Specifically, we constructed a dataset labeled with manipulation-related key attributes, comprising 900 articulated objects from 19 categories and 600 tools from 12 categories. Furthermore, we leverage MLLMs to infer object-centric representations for manipulation tasks, including affordance recognition and reasoning about 3D motion constraints. Comprehensive experiments in both simulation and real-world settings indicate that UniAff significantly improves the generalization of robotic manipulation for tools and articulated objects. We hope that UniAff will serve as a general baseline for unified robotic manipulation tasks in the future. Images, videos, dataset, and code are published on the project website at:https://sites.google.com/view/uni-aff/home
- Published
- 2024
30. Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models
- Author
-
Li, Xin, Chen, Weize, Chu, Qizhi, Li, Haopeng, Sun, Zhaojun, Li, Ran, Qian, Chen, Wei, Yiwei, Liu, Zhiyuan, Shi, Chuan, Sun, Maosong, and Yang, Cheng
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
The need to analyze graphs is ubiquitous across various fields, from social networks to biological research and recommendation systems. Therefore, enabling the ability of large language models (LLMs) to process graphs is an important step toward more advanced general intelligence. However, current LLM benchmarks on graph analysis require models to directly reason over the prompts describing graph topology, and are thus limited to small graphs with only a few dozens of nodes. In contrast, human experts typically write programs based on popular libraries for task solving, and can thus handle graphs with different scales. To this end, a question naturally arises: can LLMs analyze graphs like professionals? In this paper, we introduce ProGraph, a manually crafted benchmark containing 3 categories of graph tasks. The benchmark expects solutions based on programming instead of directly reasoning over raw inputs. Our findings reveal that the performance of current LLMs is unsatisfactory, with the best model achieving only 36% accuracy. To bridge this gap, we propose LLM4Graph datasets, which include crawled documents and auto-generated codes based on 6 widely used graph libraries. By augmenting closed-source LLMs with document retrieval and fine-tuning open-source ones on the codes, we show 11-32% absolute improvements in their accuracies. Our results underscore that the capabilities of LLMs in handling structured data are still under-explored, and show the effectiveness of LLM4Graph in enhancing LLMs' proficiency of graph analysis. The benchmark, datasets and enhanced open-source models are available at https://github.com/BUPT-GAMMA/ProGraph., Comment: NeurIPS 2024
- Published
- 2024
31. Cauchy activation function and XNet
- Author
-
Li, Xin, Xia, Zhihong, and Zhang, Hongkun
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Neural and Evolutionary Computing - Abstract
We have developed a novel activation function, named the Cauchy Activation Function. This function is derived from the Cauchy Integral Theorem in complex analysis and is specifically tailored for problems requiring high precision. This innovation has led to the creation of a new class of neural networks, which we call (Comple)XNet, or simply XNet. We will demonstrate that XNet is particularly effective for high-dimensional challenges such as image classification and solving Partial Differential Equations (PDEs). Our evaluations show that XNet significantly outperforms established benchmarks like MNIST and CIFAR-10 in computer vision, and offers substantial advantages over Physics-Informed Neural Networks (PINNs) in both low-dimensional and high-dimensional PDE scenarios.
- Published
- 2024
32. A History-Guided Regional Partitioning Evolutionary Optimization for Solving the Flexible Job Shop Problem with Limited Multi-load Automated Guided Vehicles
- Author
-
Liu, Feige, Lu, Chao, and Li, Xin
- Subjects
Electrical Engineering and Systems Science - Systems and Control ,Computer Science - Neural and Evolutionary Computing - Abstract
In a flexible job shop environment, using Automated Guided Vehicles (AGVs) to transport jobs and process materials is an important way to promote the intelligence of the workshop. Compared with single-load AGVs, multi-load AGVs can improve AGV utilization, reduce path conflicts, etc. Therefore, this study proposes a history-guided regional partitioning algorithm (HRPEO) for the flexible job shop scheduling problem with limited multi-load AGVs (FJSPMA). First, the encoding and decoding rules are designed according to the characteristics of multi-load AGVs, and then the initialization rule based on the branch and bound method is used to generate the initial population. Second, to prevent the algorithm from falling into a local optimum, the algorithm adopts a regional partitioning strategy. This strategy divides the solution space into multiple regions and measures the potential of the regions. After that, cluster the regions into multiple clusters in each iteration, and selects individuals for evolutionary search based on the set of clusters. Third, a local search strategy is designed to improve the exploitation ability of the algorithm, which uses a greedy approach to optimize machines selection and transportation sequence according to the characteristics of FJSPMA. Finally, a large number of experiments are carried out on the benchmarks to test the performance of the algorithm. Compared with multiple advanced algorithms, the results show that the HRPEO has a better advantage in solving FJSPMA., Comment: 14 pages
- Published
- 2024
33. Adaptive Knowledge-based Multi-Objective Evolutionary Algorithm for Hybrid Flow Shop Scheduling Problems with Multiple Parallel Batch Processing Stages
- Author
-
Liu, Feige, Li, Xin, Lu, Chao, and Gong, Wenying
- Subjects
Computer Science - Neural and Evolutionary Computing ,Electrical Engineering and Systems Science - Systems and Control - Abstract
Parallel batch processing machines have extensive applications in the semiconductor manufacturing process. However, the problem models in previous studies regard parallel batch processing as a fixed processing stage in the machining process. This study generalizes the problem model, in which users can arbitrarily set certain stages as parallel batch processing stages according to their needs. A Hybrid Flow Shop Scheduling Problem with Parallel Batch Processing Machines (PBHFSP) is solved in this paper. Furthermore, an Adaptive Knowledge-based Multi-Objective Evolutionary Algorithm (AMOEA/D) is designed to simultaneously optimize both makespan and Total Energy Consumption (TEC). Firstly, a hybrid initialization strategy with heuristic rules based on knowledge of PBHFSP is proposed to generate promising solutions. Secondly, the disjunctive graph model has been established based on the knowledge to find the critical-path of PBHFS. Then, a critical-path based neighborhood search is proposed to enhance the exploitation ability of AMOEA/D. Moreover, the search time is adaptively adjusted based on learning experience from Q-learning and Decay Law. Afterward, to enhance the exploration capability of the algorithm, AMOEA/D designs an improved population updating strategy with a weight vector updating strategy. These strategies rematch individuals with weight vectors, thereby maintaining the diversity of the population. Finally, the proposed algorithm is compared with state-of-the-art algorithms. The experimental results show that the AMOEA/D is superior to the comparison algorithms in solving the PBHFSP., Comment: 12 pages
- Published
- 2024
34. SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation
- Author
-
Li, Xin, Huang, Siyuan, Yu, Qiaojun, Jiang, Zhengkai, Hao, Ce, Zhu, Yimeng, Li, Hongsheng, Gao, Peng, and Lu, Cewu
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Automating garment manipulation poses a significant challenge for assistive robotics due to the diverse and deformable nature of garments. Traditional approaches typically require separate models for each garment type, which limits scalability and adaptability. In contrast, this paper presents a unified approach using vision-language models (VLMs) to improve keypoint prediction across various garment categories. By interpreting both visual and semantic information, our model enables robots to manage different garment states with a single model. We created a large-scale synthetic dataset using advanced simulation techniques, allowing scalable training without extensive real-world data. Experimental results indicate that the VLM-based method significantly enhances keypoint detection accuracy and task success rates, providing a more flexible and general solution for robotic garment manipulation. In addition, this research also underscores the potential of VLMs to unify various garment manipulation tasks within a single framework, paving the way for broader applications in home automation and assistive robotics for future.
- Published
- 2024
35. Improper flexoelectricity in hexagonal rare-earth ferrites
- Author
-
Li, Xin, Ren, Guodong, Yun, Yu, Thind, Arashdeep Singh, Shah, Amit Kumar, Bowers, Abbey, Mishra, Rohan, and Xu, Xiaoshan
- Subjects
Condensed Matter - Materials Science - Abstract
Flexoelectricity is a universal effect that generates electric polarization due to broken inversion symmetry caused by local strain gradient. The large strain gradient at nanoscale makes flexo-electric effects, especially in nanoscopic ferroelectric materials, promising in sensors, actuator, energy harvesting, and memory applications. In this work, we studied flexoelectricity in hexagonal ferrites h-YbFeO3, an improper ferroelectric expected to have weak piezoelectricity and low sensitivity to depolarization field, which are advantageous for studying flexoelectric effects. We show that in h-YbFeO3 epitaxial thin films, strain gradient on the order of 10^6 m-1 occurs near grain boundaries, which has a significant impact on the non-polar K3 structural distortion that induces spontaneous polarization. The phenomenological model based on the Landau theory of improper ferroelectricity suggests an indirect flexoelectric effect on the order of 10 nC/m in h-YbFeO3, which is substantially larger than the expectation from Kogan mechanism. These results reveal a novel microscopic mechanism of coupling between strain gradient and polarization mediated by structural distortion, which we call improper flexoelectricity.
- Published
- 2024
36. Prior Knowledge Distillation Network for Face Super-Resolution
- Author
-
Yang, Qiu, Sun, Xiao, Li, Xin-yu, Cui, Feng-Qi, Guo, Yu-Tong, Hu, Shuang-Zhen, Luo, Ping, and Li, Si-Ying
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The purpose of face super-resolution (FSR) is to reconstruct high-resolution (HR) face images from low-resolution (LR) inputs. With the continuous advancement of deep learning technologies, contemporary prior-guided FSR methods initially estimate facial priors and then use this information to assist in the super-resolution reconstruction process. However, ensuring the accuracy of prior estimation remains challenging, and straightforward cascading and convolutional operations often fail to fully leverage prior knowledge. Inaccurate or insufficiently utilized prior information inevitably degrades FSR performance. To address this issue, we propose a prior knowledge distillation network (PKDN) for FSR, which involves transferring prior information from the teacher network to the student network. This approach enables the network to learn priors during the training stage while relying solely on low-resolution facial images during the testing stage, thus mitigating the adverse effects of prior estimation inaccuracies. Additionally, we incorporate robust attention mechanisms to design a parsing map fusion block that effectively utilizes prior information. To prevent feature loss, we retain multi-scale features during the feature extraction stage and employ them in the subsequent super-resolution reconstruction process. Experimental results on benchmark datasets demonstrate that our PKDN approach surpasses existing FSR methods in generating high-quality face images.
- Published
- 2024
37. URSimulator: Human-Perception-Driven Prompt Tuning for Enhanced Virtual Urban Renewal via Diffusion Models
- Author
-
Hu, Chuanbo, Jia, Shan, and Li, Xin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Tackling Urban Physical Disorder (e.g., abandoned buildings, litter, messy vegetation, graffiti) is essential, as it negatively impacts the safety, well-being, and psychological state of communities. Urban Renewal is the process of revitalizing these neglected and decayed areas within a city to improve the physical environment and quality of life for residents. Effective urban renewal efforts can transform these environments, enhancing their appeal and livability. However, current research lacks simulation tools that can quantitatively assess and visualize the impacts of renewal efforts, often relying on subjective judgments. Such tools are crucial for planning and implementing effective strategies by providing a clear visualization of potential changes and their impacts. This paper presents a novel framework addressing this gap by using human perception feedback to simulate street environment enhancement. We develop a prompt tuning approach that integrates text-driven Stable Diffusion with human perception feedback, iteratively editing local areas of street view images to better align with perceptions of beauty, liveliness, and safety. Our experiments show that this framework significantly improves perceptions of urban environments, with increases of 17.60% in safety, 31.15% in beauty, and 28.82% in liveliness. In contrast, advanced methods like DiffEdit achieve only 2.31%, 11.87%, and 15.84% improvements, respectively. We applied this framework across various virtual scenarios, including neighborhood improvement, building redevelopment, green space expansion, and community garden creation. The results demonstrate its effectiveness in simulating urban renewal, offering valuable insights for urban planning and policy-making.
- Published
- 2024
38. UU-Mamba: Uncertainty-aware U-Mamba for Cardiovascular Segmentation
- Author
-
Tsai, Ting Yu, Lin, Li, Hu, Shu, Tsao, Connie W., Li, Xin, Chang, Ming-Ching, Zhu, Hongtu, and Wang, Xin
- Subjects
Computer Science - Artificial Intelligence - Abstract
Building on the success of deep learning models in cardiovascular structure segmentation, increasing attention has been focused on improving generalization and robustness, particularly in small, annotated datasets. Despite recent advancements, current approaches often face challenges such as overfitting and accuracy limitations, largely due to their reliance on large datasets and narrow optimization techniques. This paper introduces the UU-Mamba model, an extension of the U-Mamba architecture, designed to address these challenges in both cardiac and vascular segmentation. By incorporating Sharpness-Aware Minimization (SAM), the model enhances generalization by targeting flatter minima in the loss landscape. Additionally, we propose an uncertainty-aware loss function that combines region-based, distribution-based, and pixel-based components to improve segmentation accuracy by capturing both local and global features. While the UU-Mamba model has already demonstrated great performance, further testing is required to fully assess its generalization and robustness. We expand our evaluation by conducting new trials on the ImageCAS (coronary artery) and Aorta (aortic branches and zones) datasets, which present more complex segmentation challenges than the ACDC dataset (left and right ventricles) used in our previous work, showcasing the model's adaptability and resilience. We confirm UU-Mamba's superior performance over leading models such as TransUNet, Swin-Unet, nnUNet, and nnFormer. Moreover, we provide a more comprehensive evaluation of the model's robustness and segmentation accuracy, as demonstrated by extensive experiments.
- Published
- 2024
39. ChemEval: A Comprehensive Multi-Level Chemical Evaluation for Large Language Models
- Author
-
Huang, Yuqing, Zhang, Rongyang, He, Xuesong, Zhi, Xuyang, Wang, Hao, Li, Xin, Xu, Feiyang, Liu, Deguang, Liang, Huadong, Li, Yi, Cui, Jian, Liu, Zimu, Wang, Shijin, Hu, Guoping, Liu, Guiquan, Liu, Qi, Lian, Defu, and Chen, Enhong
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Physics - Chemical Physics ,Quantitative Biology - Biomolecules - Abstract
There is a growing interest in the role that LLMs play in chemistry which lead to an increased focus on the development of LLMs benchmarks tailored to chemical domains to assess the performance of LLMs across a spectrum of chemical tasks varying in type and complexity. However, existing benchmarks in this domain fail to adequately meet the specific requirements of chemical research professionals. To this end, we propose \textbf{\textit{ChemEval}}, which provides a comprehensive assessment of the capabilities of LLMs across a wide range of chemical domain tasks. Specifically, ChemEval identified 4 crucial progressive levels in chemistry, assessing 12 dimensions of LLMs across 42 distinct chemical tasks which are informed by open-source data and the data meticulously crafted by chemical experts, ensuring that the tasks have practical value and can effectively evaluate the capabilities of LLMs. In the experiment, we evaluate 12 mainstream LLMs on ChemEval under zero-shot and few-shot learning contexts, which included carefully selected demonstration examples and carefully designed prompts. The results show that while general LLMs like GPT-4 and Claude-3.5 excel in literature understanding and instruction following, they fall short in tasks demanding advanced chemical knowledge. Conversely, specialized LLMs exhibit enhanced chemical competencies, albeit with reduced literary comprehension. This suggests that LLMs have significant potential for enhancement when tackling sophisticated tasks in the field of chemistry. We believe our work will facilitate the exploration of their potential to drive progress in chemistry. Our benchmark and analysis will be available at {\color{blue} \url{https://github.com/USTC-StarTeam/ChemEval}}.
- Published
- 2024
40. LASERS: LAtent Space Encoding for Representations with Sparsity for Generative Modeling
- Author
-
Li, Xin and Sarwate, Anand
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Learning compact and meaningful latent space representations has been shown to be very useful in generative modeling tasks for visual data. One particular example is applying Vector Quantization (VQ) in variational autoencoders (VQ-VAEs, VQ-GANs, etc.), which has demonstrated state-of-the-art performance in many modern generative modeling applications. Quantizing the latent space has been justified by the assumption that the data themselves are inherently discrete in the latent space (like pixel values). In this paper, we propose an alternative representation of the latent space by relaxing the structural assumption than the VQ formulation. Specifically, we assume that the latent space can be approximated by a union of subspaces model corresponding to a dictionary-based representation under a sparsity constraint. The dictionary is learned/updated during the training process. We apply this approach to look at two models: Dictionary Learning Variational Autoencoders (DL-VAEs) and DL-VAEs with Generative Adversarial Networks (DL-GANs). We show empirically that our more latent space is more expressive and has leads to better representations than the VQ approach in terms of reconstruction quality at the expense of a small computational overhead for the latent space computation. Our results thus suggest that the true benefit of the VQ approach might not be from discretization of the latent space, but rather the lossy compression of the latent space. We confirm this hypothesis by showing that our sparse representations also address the codebook collapse issue as found common in VQ-family models., Comment: Preprint, under review. Submitted to 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
- Published
- 2024
41. Room-temperature valley-selective emission in Si-MoSe2 heterostructures enabled by high-quality-factor chiroptical cavities
- Author
-
Pan, Feng, Li, Xin, Johnson, Amalya C., Dhuey, Scott, Saunders, Ashley, Hu, Meng-Xia, Dixon, Jefferson P., Dagli, Sahil, Lau, Sze-Cheung, Weng, Tingting, Chen, Chih-Yi, Zeng, Jun-Hao, Apte, Rajas, Heinz, Tony F., Liu, Fang, Deng, Zi-Lan, and Dionne, Jennifer A.
- Subjects
Physics - Optics ,Condensed Matter - Materials Science - Abstract
Transition metal dichalcogenides (TMDCs) possess valley pseudospin, allowing photon spin to be coupled to electron spin and enabling initialization and readout of both classical and quantum information. Rapid valley-dephasing processes have impeded the development of scalable, high-performance valleytronic devices operating at room temperature. Here we demonstrate that a chiral resonant metasurface can enable room-temperature valley-selective emission, even with linearly polarized excitation. This platform provides circular eigen-polarization states with a high quality factor (Q-factor) and strong chiral near-field enhancement, resulting in unitary emission circular dichroism (i.e. single-handed circularly polarized emission). Our fabricated Si chiral metasurfaces exhibit chiral electromagnetic modes with Q-factors up to 450 at visible wavelengths, spectrally tuned to the exciton energy of MoSe2 monolayers. Using spatially- and spectrally-resolved mapping from temperatures of 100 K to 294 K, we demonstrate degrees of circular polarization (DOP) as high as 0.5 at room temperature. Reciprocal space mapping of the exciton emission reveals the chiral q-BIC localizes valley-selective emission in the vicinity of the photonic gamma-point. Photon-spin and time-resolved photoluminescence measurements show that the high DOP can be attributed to the significantly increased chiroptical local density of states provided by the metasurface, which enhances valley-specific radiative transition rates by a factor of approximately 13, with lifetimes as short as 189 ps. Our work could facilitate the development of compact chiral classical and quantum light sources and the creation of molecular chiral polaritons for quantum enantioselective synthesis.
- Published
- 2024
42. Synergistic Spotting and Recognition of Micro-Expression via Temporal State Transition
- Author
-
Zou, Bochao, Guo, Zizheng, Qin, Wenfeng, Li, Xin, Wang, Kangsheng, and Ma, Huimin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Micro-expressions are involuntary facial movements that cannot be consciously controlled, conveying subtle cues with substantial real-world applications. The analysis of micro-expressions generally involves two main tasks: spotting micro-expression intervals in long videos and recognizing the emotions associated with these intervals. Previous deep learning methods have primarily relied on classification networks utilizing sliding windows. However, fixed window sizes and window-level hard classification introduce numerous constraints. Additionally, these methods have not fully exploited the potential of complementary pathways for spotting and recognition. In this paper, we present a novel temporal state transition architecture grounded in the state space model, which replaces conventional window-level classification with video-level regression. Furthermore, by leveraging the inherent connections between spotting and recognition tasks, we propose a synergistic strategy that enhances overall analysis performance. Extensive experiments demonstrate that our method achieves state-of-the-art performance. The codes and pre-trained models are available at https://github.com/zizheng-guo/ME-TST.
- Published
- 2024
43. MHAD: Multimodal Home Activity Dataset with Multi-Angle Videos and Synchronized Physiological Signals
- Author
-
Yu, Lei, Fei, Jintao, Liu, Xinyi, Yao, Yang, Zhao, Jun, Wang, Guoxin, and Li, Xin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Multimedia - Abstract
Video-based physiology, exemplified by remote photoplethysmography (rPPG), extracts physiological signals such as pulse and respiration by analyzing subtle changes in video recordings. This non-contact, real-time monitoring method holds great potential for home settings. Despite the valuable contributions of public benchmark datasets to this technology, there is currently no dataset specifically designed for passive home monitoring. Existing datasets are often limited to close-up, static, frontal recordings and typically include only 1-2 physiological signals. To advance video-based physiology in real home settings, we introduce the MHAD dataset. It comprises 1,440 videos from 40 subjects, capturing 6 typical activities from 3 angles in a real home environment. Additionally, 5 physiological signals were recorded, making it a comprehensive video-based physiology dataset. MHAD is compatible with the rPPG-toolbox and has been validated using several unsupervised and supervised methods. Our dataset is publicly available at https://github.com/jdh-algo/MHAD-Dataset.
- Published
- 2024
44. Polarization Pinning at Antiphase Boundaries in Multiferroic YbFeO$_3$
- Author
-
Ren, Guodong, Omprakash, Pravan, Li, Xin, Yun, Yu, Thind, Arashdeep S., Xu, Xiaoshan, and Mishra, Rohan
- Subjects
Condensed Matter - Materials Science - Abstract
The switching characteristics of ferroelectrics and multiferroics are influenced by the interaction of topological defects with domain-walls. We report on the pinning of polarization due to antiphase boundaries in thin films of the multiferroic hexagonal YbFeO$_3$. We have directly resolved the atomic structure of a sharp antiphase boundary (APB) in YbFeO$_3$ thin films using a combination of aberration-corrected scanning transmission electron microscopy (STEM) and total energy calculations based on density-functional theory (DFT). We find the presence of a layer of FeO$_6$ octahedra at the APB that bridge the adjacent domains. STEM imaging shows a reversal in the direction of polarization on moving across the APB, which DFT calculations confirm is structural in nature as the polarization reversal reduces the distortion of the FeO$_6$ octahedral layer at the APB. Such APBs in hexagonal perovskites are expected to serve as domain-wall pinning sites and hinder ferroelectric switching of the domains., Comment: 16 pages, 4 figures
- Published
- 2024
45. LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation
- Author
-
Ding, Henghui, Hong, Lingyi, Liu, Chang, Xu, Ning, Yang, Linjie, Fan, Yuchen, Miao, Deshui, Gu, Yameng, Li, Xin, He, Zhenyu, Wang, Yaowei, Yang, Ming-Hsuan, Chai, Jinming, Ma, Qin, Zhang, Junpei, Jiao, Licheng, Liu, Fang, Liu, Xinyu, Zhang, Jing, Zhang, Kexin, Liu, Xu, Li, LingLing, Fang, Hao, Pan, Feiyu, Lu, Xiankai, Zhang, Wei, Cong, Runmin, Tran, Tuyen, Cao, Bin, Zhang, Yisi, Wang, Hanyi, He, Xingjian, and Liu, Jing
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Despite the promising performance of current video segmentation models on existing benchmarks, these models still struggle with complex scenes. In this paper, we introduce the 6th Large-scale Video Object Segmentation (LSVOS) challenge in conjunction with ECCV 2024 workshop. This year's challenge includes two tasks: Video Object Segmentation (VOS) and Referring Video Object Segmentation (RVOS). In this year, we replace the classic YouTube-VOS and YouTube-RVOS benchmark with latest datasets MOSE, LVOS, and MeViS to assess VOS under more challenging complex environments. This year's challenge attracted 129 registered teams from more than 20 institutes across over 8 countries. This report include the challenge and dataset introduction, and the methods used by top 7 teams in two tracks. More details can be found in our homepage https://lsvos.github.io/., Comment: ECCV 2024 LSVOS Challenge Report: https://lsvos.github.io/
- Published
- 2024
46. CP asymmetries of $t \to c \gamma$ and $t \to cg$ decays in the aligned two-Higgs-doublet model
- Author
-
Cai, Fang-Min, Fan, Rui-Lin, Li, Xin-Qiang, and Yang, Ya-Dong
- Subjects
High Energy Physics - Phenomenology - Abstract
We study the CP asymmetries of the rare top-quark decays $t \to c \gamma$ and $t \to cg$ in the aligned two-Higgs-doublet model (A2HDM), which is generically characterized by new sources of CP violation beyond the Standard Model (SM). Specifically, the branching ratios and CP asymmetries of these rare top-quark decays are explicitly formulated, with an emphasis on the origins of weak and strong phases in the A2HDM. Taking into account the most relevant constraints on this model, we evaluate the variations of these observables with respect to the model parameters. It is found that the branching ratios of $t \to c \gamma$ and $t \to cg$ decays can maximally reach up to $1.47\times10^{-10}$ and $4.86\times10^{-9}$ respectively, which are about four and three orders of magnitude higher than the corresponding SM predictions. While the branching ratios are almost independent of the relative phase $\varphi$ between the two alignment parameters $\varsigma_u$ and $\varsigma_d$ within the allowed parameter space, the CP asymmetries are found to be very sensitive to $\varphi$. When the two alignment parameters are complex with a non-zero $\varphi$ varied within the range $[50^\circ,150^\circ]$, the magnitudes of the CP asymmetries can be significantly enhanced relative to both the SM and the real case. In particular, the maximum absolute values of the CP asymmetries can even reach up to $\mathcal{O}(1)$ for these two decay modes, in the range $\varphi \in [70^\circ,100^\circ]$. These interesting observations could be utilized to discriminate the SM and the different scenarios of the A2HDM., Comment: 46 pages, 12 figures, and 3 tables
- Published
- 2024
47. Video-based Analysis Reveals Atypical Social Gaze in People with Autism Spectrum Disorder
- Author
-
Yu, Xiangxu, Ruan, Mindi, Hu, Chuanbo, Li, Wenqi, Paul, Lynn K., Li, Xin, and Wang, Shuo
- Subjects
Quantitative Biology - Neurons and Cognition ,Computer Science - Machine Learning - Abstract
In this study, we present a quantitative and comprehensive analysis of social gaze in people with autism spectrum disorder (ASD). Diverging from traditional first-person camera perspectives based on eye-tracking technologies, this study utilizes a third-person perspective database from the Autism Diagnostic Observation Schedule, 2nd Edition (ADOS-2) interview videos, encompassing ASD participants and neurotypical individuals as a reference group. Employing computational models, we extracted and processed gaze-related features from the videos of both participants and examiners. The experimental samples were divided into three groups based on the presence of social gaze abnormalities and ASD diagnosis. This study quantitatively analyzed four gaze features: gaze engagement, gaze variance, gaze density map, and gaze diversion frequency. Furthermore, we developed a classifier trained on these features to identify gaze abnormalities in ASD participants. Together, we demonstrated the effectiveness of analyzing social gaze in people with ASD in naturalistic settings, showcasing the potential of third-person video perspectives in enhancing ASD diagnosis through gaze analysis.
- Published
- 2024
48. Transcriptome alterations in high glucose-induced renal tubular cells of bactrian camel
- Author
-
Yang, Bin, Chen, Lihui, Li, Xin, Guo, Zhuwei, Liu, Shi, and Er, Demtu
- Published
- 2020
- Full Text
- View/download PDF
49. Analysis on correlation between polymorphism of MyoG gene exon I and body size traits of sheep
- Author
-
Bai, Jun. Yan., Cao, Heng., Yang, You. Bing., Zhang, Yi., Li, Xin. Yue., Li, Zi. Heng., Hao, Wei. Guang., and Zheng, Fei. Yang.
- Published
- 2020
- Full Text
- View/download PDF
50. Discriminative Spatial-Semantic VOS Solution: 1st Place Solution for 6th LSVOS
- Author
-
Miao, Deshui, Gu, Yameng, Li, Xin, He, Zhenyu, Wang, Yaowei, and Yang, Ming-Hsuan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Video object segmentation (VOS) is a crucial task in computer vision, but current VOS methods struggle with complex scenes and prolonged object motions. To address these challenges, the MOSE dataset aims to enhance object recognition and differentiation in complex environments, while the LVOS dataset focuses on segmenting objects exhibiting long-term, intricate movements. This report introduces a discriminative spatial-temporal VOS model that utilizes discriminative object features as query representations. The semantic understanding of spatial-semantic modules enables it to recognize object parts, while salient features highlight more distinctive object characteristics. Our model, trained on extensive VOS datasets, achieved first place (\textbf{80.90\%} $\mathcal{J \& F}$) on the test set of the 6th LSVOS challenge in the VOS Track, demonstrating its effectiveness in tackling the aforementioned challenges. The code will be available at \href{https://github.com/yahooo-m/VOS-Solution}{code}., Comment: 1st Place Solution for 6th LSVOS VOS Track. arXiv admin note: substantial text overlap with arXiv:2406.04600
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.