74,812 results on '"Li, Xiang-An"'
Search Results
2. Chinese Ambassador to Eritrea Li Xiang investigates the construction of the Bereza Power Station in Shanghai
- Subjects
Power plants -- Investigations ,Electric power-plants -- Investigations ,Ambassadors -- Investigations ,Company legal issue ,Business ,Business, international - Abstract
Key Highlights: * Chinese Ambassador to Eritrea Li Xiang visited the Bereza Power Station in Shanghai * The project by Shanghai Construction and Exporting Group will start construction in June [...]
- Published
- 2024
3. China : Ambassador to Eritrea Li Xiang meets with Eritrean President's Political Advisor Yemeni
- Subjects
Ambassadors ,Business, international - Abstract
On July 30, 2024, Ambassador to Eritrea Li Xiang met with Yemeni, Minister of the Central Political Department of the Eritrean People's Front for Democracy and Justice and Political Advisor [...]
- Published
- 2024
4. China : Ambassador to Eritrea Li Xiang pays a courtesy visit to Eritrean Minister of Tourism Askar
- Subjects
Cabinet officers -- Planning ,Ambassadors -- Planning ,Company business planning ,Business, international - Abstract
On August 1, 2024, Ambassador to Eritrea Li Xiang paid a visit to Eritrean Minister of Tourism Askaru, exchanged views on the development of Eritrea's tourism resources and introduced the [...]
- Published
- 2024
5. China : Ambassador Li Xiang pays a courtesy visit to Eritrean Minister of Agriculture Arefien
- Subjects
Agricultural industry ,Cabinet officers ,Ambassadors ,Business, international - Abstract
On July 31, 2024, Ambassador to Eritrea Li Xiang paid a visit to Eritrean Minister of Agriculture Arefein and focused on introducing the spirit of the Third Plenary Session of [...]
- Published
- 2024
6. RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark
- Author
-
Zhang, Xin, Yang, Xue, Li, Yuxuan, Yang, Jian, Cheng, Ming-Ming, and Li, Xiang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Rotated object detection has made significant progress in the optical remote sensing. However, advancements in the Synthetic Aperture Radar (SAR) field are laggard behind, primarily due to the absence of a large-scale dataset. Annotating such a dataset is inefficient and costly. A promising solution is to employ a weakly supervised model (e.g., trained with available horizontal boxes only) to generate pseudo-rotated boxes for reference before manual calibration. Unfortunately, the existing weakly supervised models exhibit limited accuracy in predicting the object's angle. Previous works attempt to enhance angle prediction by using angle resolvers that decouple angles into cosine and sine encodings. In this work, we first reevaluate these resolvers from a unified perspective of dimension mapping and expose that they share the same shortcomings: these methods overlook the unit cycle constraint inherent in these encodings, easily leading to prediction biases. To address this issue, we propose the Unit Cycle Resolver, which incorporates a unit circle constraint loss to improve angle prediction accuracy. Our approach can effectively improve the performance of existing state-of-the-art weakly supervised methods and even surpasses fully supervised models on existing optical benchmarks (i.e., DOTA-v1.0 dataset). With the aid of UCR, we further annotate and introduce RSAR, the largest multi-class rotated SAR object detection dataset to date. Extensive experiments on both RSAR and optical datasets demonstrate that our UCR enhances angle prediction accuracy. Our dataset and code can be found at: https://github.com/zhasion/RSAR.
- Published
- 2025
7. Spatiotemporal Gaussian Optimization for 4D Cone Beam CT Reconstruction from Sparse Projections
- Author
-
Fu, Yabo, Zhang, Hao, Cai, Weixing, Xie, Huiqiao, Kuo, Licheng, Cervino, Laura, Moran, Jean, Li, Xiang, and Li, Tianfang
- Subjects
Physics - Medical Physics ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
In image-guided radiotherapy (IGRT), four-dimensional cone-beam computed tomography (4D-CBCT) is critical for assessing tumor motion during a patients breathing cycle prior to beam delivery. However, generating 4D-CBCT images with sufficient quality requires significantly more projection images than a standard 3D-CBCT scan, leading to extended scanning times and increased imaging dose to the patient. To address these limitations, there is a strong demand for methods capable of reconstructing high-quality 4D-CBCT images from a 1-minute 3D-CBCT acquisition. The challenge lies in the sparse sampling of projections, which introduces severe streaking artifacts and compromises image quality. This paper introduces a novel framework leveraging spatiotemporal Gaussian representation for 4D-CBCT reconstruction from sparse projections, achieving a balance between streak artifact reduction, dynamic motion preservation, and fine detail restoration. Each Gaussian is characterized by its 3D position, covariance, rotation, and density. Two-dimensional X-ray projection images can be rendered from the Gaussian point cloud representation via X-ray rasterization. The properties of each Gaussian were optimized by minimizing the discrepancy between the measured projections and the rendered X-ray projections. A Gaussian deformation network is jointly optimized to deform these Gaussian properties to obtain a 4D Gaussian representation for dynamic CBCT scene modeling. The final 4D-CBCT images are reconstructed by voxelizing the 4D Gaussians, achieving a high-quality representation that preserves both motion dynamics and spatial detail. The code and reconstruction results can be found at https://github.com/fuyabo/4DGS_for_4DCBCT/tree/main, Comment: 11 pages, 10 figures
- Published
- 2025
8. Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection
- Author
-
Yuan, Xinbin, Zheng, Zhaohui, Li, Yuxuan, Liu, Xialei, Liu, Li, Li, Xiang, Hou, Qibin, and Cheng, Ming-Ming
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
While witnessed with rapid development, remote sensing object detection remains challenging for detecting high aspect ratio objects. This paper shows that large strip convolutions are good feature representation learners for remote sensing object detection and can detect objects of various aspect ratios well. Based on large strip convolutions, we build a new network architecture called Strip R-CNN, which is simple, efficient, and powerful. Unlike recent remote sensing object detectors that leverage large-kernel convolutions with square shapes, our Strip R-CNN takes advantage of sequential orthogonal large strip convolutions to capture spatial information. In addition, we enhance the localization capability of remote-sensing object detectors by decoupling the detection heads and equipping the localization head with strip convolutions to better localize the target objects. Extensive experiments on several benchmarks, e.g., DOTA, FAIR1M, HRSC2016, and DIOR, show that our Strip R-CNN can largely improve previous works. Notably, our 30M model achieves 82.75% mAP on DOTA-v1.0, setting a new state-of-the-art record.Code is available at https://github.com/YXB-NKU/Strip-R-CNN.
- Published
- 2025
9. China : Ambassador Li Xiang pays a courtesy visit to Eritrea's Minister of Trade and Industry Nesreddin
- Subjects
International trade ,Cabinet officers ,Ambassadors ,International trade ,Business, international - Abstract
On July 22, 2024, Ambassador to Eritrea Li Xiang paid a visit to Eritrean Minister of Trade and Industry Nesreddin. Ambassador Li said that China and Eritrea have a profound [...]
- Published
- 2024
10. China : Ambassador Li Xiang pays a courtesy visit to the Minister of Labor and Social Welfare of Eritrea
- Subjects
Social service ,Cabinet officers ,Ambassadors ,Business, international - Abstract
On July 24, 2024, Ambassador to Eritrea Li Xiang paid a visit to Eritrean Minister of Labor and Social Welfare, Rule. Ambassador Li said that in recent years, China-Eritrea bilateral [...]
- Published
- 2024
11. China : Ambassador Li Xiang attended the donation ceremony of the 17th batch of medical aid team to Eritrea
- Subjects
International cooperation -- Rites, ceremonies and celebrations ,Ambassadors -- Rites, ceremonies and celebrations ,Business, international - Abstract
On July 9, 2024, Ambassador to Eritrea Li Xiang attended the material donation ceremony for the 17th medical team to Eritrea. Representatives of the Eritrean Minister of Health, Director Berhani [...]
- Published
- 2024
12. China : Ambassador Li Xiang pays a courtesy visit to Eritrea's Minister of Water, Land and Environment Tesfai
- Subjects
Cabinet officers -- Environmental aspects ,Ambassadors -- Environmental aspects ,Environmental protection -- Environmental aspects ,Green building (Construction) -- Environmental aspects ,Environmental issue ,Business, international - Abstract
On July 1, 2024, Ambassador to Eritrea Li Xiang paid a visit to Eritrean Minister of Water, Soil and Environment Tesfai. Ambassador Li said that China and Eritrea have a [...]
- Published
- 2024
13. China : Ambassador Li Xiang meets with Mohammed, Director of the Protocol Department of the Eritrean Ministry of Foreign Affairs
- Subjects
International cooperation ,Ambassadors ,Business, international - Abstract
On June 27, 2024, Ambassador to Eritrea Li Xiang met with Mohammed, Director of the Protocol Department of the Eritrean Ministry of Foreign Affairs, to exchange views on further strengthening [...]
- Published
- 2024
14. China : Ambassador Li Xiang pays a courtesy visit to Eritrea's Minister of Justice Fawzia upon his arrival
- Subjects
Ambassadors ,Imperialism ,Business, international - Abstract
On June 28, 2024, Ambassador to Eritrea Li Xiang paid a visit to Eritrean Justice Minister Fawzia upon his arrival. Ambassador Li said that China and Eritrea have a profound [...]
- Published
- 2024
15. China : Ambassador Li Xiang pays a courtesy visit to Elsa, Director of the International Affairs Department of the Eritrean Ministry of Foreign Affairs
- Subjects
Ambassadors ,Business, international - Abstract
On June 27, 2024, Ambassador to Eritrea Li Xiang paid a visit to Elsa, Director General of the International Department of the Eritrean Ministry of Foreign Affairs, and the two [...]
- Published
- 2024
16. China : Ambassador to Eritrea Li Xiang pays a courtesy visit to Eritrean Minister of Information Yemani
- Subjects
Cabinet officers ,Ambassadors ,Business, international - Abstract
On June 24, 2024, Chinese Ambassador to Eritrea Li Xiang paid a courtesy visit to Eritrean Minister of Information Yemani. The two sides exchanged views on strengthening news media cooperation. [...]
- Published
- 2024
17. Eritrea : Ambassador Li Xiang pays a courtesy visit to Eritrea's Minister of Transport and Communications Tesfa Selasi
- Subjects
International cooperation ,Transportation policy ,Cabinet officers ,Ambassadors ,Business, international - Abstract
On June 5, 2024, Ambassador to Eritrea Li Xiang paid a visit to Eritrean Minister of Transport and Communications Tesfa Selasi. Ambassador Li said that China and Eritrea are strategic [...]
- Published
- 2024
18. Eritrea : Ambassador to Eritrea Li Xiang pays a courtesy visit to Acting Minister of Energy and Mines Alemu
- Subjects
Cabinet officers ,Ambassadors ,Business, international - Abstract
On May 29, 2024, Ambassador to Eritrea Li Xiang paid a visit to the Acting Minister of Energy and Mines of Eritrea, Alem. Ambassador Li said that China and Eritrea [...]
- Published
- 2024
19. Eritrea : Ambassador to Eritrea Li Xiang paid a courtesy call on Eritrea's Minister of Agriculture Ale Fain
- Subjects
Cabinet officers ,Ambassadors ,Beer ,Business, international - Abstract
On May 21, 2024, Ambassador to Eritrea Li Xiang paid a visit to Eritrean Agriculture Minister Arefeen upon his arrival. Ambassador Li said that China and Ecuador have a profound [...]
- Published
- 2024
20. Yemen : Ambassador to Eritrea Li Xiang paid a courtesy visit to Eritrea President Political Advisor Yemeni
- Subjects
International cooperation ,Ambassadors ,Business, international - Abstract
On May 15, 2024, Ambassador to Eritrea Li Xiang paid a visit to Eritreas Presidential Political Advisor Yemeni. Ambassador Li said that China and Ecuador have a profound traditional friendship [...]
- Published
- 2024
21. Eritrea : Ambassador to Eritrea Li Xiang paid a courtesy visit to Eritrea Minister of Information Yemani
- Subjects
Cabinet officers ,Ambassadors ,Business, international - Abstract
On May 13, 2024, Ambassador to Eritrea Li Xiang paid a visit to Eritrea Minister of Information Yemani. Ambassador Li said that China and Ecuador have a profound traditional friendship. [...]
- Published
- 2024
22. China : Li Xiang, the newly appointed Ambassador to Eritrea, submitted a copy of his credentials to the Minister of Foreign Affairs of Eritrea
- Subjects
Foreign ministers ,Ambassadors ,Business, international - Abstract
On May 7, 2024, the newly appointed Ambassador to Eritrea Li Xiang submitted a copy of his credentials to Eritrea Minister of Foreign Affairs Osman. Ambassador Li first conveyed cordial [...]
- Published
- 2024
23. SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection
- Author
-
Li, Yuxuan, Li, Xiang, Li, Yunheng, Zhang, Yicheng, Dai, Yimian, Hou, Qibin, Cheng, Ming-Ming, and Yang, Jian
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Multimedia - Abstract
With the rapid advancement of remote sensing technology, high-resolution multi-modal imagery is now more widely accessible. Conventional Object detection models are trained on a single dataset, often restricted to a specific imaging modality and annotation format. However, such an approach overlooks the valuable shared knowledge across multi-modalities and limits the model's applicability in more versatile scenarios. This paper introduces a new task called Multi-Modal Datasets and Multi-Task Object Detection (M2Det) for remote sensing, designed to accurately detect horizontal or oriented objects from any sensor modality. This task poses challenges due to 1) the trade-offs involved in managing multi-modal modelling and 2) the complexities of multi-task optimization. To address these, we establish a benchmark dataset and propose a unified model, SM3Det (Single Model for Multi-Modal datasets and Multi-Task object Detection). SM3Det leverages a grid-level sparse MoE backbone to enable joint knowledge learning while preserving distinct feature representations for different modalities. Furthermore, it integrates a consistency and synchronization optimization strategy using dynamic learning rate adjustment, allowing it to effectively handle varying levels of learning difficulty across modalities and tasks. Extensive experiments demonstrate SM3Det's effectiveness and generalizability, consistently outperforming specialized models on individual datasets. The code is available at https://github.com/zcablii/SM3Det.
- Published
- 2024
24. UniAvatar: Taming Lifelike Audio-Driven Talking Head Generation with Comprehensive Motion and Lighting Control
- Author
-
Sun, Wenzhang, Li, Xiang, Di, Donglin, Liang, Zhuding, Zhang, Qiyuan, Li, Hao, Chen, Wei, and Cui, Jianxun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recently, animating portrait images using audio input is a popular task. Creating lifelike talking head videos requires flexible and natural movements, including facial and head dynamics, camera motion, realistic light and shadow effects. Existing methods struggle to offer comprehensive, multifaceted control over these aspects. In this work, we introduce UniAvatar, a designed method that provides extensive control over a wide range of motion and illumination conditions. Specifically, we use the FLAME model to render all motion information onto a single image, maintaining the integrity of 3D motion details while enabling fine-grained, pixel-level control. Beyond motion, this approach also allows for comprehensive global illumination control. We design independent modules to manage both 3D motion and illumination, permitting separate and combined control. Extensive experiments demonstrate that our method outperforms others in both broad-range motion control and lighting control. Additionally, to enhance the diversity of motion and environmental contexts in current datasets, we collect and plan to publicly release two datasets, DH-FaceDrasMvVid-100 and DH-FaceReliVid-200, which capture significant head movements during speech and various lighting scenarios.
- Published
- 2024
25. FFCG: Effective and Fast Family Column Generation for Solving Large-Scale Linear Program
- Author
-
Hu, Yi-Xiang, Wu, Feng, Li, Shaoang, Zhao, Yifang, and Li, Xiang-Yang
- Subjects
Computer Science - Machine Learning ,Mathematics - Optimization and Control ,I.2.6 - Abstract
Column Generation (CG) is an effective and iterative algorithm to solve large-scale linear programs (LP). During each CG iteration, new columns are added to improve the solution of the LP. Typically, CG greedily selects one column with the most negative reduced cost, which can be improved by adding more columns at once. However, selecting all columns with negative reduced costs would lead to the addition of redundant columns that do not improve the objective value. Therefore, selecting the appropriate columns to add is still an open problem and previous machine-learning-based approaches for CG only add a constant quantity of columns per iteration due to the state-space explosion problem. To address this, we propose Fast Family Column Generation (FFCG) -- a novel reinforcement-learning-based CG that selects a variable number of columns as needed in an iteration. Specifically, we formulate the column selection problem in CG as an MDP and design a reward metric that balances both the convergence speed and the number of redundant columns. In our experiments, FFCG converges faster on the common benchmarks and reduces the number of CG iterations by 77.1% for Cutting Stock Problem (CSP) and 84.8% for Vehicle Routing Problem with Time Windows (VRPTW), and a 71.4% reduction in computing time for CSP and 84.0% for VRPTW on average compared to several state-of-the-art baselines.
- Published
- 2024
26. SCBench: A Sports Commentary Benchmark for Video LLMs
- Author
-
Ge, Kuangzhi, Chen, Lingjun, Zhang, Kevin, Luo, Yulin, Shi, Tianyu, Fan, Liaoyuan, Li, Xiang, Wang, Guanqun, and Zhang, Shanghang
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Recently, significant advances have been made in Video Large Language Models (Video LLMs) in both academia and industry. However, methods to evaluate and benchmark the performance of different Video LLMs, especially their fine-grained, temporal visual capabilities, remain very limited. On one hand, current benchmarks use relatively simple videos (e.g., subtitled movie clips) where the model can understand the entire video by processing just a few frames. On the other hand, their datasets lack diversity in task format, comprising only QA or multi-choice QA, which overlooks the models' capacity for generating in-depth and precise texts. Sports videos, which feature intricate visual information, sequential events, and emotionally charged commentary, present a critical challenge for Video LLMs, making sports commentary an ideal benchmarking task. Inspired by these challenges, we propose a novel task: sports video commentary generation, developed $\textbf{SCBench}$ for Video LLMs. To construct such a benchmark, we introduce (1) $\textbf{SCORES}$, a six-dimensional metric specifically designed for our task, upon which we propose a GPT-based evaluation method, and (2) $\textbf{CommentarySet}$, a dataset consisting of 5,775 annotated video clips and ground-truth labels tailored to our metric. Based on SCBench, we conduct comprehensive evaluations on multiple Video LLMs (e.g. VILA, Video-LLaVA, etc.) and chain-of-thought baseline methods. Our results found that InternVL-Chat-2 achieves the best performance with 5.44, surpassing the second-best by 1.04. Our work provides a fresh perspective for future research, aiming to enhance models' overall capabilities in complex visual understanding tasks. Our dataset will be released soon.
- Published
- 2024
27. Decoupled Functional Central Limit Theorems for Two-Time-Scale Stochastic Approximation
- Author
-
Han, Yuze, Li, Xiang, Liang, Jiadong, and Zhang, Zhihua
- Subjects
Mathematics - Probability ,Mathematics - Optimization and Control ,Statistics - Machine Learning - Abstract
In two-time-scale stochastic approximation (SA), two iterates are updated at different rates, governed by distinct step sizes, with each update influencing the other. Previous studies have demonstrated that the convergence rates of the error terms for these updates depend solely on their respective step sizes, a property known as decoupled convergence. However, a functional version of this decoupled convergence has not been explored. Our work fills this gap by establishing decoupled functional central limit theorems for two-time-scale SA, offering a more precise characterization of its asymptotic behavior. To achieve these results, we leverage the martingale problem approach and establish tightness as a crucial intermediate step. Furthermore, to address the interdependence between different time scales, we introduce an innovative auxiliary sequence to eliminate the primary influence of the fast-time-scale update on the slow-time-scale update.
- Published
- 2024
28. Breaking the Context Bottleneck on Long Time Series Forecasting
- Author
-
Ma, Chao, Hou, Yikai, Li, Xiang, Sun, Yinggang, Yu, Haining, Fang, Zhou, and Qu, Jiaxing
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Long-term time-series forecasting is essential for planning and decision-making in economics, energy, and transportation, where long foresight is required. To obtain such long foresight, models must be both efficient and effective in processing long sequence. Recent advancements have enhanced the efficiency of these models; however, the challenge of effectively leveraging longer sequences persists. This is primarily due to the tendency of these models to overfit when presented with extended inputs, necessitating the use of shorter input lengths to maintain tolerable error margins. In this work, we investigate the multiscale modeling method and propose the Logsparse Decomposable Multiscaling (LDM) framework for the efficient and effective processing of long sequences. We demonstrate that by decoupling patterns at different scales in time series, we can enhance predictability by reducing non-stationarity, improve efficiency through a compact long input representation, and simplify the architecture by providing clear task assignments. Experimental results demonstrate that LDM not only outperforms all baselines in long-term forecasting benchmarks, but also reducing both training time and memory costs., Comment: Time series forecasting algorithm based on multi-scale analysis
- Published
- 2024
29. Type II Singularities of Lagrangian Mean Curvature Flow with Zero Maslov Class
- Author
-
Li, Xiang, Luo, Yong, and Sun, Jun
- Subjects
Mathematics - Differential Geometry - Abstract
In this paper, we will prove some rigidity theorems for blow up limits to Type II singularities of Lagrangian mean curvature flow with zero Maslov class or almost calibrated Lagrangian mean curvature flows, especially for Lagrangian translating solitons in any dimension. These theorems generalized previous corresponding results from two dimensional case to arbitrarily dimensional case., Comment: All comments are welcome! 16 pages
- Published
- 2024
30. Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties
- Author
-
Li, Wenqiao, Zheng, Bozhong, Xu, Xiaohao, Gan, Jinye, Lu, Fading, Li, Xiang, Ni, Na, Tian, Zheng, Huang, Xiaonan, Gao, Shenghua, and Wu, Yingna
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Object anomaly detection is essential for industrial quality inspection, yet traditional single-sensor methods face critical limitations. They fail to capture the wide range of anomaly types, as single sensors are often constrained to either external appearance, geometric structure, or internal properties. To overcome these challenges, we introduce MulSen-AD, the first high-resolution, multi-sensor anomaly detection dataset tailored for industrial applications. MulSen-AD unifies data from RGB cameras, laser scanners, and lock-in infrared thermography, effectively capturing external appearance, geometric deformations, and internal defects. The dataset spans 15 industrial products with diverse, real-world anomalies. We also present MulSen-AD Bench, a benchmark designed to evaluate multi-sensor methods, and propose MulSen-TripleAD, a decision-level fusion algorithm that integrates these three modalities for robust, unsupervised object anomaly detection. Our experiments demonstrate that multi-sensor fusion substantially outperforms single-sensor approaches, achieving 96.1% AUROC in object-level detection accuracy. These results highlight the importance of integrating multi-sensor data for comprehensive industrial anomaly detection.
- Published
- 2024
31. Taming Landau level mixing in fractional quantum Hall states with deep learning
- Author
-
Qian, Yubing, Zhao, Tongzhou, Zhang, Jianxiao, Xiang, Tao, Li, Xiang, and Chen, Ji
- Subjects
Condensed Matter - Strongly Correlated Electrons ,Condensed Matter - Mesoscale and Nanoscale Physics ,Physics - Computational Physics - Abstract
Strong correlation brings a rich array of emergent phenomena, as well as a daunting challenge to theoretical physics study. In condensed matter physics, the fractional quantum Hall effect is a prominent example of strong correlation, with Landau level mixing being one of the most challenging aspects to address using traditional computational methods. Deep learning real-space neural network wavefunction methods have emerged as promising architectures to describe electron correlations in molecules and materials, but their power has not been fully tested for exotic quantum states. In this work, we employ real-space neural network wavefunction techniques to investigate fractional quantum Hall systems. On both $1/3$ and $2/5$ filling systems, we achieve energies consistently lower than exact diagonalization results which only consider the lowest Landau level. We also demonstrate that the real-space neural network wavefunction can naturally capture the extent of Landau level mixing up to a very high level, overcoming the limitations of traditional methods. Our work underscores the potential of neural networks for future studies of strongly correlated systems and opens new avenues for exploring the rich physics of the fractional quantum Hall effect.
- Published
- 2024
32. PA-RAG: RAG Alignment via Multi-Perspective Preference Optimization
- Author
-
Wu, Jiayi, Cai, Hengyi, Yan, Lingyong, Sun, Hao, Li, Xiang, Wang, Shuaiqiang, Yin, Dawei, and Gao, Ming
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
The emergence of Retrieval-augmented generation (RAG) has alleviated the issues of outdated and hallucinatory content in the generation of large language models (LLMs), yet it still reveals numerous limitations. When a general-purpose LLM serves as the RAG generator, it often suffers from inadequate response informativeness, response robustness, and citation quality. Past approaches to tackle these limitations, either by incorporating additional steps beyond generating responses or optimizing the generator through supervised fine-tuning (SFT), still failed to align with the RAG requirement thoroughly. Consequently, optimizing the RAG generator from multiple preference perspectives while maintaining its end-to-end LLM form remains a challenge. To bridge this gap, we propose Multiple Perspective Preference Alignment for Retrieval-Augmented Generation (PA-RAG), a method for optimizing the generator of RAG systems to align with RAG requirements comprehensively. Specifically, we construct high-quality instruction fine-tuning data and multi-perspective preference data by sampling varied quality responses from the generator across different prompt documents quality scenarios. Subsequently, we optimize the generator using SFT and Direct Preference Optimization (DPO). Extensive experiments conducted on four question-answer datasets across three LLMs demonstrate that PA-RAG can significantly enhance the performance of RAG generators. Our code and datasets are available at https://github.com/wujwyi/PA-RAG.
- Published
- 2024
33. Enhancing LLM-based Hatred and Toxicity Detection with Meta-Toxic Knowledge Graph
- Author
-
Zhao, Yibo, Zhu, Jiapeng, Xu, Can, and Li, Xiang
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
The rapid growth of social media platforms has raised significant concerns regarding online content toxicity. When Large Language Models (LLMs) are used for toxicity detection, two key challenges emerge: 1) the absence of domain-specific toxic knowledge leads to false negatives; 2) the excessive sensitivity of LLMs to toxic speech results in false positives, limiting freedom of speech. To address these issues, we propose a novel method called MetaTox, leveraging graph search on a meta-toxic knowledge graph to enhance hatred and toxicity detection. First, we construct a comprehensive meta-toxic knowledge graph by utilizing LLMs to extract toxic information through a three-step pipeline, with toxic benchmark datasets serving as corpora. Second, we query the graph via retrieval and ranking processes to supplement accurate, relevant toxic knowledge. Extensive experiments and in-depth case studies across multiple datasets demonstrate that our MetaTox significantly decreases the false positive rate while boosting overall toxicity detection performance. Our code will be available soon., Comment: 8 pages of content
- Published
- 2024
34. Large Language Model Enhanced Recommender Systems: Taxonomy, Trend, Application and Future
- Author
-
Liu, Qidong, Zhao, Xiangyu, Wang, Yuhao, Wang, Yejing, Zhang, Zijian, Sun, Yuqi, Li, Xiang, Wang, Maolin, Jia, Pengyue, Chen, Chong, Huang, Wei, and Tian, Feng
- Subjects
Computer Science - Information Retrieval ,Computer Science - Artificial Intelligence - Abstract
Large Language Model (LLM) has transformative potential in various domains, including recommender systems (RS). There have been a handful of research that focuses on empowering the RS by LLM. However, previous efforts mainly focus on LLM as RS, which may face the challenge of intolerant inference costs by LLM. Recently, the integration of LLM into RS, known as LLM-Enhanced Recommender Systems (LLMERS), has garnered significant interest due to its potential to address latency and memory constraints in real-world applications. This paper presents a comprehensive survey of the latest research efforts aimed at leveraging LLM to enhance RS capabilities. We identify a critical shift in the field with the move towards incorporating LLM into the online system, notably by avoiding their use during inference. Our survey categorizes the existing LLMERS approaches into three primary types based on the component of the RS model being augmented: Knowledge Enhancement, Interaction Enhancement, and Model Enhancement. We provide an in-depth analysis of each category, discussing the methodologies, challenges, and contributions of recent studies. Furthermore, we highlight several promising research directions that could further advance the field of LLMERS.
- Published
- 2024
35. Evidence for the Sombrero Galaxy as an Accelerator of the Highest-Energy Cosmic Rays
- Author
-
He, Hao-Ning, Kido, Eiji, Duan, Kai-Kai, Yang, Yang, Higuchi, Ryo, Fan, Yi-Zhong, Wang, Tao, Jiang, Lu-Yao, Li, Rong-Lan, Zhu, Ben-Yang, Li, Xiang, Xia, Zi-Qing, Nagataki, Shigehiro, Wei, Da-Ming, and Kusenko, Alexander
- Subjects
Astrophysics - High Energy Astrophysical Phenomena ,Astrophysics - Astrophysics of Galaxies ,High Energy Physics - Phenomenology ,High Energy Physics - Theory - Abstract
Ultrahigh-energy cosmic rays (UHECRs) are the highest energy messenger from space, with energies exceeding 1 EeV. Although UHECRs were discovered over 60 years ago, their origin still remains a mystery. Pinpointing sources of UHECRs is crucial for understanding the extreme astrophysical processes that accelerate particles to such extraordinary energies. We searched for UHECR multiplets via analyzing 17 years of data with energies greater than 40 EeV from the Pierre Auger Observatory. A spatial association is found between a multiplet of $25.7^{+6.2}_{-7.0}$ cosmic rays and the Sombrero galaxy with a local (global) significance of $4.5~\sigma~(3.3~\sigma)$. The Sombrero galaxy hosts a supermassive central black hole with a mass of $\sim1\times 10^9 M_{\odot}$ and exhibits large-scale radio lobes and jets. Our finding provides critical evidence on active supermassive black holes as the source of the highest-energy cosmic rays.
- Published
- 2024
36. SEAGraph: Unveiling the Whole Story of Paper Review Comments
- Author
-
Yu, Jianxiang, Tan, Jiaqi, Ding, Zichen, Zhu, Jiapeng, Li, Jiahao, Cheng, Yao, Cui, Qier, Lan, Yunshi, and Li, Xiang
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Peer review, as a cornerstone of scientific research, ensures the integrity and quality of scholarly work by providing authors with objective feedback for refinement. However, in the traditional peer review process, authors often receive vague or insufficiently detailed feedback, which provides limited assistance and leads to a more time-consuming review cycle. If authors can identify some specific weaknesses in their paper, they can not only address the reviewer's concerns but also improve their work. This raises the critical question of how to enhance authors' comprehension of review comments. In this paper, we present SEAGraph, a novel framework developed to clarify review comments by uncovering the underlying intentions behind them. We construct two types of graphs for each paper: the semantic mind graph, which captures the author's thought process, and the hierarchical background graph, which delineates the research domains related to the paper. A retrieval method is then designed to extract relevant content from both graphs, facilitating coherent explanations for the review comments. Extensive experiments show that SEAGraph excels in review comment understanding tasks, offering significant benefits to authors.
- Published
- 2024
37. Coupling-based Convergence Diagnostic and Stepsize Scheme for Stochastic Gradient Descent
- Author
-
Li, Xiang and Xie, Qiaomin
- Subjects
Computer Science - Machine Learning ,Mathematics - Optimization and Control ,Statistics - Machine Learning - Abstract
The convergence behavior of Stochastic Gradient Descent (SGD) crucially depends on the stepsize configuration. When using a constant stepsize, the SGD iterates form a Markov chain, enjoying fast convergence during the initial transient phase. However, when reaching stationarity, the iterates oscillate around the optimum without making further progress. In this paper, we study the convergence diagnostics for SGD with constant stepsize, aiming to develop an effective dynamic stepsize scheme. We propose a novel coupling-based convergence diagnostic procedure, which monitors the distance of two coupled SGD iterates for stationarity detection. Our diagnostic statistic is simple and is shown to track the transition from transience stationarity theoretically. We conduct extensive numerical experiments and compare our method against various existing approaches. Our proposed coupling-based stepsize scheme is observed to achieve superior performance across a diverse set of convex and non-convex problems. Moreover, our results demonstrate the robustness of our approach to a wide range of hyperparameters., Comment: 13 pages, 30 figures, to be published in AAAI 2025
- Published
- 2024
38. SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
- Author
-
Chen, Hao, Wang, Ze, Li, Xiang, Sun, Ximeng, Chen, Fangyi, Liu, Jiang, Wang, Jindong, Raj, Bhiksha, Liu, Zicheng, and Barsoum, Emad
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Efficient image tokenization with high compression ratios remains a critical challenge for training generative models. We present SoftVQ-VAE, a continuous image tokenizer that leverages soft categorical posteriors to aggregate multiple codewords into each latent token, substantially increasing the representation capacity of the latent space. When applied to Transformer-based architectures, our approach compresses 256x256 and 512x512 images using as few as 32 or 64 1-dimensional tokens. Not only does SoftVQ-VAE show consistent and high-quality reconstruction, more importantly, it also achieves state-of-the-art and significantly faster image generation results across different denoising-based generative models. Remarkably, SoftVQ-VAE improves inference throughput by up to 18x for generating 256x256 images and 55x for 512x512 images while achieving competitive FID scores of 1.78 and 2.21 for SiT-XL. It also improves the training efficiency of the generative models by reducing the number of training iterations by 2.3x while maintaining comparable performance. With its fully-differentiable design and semantic-rich latent space, our experiment demonstrates that SoftVQ-VAE achieves efficient tokenization without compromising generation quality, paving the way for more efficient generative models. Code and model are released., Comment: Code and model: https://github.com/Hhhhhhao/continuous_tokenizer
- Published
- 2024
39. Agent-based Video Trimming
- Author
-
Yang, Lingfeng, Chen, Zhenyuan, Li, Xiang, Jia, Peiyang, Long, Liangqu, and Yang, Jian
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
As information becomes more accessible, user-generated videos are increasing in length, placing a burden on viewers to sift through vast content for valuable insights. This trend underscores the need for an algorithm to extract key video information efficiently. Despite significant advancements in highlight detection, moment retrieval, and video summarization, current approaches primarily focus on selecting specific time intervals, often overlooking the relevance between segments and the potential for segment arranging. In this paper, we introduce a novel task called Video Trimming (VT), which focuses on detecting wasted footage, selecting valuable segments, and composing them into a final video with a coherent story. To address this task, we propose Agent-based Video Trimming (AVT), structured into three phases: Video Structuring, Clip Filtering, and Story Composition. Specifically, we employ a Video Captioning Agent to convert video slices into structured textual descriptions, a Filtering Module to dynamically discard low-quality footage based on the structured information of each clip, and a Video Arrangement Agent to select and compile valid clips into a coherent final narrative. For evaluation, we develop a Video Evaluation Agent to assess trimmed videos, conducting assessments in parallel with human evaluations. Additionally, we curate a new benchmark dataset for video trimming using raw user videos from the internet. As a result, AVT received more favorable evaluations in user studies and demonstrated superior mAP and precision on the YouTube Highlights, TVSum, and our own dataset for the highlight detection task. The code and models are available at https://ylingfeng.github.io/AVT.
- Published
- 2024
40. ATPrompt: Textual Prompt Learning with Embedded Attributes
- Author
-
Li, Zheng, Song, Yibing, Zhao, Penghai, Cheng, Ming-Ming, Li, Xiang, and Yang, Jian
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Textual-based prompt learning methods primarily employ multiple learnable soft prompts and hard class tokens in a cascading manner as text prompt inputs, aiming to align image and text (category) spaces for downstream tasks. However, current training is restricted to aligning images with predefined known categories and cannot be associated with unknown categories. In this work, we propose utilizing universal attributes as a bridge to enhance the alignment between images and unknown categories. Specifically, we introduce an Attribute-embedded Textual Prompt learning method for vision-language models, named ATPrompt. This approach expands the learning space of soft prompts from the original one-dimensional category level into the multi-dimensional attribute level by incorporating multiple universal attribute tokens into the learnable soft prompts. Through this modification, we transform the text prompt from a category-centric form to an attribute-category hybrid form. To finalize the attributes for downstream tasks, we propose a differentiable attribute search method that learns to identify representative and suitable attributes from a candidate pool summarized by a large language model. As an easy-to-use plug-in technique, ATPrompt can seamlessly replace the existing prompt format of textual-based methods, offering general improvements at a negligible computational cost. Extensive experiments on 11 datasets demonstrate the effectiveness of our method., Comment: Technical Report. Project Page: https://zhengli97.github.io/ATPrompt/
- Published
- 2024
41. InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
- Author
-
Fan, Tiehan, Nan, Kepan, Xie, Rui, Zhou, Penghao, Yang, Zhenheng, Fu, Chaoyou, Li, Xiang, Yang, Jian, and Tai, Ying
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Text-to-video generation has evolved rapidly in recent years, delivering remarkable results. Training typically relies on video-caption paired data, which plays a crucial role in enhancing generation performance. However, current video captions often suffer from insufficient details, hallucinations and imprecise motion depiction, affecting the fidelity and consistency of generated videos. In this work, we propose a novel instance-aware structured caption framework, termed InstanceCap, to achieve instance-level and fine-grained video caption for the first time. Based on this scheme, we design an auxiliary models cluster to convert original video into instances to enhance instance fidelity. Video instances are further used to refine dense prompts into structured phrases, achieving concise yet precise descriptions. Furthermore, a 22K InstanceVid dataset is curated for training, and an enhancement pipeline that tailored to InstanceCap structure is proposed for inference. Experimental results demonstrate that our proposed InstanceCap significantly outperform previous models, ensuring high fidelity between captions and videos while reducing hallucinations.
- Published
- 2024
42. PAFFA: Premeditated Actions For Fast Agents
- Author
-
Krishna, Shambhavi, Chen, Zheng, Kumar, Vaibhav, Huang, Xiaojiang, Li, Yingjie, Yang, Fan, and Li, Xiang
- Subjects
Computer Science - Artificial Intelligence - Abstract
Modern AI assistants have made significant progress in natural language understanding and API/tool integration, with emerging efforts to incorporate diverse interfaces (such as Web interfaces) for enhanced scalability and functionality. However, current approaches that heavily rely on repeated LLM-driven HTML parsing are computationally expensive and error-prone, particularly when handling dynamic web interfaces and multi-step tasks. To overcome these challenges, we introduce PAFFA (Premeditated Actions For Fast Agents), a framework designed to enhance web interaction capabilities through an Action API Library of reusable, verified browser interaction functions. By pre-computing interaction patterns and employing two core methodologies - "Dist-Map" for task-agnostic element distillation and "Unravel" for incremental page-wise exploration - PAFFA reduces inference calls by 87% while maintaining robust performance even as website structures evolve. This framework accelerates multi-page task execution and offers a scalable solution to advance autonomous web agent research., Comment: 9 pages
- Published
- 2024
43. Personalized and Sequential Text-to-Image Generation
- Author
-
Nabati, Ofir, Tennenholtz, Guy, Hsu, ChihWei, Ryu, Moonkyung, Ramachandran, Deepak, Chow, Yinlam, Li, Xiang, and Boutilier, Craig
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Systems and Control - Abstract
We address the problem of personalized, interactive text-to-image (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions. Using human raters, we create a novel dataset of sequential preferences, which we leverage, together with large-scale open-source (non-sequential) datasets. We construct user-preference and user-choice models using an EM strategy and identify varying user preference types. We then leverage a large multimodal language model (LMM) and a value-based RL approach to suggest a personalized and diverse slate of prompt expansions to the user. Our Personalized And Sequential Text-to-image Agent (PASTA) extends T2I models with personalized multi-turn capabilities, fostering collaborative co-creation and addressing uncertainty or underspecification in a user's intent. We evaluate PASTA using human raters, showing significant improvement compared to baseline methods. We also release our sequential rater dataset and simulated user-rater interactions to support future research in personalized, multi-turn T2I generation., Comment: Link to PASTA dataset: https://www.kaggle.com/datasets/googleai/pasta-data
- Published
- 2024
44. The Field-based Model: A New Perspective on RF-based Material Sensing
- Author
-
Shang, Fei, Jiang, Haocheng, Yang, Panlong, Yan, Dawei, Du, Haohua, and Li, Xiang-Yang
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
This paper introduces the design and implementation of WiField, a WiFi sensing system deployed on COTS devices that can simultaneously identify multiple wavelength-level targets placed flexibly. Unlike traditional RF sensing schemes that focus on specific targets and RF links, WiField focuses on all media in the sensing area for the entire electric field. In this perspective, WiField provides a unified framework to finely characterize the diffraction, scattering, and other effects of targets at different positions, materials, and numbers on signals. The combination of targets in different positions, numbers, and sizes is just a special case. WiField proposed a scheme that utilizes phaseless data to complete the inverse mapping from electric field to material distribution, thereby achieving the simultaneous identification of multiple wavelength-level targets at any position and having the potential for deployment on a wide range of low-cost COTS devices. Our evaluation results show that it has an average identification accuracy of over 97% for 1-3 targets (5 cm * 10 cm in size) with different materials randomly placed within a 1.05 m * 1.05 m area.
- Published
- 2024
45. Enhancing LLMs for Impression Generation in Radiology Reports through a Multi-Agent System
- Author
-
Zeng, Fang, Lyu, Zhiliang, Li, Quanzheng, and Li, Xiang
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
This study introduces "RadCouncil," a multi-agent Large Language Model (LLM) framework designed to enhance the generation of impressions in radiology reports from the finding section. RadCouncil comprises three specialized agents: 1) a "Retrieval" Agent that identifies and retrieves similar reports from a vector database, 2) a "Radiologist" Agent that generates impressions based on the finding section of the given report plus the exemplar reports retrieved by the Retrieval Agent, and 3) a "Reviewer" Agent that evaluates the generated impressions and provides feedback. The performance of RadCouncil was evaluated using both quantitative metrics (BLEU, ROUGE, BERTScore) and qualitative criteria assessed by GPT-4, using chest X-ray as a case study. Experiment results show improvements in RadCouncil over the single-agent approach across multiple dimensions, including diagnostic accuracy, stylistic concordance, and clarity. This study highlights the potential of utilizing multiple interacting LLM agents, each with a dedicated task, to enhance performance in specialized medical tasks and the development of more robust and adaptable healthcare AI solutions.
- Published
- 2024
46. Community Detection with Heterogeneous Block Covariance Model
- Author
-
Li, Xiang, Zhao, Yunpeng, Pan, Qing, and Hao, Ning
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning ,Statistics - Computation - Abstract
Community detection is the task of clustering objects based on their pairwise relationships. Most of the model-based community detection methods, such as the stochastic block model and its variants, are designed for networks with binary (yes/no) edges. In many practical scenarios, edges often possess continuous weights, spanning positive and negative values, which reflect varying levels of connectivity. To address this challenge, we introduce the heterogeneous block covariance model (HBCM) that defines a community structure within the covariance matrix, where edges have signed and continuous weights. Furthermore, it takes into account the heterogeneity of objects when forming connections with other objects within a community. A novel variational expectation-maximization algorithm is proposed to estimate the group membership. The HBCM provides provable consistent estimates of memberships, and its promising performance is observed in numerical simulations with different setups. The model is applied to a single-cell RNA-seq dataset of a mouse embryo and a stock price dataset. Supplementary materials for this article are available online.
- Published
- 2024
47. Grand Challenges in Immersive Technologies for Cultural Heritage
- Author
-
Wang, Hanbing, Du, Junyan, Li, Yue, Zhang, Lie, and Li, Xiang
- Subjects
Computer Science - Computers and Society ,Computer Science - Human-Computer Interaction - Abstract
Cultural heritage, a testament to human history and civilization, has gained increasing recognition for its significance in preservation and dissemination. The integration of immersive technologies has transformed how cultural heritage is presented, enabling audiences to engage with it in more vivid, intuitive, and interactive ways. However, the adoption of these technologies also brings a range of challenges and potential risks. This paper presents a systematic review, with an in-depth analysis of 177 selected papers. We comprehensively examine and categorize current applications, technological approaches, and user devices in immersive cultural heritage presentations, while also highlighting the associated risks and challenges. Furthermore, we identify areas for future research in the immersive presentation of cultural heritage. Our goal is to provide a comprehensive reference for researchers and practitioners, enhancing understanding of the technological applications, risks, and challenges in this field, and encouraging further innovation and development., Comment: 46 pages. Preprint version. Accepted for publication in the International Journal of Human-Computer Interaction (IJHCI)
- Published
- 2024
48. Modeling High Mass X-ray Binaries to Double Neutron Stars through Common Envelope Evolution
- Author
-
Nie, Yu-Dong, Shao, Yong, He, Jian-Guo, Wei, Ze-Lin, Xu, Xiao-Jie, and Li, Xiang-Dong
- Subjects
Astrophysics - Solar and Stellar Astrophysics ,Astrophysics - High Energy Astrophysical Phenomena - Abstract
We present detailed evolutionary simulations of wide binary systems with high-mass ($8-20\,M_{\odot}$) donor stars and a $1.4\,M_{\odot}$ neutron star. Mass transfer in such binaries is dynamically unstable and common envelope (CE) evolution is followed. We use a recently developed prescription to deal with CE evolution and consider various CE ejection efficiencies varying in the range of $0.1-3.0$. We focus on the evolutionary consequences of the binaries survived CE evolution. We demonstrate that it is possible for the binaries to enter a CE decoupling phase (CEDP) when the donor stars are partially stripped leaving a hydrogen envelope of $\lesssim1.0-4.0\,M_\odot$ after CE evolution. This phase is expected to last $\sim 10^4-10^5\,\rm yr$, during which mass transfer occurs stably via Roche lobe overflow with super-Eddington rates. Identification of some X-ray binaries in a CEDP is important for the understanding of the physics of CE evolution itself, the origin of ultraluminous X-ray sources, and the recycling process of accreting pulsars. Also, we discuss the formation of double neutron stars and the occurrence of ultra-stripped supernovae according to the results from our simulations. On the whole, the properties of post-CE binaries are sensitive to the options of CE ejection efficiencies., Comment: 22 pages, 12+2 figures, 1 table, accepted by ApJ
- Published
- 2024
49. XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation
- Author
-
Li, Xiang, Qiu, Kai, Chen, Hao, Kuen, Jason, Gu, Jiuxiang, Wang, Jindong, Lin, Zhe, and Raj, Bhiksha
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Image tokenizers play a critical role in shaping the performance of subsequent generative models. Since the introduction of VQ-GAN, discrete image tokenization has undergone remarkable advancements. Improvements in architecture, quantization techniques, and training recipes have significantly enhanced both image reconstruction and the downstream generation quality. In this paper, we present XQ-GAN, an image tokenization framework designed for both image reconstruction and generation tasks. Our framework integrates state-of-the-art quantization techniques, including vector quantization (VQ), residual quantization (RQ), multi-scale residual quantization (MSVQ), product quantization (PQ), lookup-free quantization (LFQ), and binary spherical quantization (BSQ), within a highly flexible and customizable training environment. On the standard ImageNet 256x256 benchmark, our released model achieves an rFID of 0.64, significantly surpassing MAGVIT-v2 (0.9 rFID) and VAR (0.9 rFID). Furthermore, we demonstrate that using XQ-GAN as a tokenizer improves gFID metrics alongside rFID. For instance, with the same VAR architecture, XQ-GAN+VAR achieves a gFID of 2.6, outperforming VAR's 3.3 gFID by a notable margin. To support further research, we provide pre-trained weights of different image tokenizers for the community to directly train the subsequent generative models on it or fine-tune for specialized tasks., Comment: Code: https://github.com/lxa9867/ImageFolder
- Published
- 2024
50. Impromptu Cybercrime Euphemism Detection
- Author
-
Li, Xiang, Zhou, Yucheng, Zhao, Laiping, Li, Jing, and Liu, Fangming
- Subjects
Computer Science - Computation and Language - Abstract
Detecting euphemisms is essential for content security on various social media platforms, but existing methods designed for detecting euphemisms are ineffective in impromptu euphemisms. In this work, we make a first attempt to an exploration of impromptu euphemism detection and introduce the Impromptu Cybercrime Euphemisms Detection (ICED) dataset. Moreover, we propose a detection framework tailored to this problem, which employs context augmentation modeling and multi-round iterative training. Our detection framework mainly consists of a coarse-grained and a fine-grained classification model. The coarse-grained classification model removes most of the harmless content in the corpus to be detected. The fine-grained model, impromptu euphemisms detector, integrates context augmentation and multi-round iterations training to better predicts the actual meaning of a masked token. In addition, we leverage ChatGPT to evaluate the mode's capability. Experimental results demonstrate that our approach achieves a remarkable 76-fold improvement compared to the previous state-of-the-art euphemism detector.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.