41,464 results on '"Kexin An"'
Search Results
2. Identification of neutral genome integration sites with high expression and high integration efficiency in Fusarium venenatum TB01
- Author
-
Sheng Tong, Kexin An, Wuxi Chen, Mengdan Chai, Yuanxia Sun, Qinhong Wang, and Demao Li
- Subjects
GFP expression library ,Neutral integration site ,Y-shaped adaptor-dependent extension ,CRISPR/Cas9 ,Fusarium venenatum ,Biotechnology ,TP248.13-248.65 ,Biology (General) ,QH301-705.5 - Abstract
CRISPR/Cas9-mediated homology-directed recombination is an efficient method to express target genes. Based on the above method, providing ideal neutral integration sites can ensure the reliable, stable, and high expression of target genes. In this study, we obtained a fluorescent transformant with neutral integration and high expression of the GFP expression cassette from the constructed GFP expression library and named strain FS. The integration site mapped at 4886 bp upstream of the gene FVRRES_00686 was identified in strain FS based on a Y-shaped adaptor-dependent extension, and the sequence containing 600 bp upstream and downstream of this site was selected as the candidate region for designing sgRNAs (Sites) for CRISPR/Cas9-mediated homology-directed recombination. PCR analysis showed that the integration efficiency of CRISPR/Cas9-mediated integration of target genes in designed sites reached 100%. Further expression stability and applicability analysis revealed that the integration of the target gene into the above designed sites can be stably inherited and expressed and has no negative effect on the growth of F. venenatum TB01. These results indicate the above designed neutral sites have the potential to accelerate the development of F. venenatum TB01 through overexpression of target genes in metabolic engineering.
- Published
- 2023
- Full Text
- View/download PDF
3. Diffusion-Based Planning for Autonomous Driving with Flexible Guidance
- Author
-
Zheng, Yinan, Liang, Ruiming, Zheng, Kexin, Zheng, Jinliang, Mao, Liyuan, Li, Jianxiong, Gu, Weihao, Ai, Rui, Li, Shengbo Eben, Zhan, Xianyuan, and Liu, Jingjing
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Achieving human-like driving behaviors in complex open-world environments is a critical challenge in autonomous driving. Contemporary learning-based planning approaches such as imitation learning methods often struggle to balance competing objectives and lack of safety assurance,due to limited adaptability and inadequacy in learning complex multi-modal behaviors commonly exhibited in human planning, not to mention their strong reliance on the fallback strategy with predefined rules. We propose a novel transformer-based Diffusion Planner for closed-loop planning, which can effectively model multi-modal driving behavior and ensure trajectory quality without any rule-based refinement. Our model supports joint modeling of both prediction and planning tasks under the same architecture, enabling cooperative behaviors between vehicles. Moreover, by learning the gradient of the trajectory score function and employing a flexible classifier guidance mechanism, Diffusion Planner effectively achieves safe and adaptable planning behaviors. Evaluations on the large-scale real-world autonomous planning benchmark nuPlan and our newly collected 200-hour delivery-vehicle driving dataset demonstrate that Diffusion Planner achieves state-of-the-art closed-loop performance with robust transferability in diverse driving styles.
- Published
- 2025
4. Qwen2.5-1M Technical Report
- Author
-
Yang, An, Yu, Bowen, Li, Chengyuan, Liu, Dayiheng, Huang, Fei, Huang, Haoyan, Jiang, Jiandong, Tu, Jianhong, Zhang, Jianwei, Zhou, Jingren, Lin, Junyang, Dang, Kai, Yang, Kexin, Yu, Le, Li, Mei, Sun, Minmin, Zhu, Qin, Men, Rui, He, Tao, Xu, Weijia, Yin, Wenbiao, Yu, Wenyuan, Qiu, Xiafei, Ren, Xingzhang, Yang, Xinlong, Li, Yong, Xu, Zhiying, and Zhang, Zipeng
- Subjects
Computer Science - Computation and Language - Abstract
We introduce Qwen2.5-1M, a series of models that extend the context length to 1 million tokens. Compared to the previous 128K version, the Qwen2.5-1M series have significantly enhanced long-context capabilities through long-context pre-training and post-training. Key techniques such as long data synthesis, progressive pre-training, and multi-stage supervised fine-tuning are employed to effectively enhance long-context performance while reducing training costs. To promote the use of long-context models among a broader user base, we present and open-source our inference framework. This framework includes a length extrapolation method that can expand the model context lengths by at least four times, or even more, without additional training. To reduce inference costs, we implement a sparse attention method along with chunked prefill optimization for deployment scenarios and a sparsity refinement method to improve precision. Additionally, we detail our optimizations in the inference engine, including kernel optimization, pipeline parallelism, and scheduling optimization, which significantly enhance overall inference performance. By leveraging our inference framework, the Qwen2.5-1M models achieve a remarkable 3x to 7x prefill speedup in scenarios with 1 million tokens of context. This framework provides an efficient and powerful solution for developing applications that require long-context processing using open-source models. The Qwen2.5-1M series currently includes the open-source models Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M, as well as the API-accessed model Qwen2.5-Turbo. Evaluations show that Qwen2.5-1M models have been greatly improved in long-context tasks without compromising performance in short-context scenarios. Specifically, the Qwen2.5-14B-Instruct-1M model significantly outperforms GPT-4o-mini in long-context tasks and supports contexts eight times longer.
- Published
- 2025
5. Joint System Latency and Data Freshness Optimization for Cache-enabled Mobile Crowdsensing Networks
- Author
-
Shi, Kexin, Fu, Yaru, Guo, Yongna, Wang, Fu Lee, and Zhang, Yan
- Subjects
Computer Science - Networking and Internet Architecture ,Electrical Engineering and Systems Science - Signal Processing - Abstract
Mobile crowdsensing (MCS) networks enable large-scale data collection by leveraging the ubiquity of mobile devices. However, frequent sensing and data transmission can lead to significant resource consumption. To mitigate this issue, edge caching has been proposed as a solution for storing recently collected data. Nonetheless, this approach may compromise data freshness. In this paper, we investigate the trade-off between re-using cached task results and re-sensing tasks in cache-enabled MCS networks, aiming to minimize system latency while maintaining information freshness. To this end, we formulate a weighted delay and age of information (AoI) minimization problem, jointly optimizing sensing decisions, user selection, channel selection, task allocation, and caching strategies. The problem is a mixed-integer non-convex programming problem which is intractable. Therefore, we decompose the long-term problem into sequential one-shot sub-problems and design a framework that optimizes system latency, task sensing decision, and caching strategy subproblems. When one task is re-sensing, the one-shot problem simplifies to the system latency minimization problem, which can be solved optimally. The task sensing decision is then made by comparing the system latency and AoI. Additionally, a Bayesian update strategy is developed to manage the cached task results. Building upon this framework, we propose a lightweight and time-efficient algorithm that makes real-time decisions for the long-term optimization problem. Extensive simulation results validate the effectiveness of our approach.
- Published
- 2025
6. Test-Time Code-Switching for Cross-lingual Aspect Sentiment Triplet Extraction
- Author
-
Sheng, Dongming, Han, Kexin, Li, Hao, Zhang, Yan, Huang, Yucheng, Lang, Jun, and Liu, Wenqiang
- Subjects
Computer Science - Computation and Language - Abstract
Aspect Sentiment Triplet Extraction (ASTE) is a thriving research area with impressive outcomes being achieved on high-resource languages. However, the application of cross-lingual transfer to the ASTE task has been relatively unexplored, and current code-switching methods still suffer from term boundary detection issues and out-of-dictionary problems. In this study, we introduce a novel Test-Time Code-SWitching (TT-CSW) framework, which bridges the gap between the bilingual training phase and the monolingual test-time prediction. During training, a generative model is developed based on bilingual code-switched training data and can produce bilingual ASTE triplets for bilingual inputs. In the testing stage, we employ an alignment-based code-switching technique for test-time augmentation. Extensive experiments on cross-lingual ASTE datasets validate the effectiveness of our proposed method. We achieve an average improvement of 3.7% in terms of weighted-averaged F1 in four datasets with different languages. Additionally, we set a benchmark using ChatGPT and GPT-4, and demonstrate that even smaller generative models fine-tuned with our proposed TT-CSW framework surpass ChatGPT and GPT-4 by 14.2% and 5.0% respectively.
- Published
- 2025
7. 'It's Mentally Painful to Stop': Design Opportunities in In-Situ Self-Management Technology for People with Obsessive-Compulsive Disorder
- Author
-
Wang, Ru, Zhang, Kexin, Wang, Yuqing, Brown, Keri, and Zhao, Yuhang
- Subjects
Computer Science - Human-Computer Interaction ,H.5.0 - Abstract
Obsessive-compulsive disorder (OCD) is a mental health condition significantly affecting people's quality of life. Although OCD can be effectively treated with evidence-based therapy (e.g., exposure and response prevention), Managing OCD symptoms independently, as an indispensable part of successful treatment, remains challenging due to fear confrontation and lack of appropriate support. We aim to comprehensively understand the challenges and needs in OCD self-management from the perspectives of both people with OCD and OCD therapists. Through interviews with 10 participants with diverse OCD conditions and seven therapists, we characterized different OCD symptoms, typical triggering factors, strategies, technology use, and barriers both inside and outside of therapy. Our findings highlighted gaps between OCD self-management needs and currently available support. Building on these insights, we suggest in-situ self-management technologies for OCD, including personalized symptom tracking, in-situ interventions, and discuss how OCD-specific privacy and social needs can be fulfilled with technology and beyond.
- Published
- 2025
8. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
- Author
-
DeepSeek-AI, Guo, Daya, Yang, Dejian, Zhang, Haowei, Song, Junxiao, Zhang, Ruoyu, Xu, Runxin, Zhu, Qihao, Ma, Shirong, Wang, Peiyi, Bi, Xiao, Zhang, Xiaokang, Yu, Xingkai, Wu, Yu, Wu, Z. F., Gou, Zhibin, Shao, Zhihong, Li, Zhuoshu, Gao, Ziyi, Liu, Aixin, Xue, Bing, Wang, Bingxuan, Wu, Bochao, Feng, Bei, Lu, Chengda, Zhao, Chenggang, Deng, Chengqi, Zhang, Chenyu, Ruan, Chong, Dai, Damai, Chen, Deli, Ji, Dongjie, Li, Erhang, Lin, Fangyun, Dai, Fucong, Luo, Fuli, Hao, Guangbo, Chen, Guanting, Li, Guowei, Zhang, H., Bao, Han, Xu, Hanwei, Wang, Haocheng, Ding, Honghui, Xin, Huajian, Gao, Huazuo, Qu, Hui, Li, Hui, Guo, Jianzhong, Li, Jiashi, Wang, Jiawei, Chen, Jingchang, Yuan, Jingyang, Qiu, Junjie, Li, Junlong, Cai, J. L., Ni, Jiaqi, Liang, Jian, Chen, Jin, Dong, Kai, Hu, Kai, Gao, Kaige, Guan, Kang, Huang, Kexin, Yu, Kuai, Wang, Lean, Zhang, Lecong, Zhao, Liang, Wang, Litong, Zhang, Liyue, Xu, Lei, Xia, Leyi, Zhang, Mingchuan, Zhang, Minghua, Tang, Minghui, Li, Meng, Wang, Miaojun, Li, Mingming, Tian, Ning, Huang, Panpan, Zhang, Peng, Wang, Qiancheng, Chen, Qinyu, Du, Qiushi, Ge, Ruiqi, Zhang, Ruisong, Pan, Ruizhe, Wang, Runji, Chen, R. J., Jin, R. L., Chen, Ruyi, Lu, Shanghao, Zhou, Shangyan, Chen, Shanhuang, Ye, Shengfeng, Wang, Shiyu, Yu, Shuiping, Zhou, Shunfeng, Pan, Shuting, Li, S. S., Zhou, Shuang, Wu, Shaoqing, Yun, Tao, Pei, Tian, Sun, Tianyu, Wang, T., Zeng, Wangding, Zhao, Wanjia, Liu, Wen, Liang, Wenfeng, Gao, Wenjun, Yu, Wenqin, Zhang, Wentao, Xiao, W. L., An, Wei, Liu, Xiaodong, Wang, Xiaohan, Chen, Xiaokang, Nie, Xiaotao, Cheng, Xin, Liu, Xin, Xie, Xin, Liu, Xingchao, Yang, Xinyu, Li, Xinyuan, Su, Xuecheng, Lin, Xuheng, Li, X. Q., Jin, Xiangyue, Shen, Xiaojin, Chen, Xiaosha, Sun, Xiaowen, Wang, Xiaoxiang, Song, Xinnan, Zhou, Xinyi, Wang, Xianzu, Shan, Xinxia, Li, Y. K., Wang, Y. Q., Wei, Y. X., Zhang, Yang, Xu, Yanhong, Li, Yao, Zhao, Yao, Sun, Yaofeng, Wang, Yaohui, Yu, Yi, Zhang, Yichao, Shi, Yifan, Xiong, Yiliang, He, Ying, Piao, Yishi, Wang, Yisong, Tan, Yixuan, Ma, Yiyang, Liu, Yiyuan, Guo, Yongqiang, Ou, Yuan, Wang, Yuduan, Gong, Yue, Zou, Yuheng, He, Yujia, Xiong, Yunfan, Luo, Yuxiang, You, Yuxiang, Liu, Yuxuan, Zhou, Yuyang, Zhu, Y. X., Huang, Yanping, Li, Yaohui, Zheng, Yi, Zhu, Yuchen, Ma, Yunxian, Tang, Ying, Zha, Yukun, Yan, Yuting, Ren, Z. Z., Ren, Zehui, Sha, Zhangli, Fu, Zhe, Xu, Zhean, Xie, Zhenda, Zhang, Zhengyan, Hao, Zhewen, Ma, Zhicheng, Yan, Zhigang, Wu, Zhiyu, Gu, Zihui, Zhu, Zijia, Liu, Zijun, Li, Zilin, Xie, Ziwei, Song, Ziyang, Pan, Zizheng, Huang, Zhen, Xu, Zhipeng, Zhang, Zhongyu, and Zhang, Zhen
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.
- Published
- 2025
9. The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility?
- Author
-
Zhang, Yiyi, Chen, Xingyu, Chen, Kexin, Du, Yuyang, Dang, Xilin, and Heng, Pheng-Ann
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Recent years have witnessed extensive efforts to enhance Large Language Models (LLMs) across various domains, alongside growing attention to their ethical implications. However, a critical challenge remains largely overlooked: LLMs must balance between rejecting harmful requests for safety and accommodating legitimate ones for utility. This paper presents a Direct Preference Optimization (DPO) based alignment framework that achieves better overall performance by addressing this ethical-utility trade-off, using chemical domain applications as a proof-of-concept. Our alignment pipeline starts with a GPT-assisted three-phase data generation scheme, in which we create LibraChemQA, a chemical question-answering dataset comprising 31.6k triplet instances. By incorporating an innovative balanced seed in the data generation process, our framework systematically considers both legitimate and illegitimate requests. The framework also introduces a rephrasing mechanism for efficient data augmentation that enhances the model's chemical comprehension. We further develop a novel hybrid evaluation scheme with LLM judges for precise assessment of both safety and utility. Experimental results demonstrate our model's substantial improvements in overall performance where both safety and utility are considered - our resulting model, LibraChem, outperforms leading LLMs including Claude-3, GPT-4o, and LLaMA-3 by margins of 13.44%, 7.16%, and 7.10% respectively on our released benchmark.
- Published
- 2025
10. Impact of the returning radiation on X-ray reflection spectroscopy measurements: the case of Galactic black holes
- Author
-
Huang, Kexin, Liu, Honghui, Bambi, Cosimo, Garcia, Javier A., and Zhang, Zuobin
- Subjects
Astrophysics - High Energy Astrophysical Phenomena - Abstract
The effect of the returning radiation has long been ignored in the analysis of the reflection spectra of Galactic black holes and active galactic nuclei and only recently has been implemented in the relxill package. Here we present a study on the impact of the returning radiation on the estimate of the parameters of Galactic black holes. We consider high-quality NuSTAR spectra of three Galactic black holes (GX 339-4, Swift J1658.2-4242, and MAXI J1535-571) and we fit the data with the lamppost model in the latest version of relxill, first without including the returning radiation and then including the returning radiation. We do not find any significant difference in the estimate of the parameters of these systems between the two cases, even if all three sources are fast-rotating black holes and for two sources the estimate of the height of the corona is very low, two ingredients that should maximize the effect of the returning radiation. We discuss our results and the approximations in relxill., Comment: 10 pages, 4 figures
- Published
- 2025
11. Future-Conditioned Recommendations with Multi-Objective Controllable Decision Transformer
- Author
-
Gao, Chongming, Huang, Kexin, Fei, Ziang, Chen, Jiaju, Chen, Jiawei, Sun, Jianshan, Liu, Shuchang, Cai, Qingpeng, and Jiang, Peng
- Subjects
Computer Science - Information Retrieval - Abstract
Securing long-term success is the ultimate aim of recommender systems, demanding strategies capable of foreseeing and shaping the impact of decisions on future user satisfaction. Current recommendation strategies grapple with two significant hurdles. Firstly, the future impacts of recommendation decisions remain obscured, rendering it impractical to evaluate them through direct optimization of immediate metrics. Secondly, conflicts often emerge between multiple objectives, like enhancing accuracy versus exploring diverse recommendations. Existing strategies, trapped in a "training, evaluation, and retraining" loop, grow more labor-intensive as objectives evolve. To address these challenges, we introduce a future-conditioned strategy for multi-objective controllable recommendations, allowing for the direct specification of future objectives and empowering the model to generate item sequences that align with these goals autoregressively. We present the Multi-Objective Controllable Decision Transformer (MocDT), an offline Reinforcement Learning (RL) model capable of autonomously learning the mapping from multiple objectives to item sequences, leveraging extensive offline data. Consequently, it can produce recommendations tailored to any specified objectives during the inference stage. Our empirical findings emphasize the controllable recommendation strategy's ability to produce item sequences according to different objectives while maintaining performance that is competitive with current recommendation strategies across various objectives.
- Published
- 2025
12. ACORD: An Expert-Annotated Retrieval Dataset for Legal Contract Drafting
- Author
-
Wang, Steven H., Zubkov, Maksim, Fan, Kexin, Harrell, Sarah, Sun, Yuyang, Chen, Wei, Plesner, Andreas, and Wattenhofer, Roger
- Subjects
Computer Science - Computation and Language - Abstract
Information retrieval, specifically contract clause retrieval, is foundational to contract drafting because lawyers rarely draft contracts from scratch; instead, they locate and revise the most relevant precedent. We introduce the Atticus Clause Retrieval Dataset (ACORD), the first retrieval benchmark for contract drafting fully annotated by experts. ACORD focuses on complex contract clauses such as Limitation of Liability, Indemnification, Change of Control, and Most Favored Nation. It includes 114 queries and over 126,000 query-clause pairs, each ranked on a scale from 1 to 5 stars. The task is to find the most relevant precedent clauses to a query. The bi-encoder retriever paired with pointwise LLMs re-rankers shows promising results. However, substantial improvements are still needed to effectively manage the complex legal work typically undertaken by lawyers. As the first retrieval benchmark for contract drafting annotated by experts, ACORD can serve as a valuable IR benchmark for the NLP community.
- Published
- 2025
13. Neural DNF-MT: A Neuro-symbolic Approach for Learning Interpretable and Editable Policies
- Author
-
Baugh, Kexin Gu, Dickens, Luke, and Russo, Alessandra
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Computer Science - Logic in Computer Science - Abstract
Although deep reinforcement learning has been shown to be effective, the model's black-box nature presents barriers to direct policy interpretation. To address this problem, we propose a neuro-symbolic approach called neural DNF-MT for end-to-end policy learning. The differentiable nature of the neural DNF-MT model enables the use of deep actor-critic algorithms for training. At the same time, its architecture is designed so that trained models can be directly translated into interpretable policies expressed as standard (bivalent or probabilistic) logic programs. Moreover, additional layers can be included to extract abstract features from complex observations, acting as a form of predicate invention. The logic representations are highly interpretable, and we show how the bivalent representations of deterministic policies can be edited and incorporated back into a neural model, facilitating manual intervention and adaptation of learned policies. We evaluate our approach on a range of tasks requiring learning deterministic or stochastic behaviours from various forms of observations. Our empirical results show that our neural DNF-MT model performs at the level of competing black-box methods whilst providing interpretable policies., Comment: AAMAS 2025
- Published
- 2025
14. Generalization-Enhanced Few-Shot Object Detection in Remote Sensing
- Author
-
Lin, Hui, Li, Nan, Yao, Pengjuan, Dong, Kexin, Guo, Yuhan, Hong, Danfeng, Zhang, Ying, and Wen, Congcong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Remote sensing object detection is particularly challenging due to the high resolution, multi-scale features, and diverse ground object characteristics inherent in satellite and UAV imagery. These challenges necessitate more advanced approaches for effective object detection in such environments. While deep learning methods have achieved remarkable success in remote sensing object detection, they typically rely on large amounts of labeled data. Acquiring sufficient labeled data, particularly for novel or rare objects, is both challenging and time-consuming in remote sensing scenarios, limiting the generalization capabilities of existing models. To address these challenges, few-shot learning (FSL) has emerged as a promising approach, aiming to enable models to learn new classes from limited labeled examples. Building on this concept, few-shot object detection (FSOD) specifically targets object detection challenges in data-limited conditions. However, the generalization capability of FSOD models, particularly in remote sensing, is often constrained by the complex and diverse characteristics of the objects present in such environments. In this paper, we propose the Generalization-Enhanced Few-Shot Object Detection (GE-FSOD) model to improve the generalization capability in remote sensing FSOD tasks. Our model introduces three key innovations: the Cross-Level Fusion Pyramid Attention Network (CFPAN) for enhanced multi-scale feature representation, the Multi-Stage Refinement Region Proposal Network (MRRPN) for more accurate region proposals, and the Generalized Classification Loss (GCL) for improved classification performance in few-shot scenarios. Extensive experiments on the DIOR and NWPU VHR-10 datasets show that our model achieves state-of-the-art performance for few-shot object detection in remote sensing.
- Published
- 2025
15. FedRSClip: Federated Learning for Remote Sensing Scene Classification Using Vision-Language Models
- Author
-
Lin, Hui, Zhang, Chao, Hong, Danfeng, Dong, Kexin, and Wen, Congcong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Remote sensing data is often distributed across multiple institutions, and due to privacy concerns and data-sharing restrictions, leveraging large-scale datasets in a centralized training framework is challenging. Federated learning offers a promising solution by enabling collaborative model training across distributed data sources without requiring data centralization. However, current Vision-Language Models (VLMs), which typically contain billions of parameters, pose significant communication challenges for traditional federated learning approaches based on model parameter updates, as they would incur substantial communication costs. In this paper, we propose FedRSCLIP, the first federated learning framework designed for remote sensing image classification based on a VLM, specifically CLIP. FedRSCLIP addresses the challenges of data heterogeneity and large-scale model transmission in federated environments by introducing Prompt Learning, which optimizes only a small set of tunable parameters. The framework introduces a dual-prompt mechanism, comprising Shared Prompts for global knowledge sharing and Private Prompts for client-specific adaptation. To maintain semantic coherence between shared and private prompts, we propose the Dual Prompt Alignment Constraint to balance global consistency and local adaptability across diverse client distributions. Additionally, to enhance cross-modal representation learning, we introduce the Cross-Modal Feature Alignment Constraint to align multimodal features between text and image prompts. To validate the effectiveness of our proposed model, we construct a Fed-RSIC dataset based on three existing remote sensing image classification datasets, specifically designed to simulate various federated learning configurations. Experimental results demonstrate the effectiveness and superiority of FedRSCLIP in remote sensing image classification.
- Published
- 2025
16. Multi-Satellite Beam Hopping and Power Allocation Using Deep Reinforcement Learning
- Author
-
Xie, Xia, Fan, Kexin, Deng, Wenfeng, Pappas, Nikolaos, and Zhang, Qinyu
- Subjects
Electrical Engineering and Systems Science - Systems and Control - Abstract
In non-geostationary orbit (NGSO) satellite communication systems, effectively utilizing beam hopping (BH) technology is crucial for addressing uneven traffic demands. However, optimizing beam scheduling and resource allocation in multi-NGSO BH scenarios remains a significant challenge. This paper proposes a multi-NGSO BH algorithm based on deep reinforcement learning (DRL) to optimize beam illumination patterns and power allocation. By leveraging three degrees of freedom (i.e., time, space, and power), the algorithm aims to optimize the long-term throughput and the long-term cumulative average delay (LTCAD). The solution is based on proximal policy optimization (PPO) with a hybrid action space combining discrete and continuous actions. Using two policy networks with a shared base layer, the proposed algorithm jointly optimizes beam scheduling and power allocation. One network selects beam illumination patterns in the discrete action space, while the other manages power allocation in the continuous space. Simulation results show that the proposed algorithm significantly reduces LTCAD while maintaining high throughput in time-varying traffic scenarios. Compared to the four benchmark methods, it improves network throughput by up to $8.9\%$ and reduces LTCAD by up to $69.2\%$
- Published
- 2025
17. Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search
- Author
-
Li, Shuangtao, Dong, Shuaihao, Luan, Kexin, Di, Xinhan, and Ding, Chaofan
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Large language models (LLMs) have demonstrated their remarkable capacity across a variety of tasks. However, reasoning remains a challenge for LLMs. To improve LLMs' reasoning ability, process supervision has proven to be better than outcome supervision. In this work, we study using Monte Carlo Tree Search (MCTS) to generate process supervision data with LLMs themselves for training them. We sample reasoning steps with an LLM and assign each step a score that captures its "relative correctness," and the LLM is then trained by minimizing weighted log-likelihood of generating the reasoning steps. This generate-then-train process is repeated iteratively until convergence.Our experimental results demonstrate that the proposed methods considerably improve the performance of LLMs on two mathematical reasoning datasets. Furthermore, models trained on one dataset also exhibit improved performance on the other, showing the transferability of the enhanced reasoning ability., Comment: 5 pages, 1 figure, 2 tables accepted by aaai 2025 NeurMAD workshop
- Published
- 2025
18. Public Access Defibrillator Deployment for Cardiac Arrests: A Learn-Then-Optimize Approach with SHAP-based Interpretable Analytics
- Author
-
Yang, Chih-Yuan, Leong, Keng-Hou, Cao, Kexin, Yang, Mingchuan, Kin, Wai, and Chan
- Subjects
Mathematics - Optimization and Control - Abstract
Out-of-hospital cardiac arrest (OHCA) survival rates remain extremely low due to challenges in the timely accessibility of medical devices. Therefore, effective deployment of automated external defibrillators (AED) can significantly increase survival rates. Precise and interpretable predictions of OHCA occurrences provide a solid foundation for efficient and robust AED deployment optimization. This study develops a novel learn-then-optimize approach, integrating three key components: a machine learning prediction model, SHAP-based interpretable analytics, and a SHAP-guided integer programming (SIP) model. The machine learning model is trained utilizing only geographic data as inputs to overcome data availability obstacles, and its strong predictive performance validates the feasibility of interpretation. Furthermore, the SHAP model elaborates on the contribution of each geographic feature to the OHCA occurrences. Finally, an integer programming model is formulated for optimizing AED deployment, incorporating SHAP-weighted OHCA densities. Various numerical experiments are conducted across different settings. Based on comparative and sensitive analysis, the optimization effect of our approach is verified and valuable insights are derived to provide substantial support for theoretical extension and practical implementation.
- Published
- 2025
19. Hypersurface Arrangements with Generic Hypersurfaces Added
- Author
-
Reinke, Bernhard and Wang, Kexin
- Subjects
Mathematics - Algebraic Geometry ,05E14, 14C17, 52C35, 90C05, 90C51 - Abstract
The Euler characteristic of a very affine variety encodes the number of critical points of the likelihood equation on this variety. In this paper, we study the Euler characteristic of the complement of a hypersurface arrangement with generic hypersurfaces added. For hyperplane arrangements, it depends on the characteristic polynomial coefficients and generic hypersurface degrees. As a corollary, we show that adding a degree-two hypersurface to a real hyperplane arrangement enables efficient sampling of a single interior point from each region in the complement. We compare the method to existing alternatives and demonstrate its efficiency. For hypersurface arrangements, the Euler characteristic is expressed in terms of Milnor numbers and generic hypersurface degrees. This formulation further yields a novel upper bound on the number of regions in the complement of a hypersurface arrangement., Comment: 19 pages, 3 figures, 2 tables
- Published
- 2024
20. DeepSeek-V3 Technical Report
- Author
-
DeepSeek-AI, Liu, Aixin, Feng, Bei, Xue, Bing, Wang, Bingxuan, Wu, Bochao, Lu, Chengda, Zhao, Chenggang, Deng, Chengqi, Zhang, Chenyu, Ruan, Chong, Dai, Damai, Guo, Daya, Yang, Dejian, Chen, Deli, Ji, Dongjie, Li, Erhang, Lin, Fangyun, Dai, Fucong, Luo, Fuli, Hao, Guangbo, Chen, Guanting, Li, Guowei, Zhang, H., Bao, Han, Xu, Hanwei, Wang, Haocheng, Zhang, Haowei, Ding, Honghui, Xin, Huajian, Gao, Huazuo, Li, Hui, Qu, Hui, Cai, J. L., Liang, Jian, Guo, Jianzhong, Ni, Jiaqi, Li, Jiashi, Wang, Jiawei, Chen, Jin, Chen, Jingchang, Yuan, Jingyang, Qiu, Junjie, Li, Junlong, Song, Junxiao, Dong, Kai, Hu, Kai, Gao, Kaige, Guan, Kang, Huang, Kexin, Yu, Kuai, Wang, Lean, Zhang, Lecong, Xu, Lei, Xia, Leyi, Zhao, Liang, Wang, Litong, Zhang, Liyue, Li, Meng, Wang, Miaojun, Zhang, Mingchuan, Zhang, Minghua, Tang, Minghui, Li, Mingming, Tian, Ning, Huang, Panpan, Wang, Peiyi, Zhang, Peng, Wang, Qiancheng, Zhu, Qihao, Chen, Qinyu, Du, Qiushi, Chen, R. J., Jin, R. L., Ge, Ruiqi, Zhang, Ruisong, Pan, Ruizhe, Wang, Runji, Xu, Runxin, Zhang, Ruoyu, Chen, Ruyi, Li, S. S., Lu, Shanghao, Zhou, Shangyan, Chen, Shanhuang, Wu, Shaoqing, Ye, Shengfeng, Ma, Shirong, Wang, Shiyu, Zhou, Shuang, Yu, Shuiping, Zhou, Shunfeng, Pan, Shuting, Wang, T., Yun, Tao, Pei, Tian, Sun, Tianyu, Xiao, W. L., Zeng, Wangding, Zhao, Wanjia, An, Wei, Liu, Wen, Liang, Wenfeng, Gao, Wenjun, Yu, Wenqin, Zhang, Wentao, Li, X. Q., Jin, Xiangyue, Wang, Xianzu, Bi, Xiao, Liu, Xiaodong, Wang, Xiaohan, Shen, Xiaojin, Chen, Xiaokang, Zhang, Xiaokang, Chen, Xiaosha, Nie, Xiaotao, Sun, Xiaowen, Wang, Xiaoxiang, Cheng, Xin, Liu, Xin, Xie, Xin, Liu, Xingchao, Yu, Xingkai, Song, Xinnan, Shan, Xinxia, Zhou, Xinyi, Yang, Xinyu, Li, Xinyuan, Su, Xuecheng, Lin, Xuheng, Li, Y. K., Wang, Y. Q., Wei, Y. X., Zhu, Y. X., Zhang, Yang, Xu, Yanhong, Huang, Yanping, Li, Yao, Zhao, Yao, Sun, Yaofeng, Li, Yaohui, Wang, Yaohui, Yu, Yi, Zheng, Yi, Zhang, Yichao, Shi, Yifan, Xiong, Yiliang, He, Ying, Tang, Ying, Piao, Yishi, Wang, Yisong, Tan, Yixuan, Ma, Yiyang, Liu, Yiyuan, Guo, Yongqiang, Wu, Yu, Ou, Yuan, Zhu, Yuchen, Wang, Yuduan, Gong, Yue, Zou, Yuheng, He, Yujia, Zha, Yukun, Xiong, Yunfan, Ma, Yunxian, Yan, Yuting, Luo, Yuxiang, You, Yuxiang, Liu, Yuxuan, Zhou, Yuyang, Wu, Z. F., Ren, Z. Z., Ren, Zehui, Sha, Zhangli, Fu, Zhe, Xu, Zhean, Huang, Zhen, Zhang, Zhen, Xie, Zhenda, Zhang, Zhengyan, Hao, Zhewen, Gou, Zhibin, Ma, Zhicheng, Yan, Zhigang, Shao, Zhihong, Xu, Zhipeng, Wu, Zhiyu, Zhang, Zhongyu, Li, Zhuoshu, Gu, Zihui, Zhu, Zijia, Liu, Zijun, Li, Zilin, Xie, Ziwei, Song, Ziyang, Gao, Ziyi, and Pan, Zizheng
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at https://github.com/deepseek-ai/DeepSeek-V3.
- Published
- 2024
21. Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning
- Author
-
Jiang, Huchen, Ma, Yangyang, Ding, Chaofan, Luan, Kexin, and Di, Xinhan
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
With current state-of-the-art approaches aimed at enhancing the reasoning capabilities of Large Language Models(LLMs) through iterative preference learning inspired by AlphaZero, we propose to further enhance the step-wise reasoning capabilities through intrinsic self-correction to some extent. Our work leverages step-wise preference learning to enhance self-verification via reinforcement learning. We initially conduct our work through a two-stage training procedure. At the first stage, the self-correction reasoning ability of an LLM is enhanced through its own predictions, relying entirely on self-generated data within the intrinsic self-correction to some extent. At the second stage, the baseline step-wise preference learning is leveraged via the application of the enhanced self-correct policy achieved at the first stage. In the evaluation of arithmetic reasoning tasks, our approach outperforms OpenMath2-Llama3.1-8B, dart-math-mistral-7b-uniform on MATH with increases in accuracy to 71.34%(+4.18%) and 48.06%(+4.94%) and LLama-3.1-8B-Instruct, Mistral-7B-Instruct-v0.1 on GSM8K with increases in accuracy to 86.76%(+2.00%) and 38.06%(+2.28%)., Comment: 6 Pages,3 figures, accepted by AAAI 2025 Workshop NeurMAD
- Published
- 2024
22. XR for All: Understanding Developer Perspectives on Accessibility Integration in Extended Reality
- Author
-
Killough, Daniel, Ji, Tiger F., Zhang, Kexin, Hu, Yaxin, Huang, Yu, Du, Ruofei, and Zhao, Yuhang
- Subjects
Computer Science - Human-Computer Interaction - Abstract
As immersive technologies enable unique, multimodal interaction methods, developers must also use tailored methods to support user accessibility, distinct from traditional software practices. Therefore, we interviewed 25 industry extended reality (XR) developers, including freelancers, startups, midsize, and big tech companies about their motivations, techniques, barriers, and attitudes towards incorporating accessibility features in their XR apps. Our study revealed a variety of challenges, including conflicting priorities between application and platform developers regarding accessibility infrastructure; startups rapid development culture prohibiting accessible development; and the lack of accessible interaction design considerations at the ideation, design, and early prototyping stages. As a comprehensive set of XR accessibility guidelines has yet to be established, we also compiled and evaluated a set of accessibility guidelines for 3D virtual worlds and addressed their limitations when applied to XR. Finally, we inform the creation of effective support methods for industry developers., Comment: Preprint
- Published
- 2024
23. Robust PCA Based on Adaptive Weighted Least Squares and Low-Rank Matrix Factorization
- Author
-
Li, Kexin, Wen, You-wei, Xiao, Xu, and Zhao, Mingchao
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Robust Principal Component Analysis (RPCA) is a fundamental technique for decomposing data into low-rank and sparse components, which plays a critical role for applications such as image processing and anomaly detection. Traditional RPCA methods commonly use $\ell_1$ norm regularization to enforce sparsity, but this approach can introduce bias and result in suboptimal estimates, particularly in the presence of significant noise or outliers. Non-convex regularization methods have been proposed to mitigate these challenges, but they tend to be complex to optimize and sensitive to initial conditions, leading to potential instability in solutions. To overcome these challenges, in this paper, we propose a novel RPCA model that integrates adaptive weighted least squares (AWLS) and low-rank matrix factorization (LRMF). The model employs a {self-attention-inspired} mechanism in its weight update process, allowing the weight matrix to dynamically adjust and emphasize significant components during each iteration. By employing a weighted F-norm for the sparse component, our method effectively reduces bias while simplifying the computational process compared to traditional $\ell_1$-norm-based methods. We use an alternating minimization algorithm, where each subproblem has an explicit solution, thereby improving computational efficiency. Despite its simplicity, numerical experiments demonstrate that our method outperforms existing non-convex regularization approaches, offering superior performance and stability, as well as enhanced accuracy and robustness in practical applications.
- Published
- 2024
24. OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving
- Author
-
Xing, Shuo, Qian, Chengyuan, Wang, Yuping, Hua, Hongyuan, Tian, Kexin, Zhou, Yang, and Tu, Zhengzhong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Computer Science - Robotics - Abstract
Since the advent of Multimodal Large Language Models (MLLMs), they have made a significant impact across a wide range of real-world applications, particularly in Autonomous Driving (AD). Their ability to process complex visual data and reason about intricate driving scenarios has paved the way for a new paradigm in end-to-end AD systems. However, the progress of developing end-to-end models for AD has been slow, as existing fine-tuning methods demand substantial resources, including extensive computational power, large-scale datasets, and significant funding. Drawing inspiration from recent advancements in inference computing, we propose OpenEMMA, an open-source end-to-end framework based on MLLMs. By incorporating the Chain-of-Thought reasoning process, OpenEMMA achieves significant improvements compared to the baseline when leveraging a diverse range of MLLMs. Furthermore, OpenEMMA demonstrates effectiveness, generalizability, and robustness across a variety of challenging driving scenarios, offering a more efficient and effective approach to autonomous driving. We release all the codes in https://github.com/taco-group/OpenEMMA.
- Published
- 2024
25. AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving
- Author
-
Xing, Shuo, Hua, Hongyuan, Gao, Xiangbo, Zhu, Shenzhe, Li, Renjie, Tian, Kexin, Li, Xiaopeng, Huang, Heng, Yang, Tianbao, Wang, Zhangyang, Zhou, Yang, Yao, Huaxiu, and Tu, Zhengzhong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Computer Science - Robotics - Abstract
Recent advancements in large vision language models (VLMs) tailored for autonomous driving (AD) have shown strong scene understanding and reasoning capabilities, making them undeniable candidates for end-to-end driving systems. However, limited work exists on studying the trustworthiness of DriveVLMs -- a critical factor that directly impacts public transportation safety. In this paper, we introduce AutoTrust, a comprehensive trustworthiness benchmark for large vision-language models in autonomous driving (DriveVLMs), considering diverse perspectives -- including trustfulness, safety, robustness, privacy, and fairness. We constructed the largest visual question-answering dataset for investigating trustworthiness issues in driving scenarios, comprising over 10k unique scenes and 18k queries. We evaluated six publicly available VLMs, spanning from generalist to specialist, from open-source to commercial models. Our exhaustive evaluations have unveiled previously undiscovered vulnerabilities of DriveVLMs to trustworthiness threats. Specifically, we found that the general VLMs like LLaVA-v1.6 and GPT-4o-mini surprisingly outperform specialized models fine-tuned for driving in terms of overall trustworthiness. DriveVLMs like DriveLM-Agent are particularly vulnerable to disclosing sensitive information. Additionally, both generalist and specialist VLMs remain susceptible to adversarial attacks and struggle to ensure unbiased decision-making across diverse environments and populations. Our findings call for immediate and decisive action to address the trustworthiness of DriveVLMs -- an issue of critical importance to public safety and the welfare of all citizens relying on autonomous transportation systems. Our benchmark is publicly available at \url{https://github.com/taco-group/AutoTrust}, and the leaderboard is released at \url{https://taco-group.github.io/AutoTrust/}., Comment: 55 pages, 14 figures
- Published
- 2024
26. Qwen2.5 Technical Report
- Author
-
Qwen, Yang, An, Yang, Baosong, Zhang, Beichen, Hui, Binyuan, Zheng, Bo, Yu, Bowen, Li, Chengyuan, Liu, Dayiheng, Huang, Fei, Wei, Haoran, Lin, Huan, Yang, Jian, Tu, Jianhong, Zhang, Jianwei, Yang, Jianxin, Yang, Jiaxi, Zhou, Jingren, Lin, Junyang, Dang, Kai, Lu, Keming, Bao, Keqin, Yang, Kexin, Yu, Le, Li, Mei, Xue, Mingfeng, Zhang, Pei, Zhu, Qin, Men, Rui, Lin, Runji, Li, Tianhao, Tang, Tianyi, Xia, Tingyu, Ren, Xingzhang, Ren, Xuancheng, Fan, Yang, Su, Yang, Zhang, Yichang, Wan, Yu, Liu, Yuqiong, Cui, Zeyu, Zhang, Zhenru, and Qiu, Zihan
- Subjects
Computer Science - Computation and Language - Abstract
In this report, we introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved during both the pre-training and post-training stages. In terms of pre-training, we have scaled the high-quality pre-training datasets from the previous 7 trillion tokens to 18 trillion tokens. This provides a strong foundation for common sense, expert knowledge, and reasoning capabilities. In terms of post-training, we implement intricate supervised finetuning with over 1 million samples, as well as multistage reinforcement learning. Post-training techniques enhance human preference, and notably improve long text generation, structural data analysis, and instruction following. To handle diverse and varied use cases effectively, we present Qwen2.5 LLM series in rich sizes. Open-weight offerings include base and instruction-tuned models, with quantized versions available. In addition, for hosted solutions, the proprietary models currently include two mixture-of-experts (MoE) variants: Qwen2.5-Turbo and Qwen2.5-Plus, both available from Alibaba Cloud Model Studio. Qwen2.5 has demonstrated top-tier performance on a wide range of benchmarks evaluating language understanding, reasoning, mathematics, coding, human preference alignment, etc. Specifically, the open-weight flagship Qwen2.5-72B-Instruct outperforms a number of open and proprietary models and demonstrates competitive performance to the state-of-the-art open-weight model, Llama-3-405B-Instruct, which is around 5 times larger. Qwen2.5-Turbo and Qwen2.5-Plus offer superior cost-effectiveness while performing competitively against GPT-4o-mini and GPT-4o respectively. Additionally, as the foundation, Qwen2.5 models have been instrumental in training specialized models such as Qwen2.5-Math, Qwen2.5-Coder, QwQ, and multimodal models.
- Published
- 2024
27. Global track finding based on the Hough transform in the STCF detector
- Author
-
Zhou, Hang, Sun, Kexin, Lu, Zhenna, Li, Hao, Ai, Xiaocong, Zhang, Jin, Huang, Xingtao, and Liu, Jianbei
- Subjects
High Energy Physics - Experiment ,Physics - Instrumentation and Detectors - Abstract
The proposed Super Tau-Charm Facility (STCF) is an electron-positron collider designed to operate in a center-of-mass energy range from 2 to 7 GeV. It provides a unique platform for physics research in the tau-charm energy region. To fulfill the physics goals of STCF, high tracking efficiency and good momentum resolution is required for charged particles with momenta from 50 MeV/c to 3.5 GeV/c. A global track finding algorithm based on Hough transform has been developed and implemented in the STCF software framework to meet this requirement. The design of the algorithm and its performance with simulation are presented in this paper.
- Published
- 2024
28. Low-Rank Adaptation with Task-Relevant Feature Enhancement for Fine-tuning Language Models
- Author
-
Li, Changqun, Ding, Chaofan, Luan, Kexin, and Di, Xinhan
- Subjects
Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Fine-tuning pre-trained large language models in a parameter-efficient manner is widely studied for its effectiveness and efficiency. LoRA is one of the most widely used methods, which assumes that the optimization process is essentially low dimensional. Although LoRA has demonstrated commendable performance, there remains a significant performance gap between LoRA and full fine-tuning when learning new tasks. In this work, we propose Low-Rank Adaptation with Task-Relevant Feature Enhancement(LoRATRF) for enhancing task-relevant features from the perspective of editing neural network representations. To prioritize task-relevant features, a task-aware filter that selectively extracts valuable knowledge from hidden representations for the target or current task is designed. As the experiments on a vareity of datasets including NLU, commonsense reasoning and mathematical reasoning tasks demonstrates, our method reduces 33.71% parameters and achieves better performance on a variety of datasets in comparison with SOTA low-rank methods., Comment: 6 Pages, 3 figures accepted by AAAI 2025 CoLoRAI - Connecting Low-Rank Representations in AI Workshop
- Published
- 2024
29. SPRec: Leveraging Self-Play to Debias Preference Alignment for Large Language Model-based Recommendations
- Author
-
Gao, Chongming, Chen, Ruijun, Yuan, Shuai, Huang, Kexin, Yu, Yuanqing, and He, Xiangnan
- Subjects
Computer Science - Information Retrieval - Abstract
Large language models (LLMs) have attracted significant attention in recommendation systems. Current LLM-based recommender systems primarily rely on supervised fine-tuning (SFT) to train the model for recommendation tasks. However, relying solely on positive samples limits the model's ability to align with user satisfaction and expectations. To address this, researchers have introduced Direct Preference Optimization (DPO), which explicitly aligns recommendations with user preferences using offline preference ranking data. Despite its advantages, our theoretical analysis reveals that DPO inherently biases the model towards a few items, exacerbating the filter bubble issue and ultimately degrading user experience. In this paper, we propose SPRec, a novel self-play recommendation framework designed to mitigate over-recommendation and improve fairness without requiring additional data or manual intervention. In each self-play iteration, the model undergoes an SFT step followed by a DPO step, treating offline interaction data as positive samples and the predicted outputs from the previous iteration as negative samples. This effectively re-weights the DPO loss function using the model's logits, adaptively suppressing biased items. Extensive experiments on multiple real-world datasets demonstrate SPRec's effectiveness in enhancing recommendation accuracy and addressing fairness concerns. The implementation is available via https://github.com/RegionCh/SPRec
- Published
- 2024
30. Collaborative Hybrid Propagator for Temporal Misalignment in Audio-Visual Segmentation
- Author
-
Li, Kexin, Yang, Zongxin, Yang, Yi, and Xiao, Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Computer Science - Multimedia ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Audio-visual video segmentation (AVVS) aims to generate pixel-level maps of sound-producing objects that accurately align with the corresponding audio. However, existing methods often face temporal misalignment, where audio cues and segmentation results are not temporally coordinated. Audio provides two critical pieces of information: i) target object-level details and ii) the timing of when objects start and stop producing sounds. Current methods focus more on object-level information but neglect the boundaries of audio semantic changes, leading to temporal misalignment. To address this issue, we propose a Collaborative Hybrid Propagator Framework~(Co-Prop). This framework includes two main steps: Preliminary Audio Boundary Anchoring and Frame-by-Frame Audio-Insert Propagation. To Anchor the audio boundary, we employ retrieval-assist prompts with Qwen large language models to identify control points of audio semantic changes. These control points split the audio into semantically consistent audio portions. After obtaining the control point lists, we propose the Audio Insertion Propagator to process each audio portion using a frame-by-frame audio insertion propagation and matching approach. We curated a compact dataset comprising diverse source conversion cases and devised a metric to assess alignment rates. Compared to traditional simultaneous processing methods, our approach reduces memory requirements and facilitates frame alignment. Experimental results demonstrate the effectiveness of our approach across three datasets and two backbones. Furthermore, our method can be integrated with existing AVVS approaches, offering plug-and-play functionality to enhance their performance.
- Published
- 2024
31. Fast Mixing of Data Augmentation Algorithms: Bayesian Probit, Logit, and Lasso Regression
- Author
-
Lee, Holden and Zhang, Kexin
- Subjects
Mathematics - Statistics Theory ,Mathematics - Probability ,Statistics - Machine Learning - Abstract
Despite the widespread use of the data augmentation (DA) algorithm, the theoretical understanding of its convergence behavior remains incomplete. We prove the first non-asymptotic polynomial upper bounds on mixing times of three important DA algorithms: DA algorithm for Bayesian Probit regression (Albert and Chib, 1993, ProbitDA), Bayesian Logit regression (Polson, Scott, and Windle, 2013, LogitDA), and Bayesian Lasso regression (Park and Casella, 2008, Rajaratnam et al., 2015, LassoDA). Concretely, we demonstrate that with $\eta$-warm start, parameter dimension $d$, and sample size $n$, the ProbitDA and LogitDA require $\mathcal{O}\left(nd\log \left(\frac{\log \eta}{\epsilon}\right)\right)$ steps to obtain samples with at most $\epsilon$ TV error, whereas the LassoDA requires $\mathcal{O}\left(d^2(d\log d +n \log n)^2 \log \left(\frac{\eta}{\epsilon}\right)\right)$ steps. The results are generally applicable to settings with large $n$ and large $d$, including settings with highly imbalanced response data in the Probit and Logit regression. The proofs are based on the Markov chain conductance and isoperimetric inequalities. Assuming that data are independently generated from either a bounded, sub-Gaussian, or log-concave distribution, we improve the guarantees for ProbitDA and LogitDA to $\tilde{\mathcal{O}}(n+d)$ with high probability, and compare it with the best known guarantees of Langevin Monte Carlo and Metropolis Adjusted Langevin Algorithm. We also discuss the mixing times of the three algorithms under feasible initialization.
- Published
- 2024
32. Exploring Complex Mental Health Symptoms via Classifying Social Media Data with Explainable LLMs
- Author
-
Chen, Kexin, Lim, Noelle, Lee, Claire, and Guerzhoy, Michael
- Subjects
Computer Science - Computation and Language - Abstract
We propose a pipeline for gaining insights into complex diseases by training LLMs on challenging social media text data classification tasks, obtaining explanations for the classification outputs, and performing qualitative and quantitative analysis on the explanations. We report initial results on predicting, explaining, and systematizing the explanations of predicted reports on mental health concerns in people reporting Lyme disease concerns. We report initial results on predicting future ADHD concerns for people reporting anxiety disorder concerns, and demonstrate preliminary results on visualizing the explanations for predicting that a person with anxiety concerns will in the future have ADHD concerns., Comment: Accepted to Machine Learning for Health (ML4H) Findings 2024 (co-located with NeurIPS 2024)
- Published
- 2024
33. Multimodal Remote Sensing Scene Classification Using VLMs and Dual-Cross Attention Networks
- Author
-
Cai, Jinjin, Meng, Kexin, Yang, Baijian, and Shao, Gang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Remote sensing scene classification (RSSC) is a critical task with diverse applications in land use and resource management. While unimodal image-based approaches show promise, they often struggle with limitations such as high intra-class variance and inter-class similarity. Incorporating textual information can enhance classification by providing additional context and semantic understanding, but manual text annotation is labor-intensive and costly. In this work, we propose a novel RSSC framework that integrates text descriptions generated by large vision-language models (VLMs) as an auxiliary modality without incurring expensive manual annotation costs. To fully leverage the latent complementarities between visual and textual data, we propose a dual cross-attention-based network to fuse these modalities into a unified representation. Extensive experiments with both quantitative and qualitative evaluation across five RSSC datasets demonstrate that our framework consistently outperforms baseline models. We also verify the effectiveness of VLM-generated text descriptions compared to human-annotated descriptions. Additionally, we design a zero-shot classification scenario to show that the learned multimodal representation can be effectively utilized for unseen class classification. This research opens new opportunities for leveraging textual information in RSSC tasks and provides a promising multimodal fusion structure, offering insights and inspiration for future studies. Code is available at: https://github.com/CJR7/MultiAtt-RSSC
- Published
- 2024
34. Genome-Wide Identification and Expression Analysis of 1-Aminocyclopropane-1-Carboxylate Synthase (ACS) Gene Family in Chenopodium quinoa
- Author
-
Lu Yin, Xia Zhang, Aihong Gao, Meng Cao, Dongdong Yang, Kexin An, Shanli Guo, and Haibo Yin
- Subjects
C. quinoa ,ethylene ,ACS genes ,expression patterns ,abiotic stress ,Botany ,QK1-989 - Abstract
Ethylene plays an important role in plant development and stress resistance. The rate-limiting enzyme in ethylene biosynthesis is 1-aminocyclopropane-1-carboxylic acid synthase (ACS). C. quinoa (Chenopodium quinoa) is an important food crop known for its strong tolerance to abiotic stresses. However, knowledge regarding the ACS gene family in C. quinoa remains restricted. In this study, we successfully identified 12 ACS genes (CqACSs) from the C. quinoa genome. Through thorough analysis of their sequences and phylogenetic relationships, it was verified that 8 out of these 12 CqACS isozymes exhibited substantial resemblance to ACS isozymes possessing ACS activity. Furthermore, these eight isozymes could be categorized into three distinct groups. The four remaining CqACS genes grouped under category IV displayed notable similarities with AtACS10 and AtACS12, known as amido transferases lacking ACS activity. The CqACS proteins bore resemblance to the AtACS proteins and had the characteristic structural features typically observed in plant ACS enzymes. Twelve CqACS genes were distributed across 8 out of the 18 chromosomes of C. quinoa. The CqACS genes were expanded from segment duplication. Many cis-regulatory elements related with various abiotic stresses, phytohormones, and light were found. The expression patterns of ACS genes varied across different tissues of C. quinoa. Furthermore, the analysis of gene expression patterns under abiotic stress showed that CqACS genes can be responsive to various stresses, implying their potential functions in adapting to various abiotic stresses. The findings from this research serve as a foundation for delving deeper into the functional roles of CqACS genes.
- Published
- 2023
- Full Text
- View/download PDF
35. BAFPN: Bi directional alignment of features to improve localization accuracy
- Author
-
Jiakun, Li, Qingqing, Wang, Hongbin, Dong, and Kexin, Li
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Current state-of-the-art vision models often utilize feature pyramids to extract multi-scale information, with the Feature Pyramid Network (FPN) being one of the most widely used classic architectures. However, traditional FPNs and their variants (e.g., AUGFPN, PAFPN) fail to fully address spatial misalignment on a global scale, leading to suboptimal performance in high-precision localization of objects. In this paper, we propose a novel Bidirectional Alignment Feature Pyramid Network (BAFPN), which aligns misaligned features globally through a Spatial Feature Alignment Module (SPAM) during the bottom-up information propagation phase. Subsequently, it further mitigates aliasing effects caused by cross-scale feature fusion via a fine-grained Semantic Alignment Module (SEAM) in the top-down phase. On the DOTAv1.5 dataset, BAFPN improves the baseline model's AP75, AP50, and mAP by 1.68%, 1.45%, and 1.34%, respectively. Additionally, BAFPN demonstrates significant performance gains when applied to various other advanced detectors., Comment: 7 page
- Published
- 2024
36. GNN 101: Visual Learning of Graph Neural Networks in Your Web Browser
- Author
-
Lu, Yilin, Chen, Chongwei, Huang, Kexin, Zitnik, Marinka, and Wang, Qianwen
- Subjects
Computer Science - Human-Computer Interaction - Abstract
Graph Neural Networks (GNNs) have achieved significant success across various applications. However, their complex structures and inner workings can be challenging for non-AI experts to understand. To address this issue, we present \name, an educational visualization tool for interactive learning of GNNs. GNN 101 seamlessly integrates mathematical formulas with visualizations via multiple levels of abstraction, including a model overview, layer operations, and detailed animations for matrix calculations. Users can easily switch between two complementary views: a node-link view that offers an intuitive understanding of the graph data, and a matrix view that provides a space-efficient and comprehensive overview of all features and their transformations across layers. GNN 101 not only demystifies GNN computations in an engaging and intuitive way but also effectively illustrates what a GNN learns about graph nodes at each layer. To ensure broad educational access, GNN 101 is open-source and available directly in web browsers without requiring any installations.
- Published
- 2024
37. Fusion Matters: Learning Fusion in Deep Click-through Rate Prediction Models
- Author
-
Zhang, Kexin, Lyu, Fuyuan, Tang, Xing, Liu, Dugang, Ma, Chen, Ding, Kaize, He, Xiuqiang, and Liu, Xue
- Subjects
Computer Science - Information Retrieval ,Computer Science - Artificial Intelligence - Abstract
The evolution of previous Click-Through Rate (CTR) models has mainly been driven by proposing complex components, whether shallow or deep, that are adept at modeling feature interactions. However, there has been less focus on improving fusion design. Instead, two naive solutions, stacked and parallel fusion, are commonly used. Both solutions rely on pre-determined fusion connections and fixed fusion operations. It has been repetitively observed that changes in fusion design may result in different performances, highlighting the critical role that fusion plays in CTR models. While there have been attempts to refine these basic fusion strategies, these efforts have often been constrained to specific settings or dependent on specific components. Neural architecture search has also been introduced to partially deal with fusion design, but it comes with limitations. The complexity of the search space can lead to inefficient and ineffective results. To bridge this gap, we introduce OptFusion, a method that automates the learning of fusion, encompassing both the connection learning and the operation selection. We have proposed a one-shot learning algorithm tackling these tasks concurrently. Our experiments are conducted over three large-scale datasets. Extensive experiments prove both the effectiveness and efficiency of OptFusion in improving CTR model performance. Our code implementation is available here\url{https://github.com/kexin-kxzhang/OptFusion}., Comment: Accepted by WSDM 2025
- Published
- 2024
38. Estimating the tails of the spectrum of the Hessian of the log-likelihood for \textit{ab-initio} single-particle reconstruction in electron cryomicroscopy
- Author
-
Rangan, Aaditya V., Tang, Wai-Shing, Cossio, Pilar, Zhang, Kexin, and Grigorieff, Nikolaus
- Subjects
Quantitative Biology - Quantitative Methods ,92 ,G.1.6 ,J.2 - Abstract
Electron cryomicroscopy (cryo-EM) is a technique in structural biology used to reconstruct accurate volumetric maps of molecules. One step of the cryo-EM pipeline involves solving an inverse-problem. This inverse-problem, referred to as \textit{ab-initio} single-particle reconstruction, takes as input a collection of 2d-images -- each a projection of a molecule from an unknown viewing-angle -- and attempts to reconstruct the 3d-volume representing the underlying molecular density. Most methods for solving this inverse-problem search for a solution which optimizes a posterior likelihood of generating the observed image-data, given the reconstructed volume. Within this framework, it is natural to study the Hessian of the log-likelihood: the eigenvectors and eigenvalues of the Hessian determine how the likelihood changes with respect to perturbations in the solution, and can give insight into the sensitivity of the solution to aspects of the input. In this paper we describe a simple strategy for estimating the smallest eigenvalues and eigenvectors (i.e., the `softest modes') of the Hessian of the log-likelihood for the \textit{ab-initio} single-particle reconstruction problem. This strategy involves rewriting the log-likelihood as a 3d-integral. This interpretation holds in the low-noise limit, as well as in many practical scenarios which allow for noise-marginalization. Once we have estimated the softest modes, we can use them to perform many kinds of sensitivity analysis. For example, we can determine which parts of the reconstructed volume are trustworthy, and which are unreliable, and how this unreliability might depend on the data-set and the imaging parameters. We believe that this kind of analysis can be used alongside more traditional strategies for sensitivity analysis, as well as in other applications, such as free-energy estimation.
- Published
- 2024
39. Exploring Device-Oriented Video Encryption for Hierarchical Privacy Protection in AR Content Sharing
- Author
-
Hu, Yongquan, Zheng, Dongsheng, Nie, Kexin, Zhang, Junyan, Hu, Wen, and Quigley, Aaron
- Subjects
Computer Science - Human-Computer Interaction - Abstract
Content sharing across multiple Augmented Reality (AR) displays is becoming commonplace, enhancing team communication and collaboration through devices like smartphones and AR glasses. However, this practice raises significant privacy concerns, especially concerning the physical environment visible in AR, which may include sensitive personal details like facial features and identifiable information. Our research focuses on protecting privacy within AR environments, particularly the physical backgrounds visible during content sharing across three common AR display methods: projection, smartphone, and AR glasses. We analyze the potential privacy risks associated with each method and employ a Region Of Interest (ROI) video encryption system to hierarchically encrypt the physical backdrop based on its safety rating. This study pioneers the integration of ROI video encryption at the bitstream level within AR contexts, providing a more efficient solution than traditional pixel-level encryption by enhancing encryption speed and reducing the required space. Our adaptive system dynamically adjusts the encryption intensity based on the AR display method, ensuring tailored privacy protection., Comment: IEEE ISMAR 2024 Poster
- Published
- 2024
- Full Text
- View/download PDF
40. MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
- Author
-
Chou, Yuhong, Yao, Man, Wang, Kexin, Pan, Yuqi, Zhu, Ruijie, Zhong, Yiran, Qiao, Yu, Wu, Jibin, Xu, Bo, and Li, Guoqi
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Various linear complexity models, such as Linear Transformer (LinFormer), State Space Model (SSM), and Linear RNN (LinRNN), have been proposed to replace the conventional softmax attention in Transformer structures. However, the optimal design of these linear models is still an open question. In this work, we attempt to answer this question by finding the best linear approximation to softmax attention from a theoretical perspective. We start by unifying existing linear complexity models as the linear attention form and then identify three conditions for the optimal linear attention design: 1) Dynamic memory ability; 2) Static approximation ability; 3) Least parameter approximation. We find that none of the current linear models meet all three conditions, resulting in suboptimal performance. Instead, we propose Meta Linear Attention (MetaLA) as a solution that satisfies these conditions. Our experiments on Multi-Query Associative Recall (MQAR) task, language modeling, image classification, and Long-Range Arena (LRA) benchmark demonstrate that MetaLA is more effective than the existing linear models.
- Published
- 2024
41. SmartInv: Multimodal Learning for Smart Contract Invariant Inference
- Author
-
Wang, Sally Junsong, Pei, Kexin, and Yang, Junfeng
- Subjects
Computer Science - Software Engineering ,Computer Science - Cryptography and Security ,Computer Science - Programming Languages - Abstract
Smart contracts are software programs that enable diverse business activities on the blockchain. Recent research has identified new classes of "machine un-auditable" bugs that arise from both transactional contexts and source code. Existing detection methods require human understanding of underlying transaction logic and manual reasoning across different sources of context (i.e. modalities), such as code, dynamic transaction executions, and natural language specifying the expected transaction behavior. To automate the detection of ``machine un-auditable'' bugs, we present SmartInv, an accurate and fast smart contract invariant inference framework. Our key insight is that the expected behavior of smart contracts, as specified by invariants, relies on understanding and reasoning across multimodal information, such as source code and natural language. We propose a new prompting strategy to foundation models, Tier of Thought (ToT), to reason across multiple modalities of smart contracts and ultimately to generate invariants. By checking the violation of these generated invariants, SmartInv can identify potential vulnerabilities. We evaluate SmartInv on real-world contracts and re-discover bugs that resulted in multi-million dollar losses over the past 2.5 years (from January 1, 2021 to May 31, 2023). Our extensive evaluation shows that SmartInv generates (3.5X) more bug-critical invariants and detects (4$\times$) more critical bugs compared to the state-of-the-art tools in significantly (150X) less time. \sys uncovers 119 zero-day vulnerabilities from the 89,621 real-world contracts. Among them, five are critical zero-day bugs confirmed by developers as ``high severity.''
- Published
- 2024
42. Relative Pose Estimation for Nonholonomic Robot Formation with UWB-IO Measurements
- Author
-
Ze, Kunrui, Wang, Wei, Yue, Shuoyu, Sun, Guibin, Liu, Kexin, and Lü, Jinhu
- Subjects
Computer Science - Robotics - Abstract
This article studies the problem of distributed formation control for multiple robots by using onboard ultra wide band (UWB) ranging and inertial odometer (IO) measurements. Although this problem has been widely studied, a fundamental limitation of most works is that they require each robot's pose and sensor measurements are expressed in a common reference frame. However, it is inapplicable for nonholonomic robot formations due to the practical difficulty of aligning IO measurements of individual robot in a common frame. To address this problem, firstly, a concurrent-learning based estimator is firstly proposed to achieve relative localization between neighboring robots in a local frame. Different from most relative localization methods in a global frame, both relative position and orientation in a local frame are estimated with only UWB ranging and IO measurements. Secondly, to deal with information loss caused by directed communication topology, a cooperative localization algorithm is introduced to estimate the relative pose to the leader robot. Thirdly, based on the theoretical results on relative pose estimation, a distributed formation tracking controller is proposed for nonholonomic robots. Both gazebo physical simulation and real-world experiments conducted on networked TurtleBot3 nonholonomic robots are provided to demonstrate the effectiveness of the proposed method., Comment: 11 pages, 12 figures
- Published
- 2024
43. Imaging heat transport in suspended diamond nanostructures with integrated spin defect thermometers
- Author
-
Goblot, Valentin, Wu, Kexin, Di Lucente, Enrico, Zhu, Yuchun, Losero, Elena, Jobert, Quentin, Concha, Claudio Jaramillo, Marzari, Nicola, Simoncelli, Michele, and Galland, Christophe
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
Among all materials, mono-crystalline diamond has one of the highest measured thermal conductivities, with values above 2000 W/m/K at room temperature. This stems from momentum-conserving `normal' phonon-phonon scattering processes dominating over momentum-dissipating `Umklapp' processes, a feature that also suggests diamond as an ideal platform to experimentally investigate phonon heat transport phenomena that violate Fourier's law. Here, we introduce dilute nitrogen-vacancy color centers as in-situ, highly precise spin defect thermometers to image temperature inhomogeneities in single-crystal diamond microstructures heated from ambient conditions. We analyze cantilevers with cross-sections in the range from about 0.2 to 2.6 $\mathrm{\mu m}^2$, observing a relation between cross-section and heat flux that departs from Fourier's law predictions. We rationalize such behavior relying on first-principles simulations based on the linearized phonon Boltzmann transport equation, also discussing how fabrication-induced impurities influence conduction. Our temperature-imaging method can be applied to diamond devices of arbitrary geometry, paving the way for the exploration of unconventional, non-diffusive heat transport phenomena., Comment: 8 pages, 4 figures + supplementary materials (8 pages, 7 figures)
- Published
- 2024
44. Attribute-Based Encryption With Payable Outsourced Decryption Using Blockchain and Responsive Zero Knowledge Proof
- Author
-
Cai, Dongliang, Chen, Borui, Zhang, Liang, Li, Kexin, and Kan, Haibin
- Subjects
Computer Science - Cryptography and Security - Abstract
Attribute-Based Encryption (ABE) is a promising solution for access control in cloud services. However, the heavy decryption overhead hinders its widespread adoption. A general approach to address this issue is to outsource decryption to decryption cloud service(DCS). Existing schemes have utilized various methods to enable users to verify outsourced results; however, they lack an effective mechanism to achieve exemptibility which enables the honest DCS to escape from wrong claims. And it is impractical to assume that the DCS will provide free services. In this paper, we propose a blockchain-based payable outsourced decryption ABE scheme that achieves both verifiability and exemptibility without adding redundant information to ABE ciphertext. We use zero-knowledge proof to verify outsourced results on blockchain and introduce an optional single-round challenge game under optimistic assumption to address the high cost of proof generation. Moreover, our system achieves fairness and decentralized outsourcing to protect the interests of all parties. Finally, we implement and evaluate our scheme on Ethereum to demonstrate its feasibility and efficiency, the gas usage in attribute numbers from 5 to 60 is 11$\times$ to 140$\times$ in the happy case and 4$\times$ to 55$\times$ in the challenge case lower than the scheme of Ge et al. (TDSC'23)., Comment: 12 pages, 5 figures
- Published
- 2024
45. Distributed Formation Shape Control of Identity-less Robot Swarms
- Author
-
Sun, Guibin, Xu, Yang, Liu, Kexin, and Lü, Jinhu
- Subjects
Computer Science - Robotics - Abstract
Different from most of the formation strategies where robots require unique labels to identify topological neighbors to satisfy the predefined shape constraints, we here study the problem of identity-less distributed shape formation in homogeneous swarms, which is rarely studied in the literature. The absence of identities creates a unique challenge: how to design appropriate target formations and local behaviors that are suitable for identity-less formation shape control. To address this challenge, we propose the following novel results. First, to avoid using unique identities, we propose a dynamic formation description method and solve the formation consensus of robots in a locally distributed manner. Second, to handle identity-less distributed formations, we propose a fully distributed control law for homogeneous swarms based on locally sensed information. While the existing methods are applicable to simple cases where the target formation is stationary, ours can tackle more general maneuvering formations such as translation, rotation, or even shape deformation. Both numerical simulation and flight experiment are presented to verify the effectiveness and robustness of our proposed formation strategy.
- Published
- 2024
46. Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model
- Author
-
Zhang, Jing, Fang, Linjiajie, Shi, Kexin, Wang, Wenjia, and Jing, Bing-Yi
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
``Distribution shift'' is the main obstacle to the success of offline reinforcement learning. A learning policy may take actions beyond the behavior policy's knowledge, referred to as Out-of-Distribution (OOD) actions. The Q-values for these OOD actions can be easily overestimated. As a result, the learning policy is biased by using incorrect Q-value estimates. One common approach to avoid Q-value overestimation is to make a pessimistic adjustment. Our key idea is to penalize the Q-values of OOD actions associated with high uncertainty. In this work, we propose Q-Distribution Guided Q-Learning (QDQ), which applies a pessimistic adjustment to Q-values in OOD regions based on uncertainty estimation. This uncertainty measure relies on the conditional Q-value distribution, learned through a high-fidelity and efficient consistency model. Additionally, to prevent overly conservative estimates, we introduce an uncertainty-aware optimization objective for updating the Q-value function. The proposed QDQ demonstrates solid theoretical guarantees for the accuracy of Q-value distribution learning and uncertainty measurement, as well as the performance of the learning policy. QDQ consistently shows strong performance on the D4RL benchmark and achieves significant improvements across many tasks., Comment: Neurips 2024
- Published
- 2024
47. A Survey of Deep Graph Learning under Distribution Shifts: from Graph Out-of-Distribution Generalization to Adaptation
- Author
-
Zhang, Kexin, Liu, Shuhan, Wang, Song, Shi, Weili, Chen, Chen, Li, Pan, Li, Sheng, Li, Jundong, and Ding, Kaize
- Subjects
Computer Science - Machine Learning - Abstract
Distribution shifts on graphs -- the discrepancies in data distribution between training and employing a graph machine learning model -- are ubiquitous and often unavoidable in real-world scenarios. These shifts may severely deteriorate model performance, posing significant challenges for reliable graph machine learning. Consequently, there has been a surge in research on graph machine learning under distribution shifts, aiming to train models to achieve satisfactory performance on out-of-distribution (OOD) test data. In our survey, we provide an up-to-date and forward-looking review of deep graph learning under distribution shifts. Specifically, we cover three primary scenarios: graph OOD generalization, training-time graph OOD adaptation, and test-time graph OOD adaptation. We begin by formally formulating the problems and discussing various types of distribution shifts that can affect graph learning, such as covariate shifts and concept shifts. To provide a better understanding of the literature, we systematically categorize the existing models based on our proposed taxonomy and investigate the adopted techniques behind. We also summarize commonly used datasets in this research area to facilitate further investigation. Finally, we point out promising research directions and the corresponding challenges to encourage further study in this vital domain. Additionally, we provide a continuously updated reading list at https://github.com/kaize0409/Awesome-Graph-OOD., Comment: 18 pages, 2 figures. arXiv admin note: text overlap with arXiv:2402.11153
- Published
- 2024
48. A Pilot Study on Clinician-AI Collaboration in Diagnosing Depression from Speech
- Author
-
Feng, Kexin and Chaspari, Theodora
- Subjects
Computer Science - Human-Computer Interaction - Abstract
This study investigates clinicians' perceptions and attitudes toward an assistive artificial intelligence (AI) system that employs a speech-based explainable ML algorithm for detecting depression. The AI system detects depression from vowel-based spectrotemporal variations of speech and generates explanations through explainable AI (XAI) methods. It further provides decisions and explanations at various temporal granularities, including utterance groups, individual utterances, and within each utterance. A small-scale user study was conducted to evaluate users' perceived usability of the system, trust in the system, and perceptions of design factors associated with several elements of the system. Quantitative and qualitative analysis of the collected data indicates both positive and negative aspects that influence clinicians' perception toward the AI. Results from quantitative analysis indicate that providing more AI explanations enhances user trust but also increases system complexity. Qualitative analysis indicates the potential of integrating such systems into the current diagnostic and screening workflow, but also highlights existing limitations including clinicians' reduced familiarity with AI/ML systems and the need for user-friendly and intuitive visualizations of speech information., Comment: accepted at the IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI 2024)
- Published
- 2024
49. Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches
- Author
-
Feng, Kexin and Chaspari, Theodora
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
This study investigates explainable machine learning algorithms for identifying depression from speech. Grounded in evidence from speech production that depression affects motor control and vowel generation, pre-trained vowel-based embeddings, that integrate semantically meaningful linguistic units, are used. Following that, an ensemble learning approach decomposes the problem into constituent parts characterized by specific depression symptoms and severity levels. Two methods are explored: a "bottom-up" approach with 8 models predicting individual Patient Health Questionnaire-8 (PHQ-8) item scores, and a "top-down" approach using a Mixture of Experts (MoE) with a router module for assessing depression severity. Both methods depict performance comparable to state-of-the-art baselines, demonstrating robustness and reduced susceptibility to dataset mean/median values. System explainability benefits are discussed highlighting their potential to assist clinicians in depression diagnosis and screening., Comment: accepted at the IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI 2024)
- Published
- 2024
50. Analyzing Nobel Prize Literature with Large Language Models
- Author
-
Yang, Zhenyuan, Liu, Zhengliang, Zhang, Jing, Lu, Cen, Tai, Jiaxin, Zhong, Tianyang, Li, Yiwei, Zhao, Siyan, Yao, Teng, Liu, Qing, Yang, Jinlin, Liu, Qixin, Li, Zhaowei, Wang, Kexin, Ma, Longjun, Zhu, Dajiang, Ren, Yudan, Ge, Bao, Zhang, Wei, Qiang, Ning, Zhang, Tuo, and Liu, Tianming
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
This study examines the capabilities of advanced Large Language Models (LLMs), particularly the o1 model, in the context of literary analysis. The outputs of these models are compared directly to those produced by graduate-level human participants. By focusing on two Nobel Prize-winning short stories, 'Nine Chapters' by Han Kang, the 2024 laureate, and 'Friendship' by Jon Fosse, the 2023 laureate, the research explores the extent to which AI can engage with complex literary elements such as thematic analysis, intertextuality, cultural and historical contexts, linguistic and structural innovations, and character development. Given the Nobel Prize's prestige and its emphasis on cultural, historical, and linguistic richness, applying LLMs to these works provides a deeper understanding of both human and AI approaches to interpretation. The study uses qualitative and quantitative evaluations of coherence, creativity, and fidelity to the text, revealing the strengths and limitations of AI in tasks typically reserved for human expertise. While LLMs demonstrate strong analytical capabilities, particularly in structured tasks, they often fall short in emotional nuance and coherence, areas where human interpretation excels. This research underscores the potential for human-AI collaboration in the humanities, opening new opportunities in literary studies and beyond.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.