2,025,523 results on '"WANG, P."'
Search Results
2. Universal Prekindergarten Expansion in California: Progress and Opportunities
- Author
-
Learning Policy Institute, Victoria Wang, Melanie Leung-Gagné, Hanna Melnick, and Marjorie E. Wechsler
- Abstract
In 2021, California committed to providing universal prekindergarten (UPK) for all 4-year-olds and expanding access for income-eligible 3-year-olds by 2025-2026. California UPK includes several early learning programs, including transitional kindergarten (TK), the California State Preschool Program (CSPP), Head Start, and locally funded early learning programs. To support UPK expansion, California's legislature and administration established the Universal Prekindergarten Planning and Implementation Grant in 2021, which allocated $200 million to all local education agencies (LEAs) serving kindergarteners, which include school districts, charter schools, and county offices of education. The California Department of Education surveyed all grant recipients in August 2023 about their UPK programs. This report provides an update on UPK implementation across the state through an analysis of survey responses from 1,384 LEAs, which represent almost all (95%) public school districts and two thirds (65%) of charter schools that serve elementary grades. Findings provide insights into LEAs' progress in UPK implementation related to service delivery models, facilities and transportation, instruction and assessment, strategies to support student needs, workforce development, implementation challenges, and technical assistance needs. In addition to statewide insights, the survey revealed promising practices and wide access with UPK expansion in California's four largest districts during their first year of implementation. The findings in this report may help policymakers and practitioners identify areas for additional investments and supports.
- Published
- 2024
3. Report on Indicators of School Crime and Safety: 2023. NCES 2024-145/NCJ 309126
- Author
-
National Center for Education Statistics (NCES) (ED/IES), US Department of Justice, Bureau of Justice Statistics, American Institutes for Research (AIR), Véronique Irwin, Ke Wang, Jiashan Cui, and Alexandra Thompson
- Abstract
This report provides the most recent national indicators on school crime and safety. The information presented in this report serves as a reference for policymakers and practitioners so that they can develop effective programs and policies aimed at violence and school crime prevention. Accurate information about the nature, extent, and scope of the problem being addressed is essential for developing effective programs and policies. The report is organized into five sections: elementary and secondary student and teacher victimization; school environment; fights and weapons; safety, security, and mental health practices; and postsecondary campus safety and security. Each section begins with a set of key findings. In this report, where available, data on victimization that occurred away from school are offered as a point of comparison for data on victimization that occurred at school. Indicators of crime and safety are compared across different population subgroups and over time. All data reflect the most current data available at the time the report was produced. Data throughout this report represent the 50 states and the District of Columbia. Findings described with comparative language (e.g., higher, lower, increase, and decrease) are statistically significant at the 0.05 level.
- Published
- 2024
4. Evaluating the Impact of Cloud e-Learning in Higher Education: An Empirical Investigation
- Author
-
Lillian-Yee-Kiaw Wang
- Abstract
The motivation for conducting this study is to investigate the potential of Cloud e-learning to address the high-cost and high-complexity challenges of conventional learning methods for the upgraded learning processes in higher education. The overall direction of this research is driven towards how the actual usage of Cloud e-learning module affects students' perceptions and academic performance. A Cloud e-learning module is designed and developed to promote optimised resource utilisation in the e-learning processes in higher education. A pretest-posttest method was adopted to study the impact of Cloud e-learning usage among students and whether the diffusion of Cloud e-learning has caused a change in students' perceptions. The pretest-posttest results and students' academic performance were then analysed to examine the impact from the actual usage of Cloud e-learning module. The findings reveal that the change of students' perceptions is time variant, indicating students' mixed perceptions on the usage of Cloud e-learning module. Analysis evidently reveals that the use of Cloud e-learning improved students' learning performance in theoretical subjects. This research is useful to educators and ICT practitioners in making informed decisions in adopting the right ICT infrastructures to support e-learning in higher education.
- Published
- 2024
5. Rethinking How People Learn: A Holistic Framework for Effective Learning Design
- Author
-
Minhong Wang
- Abstract
Learning is an integral part of being human. How people learn has long been discussed, revealed in many learning theories, investigated in numerous studies, and demonstrated in extensive practices. The goal of this article is to rethink how people learn from four fundamental perspectives, that is, learning by interaction with content (C), learning by interaction with other people (O), learning by interaction with self (S), and learning by interaction with tasks or practices (T), so-called COST model. This framework offers a high-level view of human learning and the role of technology in human learning. Moreover, it serves as a guide for effective design of learning experiences, learning environments, and learning approaches, where technology has become a crucial component.
- Published
- 2024
6. Quest for Equitable Education in Phases: Insights from an NGO in China
- Author
-
Shirley Pan and Bo Wang
- Abstract
Among the East Asian nations, a recurring predicament faced by educational institutions is that of providing inclusive but high-quality education. Active involvement of non-governmental organizations (NGOs) in education is valuable in China. Adream was such an NGO on education in China, established in 2008 with a singular and noble objective: promotion of equitable access to quality education within the disadvantaged regions of China. The trajectory of Adream's endeavor to secure equitable access to quality education in rural China stands as a compelling exemplar of the transformative potential that NGOs wield within the realm of education.
- Published
- 2024
7. Loss of Schooling from Tropical Cyclones: Evidence from 13 Low- and Middle-Income Countries. EdWorkingPaper No. 24-980
- Author
-
Annenberg Institute for School Reform at Brown University, Renzhi Jing, Sam Heft-Neal, Zetianyu Wang, Jie Chen, Minghao Qiu, Isaac M. Opper, Zachary Wagner, and Eran Bendavid
- Abstract
Increasing educational attainment is one of the most important and effective tools for health and economic improvements. The extent to which extreme climate events disrupt education, resulting in fewer years of schooling and reduced educational attainment, remains under-studied. Children in low- and middle-income countries may be uniquely vulnerable to loss of schooling after such disasters due to the poor physical condition of schools and the lack of resources to rebuild and mitigate unexpected household shocks. Our analysis assesses this overlooked social cost of tropical cyclones on schooling attainment. We study the education records of nearly 5.1 million people living in 13 low- and middle-income countries that were exposed to tropical cyclones between 1954-2010. We find that exposure to tropical cyclones during preschool age is associated with a 2.7 percentage point decrease in primary school enrollment on average (14.2% decrease), with larger effects from more intense storms (up to 28% decrease for the most intense storms). These effects are more pronounced among school-age girls compared to boys and are greater in areas less accustomed to experiencing tropical cyclones. We estimate that, across all LMICs, tropical cyclone exposure has resulted in more than 410,000 children not attending primary school in the last 20 years, leading to a reduction of more than 4.1 million total years of schooling. These impacts, identified among some of the world's poorest populations, may grow in importance as exposure to severe tropical cyclones is projected to increase with climate change.
- Published
- 2024
8. LogiCity: Advancing Neuro-Symbolic AI with Abstract Urban Simulation
- Author
-
Li, Bowen, Li, Zhaoyu, Du, Qiwei, Luo, Jinqi, Wang, Wenshan, Xie, Yaqi, Stepputtis, Simon, Wang, Chen, Sycara, Katia P., Ravikumar, Pradeep Kumar, Gray, Alexander G., Si, Xujie, and Scherer, Sebastian
- Subjects
Computer Science - Artificial Intelligence - Abstract
Recent years have witnessed the rapid development of Neuro-Symbolic (NeSy) AI systems, which integrate symbolic reasoning into deep neural networks. However, most of the existing benchmarks for NeSy AI fail to provide long-horizon reasoning tasks with complex multi-agent interactions. Furthermore, they are usually constrained by fixed and simplistic logical rules over limited entities, making them far from real-world complexities. To address these crucial gaps, we introduce LogiCity, the first simulator based on customizable first-order logic (FOL) for an urban-like environment with multiple dynamic agents. LogiCity models diverse urban elements using semantic and spatial concepts, such as IsAmbulance(X) and IsClose(X, Y). These concepts are used to define FOL rules that govern the behavior of various agents. Since the concepts and rules are abstractions, they can be universally applied to cities with any agent compositions, facilitating the instantiation of diverse scenarios. Besides, a key feature of LogiCity is its support for user-configurable abstractions, enabling customizable simulation complexities for logical reasoning. To explore various aspects of NeSy AI, LogiCity introduces two tasks, one features long-horizon sequential decision-making, and the other focuses on one-step visual reasoning, varying in difficulty and agent behaviors. Our extensive evaluation reveals the advantage of NeSy frameworks in abstract reasoning. Moreover, we highlight the significant challenges of handling more complex abstractions in long-horizon multi-agent scenarios or under high-dimensional, imbalanced data. With its flexible design, various features, and newly raised challenges, we believe LogiCity represents a pivotal step forward in advancing the next generation of NeSy AI. All the code and data are open-sourced at our website., Comment: 25 pages, 8 figures
- Published
- 2024
9. Towards Multi-Source Retrieval-Augmented Generation via Synergizing Reasoning and Preference-Driven Retrieval
- Author
-
Zhao, Qingfei, Wang, Ruobing, Wang, Xin, Zha, Daren, and Mu, Nan
- Subjects
Computer Science - Computation and Language - Abstract
Retrieval-Augmented Generation (RAG) has emerged as a reliable external knowledge augmentation technique to mitigate hallucination issues and parameterized knowledge limitations in Large Language Models (LLMs). Existing Adaptive RAG (ARAG) systems struggle to effectively explore multiple retrieval sources due to their inability to select the right source at the right time. To address this, we propose a multi-source ARAG framework, termed MSPR, which synergizes reasoning and preference-driven retrieval to adaptive decide "when and what to retrieve" and "which retrieval source to use". To better adapt to retrieval sources of differing characteristics, we also employ retrieval action adjustment and answer feedback strategy. They enable our framework to fully explore the high-quality primary source while supplementing it with secondary sources at the right time. Extensive and multi-dimensional experiments conducted on three datasets demonstrate the superiority and effectiveness of MSPR., Comment: 5 pages, 1 figure
- Published
- 2024
10. Beyond Utility: Evaluating LLM as Recommender
- Author
-
Jiang, Chumeng, Wang, Jiayin, Ma, Weizhi, Clarke, Charles L. A., Wang, Shuai, Wu, Chuhan, and Zhang, Min
- Subjects
Computer Science - Information Retrieval - Abstract
With the rapid development of Large Language Models (LLMs), recent studies employed LLMs as recommenders to provide personalized information services for distinct users. Despite efforts to improve the accuracy of LLM-based recommendation models, relatively little attention is paid to beyond-utility dimensions. Moreover, there are unique evaluation aspects of LLM-based recommendation models, which have been largely ignored. To bridge this gap, we explore four new evaluation dimensions and propose a multidimensional evaluation framework. The new evaluation dimensions include: 1) history length sensitivity, 2) candidate position bias, 3) generation-involved performance, and 4) hallucinations. All four dimensions have the potential to impact performance, but are largely unnecessary for consideration in traditional systems. Using this multidimensional evaluation framework, along with traditional aspects, we evaluate the performance of seven LLM-based recommenders, with three prompting strategies, comparing them with six traditional models on both ranking and re-ranking tasks on four datasets. We find that LLMs excel at handling tasks with prior knowledge and shorter input histories in the ranking setting, and perform better in the re-ranking setting, beating traditional models across multiple dimensions. However, LLMs exhibit substantial candidate position bias issues, and some models hallucinate non-existent items much more often than others. We intend our evaluation framework and observations to benefit future research on the use of LLMs as recommenders. The code and data are available at https://github.com/JiangDeccc/EvaLLMasRecommender.
- Published
- 2024
11. Efficient optimization of plasma surface high harmonic generation by an improved Bayesian strategy
- Author
-
Fan, Lili, Wang, Ziwei, Liao, Chenfei, and Wang, Jingwei
- Subjects
Physics - Plasma Physics ,Mathematical Physics - Abstract
Plasma surface high-order harmonics generation (SHHG) driven by intense laser pulses on plasma targets enables a high-quality extreme ultraviolet source with high pulse energy and outstanding spatiotemporal coherence. Optimizing the performance of SHHG is important for its applications in single-shot imaging and absorption spectroscopy. In this work, we demonstrate the optimization of laser-driven SHHG by an improved Bayesian strategy in conjunction with particle-in-cell simulations. A traditional Bayesian algorithm is first employed to optimize the SHHG intensity in a two-dimensional space of parameter. Then an improved Bayesian strategy, using the Latin hypercube sampling technique and a dynamic acquisition strategy, is developed to overcome the curse of dimensionality and the risk of local optima in a high-dimensional space optimization. The improved Bayesian optimization approach is efficient and robust in three-dimensionally optimizing the harmonic ellipticity, paving the way for the upcoming SHHG experiments with a considerable repetition rate., Comment: 7 pages, 6 figures
- Published
- 2024
12. BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments
- Author
-
Wang, Xinghao, Wang, Pengyu, Wang, Bo, Zhang, Dong, Zhou, Yunhua, and Qiu, Xipeng
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Large language models (LLMs) have revolutionized numerous applications, yet their deployment remains challenged by memory constraints on local devices. While scaling laws have enhanced LLM capabilities, the primary bottleneck has shifted from \textit{capability} to \textit{availability}, emphasizing the need for efficient memory management. Traditional compression methods, such as quantization, often require predefined compression ratios and separate compression processes for each setting, complicating deployment in variable memory environments. In this paper, we introduce \textbf{BitStack}, a novel, training-free weight compression approach that enables megabyte-level trade-offs between memory usage and model performance. By leveraging weight decomposition, BitStack can dynamically adjust the model size with minimal transmission between running memory and storage devices. Our approach iteratively decomposes weight matrices while considering the significance of each parameter, resulting in an approximately 1-bit per parameter residual block in each decomposition iteration. These blocks are sorted and stacked in storage as basic transmission units, with different quantities loaded based on current memory availability. Extensive experiments across a wide range of tasks demonstrate that, despite offering fine-grained size control, BitStack consistently matches or surpasses strong quantization baselines, particularly at extreme compression ratios. To the best of our knowledge, this is the first decomposition-based method that effectively bridges the gap to practical compression techniques like quantization. Code is available at https://github.com/xinghaow99/BitStack.
- Published
- 2024
13. First Proof of Principle Experiment for Muon Production with Ultrashort High Intensity Laser
- Author
-
Zhang, Feng, Deng, Li, Ge, Yanjie, Wen, Jiaxing, Cui, Bo, Feng, Ke, Wang, Hao, Wu, Chen, Pan, Ziwen, Liu, Hongjie, Deng, Zhigang, Zhang, Zongxin, Chen, Liangwen, Yan, Duo, Shan, Lianqiang, Yuan, Zongqiang, Tian, Chao, Qian, Jiayi, Zhu, Jiacheng, Xu, Yi, Yu, Yuhong, Zhang, Xueheng, Yang, Lei, Zhou, Weimin, Gu, Yuqiu, Wang, Wentao, Leng, Yuxin, Sun, Zhiyu, and Li, Ruxin
- Subjects
Physics - Accelerator Physics ,High Energy Physics - Experiment - Abstract
Muons, which play a crucial role in both fundamental and applied physics, have traditionally been generated through proton accelerators or from cosmic rays. With the advent of ultra-short high-intensity lasers capable of accelerating electrons to GeV levels, it has become possible to generate muons in laser laboratories. In this work, we show the first proof of principle experiment for novel muon production with an ultra-short, high-intensity laser device through GeV electron beam bombardment on a lead converter target. The muon physical signal is confirmed by measuring its lifetime which is the first clear demonstration of laser-produced muons. Geant4 simulations were employed to investigate the photo-production, electro-production, and Bethe-Heitler processes response for muon generation and their subsequent detection. The results show that the dominant contributions of muons are attributed to the photo-production/electro-production and a significant yield of muons up to 0.01 $\mu$/$e^-$ out of the converter target could be achieved. This laser muon source features compact, ultra-short pulse and high flux. Moreover, its implementation in a small laser laboratory is relatively straightforward, significantly reducing the barriers to entry for research in areas such as muonic X-ray elemental analysis, muon spin spectroscopy and so on.
- Published
- 2024
14. Show Me What and Where has Changed? Question Answering and Grounding for Remote Sensing Change Detection
- Author
-
Li, Ke, Dong, Fuyu, Wang, Di, Li, Shaofeng, Wang, Quan, Gao, Xinbo, and Chua, Tat-Seng
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Remote sensing change detection aims to perceive changes occurring on the Earth's surface from remote sensing data in different periods, and feed these changes back to humans. However, most existing methods only focus on detecting change regions, lacking the ability to interact with users to identify changes that the users expect. In this paper, we introduce a new task named Change Detection Question Answering and Grounding (CDQAG), which extends the traditional change detection task by providing interpretable textual answers and intuitive visual evidence. To this end, we construct the first CDQAG benchmark dataset, termed QAG-360K, comprising over 360K triplets of questions, textual answers, and corresponding high-quality visual masks. It encompasses 10 essential land-cover categories and 8 comprehensive question types, which provides a large-scale and diverse dataset for remote sensing applications. Based on this, we present VisTA, a simple yet effective baseline method that unifies the tasks of question answering and grounding by delivering both visual and textual answers. Our method achieves state-of-the-art results on both the classic CDVQA and the proposed CDQAG datasets. Extensive qualitative and quantitative experimental results provide useful insights for the development of better CDQAG models, and we hope that our work can inspire further research in this important yet underexplored direction. The proposed benchmark dataset and method are available at https://github.com/like413/VisTA.
- Published
- 2024
15. What is Wrong with Perplexity for Long-context Language Modeling?
- Author
-
Fang, Lizhe, Wang, Yifei, Liu, Zhaoyang, Zhang, Chenheng, Jegelka, Stefanie, Gao, Jinyang, Ding, Bolin, and Wang, Yisen
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Handling long-context inputs is crucial for large language models (LLMs) in tasks such as extended conversations, document summarization, and many-shot in-context learning. While recent approaches have extended the context windows of LLMs and employed perplexity (PPL) as a standard evaluation metric, PPL has proven unreliable for assessing long-context capabilities. The underlying cause of this limitation has remained unclear. In this work, we provide a comprehensive explanation for this issue. We find that PPL overlooks key tokens, which are essential for long-context understanding, by averaging across all tokens and thereby obscuring the true performance of models in long-context scenarios. To address this, we propose \textbf{LongPPL}, a novel metric that focuses on key tokens by employing a long-short context contrastive method to identify them. Our experiments demonstrate that LongPPL strongly correlates with performance on various long-context benchmarks (e.g., Pearson correlation of -0.96), significantly outperforming traditional PPL in predictive accuracy. Additionally, we introduce \textbf{LongCE} (Long-context Cross-Entropy) loss, a re-weighting strategy for fine-tuning that prioritizes key tokens, leading to consistent improvements across diverse benchmarks. In summary, these contributions offer deeper insights into the limitations of PPL and present effective solutions for accurately evaluating and enhancing the long-context capabilities of LLMs. Code is available at https://github.com/PKU-ML/LongPPL.
- Published
- 2024
16. Reverse Attitude Statistics Based Star Map Identification Method
- Author
-
Dong, Shunmei, Wang, Qinglong, Wang, Haiqing, and Wang, Qianqian
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The star tracker is generally affected by the atmospheric background light and the aerodynamic environment when working in near space, which results in missing stars or false stars. Moreover, high-speed maneuvering may cause star trailing, which reduces the accuracy of the star position. To address the challenges for starmap identification, a reverse attitude statistics based method is proposed to handle position noise, false stars, and missing stars. Conversely to existing methods which match before solving for attitude, this method introduces attitude solving into the matching process, and obtains the final match and the correct attitude simultaneously by frequency statistics. Firstly, based on stable angular distance features, the initial matching is obtained by utilizing spatial hash indexing. Then, the dual-vector attitude determination is introduced to calculate potential attitude. Finally, the star pairs are accurately matched by applying a frequency statistics filtering method. In addition, Bayesian optimization is employed to find optimal parameters under the impact of noises, which is able to enhance the algorithm performance further. In this work, the proposed method is validated in simulation, field test and on-orbit experiment. Compared with the state-of-the-art, the identification rate is improved by more than 14.3%, and the solving time is reduced by over 28.5%., Comment: 10 pages, 17figures, 4 tables, 4663 words, submitted to IEEE Sensors Journal
- Published
- 2024
17. OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models
- Author
-
Wu, Junda, Li, Xintong, Wang, Ruoyu, Xia, Yu, Xiong, Yuxin, Wang, Jianing, Yu, Tong, Chen, Xiang, Kveton, Branislav, Yao, Lina, Shang, Jingbo, and McAuley, Julian
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language - Abstract
Offline evaluation of LLMs is crucial in understanding their capacities, though current methods remain underexplored in existing research. In this work, we focus on the offline evaluation of the chain-of-thought capabilities and show how to optimize LLMs based on the proposed evaluation method. To enable offline feedback with rich knowledge and reasoning paths, we use knowledge graphs (e.g., Wikidata5m) to provide feedback on the generated chain of thoughts. Due to the heterogeneity between LLM reasoning and KG structures, direct interaction and feedback from KGs on LLM behavior are challenging, as they require accurate entity linking and grounding of LLM-generated chains of thought in the KG. To address the above challenge, we propose an offline chain-of-thought evaluation framework, OCEAN, which models chain-of-thought reasoning in LLMs as an MDP and evaluate the policy's alignment with KG preference modeling. To overcome the reasoning heterogeneity and grounding problems, we leverage on-policy KG exploration and RL to model a KG policy that generates token-level likelihood distributions for LLM-generated chain-of-thought reasoning paths, simulating KG reasoning preference. Then we incorporate the knowledge-graph feedback on the validity and alignment of the generated reasoning paths into inverse propensity scores and propose KG-IPS estimator. Theoretically, we prove the unbiasedness of the proposed KG-IPS estimator and provide a lower bound on its variance. With the off-policy evaluated value function, we can directly enable off-policy optimization to further enhance chain-of-thought alignment. Our empirical study shows that OCEAN can be efficiently optimized for generating chain-of-thought reasoning paths with higher estimated values without affecting LLMs' general abilities in downstream tasks or their internal knowledge., Comment: 10 pages
- Published
- 2024
18. XRDSLAM: A Flexible and Modular Framework for Deep Learning based SLAM
- Author
-
Wang, Xiaomeng, Wang, Nan, and Zhang, Guofeng
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Robotics - Abstract
In this paper, we propose a flexible SLAM framework, XRDSLAM. It adopts a modular code design and a multi-process running mechanism, providing highly reusable foundational modules such as unified dataset management, 3d visualization, algorithm configuration, and metrics evaluation. It can help developers quickly build a complete SLAM system, flexibly combine different algorithm modules, and conduct standardized benchmarking for accuracy and efficiency comparison. Within this framework, we integrate several state-of-the-art SLAM algorithms with different types, including NeRF and 3DGS based SLAM, and even odometry or reconstruction algorithms, which demonstrates the flexibility and extensibility. We also conduct a comprehensive comparison and evaluation of these integrated algorithms, analyzing the characteristics of each. Finally, we contribute all the code, configuration and data to the open-source community, which aims to promote the widespread research and development of SLAM technology within the open-source ecosystem.
- Published
- 2024
19. Unveiling User Satisfaction and Creator Productivity Trade-Offs in Recommendation Platforms
- Author
-
Yao, Fan, Liao, Yiming, Liu, Jingzhou, Nie, Shaoliang, Wang, Qifan, Xu, Haifeng, and Wang, Hongning
- Subjects
Computer Science - Computer Science and Game Theory ,Computer Science - Information Retrieval - Abstract
On User-Generated Content (UGC) platforms, recommendation algorithms significantly impact creators' motivation to produce content as they compete for algorithmically allocated user traffic. This phenomenon subtly shapes the volume and diversity of the content pool, which is crucial for the platform's sustainability. In this work, we demonstrate, both theoretically and empirically, that a purely relevance-driven policy with low exploration strength boosts short-term user satisfaction but undermines the long-term richness of the content pool. In contrast, a more aggressive exploration policy may slightly compromise user satisfaction but promote higher content creation volume. Our findings reveal a fundamental trade-off between immediate user satisfaction and overall content production on UGC platforms. Building on this finding, we propose an efficient optimization method to identify the optimal exploration strength, balancing user and creator engagement. Our model can serve as a pre-deployment audit tool for recommendation algorithms on UGC platforms, helping to align their immediate objectives with sustainable, long-term goals.
- Published
- 2024
20. Kernel Looping: Eliminating Synchronization Boundaries for Peak Inference Performance
- Author
-
Koeplinger, David, Gandhi, Darshan, Nandkar, Pushkar, Sheeley, Nathan, Musaddiq, Matheen, Zhang, Leon, Goodbar, Reid, Shaffer, Matthew, Wang, Han, Wang, Angela, Wang, Mingran, and Prabhakar, Raghu
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Hardware Architecture ,D.3.4 ,C.1.3 - Abstract
Token generation speed is critical to power the next wave of AI inference applications. GPUs significantly underperform during token generation due to synchronization overheads at kernel boundaries, utilizing only 21% of their peak memory bandwidth. While recent dataflow architectures mitigate these overheads by enabling aggressive fusion of decoder layers into a single kernel, they too leave performance on the table due to synchronization penalties at layer boundaries. This paper presents kernel looping, a specialized global optimization technique which exploits an optimization opportunity brought by combining the unique layer-level fusion possible in modern dataflow architectures with the repeated layer structure found in language models. Kernel looping eliminates synchronization costs between consecutive calls to the same kernel by transforming these calls into a single call to a modified kernel containing a pipelined outer loop. We evaluate kernel looping on the SambaNova SN40L Reconfigurable Dataflow Unit (RDU), a commercial dataflow accelerator for AI. Experiments demonstrate that kernel looping speeds up the decode phase of a wide array of powerful open-source models by up to 2.2$\times$ on SN40L. Kernel looping allows scaling of decode performance over multiple SN40L sockets, achieving speedups of up to 2.5$\times$. Finally, kernel looping enables SN40L to achieve over 90% of peak performance on 8 and 16 sockets and achieve a speedup of up to 3.7$\times$ over DGX H100. Kernel looping, as well as the models evaluated in this paper, are deployed in production in a commercial AI inference cloud.
- Published
- 2024
21. Online Convex Optimization with Memory and Limited Predictions
- Author
-
Ye, Lintao, Wang, Zhengmiao, Liu, Zhi-Wei, Chi, Ming, Wang, Xiaoling, and Su, Housheng
- Subjects
Mathematics - Optimization and Control ,Computer Science - Machine Learning - Abstract
We study the problem of online convex optimization with memory and predictions over a horizon $T$. At each time step, a decision maker is given some limited predictions of the cost functions from a finite window of future time steps, i.e., values of the cost function at certain decision points in the future. The decision maker then chooses an action and incurs a cost given by a convex function that depends on the actions chosen in the past. We propose an algorithm to solve this problem and show that the dynamic regret of the algorithm decays exponentially with the prediction window length. Our algorithm contains two general subroutines that work for wider classes of problems. The first subroutine can solve general online convex optimization with memory and bandit feedback with $\sqrt{T}$-dynamic regret with respect to $T$. The second subroutine is a zeroth-order method that can be used to solve general convex optimization problems with a linear convergence rate that matches the best achievable rate of first-order methods for convex optimization. The key to our algorithm design and analysis is the use of truncated Gaussian smoothing when querying the decision points for obtaining the predictions. We complement our theoretical results using numerical experiments., Comment: 28 pages, 2 figures
- Published
- 2024
22. Social Science Meets LLMs: How Reliable Are Large Language Models in Social Simulations?
- Author
-
Huang, Yue, Yuan, Zhengqing, Zhou, Yujun, Guo, Kehan, Wang, Xiangqi, Zhuang, Haomin, Sun, Weixiang, Sun, Lichao, Wang, Jindong, Ye, Yanfang, and Zhang, Xiangliang
- Subjects
Computer Science - Computation and Language - Abstract
Large Language Models (LLMs) are increasingly employed for simulations, enabling applications in role-playing agents and Computational Social Science (CSS). However, the reliability of these simulations is under-explored, which raises concerns about the trustworthiness of LLMs in these applications. In this paper, we aim to answer ``How reliable is LLM-based simulation?'' To address this, we introduce TrustSim, an evaluation dataset covering 10 CSS-related topics, to systematically investigate the reliability of the LLM simulation. We conducted experiments on 14 LLMs and found that inconsistencies persist in the LLM-based simulated roles. In addition, the consistency level of LLMs does not strongly correlate with their general performance. To enhance the reliability of LLMs in simulation, we proposed Adaptive Learning Rate Based ORPO (AdaORPO), a reinforcement learning-based algorithm to improve the reliability in simulation across 7 LLMs. Our research provides a foundation for future studies to explore more robust and trustworthy LLM-based simulations.
- Published
- 2024
23. MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts
- Author
-
Zhu, Jie, Chen, Yixiong, Ding, Mingyu, Luo, Ping, Wang, Leye, and Wang, Jingdong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Text-to-image diffusion has attracted vast attention due to its impressive image-generation capabilities. However, when it comes to human-centric text-to-image generation, particularly in the context of faces and hands, the results often fall short of naturalness due to insufficient training priors. We alleviate the issue in this work from two perspectives. 1) From the data aspect, we carefully collect a human-centric dataset comprising over one million high-quality human-in-the-scene images and two specific sets of close-up images of faces and hands. These datasets collectively provide a rich prior knowledge base to enhance the human-centric image generation capabilities of the diffusion model. 2) On the methodological front, we propose a simple yet effective method called Mixture of Low-rank Experts (MoLE) by considering low-rank modules trained on close-up hand and face images respectively as experts. This concept draws inspiration from our observation of low-rank refinement, where a low-rank module trained by a customized close-up dataset has the potential to enhance the corresponding image part when applied at an appropriate scale. To validate the superiority of MoLE in the context of human-centric image generation compared to state-of-the-art, we construct two benchmarks and perform evaluations with diverse metrics and human studies. Datasets, model, and code are released at https://sites.google.com/view/mole4diffuser/., Comment: Published at NeurIPS 2024
- Published
- 2024
24. MassSpecGym: A benchmark for the discovery and identification of molecules
- Author
-
Bushuiev, Roman, Bushuiev, Anton, de Jonge, Niek F., Young, Adamo, Kretschmer, Fleming, Samusevich, Raman, Heirman, Janne, Wang, Fei, Zhang, Luke, Dührkop, Kai, Ludwig, Marcus, Haupt, Nils A., Kalia, Apurva, Brungs, Corinna, Schmid, Robin, Greiner, Russell, Wang, Bo, Wishart, David S., Liu, Li-Ping, Rousu, Juho, Bittremieux, Wout, Rost, Hannes, Mak, Tytus D., Hassoun, Soha, Huber, Florian, van der Hooft, Justin J. J., Stravs, Michael A., Böcker, Sebastian, Sivic, Josef, and Pluskal, Tomáš
- Subjects
Quantitative Biology - Quantitative Methods ,Computer Science - Machine Learning - Abstract
The discovery and identification of molecules in biological and environmental samples is crucial for advancing biomedical and chemical sciences. Tandem mass spectrometry (MS/MS) is the leading technique for high-throughput elucidation of molecular structures. However, decoding a molecular structure from its mass spectrum is exceptionally challenging, even when performed by human experts. As a result, the vast majority of acquired MS/MS spectra remain uninterpreted, thereby limiting our understanding of the underlying (bio)chemical processes. Despite decades of progress in machine learning applications for predicting molecular structures from MS/MS spectra, the development of new methods is severely hindered by the lack of standard datasets and evaluation protocols. To address this problem, we propose MassSpecGym -- the first comprehensive benchmark for the discovery and identification of molecules from MS/MS data. Our benchmark comprises the largest publicly available collection of high-quality labeled MS/MS spectra and defines three MS/MS annotation challenges: \textit{de novo} molecular structure generation, molecule retrieval, and spectrum simulation. It includes new evaluation metrics and a generalization-demanding data split, therefore standardizing the MS/MS annotation tasks and rendering the problem accessible to the broad machine learning community. MassSpecGym is publicly available at \url{https://github.com/pluskal-lab/MassSpecGym}.
- Published
- 2024
25. SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
- Author
-
Hong, Yining, Liu, Beide, Wu, Maxine, Zhai, Yuanhao, Chang, Kai-Wei, Li, Linjie, Lin, Kevin, Lin, Chung-Ching, Wang, Jianfeng, Yang, Zhengyuan, Wu, Yingnian, and Wang, Lijuan
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning ,Computer Science - Robotics - Abstract
Human beings are endowed with a complementary learning system, which bridges the slow learning of general world dynamics with fast storage of episodic memory from a new experience. Previous video generation models, however, primarily focus on slow learning by pre-training on vast amounts of data, overlooking the fast learning phase crucial for episodic memory storage. This oversight leads to inconsistencies across temporally distant frames when generating longer videos, as these frames fall beyond the model's context window. To this end, we introduce SlowFast-VGen, a novel dual-speed learning system for action-driven long video generation. Our approach incorporates a masked conditional video diffusion model for the slow learning of world dynamics, alongside an inference-time fast learning strategy based on a temporal LoRA module. Specifically, the fast learning process updates its temporal LoRA parameters based on local inputs and outputs, thereby efficiently storing episodic memory in its parameters. We further propose a slow-fast learning loop algorithm that seamlessly integrates the inner fast learning loop into the outer slow learning loop, enabling the recall of prior multi-episode experiences for context-aware skill learning. To facilitate the slow learning of an approximate world model, we collect a large-scale dataset of 200k videos with language action annotations, covering a wide range of scenarios. Extensive experiments show that SlowFast-VGen outperforms baselines across various metrics for action-driven video generation, achieving an FVD score of 514 compared to 782, and maintaining consistency in longer videos, with an average of 0.37 scene cuts versus 0.89. The slow-fast learning loop algorithm significantly enhances performances on long-horizon planning tasks as well. Project Website: https://slowfast-vgen.github.io
- Published
- 2024
26. Fragile non-Bloch spectrum and unconventional Green's function
- Author
-
Song, Fei, Wang, Hong-Yi, and Wang, Zhong
- Subjects
Quantum Physics ,Condensed Matter - Mesoscale and Nanoscale Physics ,Condensed Matter - Quantum Gases ,Physics - Optics - Abstract
In non-Hermitian systems, it is a counterintuitive feature of the non-Hermitian skin effect (NHSE) that the energy spectrum and eigenstates can be totally different under open or periodic boundary conditions, suggesting that non-Hermitian spectra can be extremely sensitive to non-local perturbations. Here, we show that a wide range of non-Hermitian models with NHSE can even be highly sensitive to local perturbation under open boundary conditions. The spectrum of these models is so fragile that it can be significantly modified by adding only exponentially small perturbations on boundaries. Intriguingly, we show that such fragile spectra are quantified by the Green's function exhibiting unconventional V-shape asymptotic behaviors. Accordingly, bi-directional exponential amplification can be observed. As an interesting consequence, we find a real-to-complex transition of the bulk spectrum induced by exponentially small boundary perturbations. Finally, we reveal a hierarchy of the asymptotic behaviors of non-Hermitian Green's functions, which restricts the frequency range for the presence of unconventional Green's functions., Comment: 7 pages, 3 figures. Supplemental Matriel will be added to the next version
- Published
- 2024
27. TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
- Author
-
Wang, Haiyang, Fan, Yue, Naeem, Muhammad Ferjad, Xian, Yongqin, Lenssen, Jan Eric, Wang, Liwei, Tombari, Federico, and Schiele, Bernt
- Subjects
Computer Science - Machine Learning - Abstract
Transformers have become the predominant architecture in foundation models due to their excellent performance across various domains. However, the substantial cost of scaling these models remains a significant concern. This problem arises primarily from their dependence on a fixed number of parameters within linear projections. When architectural modifications (e.g., channel dimensions) are introduced, the entire model typically requires retraining from scratch. As model sizes continue growing, this strategy results in increasingly high computational costs and becomes unsustainable. To overcome this problem, we introduce TokenFormer, a natively scalable architecture that leverages the attention mechanism not only for computations among input tokens but also for interactions between tokens and model parameters, thereby enhancing architectural flexibility. By treating model parameters as tokens, we replace all the linear projections in Transformers with our token-parameter attention layer, where input tokens act as queries and model parameters as keys and values. This reformulation allows for progressive and efficient scaling without necessitating retraining from scratch. Our model scales from 124M to 1.4B parameters by incrementally adding new key-value parameter pairs, achieving performance comparable to Transformers trained from scratch while greatly reducing training costs. Code and models are available at \url{https://github.com/Haiyang-W/TokenFormer}.
- Published
- 2024
28. NASM: Neural Anisotropic Surface Meshing
- Author
-
Li, Hongbo, Zhu, Haikuan, Zhong, Sikai, Wang, Ningna, Lin, Cheng, Guo, Xiaohu, Xin, Shiqing, Wang, Wenping, Hua, Jing, and Zhong, Zichun
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computational Geometry ,Computer Science - Graphics - Abstract
This paper introduces a new learning-based method, NASM, for anisotropic surface meshing. Our key idea is to propose a graph neural network to embed an input mesh into a high-dimensional (high-d) Euclidean embedding space to preserve curvature-based anisotropic metric by using a dot product loss between high-d edge vectors. This can dramatically reduce the computational time and increase the scalability. Then, we propose a novel feature-sensitive remeshing on the generated high-d embedding to automatically capture sharp geometric features. We define a high-d normal metric, and then derive an automatic differentiation on a high-d centroidal Voronoi tessellation (CVT) optimization with the normal metric to simultaneously preserve geometric features and curvature anisotropy that exhibit in the original 3D shapes. To our knowledge, this is the first time that a deep learning framework and a large dataset are proposed to construct a high-d Euclidean embedding space for 3D anisotropic surface meshing. Experimental results are evaluated and compared with the state-of-the-art in anisotropic surface meshing on a large number of surface models from Thingi10K dataset as well as tested on extensive unseen 3D shapes from Multi-Garment Network dataset and FAUST human dataset., Comment: SIGGRAPH Asia 2024 (Conference Track)
- Published
- 2024
29. Towards Constraint-aware Learning for Resource Allocation in NFV-enabled Networks
- Author
-
Wang, Tianfu, Yang, Long, Wang, Chao, Qin, Chuan, Deng, Liwei, Shen, Li, and Xiong, Hui
- Subjects
Computer Science - Networking and Internet Architecture - Abstract
Virtual Network Embedding (VNE) is a challenging combinatorial optimization problem that refers to resource allocation associated with hard and multifaceted constraints in network function virtualization (NFV). Existing works for VNE struggle to handle such complex constraints, leading to compromised system performance and stability. In this paper, we propose a \textbf{CON}straint-\textbf{A}ware \textbf{L}earning framework for VNE, named \textbf{CONAL}, to achieve efficient constraint management. Concretely, we formulate the VNE problem as a constrained Markov decision process with violation tolerance. This modeling approach aims to improve both resource utilization and solution feasibility by precisely evaluating solution quality and the degree of constraint violation. We also propose a reachability-guided optimization with an adaptive reachability budget method that dynamically assigns budget values. This method achieves persistent zero violation to guarantee the feasibility of VNE solutions and more stable policy optimization by handling instances without any feasible solution. Furthermore, we propose a constraint-aware graph representation method to efficiently learn cross-graph relations and constrained path connectivity in VNE. Finally, extensive experimental results demonstrate the superiority of our proposed method over state-of-the-art baselines. Our code is available at https://github.com/GeminiLight/conal-vne.
- Published
- 2024
30. Observation of Anderson localization transitions in a two-dimensional conjugated metal-organic framework
- Author
-
Cheng, Jinhao, Wang, Chen, He, Wenxue, Wang, Jiaojiao, Pang, Yifan, Yang, Fan, Ding, Shuaishuai, Ren, Hechen, and Hu, Wenping
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Condensed Matter - Materials Science - Abstract
Anderson localization transitions are a universal quantum phenomenon sensitive to the disorder and dimensionality of electronic systems. Over the past decades, this intriguing topic has inspired overwhelmingly more theoretical studies than experimental verifications due to the difficulty of controlling a material's disorder or dimensionality without modifying its fundamental electronic properties. Organic crystals with their rich disorders would be terrific playgrounds to investigate such disorder-driven phase transitions except for their low conductivities which usually prohibit low-temperature measurements. Here, we conduct systematic transport experiments in mesoscopic devices made with copper benzenehexathiol thin films across a wide range of thicknesses. We find metal-insulator transitions both among three-dimensional samples with different disorder strengths and between three-dimensional and quasi-two-dimensional samples. Temperature-dependence analysis of the conductivities corroborates the dimensionality crossover. Moreover, our theoretical modeling provides a basis for understanding both types of metal-insulator transitions within the framework of Anderson localization transitions. Our findings establish for the first time that organic crystals such as conductive metal-organic frameworks can exhibit such quantum interference effects. With organic materials' versatile chemical designs and crystalline structures, our work opens new avenues to search for novel quantum phenomena in organic material platforms.
- Published
- 2024
31. Explainable Behavior Cloning: Teaching Large Language Model Agents through Learning by Demonstration
- Author
-
Guan, Yanchu, Wang, Dong, Wang, Yan, Wang, Haiqing, Sun, Renen, Zhuang, Chenyi, Gu, Jinjie, and Chu, Zhixuan
- Subjects
Computer Science - Computation and Language - Abstract
Autonomous mobile app interaction has become increasingly important with growing complexity of mobile applications. Developing intelligent agents that can effectively navigate and interact with mobile apps remains a significant challenge. In this paper, we propose an Explainable Behavior Cloning LLM Agent (EBC-LLMAgent), a novel approach that combines large language models (LLMs) with behavior cloning by learning demonstrations to create intelligent and explainable agents for autonomous mobile app interaction. EBC-LLMAgent consists of three core modules: Demonstration Encoding, Code Generation, and UI Mapping, which work synergistically to capture user demonstrations, generate executable codes, and establish accurate correspondence between code and UI elements. We introduce the Behavior Cloning Chain Fusion technique to enhance the generalization capabilities of the agent. Extensive experiments on five popular mobile applications from diverse domains demonstrate the superior performance of EBC-LLMAgent, achieving high success rates in task completion, efficient generalization to unseen scenarios, and the generation of meaningful explanations., Comment: 20 pages
- Published
- 2024
32. Non-contact Dexterous Micromanipulation with Multiple Optoelectronic Robots
- Author
-
Jia, Yongyi, Miao, Shu, Wang, Ao, Ni, Caiding, Feng, Lin, Wang, Xiaowo, and Li, Xiang
- Subjects
Computer Science - Robotics - Abstract
Micromanipulation systems leverage automation and robotic technologies to improve the precision, repeatability, and efficiency of various tasks at the microscale. However, current approaches are typically limited to specific objects or tasks, which necessitates the use of custom tools and specialized grasping methods. This paper proposes a novel non-contact micromanipulation method based on optoelectronic technologies. The proposed method utilizes repulsive dielectrophoretic forces generated in the optoelectronic field to drive a microrobot, enabling the microrobot to push the target object in a cluttered environment without physical contact. The non-contact feature can minimize the risks of potential damage, contamination, or adhesion while largely improving the flexibility of manipulation. The feature enables the use of a general tool for indirect object manipulation, eliminating the need for specialized tools. A series of simulation studies and real-world experiments -- including non-contact trajectory tracking, obstacle avoidance, and reciprocal avoidance between multiple microrobots -- are conducted to validate the performance of the proposed method. The proposed formulation provides a general and dexterous solution for a range of objects and tasks at the micro scale., Comment: 8 pages, 10 figures
- Published
- 2024
33. Coexistence of superconductivity and sliding polar metal state in HgPSe3
- Author
-
Yu, Xiaohui, Zhong, Wei, Kawaguchi, Saori, Kadobayashi, Hirokazu, Wang, Xiaolin, Cheng, Zhenxiang, Chen, Changfeng, Yue, Binbin, Wang, Jian-Tao, Mao, Ho-Kwang, and Hong, Fang
- Subjects
Condensed Matter - Superconductivity ,Condensed Matter - Materials Science ,Condensed Matter - Strongly Correlated Electrons - Abstract
The simultaneous presence of polarity and metallicity in a material signifies an exotic polar metal state, but such materials are extremely rare, especially in bulk form, due to mutually exclusive nature of the fundamental defining properties. Here, we report experimental findings that HgPSe3 is a robust bulk polar metal at room temperature with a chiral structure stabilized by pressure and, remarkably, this polar metal hosts superconductivity with critical temperature Tc up to 11 K. Theoretical analysis reveals a two-step interlayer sliding-then-compressing mechanism for coexistence of polarity and metallicity in HgPSe3. This work unveils a new paradigm for creating the bulk polar metal state and simultaneous presence of coexisting quantum orders, raising the prospect of discovering novel emergent physics using pressure as a tuning knob., Comment: 19 pages, 4 main figures + 6 extented figures
- Published
- 2024
34. Dual Contrastive Transformer for Hierarchical Preference Modeling in Sequential Recommendation
- Author
-
Huang, Chengkai, Wang, Shoujin, Wang, Xianzhi, and Yao, Lina
- Subjects
Computer Science - Information Retrieval ,Computer Science - Artificial Intelligence - Abstract
Sequential recommender systems (SRSs) aim to predict the subsequent items which may interest users via comprehensively modeling users' complex preference embedded in the sequence of user-item interactions. However, most of existing SRSs often model users' single low-level preference based on item ID information while ignoring the high-level preference revealed by item attribute information, such as item category. Furthermore, they often utilize limited sequence context information to predict the next item while overlooking richer inter-item semantic relations. To this end, in this paper, we proposed a novel hierarchical preference modeling framework to substantially model the complex low- and high-level preference dynamics for accurate sequential recommendation. Specifically, in the framework, a novel dual-transformer module and a novel dual contrastive learning scheme have been designed to discriminatively learn users' low- and high-level preference and to effectively enhance both low- and high-level preference learning respectively. In addition, a novel semantics-enhanced context embedding module has been devised to generate more informative context embedding for further improving the recommendation performance. Extensive experiments on six real-world datasets have demonstrated both the superiority of our proposed method over the state-of-the-art ones and the rationality of our design.
- Published
- 2024
35. MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning
- Author
-
Wang, Xujia, Zhao, Haiyan, Wang, Shuo, Wang, Hanqing, and Liu, Zhiyuan
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning ,I.2.7 - Abstract
Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA have significantly improved the adaptation of LLMs to downstream tasks in a resource-efficient manner. However, in multi-task scenarios, challenges such as training imbalance and the seesaw effect frequently emerge. Mixture-of-LoRA (MoLoRA), which combines LoRA with sparse Mixture-of-Experts, mitigates some of these issues by promoting task-specific learning across experts. Despite this, MoLoRA remains inefficient in terms of training speed, parameter utilization, and overall multi-task performance. In this paper, we propose Mixture of Asymmetric Low-Rank Adaptaion (MALoRA), a flexible fine-tuning framework that leverages asymmetric optimization across LoRA experts. MALoRA reduces the number of trainable parameters by 30% to 48%, increases training speed by 1.2x, and matches the computational efficiency of single-task LoRA models. Additionally, MALoRA addresses overfitting issues commonly seen in high-rank configurations, enhancing performance stability. Extensive experiments across diverse multi-task learning scenarios demonstrate that MALoRA consistently outperforms all baseline methods in both inter-domain and intra-domain tasks., Comment: 14 pages, 5 figures
- Published
- 2024
36. FuseAnyPart: Diffusion-Driven Facial Parts Swapping via Multiple Reference Images
- Author
-
Yu, Zheng, Wang, Yaohua, Cui, Siying, Zhang, Aixi, Zheng, Wei-Long, and Wang, Senzhang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Facial parts swapping aims to selectively transfer regions of interest from the source image onto the target image while maintaining the rest of the target image unchanged. Most studies on face swapping designed specifically for full-face swapping, are either unable or significantly limited when it comes to swapping individual facial parts, which hinders fine-grained and customized character designs. However, designing such an approach specifically for facial parts swapping is challenged by a reasonable multiple reference feature fusion, which needs to be both efficient and effective. To overcome this challenge, FuseAnyPart is proposed to facilitate the seamless "fuse-any-part" customization of the face. In FuseAnyPart, facial parts from different people are assembled into a complete face in latent space within the Mask-based Fusion Module. Subsequently, the consolidated feature is dispatched to the Addition-based Injection Module for fusion within the UNet of the diffusion model to create novel characters. Extensive experiments qualitatively and quantitatively validate the superiority and robustness of FuseAnyPart. Source codes are available at https://github.com/Thomas-wyh/FuseAnyPart., Comment: Accepted by the NeurIPS 2024 (Spotlight). Homepage: https://thomas-wyh.github.io/
- Published
- 2024
37. P$^2$C$^2$Net: PDE-Preserved Coarse Correction Network for efficient prediction of spatiotemporal dynamics
- Author
-
Wang, Qi, Ren, Pu, Zhou, Hao, Liu, Xin-Yang, Deng, Zhiwen, Zhang, Yi, Chengze, Ruizhi, Liu, Hongsheng, Wang, Zidong, Wang, Jian-Xun, Ji-Rong_Wen, Sun, Hao, and Liu, Yang
- Subjects
Mathematics - Numerical Analysis ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
When solving partial differential equations (PDEs), classical numerical methods often require fine mesh grids and small time stepping to meet stability, consistency, and convergence conditions, leading to high computational cost. Recently, machine learning has been increasingly utilized to solve PDE problems, but they often encounter challenges related to interpretability, generalizability, and strong dependency on rich labeled data. Hence, we introduce a new PDE-Preserved Coarse Correction Network (P$^2$C$^2$Net) to efficiently solve spatiotemporal PDE problems on coarse mesh grids in small data regimes. The model consists of two synergistic modules: (1) a trainable PDE block that learns to update the coarse solution (i.e., the system state), based on a high-order numerical scheme with boundary condition encoding, and (2) a neural network block that consistently corrects the solution on the fly. In particular, we propose a learnable symmetric Conv filter, with weights shared over the entire model, to accurately estimate the spatial derivatives of PDE based on the neural-corrected system state. The resulting physics-encoded model is capable of handling limited training data (e.g., 3--5 trajectories) and accelerates the prediction of PDE solutions on coarse spatiotemporal grids while maintaining a high accuracy. P$^2$C$^2$Net achieves consistent state-of-the-art performance with over 50\% gain (e.g., in terms of relative prediction error) across four datasets covering complex reaction-diffusion processes and turbulent flows.
- Published
- 2024
38. Personalization of Large Language Models: A Survey
- Author
-
Zhang, Zhehao, Rossi, Ryan A., Kveton, Branislav, Shao, Yijia, Yang, Diyi, Zamani, Hamed, Dernoncourt, Franck, Barrow, Joe, Yu, Tong, Kim, Sungchul, Zhang, Ruiyi, Gu, Jiuxiang, Derr, Tyler, Chen, Hongjie, Wu, Junda, Chen, Xiang, Wang, Zichao, Mitra, Subrata, Lipka, Nedim, Ahmed, Nesreen, and Wang, Yu
- Subjects
Computer Science - Computation and Language - Abstract
Personalization of Large Language Models (LLMs) has recently become increasingly important with a wide range of applications. Despite the importance and recent progress, most existing works on personalized LLMs have focused either entirely on (a) personalized text generation or (b) leveraging LLMs for personalization-related downstream applications, such as recommendation systems. In this work, we bridge the gap between these two separate main directions for the first time by introducing a taxonomy for personalized LLM usage and summarizing the key differences and challenges. We provide a formalization of the foundations of personalized LLMs that consolidates and expands notions of personalization of LLMs, defining and discussing novel facets of personalization, usage, and desiderata of personalized LLMs. We then unify the literature across these diverse fields and usage scenarios by proposing systematic taxonomies for the granularity of personalization, personalization techniques, datasets, evaluation methods, and applications of personalized LLMs. Finally, we highlight challenges and important open problems that remain to be addressed. By unifying and surveying recent research using the proposed taxonomies, we aim to provide a clear guide to the existing literature and different facets of personalization in LLMs, empowering both researchers and practitioners.
- Published
- 2024
39. Modeling Temporal Positive and Negative Excitation for Sequential Recommendation
- Author
-
Huang, Chengkai, Wang, Shoujin, Wang, Xianzhi, and Yao, Lina
- Subjects
Computer Science - Information Retrieval ,Computer Science - Artificial Intelligence - Abstract
Sequential recommendation aims to predict the next item which interests users via modeling their interest in items over time. Most of the existing works on sequential recommendation model users' dynamic interest in specific items while overlooking users' static interest revealed by some static attribute information of items, e.g., category, or brand. Moreover, existing works often only consider the positive excitation of a user's historical interactions on his/her next choice on candidate items while ignoring the commonly existing negative excitation, resulting in insufficient modeling dynamic interest. The overlook of static interest and negative excitation will lead to incomplete interest modeling and thus impede the recommendation performance. To this end, in this paper, we propose modeling both static interest and negative excitation for dynamic interest to further improve the recommendation performance. Accordingly, we design a novel Static-Dynamic Interest Learning (SDIL) framework featured with a novel Temporal Positive and Negative Excitation Modeling (TPNE) module for accurate sequential recommendation. TPNE is specially designed for comprehensively modeling dynamic interest based on temporal positive and negative excitation learning. Extensive experiments on three real-world datasets show that SDIL can effectively capture both static and dynamic interest and outperforms state-of-the-art baselines.
- Published
- 2024
40. ReDAN: An Empirical Study on Remote DoS Attacks against NAT Networks
- Author
-
Feng, Xuewei, Yang, Yuxiang, Li, Qi, Zhan, xingxiang, Sun, Kun, Wang, Ziqiang, Wang, Ao, Du, Ganqiu, and Xu, Ke
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Networking and Internet Architecture - Abstract
In this paper, we conduct an empirical study on remote DoS attacks targeting NAT networks. We show that Internet attackers operating outside local NAT networks can remotely identify a NAT device and subsequently terminate TCP connections initiated from the identified NAT device to external servers. Our attack involves two steps. First, we identify NAT devices on the Internet by exploiting inadequacies in the PMTUD mechanism within NAT specifications. This deficiency creates a fundamental side channel that allows Internet attackers to distinguish if a public IPv4 address serves a NAT device or a separate IP host, aiding in the identification of target NAT devices. Second, we launch a remote DoS attack to terminate TCP connections on the identified NAT devices. While recent NAT implementations may include protective measures, such as packet legitimacy validation to prevent malicious manipulations on NAT mappings, we discover that these safeguards are not widely adopted in real world. Consequently, attackers can send crafted packets to deceive NAT devices into erroneously removing innocent TCP connection mappings, thereby disrupting the NATed clients to access remote TCP servers. Our experimental results reveal widespread security vulnerabilities in existing NAT devices. After testing 8 types of router firmware and 30 commercial NAT devices from 14 vendors, we identify vulnerabilities in 6 firmware types and 29 NAT devices. Moreover, our measurements reveal a stark reality: 166 out of 180 (over 92%) tested real-world NAT networks, comprising 90 4G LTE/5G networks, 60 public Wi-Fi networks, and 30 cloud VPS networks, are susceptible to exploitation. We responsibly disclosed the vulnerabilities to affected vendors and received a significant number of acknowledgments. Finally, we propose our countermeasures against the identified DoS attack., Comment: Accepted by Network and Distributed System Security (NDSS) Symposium 2025
- Published
- 2024
41. Micro-Structures Graph-Based Point Cloud Registration for Balancing Efficiency and Accuracy
- Author
-
Zhang, Rongling, Yan, Li, Wei, Pengcheng, Xie, Hong, Wang, Pinzhuo, and Wang, Binbing
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Point Cloud Registration (PCR) is a fundamental and significant issue in photogrammetry and remote sensing, aiming to seek the optimal rigid transformation between sets of points. Achieving efficient and precise PCR poses a considerable challenge. We propose a novel micro-structures graph-based global point cloud registration method. The overall method is comprised of two stages. 1) Coarse registration (CR): We develop a graph incorporating micro-structures, employing an efficient graph-based hierarchical strategy to remove outliers for obtaining the maximal consensus set. We propose a robust GNC-Welsch estimator for optimization derived from a robust estimator to the outlier process in the Lie algebra space, achieving fast and robust alignment. 2) Fine registration (FR): To refine local alignment further, we use the octree approach to adaptive search plane features in the micro-structures. By minimizing the distance from the point-to-plane, we can obtain a more precise local alignment, and the process will also be addressed effectively by being treated as a planar adjustment algorithm combined with Anderson accelerated optimization (PA-AA). After extensive experiments on real data, our proposed method performs well on the 3DMatch and ETH datasets compared to the most advanced methods, achieving higher accuracy metrics and reducing the time cost by at least one-third.
- Published
- 2024
- Full Text
- View/download PDF
42. Search for $\Lambda$-$\bar{\Lambda} $ oscillation in $J/\psi\rightarrow\Lambda\bar{\Lambda}$ decay
- Author
-
BESIII Collaboration, Ablikim, M., Achasov, M. N., Adlarson, P., Afedulidis, O., Ai, X. C., Aliberti, R., Amoroso, A., An, Q., Bai, Y., Bakina, O., Balossino, I., Ban, Y., Bao, H. -R., Batozskaya, V., Begzsuren, K., Berger, N., Berlowski, M., Bertani, M., Bettoni, D., Bianchi, F., Bianco, E., Bortone, A., Boyko, I., Briere, R. A., Brueggemann, A., Cai, H., Cai, X., Calcaterra, A., Cao, G. F., Cao, N., Cetin, S. A., Chang, J. F., Che, G. R., Chelkov, G., Chen, C., Chen, C. H., Chen, Chao, Chen, G., Chen, H. S., Chen, H. Y., Chen, M. L., Chen, S. J., Chen, S. L., Chen, S. M., Chen, T., Chen, X. R., Chen, X. T., Chen, Y. B., Chen, Y. Q., Chen, Z. J., Chen, Z. Y., Choi, S. K., Cibinetto, G., Cossio, F., Cui, J. J., Dai, H. L., Dai, J. P., Dbeyssi, A., de Boer, R. E., Dedovich, D., Deng, C. Q., Deng, Z. Y., Denig, A., Denysenko, I., Destefanis, M., De Mori, F., Ding, B., Ding, X. X., Ding, Y., Dong, J., Dong, L. Y., Dong, M. Y., Dong, X., Du, M. C., Du, S. X., Duan, Y. Y., Duan, Z. H., Egorov, P., Fan, Y. H., Fang, J., Fang, S. S., Fang, W. X., Fang, Y., Fang, Y. Q., Farinelli, R., Fava, L., Feldbauer, F., Felici, G., Feng, C. Q., Feng, J. H., Feng, Y. T., Fritsch, M., Fu, C. D., Fu, J. L., Fu, Y. W., Gao, H., Gao, X. B., Gao, Y. N., Gao, Yang, Garbolino, S., Garzia, I., Ge, L., Ge, P. T., Ge, Z. W., Geng, C., Gersabeck, E. M., Gilman, A., Goetzen, K., Gong, L., Gong, W. X., Gradl, W., Gramigna, S., Greco, M., Gu, M. H., Gu, Y. T., Guan, C. Y., Guo, A. Q., Guo, L. B., Guo, M. J., Guo, R. P., Guo, Y. P., Guskov, A., Gutierrez, J., Han, K. L., Han, T. T., Hanisch, F., Hao, X. Q., Harris, F. A., He, K. K., He, K. L., Heinsius, F. H., Heinz, C. H., Heng, Y. K., Herold, C., Holtmann, T., Hong, P. C., Hou, G. Y., Hou, X. T., Hou, Y. R., Hou, Z. L., Hu, B. Y., Hu, H. M., Hu, J. F., Hu, S. L., Hu, T., Hu, Y., Huang, G. S., Huang, K. X., Huang, L. Q., Huang, X. T., Huang, Y. P., Huang, Y. S., Hussain, T., Hölzken, F., Hüsken, N., der Wiesche, N. in, Jackson, J., Janchiv, S., Jeong, J. H., Ji, Q., Ji, Q. P., Ji, W., Ji, X. B., Ji, X. L., Ji, Y. Y., Jia, X. Q., Jia, Z. K., Jiang, D., Jiang, H. B., Jiang, P. C., Jiang, S. S., Jiang, T. J., Jiang, X. S., Jiang, Y., Jiao, J. B., Jiao, J. K., Jiao, Z., Jin, S., Jin, Y., Jing, M. Q., Jing, X. M., Johansson, T., Kabana, S., Kalantar-Nayestanaki, N., Kang, X. L., Kang, X. S., Kavatsyuk, M., Ke, B. C., Khachatryan, V., Khoukaz, A., Kiuchi, R., Kolcu, O. B., Kopf, B., Kuessner, M., Kui, X., Kumar, N., Kupsc, A., Kühn, W., Lane, J. J., Lavezzi, L., Lei, T. T., Lei, Z. H., Lellmann, M., Lenz, T., Li, C., Li, C. H., Li, Cheng, Li, D. M., Li, F., Li, G., Li, H. B., Li, H. J., Li, H. N., Li, Hui, Li, J. R., Li, J. S., Li, K., Li, L. J., Li, L. K., Li, Lei, Li, M. H., Li, P. R., Li, Q. M., Li, Q. X., Li, R., Li, S. X., Li, T., Li, W. D., Li, W. G., Li, X., Li, X. H., Li, X. L., Li, X. Y., Li, X. Z., Li, Y. G., Li, Z. J., Li, Z. Y., Liang, C., Liang, H., Liang, Y. F., Liang, Y. T., Liao, G. R., Liao, Y. P., Libby, J., Limphirat, A., Lin, C. C., Lin, D. X., Lin, T., Liu, B. J., Liu, B. X., Liu, C., Liu, C. X., Liu, F., Liu, F. H., Liu, Feng, Liu, G. M., Liu, H., Liu, H. B., Liu, H. H., Liu, H. M., Liu, Huihui, Liu, J. B., Liu, J. Y., Liu, K., Liu, K. Y., Liu, Ke, Liu, L., Liu, L. C., Liu, Lu, Liu, M. H., Liu, P. L., Liu, Q., Liu, S. B., Liu, T., Liu, W. K., Liu, W. M., Liu, X., Liu, Y., Liu, Y. B., Liu, Z. A., Liu, Z. D., Liu, Z. Q., Lou, X. C., Lu, F. X., Lu, H. J., Lu, J. G., Lu, X. L., Lu, Y., Lu, Y. P., Lu, Z. H., Luo, C. L., Luo, J. R., Luo, M. X., Luo, T., Luo, X. L., Lyu, X. R., Lyu, Y. F., Ma, F. C., Ma, H., Ma, H. L., Ma, J. L., Ma, L. L., Ma, L. R., Ma, M. M., Ma, Q. M., Ma, R. Q., Ma, T., Ma, X. T., Ma, X. Y., Ma, Y., Ma, Y. M., Maas, F. E., Maggiora, M., Malde, S., Mao, Y. J., Mao, Z. P., Marcello, S., Meng, Z. X., Messchendorp, J. G., Mezzadri, G., Miao, H., Min, T. J., Mitchell, R. E., Mo, X. H., Moses, B., Muchnoi, N. Yu., Muskalla, J., Nefedov, Y., Nerling, F., Nie, L. S., Nikolaev, I. B., Ning, Z., Nisar, S., Niu, Q. L., Niu, W. D., Niu, Y., Olsen, S. L., Ouyang, Q., Pacetti, S., Pan, X., Pan, Y., Pathak, A., Pei, Y. P., Pelizaeus, M., Peng, H. P., Peng, Y. Y., Peters, K., Ping, J. L., Ping, R. G., Plura, S., Prasad, V., Qi, F. Z., Qi, H., Qi, H. R., Qi, M., Qi, T. Y., Qian, S., Qian, W. B., Qiao, C. F., Qiao, X. K., Qin, J. J., Qin, L. Q., Qin, L. Y., Qin, X. P., Qin, X. S., Qin, Z. H., Qiu, J. F., Qu, Z. H., Redmer, C. F., Ren, K. J., Rivetti, A., Rolo, M., Rong, G., Rosner, Ch., Ruan, S. N., Salone, N., Sarantsev, A., Schelhaas, Y., Schoenning, K., Scodeggio, M., Shan, K. Y., Shan, W., Shan, X. Y., Shang, Z. J., Shangguan, J. F., Shao, L. G., Shao, M., Shen, C. P., Shen, H. F., Shen, W. H., Shen, X. Y., Shi, B. A., Shi, H., Shi, H. C., Shi, J. L., Shi, J. Y., Shi, Q. Q., Shi, S. Y., Shi, X., Song, J. J., Song, T. Z., Song, W. M., Song, Y. J., Song, Y. X., Sosio, S., Spataro, S., Stieler, F., Su, Y. J., Sun, G. B., Sun, G. X., Sun, H., Sun, H. K., Sun, J. F., Sun, K., Sun, L., Sun, S. S., Sun, T., Sun, W. Y., Sun, Y., Sun, Y. J., Sun, Y. Z., Sun, Z. Q., Sun, Z. T., Tang, C. J., Tang, G. Y., Tang, J., Tang, M., Tang, Y. A., Tao, L. Y., Tao, Q. T., Tat, M., Teng, J. X., Thoren, V., Tian, W. H., Tian, Y., Tian, Z. F., Uman, I., Wan, Y., Wang, S. J., Wang, B., Wang, B. L., Wang, Bo, Wang, D. Y., Wang, F., Wang, H. J., Wang, J. J., Wang, J. P., Wang, K., Wang, L. L., Wang, M., Wang, N. Y., Wang, S., Wang, T., Wang, T. J., Wang, W., Wang, W. P., Wang, X., Wang, X. F., Wang, X. J., Wang, X. L., Wang, X. N., Wang, Y., Wang, Y. D., Wang, Y. F., Wang, Y. L., Wang, Y. N., Wang, Y. Q., Wang, Yaqian, Wang, Yi, Wang, Z., Wang, Z. L., Wang, Z. Y., Wang, Ziyi, Wei, D. H., Weidner, F., Wen, S. P., Wen, Y. R., Wiedner, U., Wilkinson, G., Wolke, M., Wollenberg, L., Wu, C., Wu, J. F., Wu, L. H., Wu, L. J., Wu, X., Wu, X. H., Wu, Y., Wu, Y. H., Wu, Y. J., Wu, Z., Xia, L., Xian, X. M., Xiang, B. H., Xiang, T., Xiao, D., Xiao, G. Y., Xiao, S. Y., Xiao, Y. L., Xiao, Z. J., Xie, C., Xie, X. H., Xie, Y., Xie, Y. G., Xie, Y. H., Xie, Z. P., Xing, T. Y., Xu, C. F., Xu, C. J., Xu, G. F., Xu, H. Y., Xu, M., Xu, Q. J., Xu, Q. N., Xu, W., Xu, W. L., Xu, X. P., Xu, Y. C., Xu, Z. S., Yan, F., Yan, L., Yan, W. B., Yan, W. C., Yan, X. Q., Yang, H. J., Yang, H. L., Yang, H. X., Yang, T., Yang, Y., Yang, Y. F., Yang, Y. X., Yang, Z. W., Yao, Z. P., Ye, M., Ye, M. H., Yin, J. H., Yin, Junhao, You, Z. Y., Yu, B. X., Yu, C. X., Yu, G., Yu, J. S., Yu, T., Yu, X. D., Yu, Y. C., Yuan, C. Z., Yuan, J., Yuan, L., Yuan, S. C., Yuan, Y., Yuan, Z. Y., Yue, C. X., Zafar, A. A., Zeng, F. R., Zeng, S. H., Zeng, X., Zeng, Y., Zeng, Y. J., Zhai, X. Y., Zhai, Y. C., Zhan, Y. H., Zhang, A. Q., Zhang, B. L., Zhang, B. X., Zhang, D. H., Zhang, G. Y., Zhang, H., Zhang, H. C., Zhang, H. H., Zhang, H. Q., Zhang, H. R., Zhang, H. Y., Zhang, J., Zhang, J. J., Zhang, J. L., Zhang, J. Q., Zhang, J. S., Zhang, J. W., Zhang, J. X., Zhang, J. Y., Zhang, J. Z., Zhang, Jianyu, Zhang, L. M., Zhang, Lei, Zhang, P., Zhang, Q. Y., Zhang, R. Y., Zhang, S. H., Zhang, Shulei, Zhang, X. D., Zhang, X. M., Zhang, X. Y., Zhang, Y., Zhang, Y. T., Zhang, Y. H., Zhang, Y. M., Zhang, Yan, Zhang, Z. D., Zhang, Z. H., Zhang, Z. L., Zhang, Z. Y., Zhang, Z. Z., Zhao, G., Zhao, J. Y., Zhao, J. Z., Zhao, L., Zhao, Lei, Zhao, M. G., Zhao, N., Zhao, R. P., Zhao, S. J., Zhao, Y. B., Zhao, Y. X., Zhao, Z. G., Zhemchugov, A., Zheng, B., Zheng, B. M., Zheng, J. P., Zheng, W. J., Zheng, Y. H., Zhong, B., Zhong, X., Zhou, H., Zhou, J. Y., Zhou, L. P., Zhou, S., Zhou, X., Zhou, X. K., Zhou, X. R., Zhou, X. Y., Zhou, Y. Z., Zhu, A. N., Zhu, J., Zhu, K., Zhu, K. J., Zhu, K. S., Zhu, L., Zhu, L. X., Zhu, S. H., Zhu, T. J., Zhu, W. D., Zhu, Y. C., Zhu, Z. A., Zou, J. H., and Zu, J.
- Subjects
High Energy Physics - Experiment - Abstract
Using $(10087\pm44)\times 10^{6}$ $J/\psi$ decays collected by the BESIII detector at the BEPCII collider, we search for baryon number violation via $\Lambda-\bar{\Lambda}$ oscillation in the decay $J/\psi \to \Lambda \bar{\Lambda}$. No evidence for $\Lambda-\bar\Lambda$ oscillation is observed. The upper limit on the time-integrated probability of $\Lambda-\bar{\Lambda}$ oscillation is estimated to be $1.4\times 10^{-6}$, corresponding to an oscillation parameter less than $2.1\times 10^{-18}~\mathrm{GeV}$ at $90\%$ confidence level., Comment: 8 pages, 2 figures
- Published
- 2024
43. Probing the mass effect of heavy quark jets in high-energy nuclear collisions
- Author
-
Wang, Sa, Li, Shuang, Li, Yao, Zhang, Ben-Wei, and Wang, Enke
- Subjects
High Energy Physics - Phenomenology ,Nuclear Theory - Abstract
The production of heavy quark (HQ) jets provides a new arena to address the mass effect of jet quenching in heavy-ion physics. This paper presents a theoretical study of HQ jet yield suppression in Pb+Pb collisions at the LHC and focuses on the energy loss of HQ jets produced by different mechanisms. The p+p baseline is carried out by the SHERPA generator, and the jet-medium interactions are described by the SHELL transport model, which considers the elastic and inelastic partonic energy loss in the quark-gluon plasma (QGP). In p+p collisions, our numerical results indicate that the HQ jets from gluon splitting ($g \rightarrow Q$-jet) give the dominant contribution at high $p_T$, and it shows more dispersive structures than the HQ-initiated one ($Q \rightarrow Q$-jet). In nucleus-nucleus collisions, our calculations are consistent with the inclusive and b-jet $R_{AA}$ recently measured by the ATLAS collaboration, which suggests a remarkable manifestation of the mass effect of jet energy loss. As a result of the dispersive substructure, the $g \rightarrow Q$-jet will lose more energy than the $Q \rightarrow Q$-jet in the QGP. Due to the significant contribution of $g \rightarrow c$-jet, the $R_{AA}$ of c-jet will be comparable or even smaller than that of inclusive jet. To experimentally distinguish the $g \rightarrow Q$-jet and $Q \rightarrow Q$-jet, we propose the event selection strategies based on their topological features and test the performances. By isolating the $c \rightarrow c$-jet and $b \rightarrow b$-jet, the jets initiated by heavy quarks, we predict that the order of their $R_{AA}$ are in line with the mass hierarchy of energy loss. Future measurements on the $R_{AA}$ of $Q \rightarrow Q$-jet and $g \rightarrow Q$-jet will provide a unique chance to test the flavour/mass dependence of energy loss at the jet level., Comment: 11 pages, 12 figures
- Published
- 2024
44. Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models
- Author
-
Wang, Shaobo, Tang, Hongxuan, Wang, Mingyang, Zhang, Hongrui, Liu, Xuyang, Li, Weiya, Hu, Xuming, and Zhang, Linfeng
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computer Science and Game Theory - Abstract
The debate between self-interpretable models and post-hoc explanations for black-box models is central to Explainable AI (XAI). Self-interpretable models, such as concept-based networks, offer insights by connecting decisions to human-understandable concepts but often struggle with performance and scalability. Conversely, post-hoc methods like Shapley values, while theoretically robust, are computationally expensive and resource-intensive. To bridge the gap between these two lines of research, we propose a novel method that combines their strengths, providing theoretically guaranteed self-interpretability for black-box models without compromising prediction accuracy. Specifically, we introduce a parameter-efficient pipeline, *AutoGnothi*, which integrates a small side network into the black-box model, allowing it to generate Shapley value explanations without changing the original network parameters. This side-tuning approach significantly reduces memory, training, and inference costs, outperforming traditional parameter-efficient methods, where full fine-tuning serves as the optimal baseline. *AutoGnothi* enables the black-box model to predict and explain its predictions with minimal overhead. Extensive experiments show that *AutoGnothi* offers accurate explanations for both vision and language tasks, delivering superior computational efficiency with comparable interpretability.
- Published
- 2024
45. Efficient Incremental Code Coverage Analysis for Regression Test Suites
- Author
-
Wang, Jiale Amber, Wang, Kaiyuan, and Nie, Pengyu
- Subjects
Computer Science - Software Engineering - Abstract
Code coverage analysis has been widely adopted in the continuous integration of open-source and industry software repositories to monitor the adequacy of regression test suites. However, computing code coverage can be costly, introducing significant overhead during test execution. Plus, re-collecting code coverage for the entire test suite is usually unnecessary when only a part of the coverage data is affected by code changes. While regression test selection (RTS) techniques exist to select a subset of tests whose behaviors may be affected by code changes, they are not compatible with code coverage analysis techniques -- that is, simply executing RTS-selected tests leads to incorrect code coverage results. In this paper, we present the first incremental code coverage analysis technique, which speeds up code coverage analysis by executing a minimal subset of tests to update the coverage data affected by code changes. We implement our technique in a tool dubbed iJaCoCo, which builds on Ekstazi and JaCoCo -- the state-of-the-art RTS and code coverage analysis tools for Java. We evaluate iJaCoCo on 1,122 versions from 22 open-source repositories and show that iJaCoCo can speed up code coverage analysis time by an average of 1.86x and up to 8.20x compared to JaCoCo., Comment: Accepted as a conference paper at ASE 2024
- Published
- 2024
46. Spectral study of very high energy gamma rays from SS 433 with HAWC
- Author
-
Alfaro, R., Alvarez, C., Arteaga-Velázquez, J. C., Rojas, D. Avila, Solares, H. A. Ayala, Babu, R., Belmont-Moreno, E., Caballero-Mora, K. S., Capistrán, T., Carramiñana, A., Casanova, S., Cotzomi, J., De la Fuente, E., Depaoli, D., Di Lalla, N., Hernandez, R. Diaz, Dingus, B. L ., DuVernois, M. A., Engel, K., Ergin, T., Espinoza, C ., Fan, K. L., Fang, K., Fraija, N., Fraija, S., García-González, J. A., Muñoz, A. González, González, M. M., Goodman, J. A., Groetsch, S., Harding, J. P., Hernández-Cadena, S., Herzog, I., Huang, D., Hueyotl-Zahuantitla, F., Hüntemeyer, P., Iriarte, A., Kaufmann, S., Lara, A ., Lee, W. H., Lee, J., de León, C., Vargas, H. León, Longinotti, A. L., Luis-Raya, G., Malone, K., Martínez-Castro, J., Matthews, J. A., Miranda-Romagnoli, P., Montes, J. A., Moreno, E., Mostafá, M., Nellen, L., Nisa, M. U ., Noriega-Papaqui, R ., Araujo, Y. Pérez, Pérez-Pérez, E. G., Rho, C. D., Rosa-González, D., Ruiz-Velasco, E ., Salazar, H., Sandoval, A., Schneider, M., Serna-Franco, J., Smith, A. J., Son, Y., Springer, R. W ., Tibolla, O., Tollefson, K., Torres, I., Torres-Escobedo, R., Turner, R., Ureña-Mena, F., Varela, E ., Villaseñor, L., Wang, X., Wang, Z., Watson, I. J., Yu, S ., Yun-Cárcamo, S., and Zhou, H.
- Subjects
Astrophysics - High Energy Astrophysical Phenomena - Abstract
Very-high-energy (0.1-100 TeV) gamma-ray emission was observed in HAWC data from the lobes of the microquasar SS 433, making them the first set of astrophysical jets that were resolved at TeV energies. In this work, we update the analysis of SS 433 using 2,565 days of data from the High Altitude Water Cherenkov (HAWC) observatory. Our analysis reports the detection of a point-like source in the east lobe at a significance of $6.6\,\sigma$ and in the west lobe at a significance of $8.2\,\sigma$. For each jet lobe, we localize the gamma-ray emission and identify a best-fit position. The locations are close to the X-ray emission sites "e1" and "w1" for the east and west lobes, respectively. We analyze the spectral energy distributions and find that the energy spectra of the lobes are consistent with a simple power-law $\text{d}N/\text{d}E\propto E^{\alpha}$ with $\alpha = -2.44^{+0.13+0.04}_{-0.12-0.04}$ and $\alpha = -2.35^{+0.12+0.03}_{-0.11-0.03}$ for the east and west lobes, respectively. The maximum energy of photons from the east and west lobes reaches 56 TeV and 123 TeV, respectively. We compare our observations to various models and conclude that the very-high-energy gamma-ray emission can be produced by a population of electrons that were efficiently accelerated.
- Published
- 2024
47. EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data
- Author
-
Yi, Zhonghua, Shi, Hao, Jiang, Qi, Yang, Kailun, Wang, Ze, Gu, Diyang, Zhang, Yufan, and Wang, Kaiwei
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Robotics ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Event cameras, with high temporal resolution and high dynamic range, have limited research on the inter-modality local feature extraction and matching of event-image data. We propose EI-Nexus, an unmediated and flexible framework that integrates two modality-specific keypoint extractors and a feature matcher. To achieve keypoint extraction across viewpoint and modality changes, we bring Local Feature Distillation (LFD), which transfers the viewpoint consistency from a well-learned image extractor to the event extractor, ensuring robust feature correspondence. Furthermore, with the help of Context Aggregation (CA), a remarkable enhancement is observed in feature matching. We further establish the first two inter-modality feature matching benchmarks, MVSEC-RPE and EC-RPE, to assess relative pose estimation on event-image data. Our approach outperforms traditional methods that rely on explicit modal transformation, offering more unmediated and adaptable feature extraction and matching, achieving better keypoint similarity and state-of-the-art results on the MVSEC-RPE and EC-RPE benchmarks. The source code and benchmarks will be made publicly available at https://github.com/ZhonghuaYi/EI-Nexus_official., Comment: Accepted to WACV 2025. The source code and benchmarks will be made publicly available at https://github.com/ZhonghuaYi/EI-Nexus_official
- Published
- 2024
48. FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution
- Author
-
Wang, Shuai, Li, Zexian, Song, Tianhui, Li, Xubin, Ge, Tiezheng, Zheng, Bo, and Wang, Limin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Arbitrary-resolution image generation still remains a challenging task in AIGC, as it requires handling varying resolutions and aspect ratios while maintaining high visual quality. Existing transformer-based diffusion methods suffer from quadratic computation cost and limited resolution extrapolation capabilities, making them less effective for this task. In this paper, we propose FlowDCN, a purely convolution-based generative model with linear time and memory complexity, that can efficiently generate high-quality images at arbitrary resolutions. Equipped with a new design of learnable group-wise deformable convolution block, our FlowDCN yields higher flexibility and capability to handle different resolutions with a single model. FlowDCN achieves the state-of-the-art 4.30 sFID on $256\times256$ ImageNet Benchmark and comparable resolution extrapolation results, surpassing transformer-based counterparts in terms of convergence speed (only $\frac{1}{5}$ images), visual quality, parameters ($8\%$ reduction) and FLOPs ($20\%$ reduction). We believe FlowDCN offers a promising solution to scalable and flexible image synthesis., Comment: Accepted on NeurIPS24
- Published
- 2024
49. Scaling LLM Inference with Optimized Sample Compute Allocation
- Author
-
Zhang, Kexun, Zhou, Shang, Wang, Danqing, Wang, William Yang, and Li, Lei
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Sampling is a basic operation in many inference-time algorithms of large language models (LLMs). To scale up inference efficiently with a limited compute, it is crucial to find an optimal allocation for sample compute budgets: Which sampling configurations (model, temperature, language, etc.) do we use? How many samples do we generate in each configuration? We formulate these choices as a learning problem and propose OSCA, an algorithm that Optimizes Sample Compute Allocation by finding an optimal mix of different inference configurations. Our experiments show that with our learned mixed allocation, we can achieve accuracy better than the best single configuration with 128x less compute on code generation and 25x less compute on 4 reasoning tasks. OSCA is also shown to be effective in agentic workflows beyond single-turn tasks, achieving a better accuracy on SWE-Bench with 3x less compute than the default configuration. Our code and generations are released at https://github.com/LeiLiLab/OSCA.
- Published
- 2024
50. Leveraging Recurrent Neural Networks for Predicting Motor Movements from Primate Motor Cortex Neural Recordings
- Author
-
Wang, Yuanxi, Wang, Zuowen, and Liu, Shih-Chii
- Subjects
Electrical Engineering and Systems Science - Signal Processing ,Computer Science - Machine Learning ,Quantitative Biology - Neurons and Cognition - Abstract
This paper presents an efficient deep learning solution for decoding motor movements from neural recordings in non-human primates. An Autoencoder Gated Recurrent Unit (AEGRU) model was adopted as the model architecture for this task. The autoencoder is only used during the training stage to achieve better generalization. Together with the preprocessing techniques, our model achieved 0.71 $R^2$ score, surpassing the baseline models in Neurobench and is ranked first for $R^2$ in the IEEE BioCAS 2024 Grand Challenge on Neural Decoding. Model pruning is also applied leading to a reduction of 41.4% of the multiply-accumulate (MAC) operations with little change in the $R^2$ score compared to the unpruned model.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.