23,826 results on '"Wang, Han"'
Search Results
2. AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge
- Author
-
Wang, Han, Prasad, Archiki, Stengel-Eskin, Elias, and Bansal, Mohit
- Subjects
Computer Science - Computation and Language - Abstract
Knowledge conflict arises from discrepancies between information in the context of a large language model (LLM) and the knowledge stored in its parameters. This can hurt performance when using standard decoding techniques, which tend to ignore the context. Existing test-time contrastive methods seek to address this by comparing the LLM's output distribution with and without the context and adjust the model according to the contrast between them. However, we find that these methods frequently misjudge the degree of conflict and struggle to handle instances that vary in their amount of conflict, with static methods over-adjusting when conflict is absent. We propose a fine-grained, instance-level approach called AdaCAD, which dynamically infers the weight of adjustment based on the degree of conflict, as measured by the Jensen-Shannon divergence between distributions representing contextual and parametric knowledge. Our experiments across four models on six diverse question-answering (QA) datasets and three summarization tasks demonstrate that our training-free adaptive method consistently outperforms other decoding methods on QA, with average accuracy gains of 14.21% (absolute) over a static contrastive baseline, and improves the factuality of summaries by 5.59 (AlignScore). Furthermore, our analysis shows that while decoding with contrastive baselines hurts performance when conflict is absent, AdaCAD mitigates these losses, making it more applicable to real-world datasets in which some examples have conflict and others do not., Comment: 16 pages, Code: https://github.com/HanNight/AdaCAD
- Published
- 2024
3. Reinforcement learning-based adaptive speed controllers in mixed autonomy condition
- Author
-
Wang, Han, Matin, Hossein Nick Zinat, and Monache, Maria Laura Delle
- Subjects
Electrical Engineering and Systems Science - Systems and Control - Abstract
The integration of Automated Vehicles (AVs) into traffic flow holds the potential to significantly improve traffic congestion by enabling AVs to function as actuators within the flow. This paper introduces an adaptive speed controller tailored for scenarios of mixed autonomy, where AVs interact with human-driven vehicles. We model the traffic dynamics using a system of strongly coupled Partial and Ordinary Differential Equations (PDE-ODE), with the PDE capturing the general flow of human-driven traffic and the ODE characterizing the trajectory of the AVs. A speed policy for AVs is derived using a Reinforcement Learning (RL) algorithm structured within an Actor-Critic (AC) framework. This algorithm interacts with the PDE-ODE model to optimize the AV control policy. Numerical simulations are presented to demonstrate the controller's impact on traffic patterns, showing the potential of AVs to improve traffic flow and reduce congestion.
- Published
- 2024
- Full Text
- View/download PDF
4. Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion
- Author
-
Chen, Peiyuan, Zhang, Zecheng, Dong, Yiping, Zhou, Li, and Wang, Han
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Visual Question Answering (VQA) is a challenging task that requires systems to provide accurate answers to questions based on image content. Current VQA models struggle with complex questions due to limitations in capturing and integrating multimodal information effectively. To address these challenges, we propose the Rank VQA model, which leverages a ranking-inspired hybrid training strategy to enhance VQA performance. The Rank VQA model integrates high-quality visual features extracted using the Faster R-CNN model and rich semantic text features obtained from a pre-trained BERT model. These features are fused through a sophisticated multimodal fusion technique employing multi-head self-attention mechanisms. Additionally, a ranking learning module is incorporated to optimize the relative ranking of answers, thus improving answer accuracy. The hybrid training strategy combines classification and ranking losses, enhancing the model's generalization ability and robustness across diverse datasets. Experimental results demonstrate the effectiveness of the Rank VQA model. Our model significantly outperforms existing state-of-the-art models on standard VQA datasets, including VQA v2.0 and COCO-QA, in terms of both accuracy and Mean Reciprocal Rank (MRR). The superior performance of Rank VQA is evident in its ability to handle complex questions that require understanding nuanced details and making sophisticated inferences from the image and text. This work highlights the effectiveness of a ranking-based hybrid training strategy in improving VQA performance and lays the groundwork for further research in multimodal learning methods., Comment: Visual Question Answering, Rank VQA, Faster R-CNN, BERT, Multimodal Fusion, Ranking Learning, Hybrid Training Strategy
- Published
- 2024
5. q-exponential family for policy optimization
- Author
-
Zhu, Lingwei, Shah, Haseeb, Wang, Han, and White, Martha
- Subjects
Computer Science - Machine Learning - Abstract
Policy optimization methods benefit from a simple and tractable policy functional, usually the Gaussian for continuous action spaces. In this paper, we consider a broader policy family that remains tractable: the $q$-exponential family. This family of policies is flexible, allowing the specification of both heavy-tailed policies ($q>1$) and light-tailed policies ($q<1$). This paper examines the interplay between $q$-exponential policies for several actor-critic algorithms conducted on both online and offline problems. We find that heavy-tailed policies are more effective in general and can consistently improve on Gaussian. In particular, we find the Student's t-distribution to be more stable than the Gaussian across settings and that a heavy-tailed $q$-Gaussian for Tsallis Advantage Weighted Actor-Critic consistently performs well in offline benchmark problems. Our code is available at \url{https://github.com/lingweizhu/qexp}., Comment: 27 pages, 12 pages main text, 15 pages appendix
- Published
- 2024
6. Chat-like Asserts Prediction with the Support of Large Language Model
- Author
-
Wang, Han, Hu, Han, Chen, Chunyang, and Turhan, Burak
- Subjects
Computer Science - Software Engineering - Abstract
Unit testing is an essential component of software testing, with the assert statements playing an important role in determining whether the tested function operates as expected. Although research has explored automated test case generation, generating meaningful assert statements remains an ongoing challenge. While several studies have investigated assert statement generation in Java, limited work addresses this task in popular dynamically-typed programming languages like Python. In this paper, we introduce Chat-like execution-based Asserts Prediction (\tool), a novel Large Language Model-based approach for generating meaningful assert statements for Python projects. \tool utilizes the persona, Chain-of-Thought, and one-shot learning techniques in the prompt design, and conducts rounds of communication with LLM and Python interpreter to generate meaningful assert statements. We also present a Python assert statement dataset mined from GitHub. Our evaluation demonstrates that \tool achieves 64.7\% accuracy for single assert statement generation and 62\% for overall assert statement generation, outperforming the existing approaches. We also analyze the mismatched assert statements, which may still share the same functionality and discuss the potential help \tool could offer to the automated Python unit test generation. The findings indicate that \tool has the potential to benefit the SE community through more practical usage scenarios.
- Published
- 2024
7. MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube and Bilibili
- Author
-
Wang, Han, Yang, Tan Rui, Naseem, Usman, and Lee, Roy Ka-Wei
- Subjects
Computer Science - Multimedia ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,I.2.0 - Abstract
Hate speech is a pressing issue in modern society, with significant effects both online and offline. Recent research in hate speech detection has primarily centered on text-based media, largely overlooking multimodal content such as videos. Existing studies on hateful video datasets have predominantly focused on English content within a Western context and have been limited to binary labels (hateful or non-hateful), lacking detailed contextual information. This study presents MultiHateClip1 , an novel multilingual dataset created through hate lexicons and human annotation. It aims to enhance the detection of hateful videos on platforms such as YouTube and Bilibili, including content in both English and Chinese languages. Comprising 2,000 videos annotated for hatefulness, offensiveness, and normalcy, this dataset provides a cross-cultural perspective on gender-based hate speech. Through a detailed examination of human annotation results, we discuss the differences between Chinese and English hateful videos and underscore the importance of different modalities in hateful and offensive video analysis. Evaluations of state-of-the-art video classification models, such as VLM, GPT-4V and Qwen-VL, on MultiHateClip highlight the existing challenges in accurately distinguishing between hateful and offensive content and the urgent need for models that are both multimodally and culturally nuanced. MultiHateClip represents a foundational advance in enhancing hateful video detection by underscoring the necessity of a multimodal and culturally sensitive approach in combating online hate speech., Comment: 10 pages, 3 figures, ACM Multimedia 2024
- Published
- 2024
- Full Text
- View/download PDF
8. BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
- Author
-
Su, Hongjin, Yen, Howard, Xia, Mengzhou, Shi, Weijia, Muennighoff, Niklas, Wang, Han-yu, Liu, Haisu, Shi, Quan, Siegel, Zachary S., Tang, Michael, Sun, Ruoxi, Yoon, Jinsung, Arik, Sercan O., Chen, Danqi, and Yu, Tao
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Information Retrieval - Abstract
Existing retrieval benchmarks primarily consist of information-seeking queries (e.g., aggregated questions from search engines) where keyword or semantic-based retrieval is usually sufficient. However, many complex real-world queries require in-depth reasoning to identify relevant documents that go beyond surface form matching. For example, finding documentation for a coding question requires understanding the logic and syntax of the functions involved. To better benchmark retrieval on such challenging queries, we introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents. BRIGHT is constructed from the 1,398 real-world queries collected from diverse domains (such as economics, psychology, robotics, software engineering, earth sciences, etc.), sourced from naturally occurring or carefully curated human data. Extensive evaluation reveals that even state-of-the-art retrieval models perform poorly on BRIGHT. The leading model on the MTEB leaderboard [38 ], which achieves a score of 59.0 nDCG@10,2 produces a score of nDCG@10 of 18.0 on BRIGHT. We further demonstrate that augmenting queries with Chain-of-Thought reasoning generated by large language models (LLMs) improves performance by up to 12.2 points. Moreover, BRIGHT is robust against data leakage during pretraining of the benchmarked models as we validate by showing similar performance even when documents from the benchmark are included in the training data. We believe that BRIGHT paves the way for future research on retrieval systems in more realistic and challenging settings. Our code and data are available at https://brightbenchmark.github.io., Comment: 50 pages
- Published
- 2024
9. Disentangling Masked Autoencoders for Unsupervised Domain Generalization
- Author
-
Zhang, An, Wang, Han, Wang, Xiang, and Chua, Tat-Seng
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Domain Generalization (DG), designed to enhance out-of-distribution (OOD) generalization, is all about learning invariance against domain shifts utilizing sufficient supervision signals. Yet, the scarcity of such labeled data has led to the rise of unsupervised domain generalization (UDG) - a more important yet challenging task in that models are trained across diverse domains in an unsupervised manner and eventually tested on unseen domains. UDG is fast gaining attention but is still far from well-studied. To close the research gap, we propose a novel learning framework designed for UDG, termed the Disentangled Masked Auto Encoder (DisMAE), aiming to discover the disentangled representations that faithfully reveal the intrinsic features and superficial variations without access to the class label. At its core is the distillation of domain-invariant semantic features, which cannot be distinguished by domain classifier, while filtering out the domain-specific variations (for example, color schemes and texture patterns) that are unstable and redundant. Notably, DisMAE co-trains the asymmetric dual-branch architecture with semantic and lightweight variation encoders, offering dynamic data manipulation and representation level augmentation capabilities. Extensive experiments on four benchmark datasets (i.e., DomainNet, PACS, VLCS, Colored MNIST) with both DG and UDG tasks demonstrate that DisMAE can achieve competitive OOD performance compared with the state-of-the-art DG and UDG baselines, which shed light on potential research line in improving the generalization ability with large-scale unlabeled data.
- Published
- 2024
10. Research, Applications and Prospects of Event-Based Pedestrian Detection: A Survey
- Author
-
Wang, Han, Nie, Yuman, Li, Yun, Liu, Hongjie, Liu, Min, Cheng, Wen, and Wang, Yaoxiong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Event-based cameras, inspired by the biological retina, have evolved into cutting-edge sensors distinguished by their minimal power requirements, negligible latency, superior temporal resolution, and expansive dynamic range. At present, cameras used for pedestrian detection are mainly frame-based imaging sensors, which have suffered from lethargic response times and hefty data redundancy. In contrast, event-based cameras address these limitations by eschewing extraneous data transmissions and obviating motion blur in high-speed imaging scenarios. On pedestrian detection via event-based cameras, this paper offers an exhaustive review of research and applications particularly in the autonomous driving context. Through methodically scrutinizing relevant literature, the paper outlines the foundational principles, developmental trajectory, and the comparative merits and demerits of eventbased detection relative to traditional frame-based methodologies. This review conducts thorough analyses of various event stream inputs and their corresponding network models to evaluate their applicability across diverse operational environments. It also delves into pivotal elements such as crucial datasets and data acquisition techniques essential for advancing this technology, as well as advanced algorithms for processing event stream data. Culminating with a synthesis of the extant landscape, the review accentuates the unique advantages and persistent challenges inherent in event-based pedestrian detection, offering a prognostic view on potential future developments in this fast-progressing field.
- Published
- 2024
11. A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
- Author
-
Lu, Jinghui, Yu, Haiyang, Wang, Yanjie, Ye, Yongjie, Tang, Jingqun, Yang, Ziwei, Wu, Binghong, Liu, Qi, Feng, Hao, Wang, Han, Liu, Hao, and Huang, Can
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Multimedia - Abstract
Recently, many studies have demonstrated that exclusively incorporating OCR-derived text and spatial layouts with large language models (LLMs) can be highly effective for document understanding tasks. However, existing methods that integrate spatial layouts with text have limitations, such as producing overly long text sequences or failing to fully leverage the autoregressive traits of LLMs. In this work, we introduce Interleaving Layout and Text in a Large Language Model (LayTextLLM)} for document understanding. In particular, LayTextLLM projects each bounding box to a single embedding and interleaves it with text, efficiently avoiding long sequence issues while leveraging autoregressive traits of LLMs. LayTextLLM not only streamlines the interaction of layout and textual data but also shows enhanced performance in Key Information Extraction (KIE) and Visual Question Answering (VQA). Comprehensive benchmark evaluations reveal significant improvements, with a 27.2% increase on KIE tasks and 12.0% on VQA tasks compared to previous state-of-the-art document understanding MLLMs, as well as a 15.1% improvement over other SOTA OCR-based LLMs on KIE tasks.
- Published
- 2024
12. Sustained decline in hospitalisations for anogenital warts in Australia: Analysis of national hospital morbidity data 2003-2020
- Author
-
Rashid, Harunor, Dey, Aditi, Wang, Han, and Beard, Frank
- Published
- 2024
13. Silver Linings in the Shadows: Harnessing Membership Inference for Machine Unlearning
- Author
-
Sula, Nexhi, Kumar, Abhinav, Hou, Jie, Wang, Han, and Tourani, Reza
- Subjects
Computer Science - Machine Learning - Abstract
With the continued advancement and widespread adoption of machine learning (ML) models across various domains, ensuring user privacy and data security has become a paramount concern. In compliance with data privacy regulations, such as GDPR, a secure machine learning framework should not only grant users the right to request the removal of their contributed data used for model training but also facilitates the elimination of sensitive data fingerprints within machine learning models to mitigate potential attack - a process referred to as machine unlearning. In this study, we present a novel unlearning mechanism designed to effectively remove the impact of specific data samples from a neural network while considering the performance of the unlearned model on the primary task. In achieving this goal, we crafted a novel loss function tailored to eliminate privacy-sensitive information from weights and activation values of the target model by combining target classification loss and membership inference loss. Our adaptable framework can easily incorporate various privacy leakage approximation mechanisms to guide the unlearning process. We provide empirical evidence of the effectiveness of our unlearning approach with a theoretical upper-bound analysis through a membership inference mechanism as a proof of concept. Our results showcase the superior performance of our approach in terms of unlearning efficacy and latency as well as the fidelity of the primary task, across four datasets and four deep learning architectures., Comment: 17 pages, 14 figures, 6 tables
- Published
- 2024
14. Safe and Stable Filter Design Using a Relaxed Compatibitlity Control Barrier -- Lyapunov Condition
- Author
-
Wang, Han, Margellos, Kostas, and Papachristodoulou, Antonis
- Subjects
Electrical Engineering and Systems Science - Systems and Control ,Mathematics - Optimization and Control - Abstract
In this paper, we propose a quadratic programming-based filter for safe and stable controller design, via a Control Barrier Function (CBF) and a Control Lyapunov Function (CLF). Our method guarantees safety and local asymptotic stability without the need for an asymptotically stabilizing control law. Feasibility of the proposed program is ensured under a mild regularity condition, termed relaxed compatibility between the CLF and CBF. The resulting optimal control law is guaranteed to be locally Lipschitz continuous. We also analyze the closed-loop behaviour by characterizing the equilibrium points, and verifying that there are no equilibrium points in the interior of the control invariant set except at the origin. For a polynomial system and a semi-algebraic safe set, we provide a sum-of-squares program to design a relaxed compatible pair of CLF and CBF. The proposed approach is compared with other methods in the literature using numerical examples, exhibits superior filter performance and guarantees safety and local stability.
- Published
- 2024
15. MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
- Author
-
Zhang, Yuang, Gu, Jiaxi, Wang, Li-Wen, Wang, Han, Cheng, Junqi, Zhu, Yuefeng, and Zou, Fangyuan
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Multimedia - Abstract
In recent years, generative artificial intelligence has achieved significant advancements in the field of image generation, spawning a variety of applications. However, video generation still faces considerable challenges in various aspects, such as controllability, video length, and richness of details, which hinder the application and popularization of this technology. In this work, we propose a controllable video generation framework, dubbed MimicMotion, which can generate high-quality videos of arbitrary length mimicking specific motion guidance. Compared with previous methods, our approach has several highlights. Firstly, we introduce confidence-aware pose guidance that ensures high frame quality and temporal smoothness. Secondly, we introduce regional loss amplification based on pose confidence, which significantly reduces image distortion. Lastly, for generating long and smooth videos, we propose a progressive latent fusion strategy. By this means, we can produce videos of arbitrary length with acceptable resource consumption. With extensive experiments and user studies, MimicMotion demonstrates significant improvements over previous approaches in various aspects. Detailed results and comparisons are available on our project page: https://tencent.github.io/MimicMotion .
- Published
- 2024
16. Towards Cyber Threat Intelligence for the IoT
- Author
-
Iacovazzi, Alfonso, Wang, Han, Butun, Ismail, and Raza, Shahid
- Subjects
Computer Science - Cryptography and Security - Abstract
With the proliferation of digitization and its usage in critical sectors, it is necessary to include information about the occurrence and assessment of cyber threats in an organization's threat mitigation strategy. This Cyber Threat Intelligence (CTI) is becoming increasingly important, or rather necessary, for critical national and industrial infrastructures. Current CTI solutions are rather federated and unsuitable for sharing threat information from low-power IoT devices. This paper presents a taxonomy and analysis of the CTI frameworks and CTI exchange platforms available today. It proposes a new CTI architecture relying on the MISP Threat Intelligence Sharing Platform customized and focusing on IoT environment. The paper also introduces a tailored version of STIX (which we call tinySTIX), one of the most prominent standards adopted for CTI data modeling, optimized for low-power IoT devices using the new lightweight encoding and cryptography solutions. The proposed CTI architecture will be very beneficial for securing IoT networks, especially the ones working in harsh and adversarial environments.
- Published
- 2024
- Full Text
- View/download PDF
17. Topology of the charged AdS black hole in restricted phase space
- Author
-
Wang, Han and Du, Yun-Zhi
- Subjects
High Energy Physics - Theory - Abstract
The local topological properties of black hole systems can be expressed by the winding numbers as the defects. As so far, AdS black hole thermodynamics is often depicted by the dual parameters of $(T,S),~ (P,V), (\Phi, Q)$ in the extended phase space, while there is several study on the black hole thermodynamics in the restricted phase space. In this paper, we analyze the topological properties of the charged AdS black holes in the restricted phase space under the higher dimensions and higher order curvature gravities frame. The results show that the topological number of the charged black hole in the same canonical ensembles is a constant and is independent of the concrete dual thermodynamical parameters. However, the topological number in the grand canonical ensemble is different from that in the canonical ensemble for the same black hole system. Furthermore, these results are independent of the dimension $d$, the highest order $k$ of the Lanczos-Lovelock densities.
- Published
- 2024
- Full Text
- View/download PDF
18. Continuous-Time Digital Twin with Analogue Memristive Neural Ordinary Differential Equation Solver
- Author
-
Chen, Hegan, Yang, Jichang, Chen, Jia, Wang, Songqi, Wang, Shaocong, Wang, Dingchen, Tian, Xinyu, Yu, Yifei, Chen, Xi, Lin, Yinan, He, Yangu, Wu, Xiaoshan, Li, Yi, Zhang, Xinyuan, Lin, Ning, Xu, Meng, Zhang, Xumeng, Wang, Zhongrui, Wang, Han, Shang, Dashan, Liu, Qi, Cheng, Kwang-Ting, and Liu, Ming
- Subjects
Computer Science - Hardware Architecture ,Computer Science - Artificial Intelligence ,Computer Science - Emerging Technologies ,Computer Science - Neural and Evolutionary Computing - Abstract
Digital twins, the cornerstone of Industry 4.0, replicate real-world entities through computer models, revolutionising fields such as manufacturing management and industrial automation. Recent advances in machine learning provide data-driven methods for developing digital twins using discrete-time data and finite-depth models on digital computers. However, this approach fails to capture the underlying continuous dynamics and struggles with modelling complex system behaviour. Additionally, the architecture of digital computers, with separate storage and processing units, necessitates frequent data transfers and Analogue-Digital (A/D) conversion, thereby significantly increasing both time and energy costs. Here, we introduce a memristive neural ordinary differential equation (ODE) solver for digital twins, which is capable of capturing continuous-time dynamics and facilitates the modelling of complex systems using an infinite-depth model. By integrating storage and computation within analogue memristor arrays, we circumvent the von Neumann bottleneck, thus enhancing both speed and energy efficiency. We experimentally validate our approach by developing a digital twin of the HP memristor, which accurately extrapolates its nonlinear dynamics, achieving a 4.2-fold projected speedup and a 41.4-fold projected decrease in energy consumption compared to state-of-the-art digital hardware, while maintaining an acceptable error margin. Additionally, we demonstrate scalability through experimentally grounded simulations of Lorenz96 dynamics, exhibiting projected performance improvements of 12.6-fold in speed and 189.7-fold in energy efficiency relative to traditional digital approaches. By harnessing the capabilities of fully analogue computing, our breakthrough accelerates the development of digital twins, offering an efficient and rapid solution to meet the demands of Industry 4.0., Comment: 14 pages, 4 figures
- Published
- 2024
19. Carbon Market Simulation with Adaptive Mechanism Design
- Author
-
Wang, Han, Li, Wenhao, Zha, Hongyuan, and Wang, Baoxiang
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Multiagent Systems - Abstract
A carbon market is a market-based tool that incentivizes economic agents to align individual profits with the global utility, i.e., reducing carbon emissions to tackle climate change. Cap and trade stands as a critical principle based on allocating and trading carbon allowances (carbon emission credit), enabling economic agents to follow planned emissions and penalizing excess emissions. A central authority is responsible for introducing and allocating those allowances in cap and trade. However, the complexity of carbon market dynamics makes accurate simulation intractable, which in turn hinders the design of effective allocation strategies. To address this, we propose an adaptive mechanism design framework, simulating the market using hierarchical, model-free multi-agent reinforcement learning (MARL). Government agents allocate carbon credits, while enterprises engage in economic activities and carbon trading. This framework illustrates agents' behavior comprehensively. Numerical results show MARL enables government agents to balance productivity, equality, and carbon emissions. Our project is available at https://github.com/xwanghan/Carbon-Simulator., Comment: 10 pages, 4 figures
- Published
- 2024
20. A Multimodal Dangerous State Recognition and Early Warning System for Elderly with Intermittent Dementia
- Author
-
Deng, Liyun, Jin, Lei, Wang, Guangcheng, Shi, Quan, and Wang, Han
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In response to the social issue of the increasing number of elderly vulnerable groups going missing due to the aggravating aging population in China, our team has developed a wearable anti-loss device and intelligent early warning system for elderly individuals with intermittent dementia using artificial intelligence and IoT technology. This system comprises an anti-loss smart helmet, a cloud computing module, and an intelligent early warning application on the caregiver's mobile device. The smart helmet integrates a miniature camera module, a GPS module, and a 5G communication module to collect first-person images and location information of the elderly. Data is transmitted remotely via 5G, FTP, and TCP protocols. In the cloud computing module, our team has proposed for the first time a multimodal dangerous state recognition network based on scene and location information to accurately assess the risk of elderly individuals going missing. Finally, the application software interface designed for the caregiver's mobile device implements multi-level early warnings. The system developed by our team requires no operation or response from the elderly, achieving fully automatic environmental perception, risk assessment, and proactive alarming. This overcomes the limitations of traditional monitoring devices, which require active operation and response, thus avoiding the issue of the digital divide for the elderly. It effectively prevents accidental loss and potential dangers for elderly individuals with dementia., Comment: 13 pages,9 figures
- Published
- 2024
21. Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments
- Author
-
Wang, Han, He, Sihong, Zhang, Zhili, Miao, Fei, and Anderson, James
- Subjects
Computer Science - Machine Learning ,Computer Science - Multiagent Systems ,Mathematics - Optimization and Control - Abstract
We explore a Federated Reinforcement Learning (FRL) problem where $N$ agents collaboratively learn a common policy without sharing their trajectory data. To date, existing FRL work has primarily focused on agents operating in the same or ``similar" environments. In contrast, our problem setup allows for arbitrarily large levels of environment heterogeneity. To obtain the optimal policy which maximizes the average performance across all potentially completely different environments, we propose two algorithms: FedSVRPG-M and FedHAPG-M. In contrast to existing results, we demonstrate that both FedSVRPG-M and FedHAPG-M, both of which leverage momentum mechanisms, can exactly converge to a stationary point of the average performance function, regardless of the magnitude of environment heterogeneity. Furthermore, by incorporating the benefits of variance-reduction techniques or Hessian approximation, both algorithms achieve state-of-the-art convergence results, characterized by a sample complexity of $\mathcal{O}\left(\epsilon^{-\frac{3}{2}}/N\right)$. Notably, our algorithms enjoy linear convergence speedups with respect to the number of agents, highlighting the benefit of collaboration among agents in finding a common policy.
- Published
- 2024
22. LMO-DP: Optimizing the Randomization Mechanism for Differentially Private Fine-Tuning (Large) Language Models
- Author
-
Yang, Qin, Mohammad, Meisam, Wang, Han, Payani, Ali, Kundu, Ashish, Shu, Kai, Yan, Yan, and Hong, Yuan
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Differentially Private Stochastic Gradient Descent (DP-SGD) and its variants have been proposed to ensure rigorous privacy for fine-tuning large-scale pre-trained language models. However, they rely heavily on the Gaussian mechanism, which may overly perturb the gradients and degrade the accuracy, especially in stronger privacy regimes (e.g., the privacy budget $\epsilon < 3$). To address such limitations, we propose a novel Language Model-based Optimal Differential Privacy (LMO-DP) mechanism, which takes the first step to enable the tight composition of accurately fine-tuning (large) language models with a sub-optimal DP mechanism, even in strong privacy regimes (e.g., $0.1\leq \epsilon<3$). Furthermore, we propose a novel offline optimal noise search method to efficiently derive the sub-optimal DP that significantly reduces the noise magnitude. For instance, fine-tuning RoBERTa-large (with 300M parameters) on the SST-2 dataset can achieve an accuracy of 92.20% (given $\epsilon=0.3$, $\delta=10^{-10}$) by drastically outperforming the Gaussian mechanism (e.g., $\sim 50\%$ for small $\epsilon$ and $\delta$). We also draw similar findings on the text generation tasks on GPT-2. Finally, to our best knowledge, LMO-DP is also the first solution to accurately fine-tune Llama-2 with strong differential privacy guarantees. The code will be released soon and available upon request., Comment: 18 pages, 15 figures
- Published
- 2024
23. ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation
- Author
-
Zheng, Jingnan, Wang, Han, Zhang, An, Nguyen, Tai D., Sun, Jun, and Chua, Tat-Seng
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Large Language Models (LLMs) can elicit unintended and even harmful content when misaligned with human values, posing severe risks to users and society. To mitigate these risks, current evaluation benchmarks predominantly employ expert-designed contextual scenarios to assess how well LLMs align with human values. However, the labor-intensive nature of these benchmarks limits their test scope, hindering their ability to generalize to the extensive variety of open-world use cases and identify rare but crucial long-tail risks. Additionally, these static tests fail to adapt to the rapid evolution of LLMs, making it hard to evaluate timely alignment issues. To address these challenges, we propose ALI-Agent, an evaluation framework that leverages the autonomous abilities of LLM-powered agents to conduct in-depth and adaptive alignment assessments. ALI-Agent operates through two principal stages: Emulation and Refinement. During the Emulation stage, ALI-Agent automates the generation of realistic test scenarios. In the Refinement stage, it iteratively refines the scenarios to probe long-tail risks. Specifically, ALI-Agent incorporates a memory module to guide test scenario generation, a tool-using module to reduce human labor in tasks such as evaluating feedback from target LLMs, and an action module to refine tests. Extensive experiments across three aspects of human values--stereotypes, morality, and legality--demonstrate that ALI-Agent, as a general evaluation framework, effectively identifies model misalignment. Systematic analysis also validates that the generated test scenarios represent meaningful use cases, as well as integrate enhanced measures to probe long-tail risks. Our code is available at https://github.com/SophieZheng998/ALI-Agent.git
- Published
- 2024
24. MUD: Towards a Large-Scale and Noise-Filtered UI Dataset for Modern Style UI Modeling
- Author
-
Feng, Sidong, Ma, Suyu, Wang, Han, Kong, David, and Chen, Chunyang
- Subjects
Computer Science - Human-Computer Interaction - Abstract
The importance of computational modeling of mobile user interfaces (UIs) is undeniable. However, these require a high-quality UI dataset. Existing datasets are often outdated, collected years ago, and are frequently noisy with mismatches in their visual representation. This presents challenges in modeling UI understanding in the wild. This paper introduces a novel approach to automatically mine UI data from Android apps, leveraging Large Language Models (LLMs) to mimic human-like exploration. To ensure dataset quality, we employ the best practices in UI noise filtering and incorporate human annotation as a final validation step. Our results demonstrate the effectiveness of LLMs-enhanced app exploration in mining more meaningful UIs, resulting in a large dataset MUD of 18k human-annotated UIs from 3.3k apps. We highlight the usefulness of MUD in two common UI modeling tasks: element detection and UI retrieval, showcasing its potential to establish a foundation for future research into high-quality, modern UIs.
- Published
- 2024
25. Data-Driven Stable Neural Feedback Loop Design
- Author
-
Xiong, Zuxun, Wang, Han, Zhao, Liqun, and Papachristodoulou, Antonis
- Subjects
Mathematics - Optimization and Control - Abstract
This paper proposes a data-driven approach to design a feedforward Neural Network (NN) controller with a stability guarantee for systems with unknown dynamics. We first introduce data-driven representations of stability conditions for Neural Feedback Loops (NFLs) with linear plants. These conditions are then formulated into a semidefinite program (SDP). Subsequently, this SDP constraint is integrated into the NN training process resulting in a stable NN controller. We propose an iterative algorithm to solve this problem efficiently. Finally, we illustrate the effectiveness of the proposed method and its superiority compared to model-based methods via numerical examples.
- Published
- 2024
26. Multivariate Bayesian Last Layer for Regression: Uncertainty Quantification and Disentanglement
- Author
-
Wang, Han, Kawasaki, Eiji, Damblin, Guillaume, and Daniel, Geoffrey
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
We present new Bayesian Last Layer models in the setting of multivariate regression under heteroscedastic noise, and propose an optimization algorithm for parameter learning. Bayesian Last Layer combines Bayesian modelling of the predictive distribution with neural networks for parameterization of the prior, and has the attractive property of uncertainty quantification with a single forward pass. The proposed framework is capable of disentangling the aleatoric and epistemic uncertainty, and can be used to transfer a canonically trained deep neural network to new data domains with uncertainty-aware capability.
- Published
- 2024
27. PVF (Parameter Vulnerability Factor): A Scalable Metric for Understanding AI Vulnerability Against SDCs in Model Parameters
- Author
-
Jiao, Xun, Lin, Fred, Dixit, Harish D., Coburn, Joel, Pandey, Abhinav, Wang, Han, Ramesh, Venkat, Huang, Jianyu, Xu, Wang, Moore, Daniel, and Sankar, Sriram
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence ,Computer Science - Hardware Architecture ,Computer Science - Machine Learning - Abstract
Reliability of AI systems is a fundamental concern for the successful deployment and widespread adoption of AI technologies. Unfortunately, the escalating complexity and heterogeneity of AI hardware systems make them increasingly susceptible to hardware faults, e.g., silent data corruptions (SDC), that can potentially corrupt model parameters. When this occurs during AI inference/servicing, it can potentially lead to incorrect or degraded model output for users, ultimately affecting the quality and reliability of AI services. In light of the escalating threat, it is crucial to address key questions: How vulnerable are AI models to parameter corruptions, and how do different components (such as modules, layers) of the models exhibit varying vulnerabilities to parameter corruptions? To systematically address this question, we propose a novel quantitative metric, Parameter Vulnerability Factor (PVF), inspired by architectural vulnerability factor (AVF) in computer architecture community, aiming to standardize the quantification of AI model vulnerability against parameter corruptions. We define a model parameter's PVF as the probability that a corruption in that particular model parameter will result in an incorrect output. In this paper, we present several use cases on applying PVF to three types of tasks/models during inference -- recommendation (DLRM), vision classification (CNN), and text classification (BERT), while presenting an in-depth vulnerability analysis on DLRM. PVF can provide pivotal insights to AI hardware designers in balancing the tradeoff between fault protection and performance/efficiency such as mapping vulnerable AI parameter components to well-protected hardware modules. PVF metric is applicable to any AI model and has a potential to help unify and standardize AI vulnerability/resilience evaluation practice.
- Published
- 2024
28. Enhancing GUI Exploration Coverage of Android Apps with Deep Link-Integrated Monkey
- Author
-
Hu, Han, Wang, Han, Dong, Ruiqi, Chen, Xiao, and Chen, Chunyang
- Subjects
Computer Science - Software Engineering ,Computer Science - Cryptography and Security - Abstract
Mobile apps are ubiquitous in our daily lives for supporting different tasks such as reading and chatting. Despite the availability of many GUI testing tools, app testers still struggle with low testing code coverage due to tools frequently getting stuck in loops or overlooking activities with concealed entries. This results in a significant amount of testing time being spent on redundant and repetitive exploration of a few GUI pages. To address this, we utilize Android's deep links, which assist in triggering Android intents to lead users to specific pages and introduce a deep link-enhanced exploration method. This approach, integrated into the testing tool Monkey, gives rise to Delm (Deep Link-enhanced Monkey). Delm oversees the dynamic exploration process, guiding the tool out of meaningless testing loops to unexplored GUI pages. We provide a rigorous activity context mock-up approach for triggering existing Android intents to discover more activities with hidden entrances. We conduct experiments to evaluate Delm's effectiveness on activity context mock-up, activity coverage, method coverage, and crash detection. The findings reveal that Delm can mock up more complex activity contexts and significantly outperform state-of-the-art baselines with 27.2\% activity coverage, 21.13\% method coverage, and 23.81\% crash detection.
- Published
- 2024
29. Dflow, a Python framework for constructing cloud-native AI-for-Science workflows
- Author
-
Liu, Xinzijian, Han, Yanbo, Li, Zhuoyuan, Fan, Jiahao, Zhang, Chengqian, Zeng, Jinzhe, Shan, Yifan, Yuan, Yannan, Xu, Wei-Hong, Liu, Yun-Pei, Zhang, Yuzhi, Wen, Tongqi, York, Darrin M., Zhong, Zhicheng, Zheng, Hang, Cheng, Jun, Zhang, Linfeng, and Wang, Han
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
In the AI-for-science era, scientific computing scenarios such as concurrent learning and high-throughput computing demand a new generation of infrastructure that supports scalable computing resources and automated workflow management on both cloud and high-performance supercomputers. Here we introduce Dflow, an open-source Python toolkit designed for scientists to construct workflows with simple programming interfaces. It enables complex process control and task scheduling across a distributed, heterogeneous infrastructure, leveraging containers and Kubernetes for flexibility. Dflow is highly observable and can scale to thousands of concurrent nodes per workflow, enhancing the efficiency of complex scientific computing tasks. The basic unit in Dflow, known as an Operation (OP), is reusable and independent of the underlying infrastructure or context. Dozens of workflow projects have been developed based on Dflow, spanning a wide range of projects. We anticipate that the reusability of Dflow and its components will encourage more scientists to publish their workflows and OP components. These components, in turn, can be adapted and reused in various contexts, fostering greater collaboration and innovation in the scientific community.
- Published
- 2024
30. Anomalous Phonon in Charge-Density-Wave Phase of Kagome Metal CsV3Sb5
- Author
-
Wang, Han-Yu, Bai, Xiao-Cheng, Wu, Wen-Feng, Zeng, Zhi, Liu, Da-Yong, and Zou, Liang-Jian
- Subjects
Condensed Matter - Materials Science ,Condensed Matter - Superconductivity - Abstract
CsV3Sb5, a notable compound within the kagome family, is renowned for its topological and superconducting properties, as well as its detection of local magnetic field and anomalous Hall effect in experiments. However, the origin of this local magnetic field is still veiled. In this study, we employ the first-principles calculations to investigate the atomic vibration in both the pristine and the charge-density-wave phases of CsV$_3$Sb$_5$. Our analysis reveals the presence of ``anomalous phonons" in these structures, these phonon induce the circular vibration of atoms, contributing to the phonon magnetic moments and subsequently to the observed the local magnetic fields. Additionally, we observe that lattice distortion in the charge-density-wave phase amplifies these circular vibrations, resulting in a stronger local magnetic field, particularly from the vanadium atoms. This investigation not only reveals the potential relation between lattice distortion and atomic polarization but also offers a novel idea to understand the origin of local magnetic moment in CsV3Sb5., Comment: 6 pages, 5 figures
- Published
- 2024
31. An Extendable Cloud-Native Alloy Property Explorer
- Author
-
Li, Zhuoyuan, Wen, Tongqi, Zhang, Yuzhi, Liu, Xinzijian, Zhang, Chengqian, Pattamatta, A. S. L. Subrahmanyam, Gong, Xiaoguo, Ye, Beilin, Wang, Han, Zhang, Linfeng, and Srolovitz, David J.
- Subjects
Condensed Matter - Materials Science - Abstract
The ability to rapidly evaluate materials properties through atomistic simulation approaches is the foundation of many new artificial intelligence-based approaches to materials identification and design. This depends on the availability of accurate descriptions of atomic bonding through various forms of interatomic potentials. We present an efficient, robust platform for calculating materials properties, i.e., APEX, the Alloy Property Explorer. APEX enables the rapid evolution of interatomic potential development and optimization, which is of particular importance in fine-tuning new classes of general AI-based foundation models to forms that are readily applicable to impacting materials development. APEX is an open-source, extendable, and cloud-native platform for material property calculations using a range of atomistic simulation methodologies that effectively manages diverse computational resources and is built upon user-friendly features including automatic results visualization, web-based platforms and NoSQL database client. It is designed for expert and non-specialist users, lowers the barrier to entry for interdisciplinary research within the "AI for Materials" framework. We describe the foundation and use of APEX, as well as provide an example of its application to properties of titanium for a wide-range of bonding descriptions.
- Published
- 2024
32. Device Feature based on Graph Fourier Transformation with Logarithmic Processing For Detection of Replay Speech Attacks
- Author
-
He, Mingrui, Xu, Longting, Wang, Han, Zhang, Mingjun, and Das, Rohan Kumar
- Subjects
Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
The most common spoofing attacks on automatic speaker verification systems are replay speech attacks. Detection of replay speech heavily relies on replay configuration information. Previous studies have shown that graph Fourier transform-derived features can effectively detect replay speech but ignore device and environmental noise effects. In this work, we propose a new feature, the graph frequency device cepstral coefficient, derived from the graph frequency domain using a device-related linear transformation. We also introduce two novel representations: graph frequency logarithmic coefficient and graph frequency logarithmic device coefficient. We evaluate our methods using traditional Gaussian mixture model and light convolutional neural network systems as classifiers. On the ASVspoof 2017 V2, ASVspoof 2019 physical access, and ASVspoof 2021 physical access datasets, our proposed features outperform known front-ends, demonstrating their effectiveness for replay speech detection.
- Published
- 2024
33. Multimodal Semantic-Aware Automatic Colorization with Diffusion Prior
- Author
-
Wang, Han, Chai, Xinning, Wang, Yiwen, Zhang, Yuhong, Xie, Rong, and Song, Li
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Colorizing grayscale images offers an engaging visual experience. Existing automatic colorization methods often fail to generate satisfactory results due to incorrect semantic colors and unsaturated colors. In this work, we propose an automatic colorization pipeline to overcome these challenges. We leverage the extraordinary generative ability of the diffusion prior to synthesize color with plausible semantics. To overcome the artifacts introduced by the diffusion prior, we apply the luminance conditional guidance. Moreover, we adopt multimodal high-level semantic priors to help the model understand the image content and deliver saturated colors. Besides, a luminance-aware decoder is designed to restore details and enhance overall visual quality. The proposed pipeline synthesizes saturated colors while maintaining plausible semantics. Experiments indicate that our proposed method considers both diversity and fidelity, surpassing previous methods in terms of perceptual realism and gain most human preference.
- Published
- 2024
34. Designing, simulating, and performing the 100-AV field test for the CIRCLES consortium: Methodology and Implementation of the Largest mobile traffic control experiment to date
- Author
-
Ameli, Mostafa, Mcquade, Sean, Lee, Jonathan W., Bunting, Matthew, Nice, Matthew, Wang, Han, Barbour, William, Weightman, Ryan, Denaro, Chris, Delorenzo, Ryan, Hornstein, Sharon, Davis, Jon F., Timsit, Dan, Wagner, Riley, Xu, Rita, Mahmood, Malaika, Mahmood, Mikail, Monache, Maria Laura Delle, Seibold, Benjamin, Work, Daniel B., Sprinkle, Jonathan, Piccoli, Benedetto, and Bayen, Alexandre M.
- Subjects
Electrical Engineering and Systems Science - Systems and Control - Abstract
Previous controlled experiments on single-lane ring roads have shown that a single partially autonomous vehicle (AV) can effectively mitigate traffic waves. This naturally prompts the question of how these findings can be generalized to field operational, high-density traffic conditions. To address this question, the Congestion Impacts Reduction via CAV-in-the-loop Lagrangian Energy Smoothing (CIRCLES) Consortium conducted MegaVanderTest (MVT), a live traffic control experiment involving 100 vehicles near Nashville, TN, USA. This article is a tutorial for developing analytical and simulation-based tools essential for designing and executing a live traffic control experiment like the MVT. It presents an overview of the proposed roadmap and various procedures used in designing, monitoring, and conducting the MVT, which is the largest mobile traffic control experiment at the time. The design process is aimed at evaluating the impact of the CIRCLES AVs on surrounding traffic. The article discusses the agent-based traffic simulation framework created for this evaluation. A novel methodological framework is introduced to calibrate this microsimulation, aiming to accurately capture traffic dynamics and assess the impact of adding 100 vehicles to existing traffic. The calibration model's effectiveness is verified using data from a six-mile section of Nashville's I-24 highway. The results indicate that the proposed model establishes an effective feedback loop between the optimizer and the simulator, thereby calibrating flow and speed with different spatiotemporal characteristics to minimize the error between simulated and real-world data. Finally, We simulate AVs in multiple scenarios to assess their effect on traffic congestion. This evaluation validates the AV routes, thereby contributing to the execution of a safe and successful live traffic control experiment via AVs.
- Published
- 2024
35. Physical Vapor Deposition of High Mobility P-type Tellurium and its Applications for Gate-tunable van der Waals PN Photodiodes
- Author
-
Huang, Tianyi, Lin, Sen, Zou, Jingyi, Wang, Zexiao, Zhong, Yibai, Li, Jingwei, Wang, Ruixuan, Wang, Han, Li, Qing, Xu, Min, Shen, Sheng, and Zhang, Xu
- Subjects
Physics - Applied Physics - Abstract
Recently tellurium (Te) has attracted resurgent interests due to its p-type characteristics and outstanding ambient environmental stability. Here we present a substrate engineering based physical vapor deposition method to synthesize high-quality Te nanoflakes and achieved a field-effect hole mobility of 1500 cm2/Vs, which is, to the best of our knowledge, the highest among the existing synthesized van der Waals p-type semiconductors. The high mobility Te enables the fabrication of Te/MoS2 pn diodes with highly gate-tunable electronic and optoelectronic characteristics. The Te/MoS2 heterostructure can be used as a visible range photodetector with a current responsivity up to 630 A/W, which is about one order of magnitude higher than the one achieved using p-type Si-MoS2 PN photodiodes. The photo response of the Te/MoS2 heterojunction also exhibits strong gate tunability due to their ultrathin thickness and unique band structures. The successful synthesis of high mobility Te and the enabled Te/MoS2 photodiodes show promise for the development of highly tunable and ultrathin photodetectors.
- Published
- 2024
36. Quantitative homogenization and hydrodynamic limit of non-gradient exclusion process
- Author
-
Funaki, Tadahisa, Gu, Chenlin, and Wang, Han
- Subjects
Mathematics - Probability ,Mathematics - Analysis of PDEs ,82C22, 35B27, 60K35 - Abstract
For the non-gradient exclusion process, we prove the quantitative homogenization in the approximation of the diffusion matrix and the conductivity by local functions. The proof relies on the renormalization approach developed by Armstrong, Kuusi, Mourrat, and Smart, while the new challenge here is the hard core constraint of particle number on every site. Therefore, a coarse-grained method is proposed to lift the configuration to a larger space without exclusion, and a gradient coupling between two systems is applied to capture the spatial cancellation. We then strengthen the convergence rate to be uniform concerning the density and integrate it into the work by Funaki, Uchiyama, and Yau [IMA Vol. Math. Appl., 77 (1996), pp. 1-40.] to yield a quantitative hydrodynamic limit. Our new approach avoids showing the characterization of closed forms and provides stronger results. The extension is discussed for the model in the presence of disorder on the bonds., Comment: 83 pages, 3 figures
- Published
- 2024
37. Active robustness against the detuning-error for Rydberg quantum gates
- Author
-
Hou, Qing-Ling, Wang, Han, and Qian, Jing
- Subjects
Quantum Physics - Abstract
Error suppression to the experimental imperfections is a central challenge for useful quantum computing. Recent studies have shown the advantages of using single-modulated pulses based on optimal control which can realize high-fidelity two-qubit gates in neutral-atom arrays. However, typical optimization only minimizes the ideal gate error in the absence of any decay, which allows the gate to be passively influenced by all error sources leading to an exponential increase of sensitivity when the error becomes larger. In the present work, we propose the realization of two-qubit CZ gates with active robustness against two-photon detuning errors. Our method depends on a modified cost function in numerical optimization for shaping gate pulses, which can minimize, not only the ideal gate error but also the fluctuations of gate infidelity over a wide error range. We introduce a family of Rydberg blockade gates with active robustness towards the impacts of versatile noise sources such as Doppler dephasing and ac Stark shifts. The resulting gates with robust pulses can significantly increase the insensitivity to any type of errors acting on the two-photon detuning, benefiting from a relaxed requirement of colder atomic temperatures or more stable lasers for current experimental technology., Comment: 13 pages, 7 figures,Physical Review Applied in press
- Published
- 2024
38. Asynchronous Heterogeneous Linear Quadratic Regulator Design
- Author
-
Toso, Leonardo F., Wang, Han, and Anderson, James
- Subjects
Mathematics - Optimization and Control - Abstract
We address the problem of designing an LQR controller in a distributed setting, where M similar but not identical systems share their locally computed policy gradient (PG) estimates with a server that aggregates the estimates and computes a controller that, on average, performs well on all systems. Learning in a distributed setting has the potential to offer statistical benefits - multiple datasets can be leveraged simultaneously to produce more accurate policy gradient estimates. However, the interplay of heterogeneous trajectory data and varying levels of local computational power introduce bias to the aggregated PG descent direction, and prevents us from fully exploiting the parallelism in the distributed computation. The latter stems from synchronous aggregation, where straggler systems negatively impact the runtime. To address this, we propose an asynchronous policy gradient algorithm for LQR control design. By carefully controlling the "staleness" in the asynchronous aggregation, we show that the designed controller converges to each system's $\epsilon$-near optimal controller up to a heterogeneity bias. Furthermore, we prove that our asynchronous approach obtains exact local convergence at a sub-linear rate., Comment: Leonardo F. Toso and Han Wang contributed equally to this work
- Published
- 2024
39. Resistive Memory-based Neural Differential Equation Solver for Score-based Diffusion Model
- Author
-
Yang, Jichang, Chen, Hegan, Chen, Jia, Wang, Songqi, Wang, Shaocong, Yu, Yifei, Chen, Xi, Wang, Bo, Zhang, Xinyuan, Cui, Binbin, Li, Yi, Lin, Ning, Xu, Meng, Xu, Xiaoxin, Qi, Xiaojuan, Wang, Zhongrui, Zhang, Xumeng, Shang, Dashan, Wang, Han, Liu, Qi, Cheng, Kwang-Ting, and Liu, Ming
- Subjects
Computer Science - Hardware Architecture ,Computer Science - Artificial Intelligence ,Computer Science - Emerging Technologies ,Computer Science - Neural and Evolutionary Computing - Abstract
Human brains image complicated scenes when reading a novel. Replicating this imagination is one of the ultimate goals of AI-Generated Content (AIGC). However, current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency. This deficiency is rooted in the difference between the brain and digital computers. Digital computers have physically separated storage and processing units, resulting in frequent data transfers during iterative calculations, incurring large time and energy overheads. This issue is further intensified by the conversion of inherently continuous and analog generation dynamics, which can be formulated by neural differential equations, into discrete and digital operations. Inspired by the brain, we propose a time-continuous and analog in-memory neural differential equation solver for score-based diffusion, employing emerging resistive memory. The integration of storage and computation within resistive memory synapses surmount the von Neumann bottleneck, benefiting the generative speed and energy efficiency. The closed-loop feedback integrator is time-continuous, analog, and compact, physically implementing an infinite-depth neural network. Moreover, the software-hardware co-design is intrinsically robust to analog noise. We experimentally validate our solution with 180 nm resistive memory in-memory computing macros. Demonstrating equivalent generative quality to the software baseline, our system achieved remarkable enhancements in generative speed for both unconditional and conditional generation tasks, by factors of 64.8 and 156.5, respectively. Moreover, it accomplished reductions in energy consumption by factors of 5.2 and 4.1. Our approach heralds a new horizon for hardware solutions in edge computing for generative AI applications.
- Published
- 2024
40. Integrating AI in NDE: Techniques, Trends, and Further Directions
- Author
-
Pérez, Eduardo, Ardic, Cemil Emre, Çakıroğlu, Ozan, Jacob, Kevin, Kodera, Sayako, Pompa, Luca, Rachid, Mohamad, Wang, Han, Zhou, Yiming, Zimmer, Cyril, Römer, Florian, and Osman, Ahmad
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
The digital transformation is fundamentally changing our industries, affecting planning, execution as well as monitoring of production processes in a wide range of application fields. With product line-ups becoming more and more versatile and diverse, the necessary inspection and monitoring sparks significant novel requirements on the corresponding Nondestructive Evaluation (NDE) systems. The establishment of increasingly powerful approaches to incorporate Artificial Intelligence (AI) may provide just the needed innovation to solve some of these challenges. In this paper we provide a comprehensive survey about the usage of AI methods in NDE in light of the recent innovations towards NDE 4.0. Since we cannot discuss each NDE modality in one paper, we limit our attention to magnetic methods, ultrasound, thermography, as well as optical inspection. In addition to reviewing recent AI developments in each field, we draw common connections by pointing out NDE-related tasks that have a common underlying mathematical problem and categorizing the state of the art according to the corresponding sub-tasks. In so doing, interdisciplinary connections are drawn that provide a more complete overall picture.
- Published
- 2024
41. Remote-contact catalysis for target-diameter semiconducting carbon nanotube array
- Author
-
Wang, Jiangtao, Zheng, Xudong, Pitner, Gregory, Ji, Xiang, Zhang, Tianyi, Yao, Aijia, Zhu, Jiadi, Palacios, Tomás, Li, Lain-Jong, Wang, Han, and Kong, Jing
- Subjects
Physics - Applied Physics ,Condensed Matter - Materials Science - Abstract
Electrostatic catalysis has been an exciting development in chemical synthesis (beyond enzymes catalysis) in recent years, boosting reaction rates and selectively producing certain reaction products. Most of the studies to date have been focused on using external electric field (EEF) to rearrange the charge distribution in small molecule reactions such as Diels-Alder addition, carbene reaction, etc. However, in order for these EEFs to be effective, a field on the order of 1 V/nm (10 MV/cm) is required, and the direction of the EEF has to be aligned with the reaction axis. Such a large and oriented EEF will be challenging for large-scale implementation, or materials growth with multiple reaction axis or steps. Here, we demonstrate that the energy band at the tip of an individual single-walled carbon nanotube (SWCNT) can be spontaneously shifted in a high-permittivity growth environment, with its other end in contact with a low-work function electrode (e.g., hafnium carbide or titanium carbide). By adjusting the Fermi level at a point where there is a substantial disparity in the density of states (DOS) between semiconducting (s-) and metallic (m-) SWCNTs, we achieve effective electrostatic catalysis for s-SWCNT growth assisted by a weak EEF perturbation (200V/cm). This approach enables the production of high-purity (99.92%) s-SWCNT horizontal arrays with narrow diameter distribution (0.95+-0.04 nm), targeting the requirement of advanced SWCNT-based electronics for future computing. These findings highlight the potential of electrostatic catalysis in precise materials growth, especially for s-SWCNTs, and pave the way for the development of advanced SWCNT-based electronics., Comment: 4 figures, 23 pages
- Published
- 2024
42. Understanding 2p core-level excitons of late transition metals by analysis of mixed-valence copper in a metal–organic framework
- Author
-
Wang, Han, Su, Gregory M, Barnett, Brandon R, Drisdell, Walter S, Long, Jeffrey R, and Prendergast, David
- Subjects
Inorganic Chemistry ,Physical Sciences ,Chemical Sciences ,Physical Chemistry ,Affordable and Clean Energy ,Chemical Physics - Abstract
The L2,3-edge X-ray absorption spectra of late transition metals such as Cu, Ag, and Au exhibit absorption onsets lower in energy for higher oxidation states, which is at odds with the measured spectra of earlier transition metals. Time-dependent density functional theory calculations for Cu2+/Cu+ reveal a larger 2p core-exciton binding energy for Cu2+, overshadowing shifts in single-particle excitation energies with respect to Cu+. We explore this phenomenon in a Cu+ metal-organic framework with ∼12% Cu2+ defects and find that corrections with self-consistent excited-state total energy differences provide accurate XAS peak alignment.
- Published
- 2024
43. Elysium: Exploring Object-level Perception in Videos via MLLM
- Author
-
Wang, Han, Wang, Yanjie, Ye, Yongjie, Nie, Yuxiang, and Huang, Can
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Multi-modal Large Language Models (MLLMs) have demonstrated their ability to perceive objects in still images, but their application in video-related tasks, such as object tracking, remains understudied. This lack of exploration is primarily due to two key challenges. Firstly, extensive pretraining on large-scale video datasets is required to equip MLLMs with the capability to perceive objects across multiple frames and understand inter-frame relationships. Secondly, processing a large number of frames within the context window of Large Language Models (LLMs) can impose a significant computational burden. To address the first challenge, we introduce ElysiumTrack-1M, a large-scale video dataset supported for three tasks: Single Object Tracking (SOT), Referring Single Object Tracking (RSOT), and Video Referring Expression Generation (Video-REG). ElysiumTrack-1M contains 1.27 million annotated video frames with corresponding object boxes and descriptions. Leveraging this dataset, we conduct training of MLLMs and propose a token-compression model T-Selector to tackle the second challenge. Our proposed approach, Elysium: Exploring Object-level Perception in Videos via MLLM, is an end-to-end trainable MLLM that attempts to conduct object-level tasks in videos without requiring any additional plug-in or expert models. All codes and datasets are available at https://github.com/Hon-Wong/Elysium.
- Published
- 2024
44. Convex Co-Design of Control Barrier Function and Safe Feedback Controller Under Input Constraints
- Author
-
Wang, Han, Margellos, Kostas, Papachristodoulou, Antonis, and De Persis, Claudio
- Subjects
Mathematics - Optimization and Control ,Electrical Engineering and Systems Science - Systems and Control - Abstract
We study the problem of co-designing control barrier functions (CBF) and linear state feedback controllers for continuous-time linear systems. We achieve this by means of a single semi-definite optimization program. Our formulation can handle mixed-relative degree problems without requiring an explicit safe controller. Different L-norm based input limitations can be introduced as convex constraints in the proposed program. We demonstrate our results on an omni-directional car numerical example., Comment: manuscript submitted to TAC
- Published
- 2024
45. A Simple Mixture Policy Parameterization for Improving Sample Efficiency of CVaR Optimization
- Author
-
Luo, Yudong, Pan, Yangchen, Wang, Han, Torr, Philip, and Poupart, Pascal
- Subjects
Computer Science - Machine Learning ,Mathematics - Optimization and Control - Abstract
Reinforcement learning algorithms utilizing policy gradients (PG) to optimize Conditional Value at Risk (CVaR) face significant challenges with sample inefficiency, hindering their practical applications. This inefficiency stems from two main facts: a focus on tail-end performance that overlooks many sampled trajectories, and the potential of gradient vanishing when the lower tail of the return distribution is overly flat. To address these challenges, we propose a simple mixture policy parameterization. This method integrates a risk-neutral policy with an adjustable policy to form a risk-averse policy. By employing this strategy, all collected trajectories can be utilized for policy updating, and the issue of vanishing gradients is counteracted by stimulating higher returns through the risk-neutral component, thus lifting the tail and preventing flatness. Our empirical study reveals that this mixture parameterization is uniquely effective across a variety of benchmark domains. Specifically, it excels in identifying risk-averse CVaR policies in some Mujoco environments where the traditional CVaR-PG fails to learn a reasonable policy., Comment: RLC 2024
- Published
- 2024
46. Efficient Convolutional Forward Modeling and Sparse Coding in Multichannel Imaging
- Author
-
Wang, Han, Kvich, Yhonatan, Pérez, Eduardo, Römer, Florian, and Eldar, Yonina C.
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
This study considers the Block-Toeplitz structural properties inherent in traditional multichannel forward model matrices, using Full Matrix Capture (FMC) in ultrasonic testing as a case study. We propose an analytical convolutional forward model that transforms reflectivity maps into FMC data. Our findings demonstrate that the convolutional model excels over its matrix-based counterpart in terms of computational efficiency and storage requirements. This accelerated forward modeling approach holds significant potential for various inverse problems, notably enhancing Sparse Signal Recovery (SSR) within the context LASSO regression, which facilitates efficient Convolutional Sparse Coding (CSC) algorithms. Additionally, we explore the integration of Convolutional Neural Networks (CNNs) for the forward model, employing deep unfolding to implement the Learned Block Convolutional ISTA (BC-LISTA)., Comment: 5 pages, 7 figures, accepted by EUSIPCO-2024
- Published
- 2024
47. Performance assessment of the effective core potentials under the Fermionic neural network: first and second row elements
- Author
-
Wang, Mengsa, Zhou, Yuzhi, and Wang, Han
- Subjects
Physics - Computational Physics ,Physics - Chemical Physics - Abstract
The rapid development of deep learning techniques has driven the emergence of a neural network-based variational Monte Carlo method (referred to as FermiNet), which has manifested high accuracy and strong predictive power in the electronic structure calculations of atoms, molecules as well as some periodic systems. Recently, the implementation of the effective core potential (ECP) scheme in it further facilitates more efffcient calculations in practice. But there still lack comprehensive assessments on the ECP's performance under the FermiNet. In this work, we set sail to ffll this gap by conducting extensive tests on the ffrst two row elements regarding their atomic spectral and molecular properties. Our major ffnding is that in general the qualities of ECPs have been correctly reffected under the FermiNet. Two recently built ECP tables, namely ccECP and eCEPP, seem to prevail on the overall performance. Speciffcally, ccECP performs slightly better on the spectral precision and covers more elements, while eCEPP is more systematically built from both shape and energy consistency, and better treats the core polarisation. On the other hand, the high accuracy of the all-electron calculations is hindered by the absence of relativistic effects as well as the numerical instabilities in some heavier elements. Finally, with further in-depth discussions, we generate possible directions for developing and improving the FermiNet in the near future.
- Published
- 2024
- Full Text
- View/download PDF
48. A Multimodal Fusion Network For Student Emotion Recognition Based on Transformer and Tensor Product
- Author
-
Xiang, Ao, Qi, Zongqing, Wang, Han, Yang, Qin, and Ma, Danqing
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
This paper introduces a new multi-modal model based on the Transformer architecture and tensor product fusion strategy, combining BERT's text vectors and ViT's image vectors to classify students' psychological conditions, with an accuracy of 93.65%. The purpose of the study is to accurately analyze the mental health status of students from various data sources. This paper discusses modal fusion methods, including early, late and intermediate fusion, to overcome the challenges of integrating multi-modal information. Ablation studies compare the performance of different models and fusion techniques, showing that the proposed model outperforms existing methods such as CLIP and ViLBERT in terms of accuracy and inference speed. Conclusions indicate that while this model has significant advantages in emotion recognition, its potential to incorporate other data modalities provides areas for future research.
- Published
- 2024
49. Electrically Programmable Pixelated Graphene-Integrated Plasmonic Metasurfaces for Coherent Mid-Infrared Emission
- Author
-
Liu, Xiu, Zhong, Yibai, Wang, Zexiao, Huang, Tianyi, Lin, Sen, Zou, Jingyi, Wang, Haozhe, Wang, Zhien, Li, Zhuo, Luo, Xiao, Cheng, Rui, Li, Jiayu, Yun, Hyeong Seok, Wang, Han, Kong, Jing, Zhang, Xu, and Shen, Sheng
- Subjects
Physics - Optics ,Physics - Applied Physics - Abstract
Active metasurfaces have recently emerged as compact, lightweight, and efficient platforms for dynamic control of electromagnetic fields and optical responses. However, the complexities associated with their post-fabrication tunability significantly hinder their widespread applications, especially for the mid-infrared range due to material scarcity and design intricacy. Here, we experimentally demonstrate highly dynamic, pixelated modulations of coherent mid-infrared emission based on an electrically programmable plasmonic metasurface integrated with graphene field effect transistors (Gr-FETs). The ultrabroad infrared transparency of graphene allows for free-form control over plasmonic meta-atoms, thus achieving coherent mid-infrared states across a broad range of wavelengths and polarizations. The spatial temperature modulation generated by Gr-FETs is effectively synergized with the emissivity control by the localized surface plasmon polaritons from gold nanoantennas. This integrated temperature-emissivity modulation of metasurfaces is systematically extended to form a pixelated 2D array, envisioning new approaches toward scalable 2D electrical wiring for densely packed, independently controlled pixels., Comment: Needs more updates for the experiments
- Published
- 2024
50. Schema-Aware Multi-Task Learning for Complex Text-to-SQL
- Author
-
Wu, Yangjun and Wang, Han
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Databases - Abstract
Conventional text-to-SQL parsers are not good at synthesizing complex SQL queries that involve multiple tables or columns, due to the challenges inherent in identifying the correct schema items and performing accurate alignment between question and schema items. To address the above issue, we present a schema-aware multi-task learning framework (named MTSQL) for complicated SQL queries. Specifically, we design a schema linking discriminator module to distinguish the valid question-schema linkings, which explicitly instructs the encoder by distinctive linking relations to enhance the alignment quality. On the decoder side, we define 6-type relationships to describe the connections between tables and columns (e.g., WHERE_TC), and introduce an operator-centric triple extractor to recognize those associated schema items with the predefined relationship. Also, we establish a rule set of grammar constraints via the predicted triples to filter the proper SQL operators and schema items during the SQL generation. On Spider, a cross-domain challenging text-to-SQL benchmark, experimental results indicate that MTSQL is more effective than baselines, especially in extremely hard scenarios. Moreover, further analyses verify that our approach leads to promising improvements for complicated SQL queries., Comment: 8pages
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.