Author: "WANG, Ye" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"WANG, Ye"' showing total 16,401 results

Start Over Author "WANG, Ye"

16,401 results on '"WANG, Ye"'

1. Visualizing a Framework in Teaching Literacy to Filipino Deaf Students in Multimodal Learning Spaces

Author: Francisco, Marian Patricia Bea U., Sulse, Leonides D., and Wang, Ye
Published: 2024
Full Text: View/download PDF

2. Quo Vadis, Motion Generation? From Large Language Models to Large Motion Models

Author: Wang, Ye, Zheng, Sipeng, Cao, Bin, Wei, Qianshan, Jin, Qin, and Lu, Zongqing
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Inspired by the recent success of LLMs, the field of human motion understanding has increasingly shifted towards the development of large motion models. Despite some progress, current state-of-the-art works remain far from achieving truly generalist models, largely due to the lack of large-scale, high-quality motion data. To address this, we present MotionBase, the first million-level motion generation benchmark, offering 15 times the data volume of the previous largest dataset, and featuring multimodal data with hierarchically detailed text descriptions. By leveraging this vast dataset, our large motion model demonstrates strong performance across a broad range of motions, including unseen ones. Through systematic investigation, we underscore the importance of scaling both data and model size, with synthetic data and pseudo labels playing a crucial role in mitigating data acquisition costs. Moreover, our research reveals the limitations of existing evaluation metrics, particularly in handling out-of-domain text instructions -- an issue that has long been overlooked. In addition to these, we introduce a novel 2D lookup-free approach for motion tokenization, which preserves motion information and expands codebook capacity, further enhancing the representative ability of large motion models. The release of MotionBase and the insights gained from this study are expected to pave the way for the development of more powerful and versatile motion generation models.
Published: 2024

3. Dirichlet-Based Coarse-to-Fine Example Selection For Open-Set Annotation

Author: Wang, Ye-Wen, Zong, Chen-Chen, Xie, Ming-Kun, and Huang, Sheng-Jun
Subjects: Computer Science - Artificial Intelligence
Abstract: Active learning (AL) has achieved great success by selecting the most valuable examples from unlabeled data. However, they usually deteriorate in real scenarios where open-set noise gets involved, which is studied as open-set annotation (OSA). In this paper, we owe the deterioration to the unreliable predictions arising from softmax-based translation invariance and propose a Dirichlet-based Coarse-to-Fine Example Selection (DCFS) strategy accordingly. Our method introduces simplex-based evidential deep learning (EDL) to break translation invariance and distinguish known and unknown classes by considering evidence-based data and distribution uncertainty simultaneously. Furthermore, hard known-class examples are identified by model discrepancy generated from two classifier heads, where we amplify and alleviate the model discrepancy respectively for unknown and known classes. Finally, we combine the discrepancy with uncertainties to form a two-stage strategy, selecting the most informative examples from known classes. Extensive experiments on various openness ratio datasets demonstrate that DCFS achieves state-of-art performance.
Published: 2024

4. Convolutional Dictionary Learning Based Hybrid-Field Channel Estimation for XL-RIS-Aided Massive MIMO Systems

Author: Zheng, Peicong, Lyu, Xuantao, Wang, Ye, and Gong, Yi
Subjects: Electrical Engineering and Systems Science - Signal Processing
Abstract: Extremely large reconfigurable intelligent surface (XL-RIS) is emerging as a promising key technology for 6G systems. To exploit XL-RIS's full potential, accurate channel estimation is essential. This paper investigates channel estimation in XL-RIS-aided massive MIMO systems under hybrid-field scenarios where far-field and near-field channels coexist. We formulate this problem using dictionary learning, which allows for joint optimization of the dictionary and estimated channel. To handle the high-dimensional nature of XL-RIS channels, we specifically adopt a convolutional dictionary learning (CDL) formulation. The CDL formulation is cast as a bilevel optimization problem, which we solve using a gradient-based approach. To address the challenge of computing the gradient of the upper-level objective, we introduce an unrolled optimization method based on proximal gradient descent (PGD) and its special case, the iterative soft-thresholding algorithm (ISTA). We propose two neural network architectures, Convolutional ISTA-Net and its enhanced version Convolutional ISTA-Net+, for end-to-end optimization of the CDL. To overcome the limitations of linear convolutional filters in capturing complex hybrid-field channel structures, we propose the CNN-CDL approach, which enhances PGD by replacing linear convolution filters with CNN blocks in its gradient descent step, employing a learnable proximal mapping module in its proximal mapping step, and incorporating cross-layer feature integration. Simulation results demonstrate the effectiveness of the proposed methods for channel estimation in hybrid-field XL-RIS systems.
Published: 2024

5. Knowledge Adaptation Network for Few-Shot Class-Incremental Learning

Author: Wang, Ye, Wang, Yaxiong, Zhao, Guoshuai, and Qian, Xueming
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Few-shot class-incremental learning (FSCIL) aims to incrementally recognize new classes using a few samples while maintaining the performance on previously learned classes. One of the effective methods to solve this challenge is to construct prototypical evolution classifiers. Despite the advancement achieved by most existing methods, the classifier weights are simply initialized using mean features. Because representations for new classes are weak and biased, we argue such a strategy is suboptimal. In this paper, we tackle this issue from two aspects. Firstly, thanks to the development of foundation models, we employ a foundation model, the CLIP, as the network pedestal to provide a general representation for each class. Secondly, to generate a more reliable and comprehensive instance representation, we propose a Knowledge Adapter (KA) module that summarizes the data-specific knowledge from training data and fuses it into the general representation. Additionally, to tune the knowledge learned from the base classes to the upcoming classes, we propose a mechanism of Incremental Pseudo Episode Learning (IPEL) by simulating the actual FSCIL. Taken together, our proposed method, dubbed as Knowledge Adaptation Network (KANet), achieves competitive performance on a wide range of datasets, including CIFAR100, CUB200, and ImageNet-R., Comment: 13 pages;6 figures
Published: 2024

6. Exploring User-level Gradient Inversion with a Diffusion Prior

Author: Li, Zhuohang, Lowy, Andrew, Liu, Jing, Koike-Akino, Toshiaki, Malin, Bradley, Parsons, Kieran, and Wang, Ye
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: We explore user-level gradient inversion as a new attack surface in distributed learning. We first investigate existing attacks on their ability to make inferences about private information beyond training data reconstruction. Motivated by the low reconstruction quality of existing methods, we propose a novel gradient inversion attack that applies a denoising diffusion model as a strong image prior in order to enhance recovery in the large batch setting. Unlike traditional attacks, which aim to reconstruct individual samples and suffer at large batch and image sizes, our approach instead aims to recover a representative image that captures the sensitive shared semantic information corresponding to the underlying user. Our experiments with face images demonstrate the ability of our methods to recover realistic facial images along with private user attributes., Comment: Presented at the International Workshop on Federated Learning in the Age of Foundation Models in conjunction with NeurIPS 2023
Published: 2024

7. Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage

Author: Rashid, Md Rafi Ur, Liu, Jing, Koike-Akino, Toshiaki, Mehnaz, Shagufta, and Wang, Ye
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security
Abstract: Fine-tuning large language models on private data for downstream applications poses significant privacy risks in potentially exposing sensitive information. Several popular community platforms now offer convenient distribution of a large variety of pre-trained models, allowing anyone to publish without rigorous verification. This scenario creates a privacy threat, as pre-trained models can be intentionally crafted to compromise the privacy of fine-tuning datasets. In this study, we introduce a novel poisoning technique that uses model-unlearning as an attack tool. This approach manipulates a pre-trained language model to increase the leakage of private data during the fine-tuning process. Our method enhances both membership inference and data extraction attacks while preserving model utility. Experimental results across different models, datasets, and fine-tuning setups demonstrate that our attacks significantly surpass baseline performance. This work serves as a cautionary note for users who download pre-trained models from unverified sources, highlighting the potential risks involved.
Published: 2024

8. Analyzing Inference Privacy Risks Through Gradients in Machine Learning

Author: Li, Zhuohang, Lowy, Andrew, Liu, Jing, Koike-Akino, Toshiaki, Parsons, Kieran, Malin, Bradley, and Wang, Ye
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security, Statistics - Machine Learning
Abstract: In distributed learning settings, models are iteratively updated with shared gradients computed from potentially sensitive user data. While previous work has studied various privacy risks of sharing gradients, our paper aims to provide a systematic approach to analyze private information leakage from gradients. We present a unified game-based framework that encompasses a broad range of attacks including attribute, property, distributional, and user disclosures. We investigate how different uncertainties of the adversary affect their inferential power via extensive experiments on five datasets across various data modalities. Our results demonstrate the inefficacy of solely relying on data aggregation to achieve privacy against inference attacks in distributed learning. We further evaluate five types of defenses, namely, gradient pruning, signed gradient descent, adversarial perturbations, variational information bottleneck, and differential privacy, under both static and adaptive adversary settings. We provide an information-theoretic view for analyzing the effectiveness of these defenses against inference from gradients. Finally, we introduce a method for auditing attribute inference privacy, improving the empirical estimation of worst-case privacy through crafting adversarial canary records.
Published: 2024

9. Unlocking Potential in Pre-Trained Music Language Models for Versatile Multi-Track Music Arrangement

Author: Ou, Longshen, Zhao, Jingwei, Wang, Ziyu, Xia, Gus, and Wang, Ye
Subjects: Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Large language models have shown significant capabilities across various domains, including symbolic music generation. However, leveraging these pre-trained models for controllable music arrangement tasks, each requiring different forms of musical information as control, remains a novel challenge. In this paper, we propose a unified sequence-to-sequence framework that enables the fine-tuning of a symbolic music language model for multiple multi-track arrangement tasks, including band arrangement, piano reduction, drum arrangement, and voice separation. Our experiments demonstrate that the proposed approach consistently achieves higher musical quality compared to task-specific baselines across all four tasks. Furthermore, through additional experiments on probing analysis, we show the pre-training phase equips the model with essential knowledge to understand musical conditions, which is hard to acquired solely through task-specific fine-tuning., Comment: Submitted to AAAI 2025
Published: 2024

10. Advancing Adversarial Suffix Transfer Learning on Aligned Large Language Models

Author: Liu, Hongfu, Xie, Yuxi, Wang, Ye, and Shieh, Michael
Subjects: Computer Science - Computation and Language, Computer Science - Cryptography and Security, Computer Science - Machine Learning
Abstract: Language Language Models (LLMs) face safety concerns due to potential misuse by malicious users. Recent red-teaming efforts have identified adversarial suffixes capable of jailbreaking LLMs using the gradient-based search algorithm Greedy Coordinate Gradient (GCG). However, GCG struggles with computational inefficiency, limiting further investigations regarding suffix transferability and scalability across models and data. In this work, we bridge the connection between search efficiency and suffix transferability. We propose a two-stage transfer learning framework, DeGCG, which decouples the search process into behavior-agnostic pre-searching and behavior-relevant post-searching. Specifically, we employ direct first target token optimization in pre-searching to facilitate the search process. We apply our approach to cross-model, cross-data, and self-transfer scenarios. Furthermore, we introduce an interleaved variant of our approach, i-DeGCG, which iteratively leverages self-transferability to accelerate the search process. Experiments on HarmBench demonstrate the efficiency of our approach across various models and domains. Notably, our i-DeGCG outperforms the baseline on Llama2-chat-7b with ASRs of $43.9$ ($+22.2$) and $39.0$ ($+19.5$) on valid and test sets, respectively. Further analysis on cross-model transfer indicates the pivotal role of first target token optimization in leveraging suffix transferability for efficient searching., Comment: Accepted to the EMNLP 2024
Published: 2024

11. Quantifying the Blockchain Trilemma: A Comparative Analysis of Algorand, Ethereum 2.0, and Beyond

Author: Fu, Yihang, Jing, Mingwei, Zhou, Jiaolun, Wu, Peilin, Wang, Ye, Zhang, Luyao, and Hu, Chuang
Subjects: Economics - General Economics, Computer Science - Computational Engineering, Finance, and Science, Computer Science - Cryptography and Security, Quantitative Finance - Computational Finance, Statistics - Computation
Abstract: Blockchain technology is essential for the digital economy and metaverse, supporting applications from decentralized finance to virtual assets. However, its potential is constrained by the "Blockchain Trilemma," which necessitates balancing decentralization, security, and scalability. This study evaluates and compares two leading proof-of-stake (PoS) systems, Algorand and Ethereum 2.0, against these critical metrics. Our research interprets existing indices to measure decentralization, evaluates scalability through transactional data, and assesses security by identifying potential vulnerabilities. Utilizing real-world data, we analyze each platform's strategies in a structured manner to understand their effectiveness in addressing trilemma challenges. The findings highlight each platform's strengths and propose general methodologies for evaluating key blockchain characteristics applicable to other systems. This research advances the understanding of blockchain technologies and their implications for the future digital economy. Data and code are available on GitHub as open source.
Published: 2024

12. Variational Randomized Smoothing for Sample-Wise Adversarial Robustness

Author: Hase, Ryo, Wang, Ye, Koike-Akino, Toshiaki, Liu, Jing, and Parsons, Kieran
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security, Statistics - Machine Learning
Abstract: Randomized smoothing is a defensive technique to achieve enhanced robustness against adversarial examples which are small input perturbations that degrade the performance of neural network models. Conventional randomized smoothing adds random noise with a fixed noise level for every input sample to smooth out adversarial perturbations. This paper proposes a new variational framework that uses a per-sample noise level suitable for each input by introducing a noise level selector. Our experimental results demonstrate enhancement of empirical robustness against adversarial attacks. We also provide and analyze the certified robustness for our sample-wise smoothing method., Comment: 20 pages, under preparation
Published: 2024

13. Random Channel Ablation for Robust Hand Gesture Classification with Multimodal Biosignals

Author: Bimbraw, Keshav, Liu, Jing, Wang, Ye, and Koike-Akino, Toshiaki
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Biosignal-based hand gesture classification is an important component of effective human-machine interaction. For multimodal biosignal sensing, the modalities often face data loss due to missing channels in the data which can adversely affect the gesture classification performance. To make the classifiers robust to missing channels in the data, this paper proposes using Random Channel Ablation (RChA) during the training process. Ultrasound and force myography (FMG) data were acquired from the forearm for 12 hand gestures over 2 subjects. The resulting multimodal data had 16 total channels, 8 for each modality. The proposed method was applied to convolutional neural network architecture, and compared with baseline, imputation, and oracle methods. Using 5-fold cross-validation for the two subjects, on average, 12.2% and 24.5% improvement was observed for gesture classification with up to 4 and 8 missing channels respectively compared to the baseline. Notably, the proposed method is also robust to an increase in the number of missing channels compared to other methods. These results show the efficacy of using random channel ablation to improve classifier robustness for multimodal and multi-channel biosignal-based hand gesture classification., Comment: 5 pages, 4 figures
Published: 2024

14. GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM

Author: Bimbraw, Keshav, Wang, Ye, Liu, Jing, and Koike-Akino, Toshiaki
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction, Computer Science - Machine Learning
Abstract: Large vision-language models (LVLMs), such as the Generative Pre-trained Transformer 4-omni (GPT-4o), are emerging multi-modal foundation models which have great potential as powerful artificial-intelligence (AI) assistance tools for a myriad of applications, including healthcare, industrial, and academic sectors. Although such foundation models perform well in a wide range of general tasks, their capability without fine-tuning is often limited in specialized tasks. However, full fine-tuning of large foundation models is challenging due to enormous computation/memory/dataset requirements. We show that GPT-4o can decode hand gestures from forearm ultrasound data even with no fine-tuning, and improves with few-shot, in-context learning., Comment: 8 pages, 9 figures
Published: 2024

15. KUNPENG: An Embodied Large Model for Intelligent Maritime

Author: Wang, Naiyao, Jiang, Tongbang, Wang, Ye, Qiu, Shaoyang, Zhang, Bo, Xie, Xinqiang, Li, Munan, Wang, Chunliu, Wang, Yiyang, Ren, Hongxiang, Wang, Ruili, Shan, Hongjun, and Liu, Hongbo
Subjects: Computer Science - Artificial Intelligence
Abstract: Intelligent maritime, as an essential component of smart ocean construction, deeply integrates advanced artificial intelligence technology and data analysis methods, which covers multiple aspects such as smart vessels, route optimization, safe navigation, aiming to enhance the efficiency of ocean resource utilization and the intelligence of transportation networks. However, the complex and dynamic maritime environment, along with diverse and heterogeneous large-scale data sources, present challenges for real-time decision-making in intelligent maritime. In this paper, We propose KUNPENG, the first-ever embodied large model for intelligent maritime in the smart ocean construction, which consists of six systems. The model perceives multi-source heterogeneous data for the cognition of environmental interaction and make autonomous decision strategies, which are used for intelligent vessels to perform navigation behaviors under safety and emergency guarantees and continuously optimize power to achieve embodied intelligence in maritime. In comprehensive maritime task evaluations, KUNPENG has demonstrated excellent performance., Comment: 9 pages, 3 figures
Published: 2024

16. Planning with Large Language Models for Conversational Agents

Author: Li, Zhigen, Peng, Jianxiang, Wang, Yanmeng, Shen, Tianhao, Zhang, Minghui, Su, Linxi, Wu, Shang, Wu, Yihang, Wang, Yuqian, Wang, Ye, Hu, Wei, Li, Jianfeng, Wang, Shaojun, Xiao, Jing, and Xiong, Deyi
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Controllability and proactivity are crucial properties of autonomous conversational agents (CAs). Controllability requires the CAs to follow the standard operating procedures (SOPs), such as verifying identity before activating credit cards. Proactivity requires the CAs to guide the conversation towards the goal during user uncooperation, such as persuasive dialogue. Existing research cannot be unified with controllability, proactivity, and low manual annotation. To bridge this gap, we propose a new framework for planning-based conversational agents (PCA) powered by large language models (LLMs), which only requires humans to define tasks and goals for the LLMs. Before conversation, LLM plans the core and necessary SOP for dialogue offline. During the conversation, LLM plans the best action path online referring to the SOP, and generates responses to achieve process controllability. Subsequently, we propose a semi-automatic dialogue data creation framework and curate a high-quality dialogue dataset (PCA-D). Meanwhile, we develop multiple variants and evaluation metrics for PCA, e.g., planning with Monte Carlo Tree Search (PCA-M), which searches for the optimal dialogue action while satisfying SOP constraints and achieving the proactive of the dialogue. Experiment results show that LLMs finetuned on PCA-D can significantly improve the performance and generalize to unseen domains. PCA-M outperforms other CoT and ToT baselines in terms of conversation controllability, proactivity, task success rate, and overall logical coherence, and is applicable in industry dialogue scenarios. The dataset and codes are available at XXXX.
Published: 2024

17. QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds

Author: Wang, Ye, Mei, Yuting, Zheng, Sipeng, and Jin, Qin
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: While pets offer companionship, their limited intelligence restricts advanced reasoning and autonomous interaction with humans. Considering this, we propose QuadrupedGPT, a versatile agent designed to master a broad range of complex tasks with agility comparable to that of a pet. To achieve this goal, the primary challenges include: i) effectively leveraging multimodal observations for decision-making; ii) mastering agile control of locomotion and path planning; iii) developing advanced cognition to execute long-term objectives. QuadrupedGPT processes human command and environmental contexts using a large multimodal model (LMM). Empowered by its extensive knowledge base, our agent autonomously assigns appropriate parameters for adaptive locomotion policies and guides the agent in planning a safe but efficient path towards the goal, utilizing semantic-aware terrain analysis. Moreover, QuadrupedGPT is equipped with problem-solving capabilities that enable it to decompose long-term goals into a sequence of executable subgoals through high-level reasoning. Extensive experiments across various benchmarks confirm that QuadrupedGPT can adeptly handle multiple tasks with intricate instructions, demonstrating a significant step towards the versatile quadruped agents in open-ended worlds. Our website and codes can be found at https://quadruped-hub.github.io/Quadruped-GPT/., Comment: Under review
Published: 2024

18. Efficient Low-rank Identification via Accelerated Iteratively Reweighted Nuclear Norm Minimization

Author: Wang, Hao, Wang, Ye, and Yang, Xiangyu
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning
Abstract: This paper considers the problem of minimizing the sum of a smooth function and the Schatten-$p$ norm of the matrix. Our contribution involves proposing accelerated iteratively reweighted nuclear norm methods designed for solving the nonconvex low-rank minimization problem. Two major novelties characterize our approach. Firstly, the proposed method possesses a rank identification property, enabling the provable identification of the "correct" rank of the stationary point within a finite number of iterations. Secondly, we introduce an adaptive updating strategy for smoothing parameters. This strategy automatically fixes parameters associated with zero singular values as constants upon detecting the "correct" rank while quickly driving the rest of the parameters to zero. This adaptive behavior transforms the algorithm into one that effectively solves smooth problems after a few iterations, setting our work apart from existing iteratively reweighted methods for low-rank optimization. We prove the global convergence of the proposed algorithm, guaranteeing that every limit point of the iterates is a critical point. Furthermore, a local convergence rate analysis is provided under the Kurdyka-{\L}ojasiewicz property. We conduct numerical experiments using both synthetic and real data to showcase our algorithm's efficiency and superiority over existing methods., Comment: Copyright may be transferred without notice, after which this version may no longer be accessible
Published: 2024

19. EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration

Author: Wang, Ye, Xun, Jiahao, Hong, Minjie, Zhu, Jieming, Jin, Tao, Lin, Wang, Li, Haoyuan, Li, Linjun, Xia, Yan, Zhao, Zhou, and Dong, Zhenhua
Subjects: Computer Science - Information Retrieval
Abstract: Generative retrieval has recently emerged as a promising approach to sequential recommendation, framing candidate item retrieval as an autoregressive sequence generation problem. However, existing generative methods typically focus solely on either behavioral or semantic aspects of item information, neglecting their complementary nature and thus resulting in limited effectiveness. To address this limitation, we introduce EAGER, a novel generative recommendation framework that seamlessly integrates both behavioral and semantic information. Specifically, we identify three key challenges in combining these two types of information: a unified generative architecture capable of handling two feature types, ensuring sufficient and independent learning for each type, and fostering subtle interactions that enhance collaborative information utilization. To achieve these goals, we propose (1) a two-stream generation architecture leveraging a shared encoder and two separate decoders to decode behavior tokens and semantic tokens with a confidence-based ranking strategy; (2) a global contrastive task with summary tokens to achieve discriminative decoding for each type of information; and (3) a semantic-guided transfer task designed to implicitly promote cross-interactions through reconstruction and estimation objectives. We validate the effectiveness of EAGER on four public benchmarks, demonstrating its superior performance compared to existing methods., Comment: Accepted by KDD 2024. Code available at https://reczoo.github.io/EAGER
Published: 2024

20. Joint Observer Gain and Input Design for Asymptotic Active Fault Diagnosis

Author: Xu, Feng, Wan, Yiming, Wang, Ye, and Puig, Vicenc
Subjects: Electrical Engineering and Systems Science - Systems and Control
Abstract: This paper proposes a joint gain and input design method for observer-based asymptotic active fault diagnosis, which is based on a newly-defined notion named the excluding degree of the origin from a zonotope. Using the excluding degree, a quantitative specification is obtained to characterize the performance of set-based robust fault diagnosis. Furthermore, a single gain design method and a joint gain and input design method are proposed, respectively. This is the first work to achieve a joint observer gain and input design for set-based active fault diagnosis. Compared with the existing methods that design gains and input separately, the proposed joint gain and input design method has advantages to exploit the fault diagnosis potential of observer-based schemes. Finally, several examples are used to illustrate the effectiveness of the proposed methods.
Published: 2024

21. Efficient Differentially Private Fine-Tuning of Diffusion Models

Author: Liu, Jing, Lowy, Andrew, Koike-Akino, Toshiaki, Parsons, Kieran, and Wang, Ye
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security
Abstract: The recent developments of Diffusion Models (DMs) enable generation of astonishingly high-quality synthetic samples. Recent work showed that the synthetic samples generated by the diffusion model, which is pre-trained on public data and fully fine-tuned with differential privacy on private data, can train a downstream classifier, while achieving a good privacy-utility tradeoff. However, fully fine-tuning such large diffusion models with DP-SGD can be very resource-demanding in terms of memory usage and computation. In this work, we investigate Parameter-Efficient Fine-Tuning (PEFT) of diffusion models using Low-Dimensional Adaptation (LoDA) with Differential Privacy. We evaluate the proposed method with the MNIST and CIFAR-10 datasets and demonstrate that such efficient fine-tuning can also generate useful synthetic samples for training downstream classifiers, with guaranteed privacy protection of fine-tuning data. Our source code will be made available on GitHub.
Published: 2024

22. StreamOptix: A Cross-layer Adaptive Video Delivery Scheme

Author: Liu, Mufan, Yang, Le, Wang, Yifan, Xu, Yiling, Wang, Ye-Kui, and Guan, Yunfeng
Subjects: Computer Science - Multimedia
Abstract: This paper presents a cross-layer video delivery scheme, StreamOptix, and proposes a joint optimization algorithm for video delivery that leverages the characteristics of the physical (PHY), medium access control (MAC), and application (APP) layers. Most existing methods for optimizing video transmission over different layers were developed individually. Realizing a cross-layer design has always been a significant challenge, mainly due to the complex interactions and mismatches in timescales between layers, as well as the presence of distinct objectives in different layers. To address these complications, we take a divide-and-conquer approach and break down the formulated cross-layer optimization problem for video delivery into three sub-problems. We then propose a three-stage closedloop optimization framework, which consists of 1) an adaptive bitrate (ABR) strategy based on the link capacity information from PHY, 2) a video-aware resource allocation scheme accounting for the APP bitrate constraint, and 3) a link adaptation technique utilizing the soft acknowledgment feedback (soft-ACK). The proposed framework also supports the collections of the distorted bitstreams transmitted across the link. This allows a more reasonable assessment of video quality compared to many existing ABR methods that simply neglect the distortions occurring in the PHY layer. Experiments conducted under various network settings demonstrate the effectiveness and superiority of the new cross-layer optimization strategy. A byproduct of this study is the development of more comprehensive performance metrics on video delivery, which lays down the foundation for extending our system to multimodal communications in the future. Code for reproducing the experimental results is available at https://github.com/Evan-sudo/StreamOptix., Comment: under review in Transactions on Multimedia (TMM)
Published: 2024

23. Inspired by AI? A Novel Generative AI System To Assist Conceptual Automotive Design

Author: Wang, Ye, Damen, Nicole B., Gale, Thomas, Seo, Voho, and Shayani, Hooman
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Artificial Intelligence
Abstract: Design inspiration is crucial for establishing the direction of a design as well as evoking feelings and conveying meanings during the conceptual design process. Many practice designers use text-based searches on platforms like Pinterest to gather image ideas, followed by sketching on paper or using digital tools to develop concepts. Emerging generative AI techniques, such as diffusion models, offer a promising avenue to streamline these processes by swiftly generating design concepts based on text and image inspiration inputs, subsequently using the AI generated design concepts as fresh sources of inspiration for further concept development. However, applying these generative AI techniques directly within a design context has challenges. Firstly, generative AI tools may exhibit a bias towards particular styles, resulting in a lack of diversity of design outputs. Secondly, these tools may struggle to grasp the nuanced meanings of texts or images in a design context. Lastly, the lack of integration with established design processes within design teams can result in fragmented use scenarios. Focusing on these challenges, we conducted workshops, surveys, and data augmentation involving teams of experienced automotive designers to investigate their current practices in generating concepts inspired by texts and images, as well as their preferred interaction modes for generative AI systems to support the concept generation workflow. Finally, we developed a novel generative AI system based on diffusion models to assist conceptual automotive design.
Published: 2024

24. Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis

Author: Wang, Xintong, Shi, Mingqian, and Wang, Ye
Subjects: Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Mispronunciation Detection and Diagnosis (MDD) systems, leveraging Automatic Speech Recognition (ASR), face two main challenges in Mandarin Chinese: 1) The two-stage models create an information gap between the phoneme or tone classification stage and the MDD stage. 2) The scarcity of Mandarin MDD datasets limits model training. In this paper, we introduce a stateless RNN-T model for Mandarin MDD, utilizing HuBERT features with pitch embedding through a Pitch Fusion Block. Our model, trained solely on native speaker data, shows a 3% improvement in Phone Error Rate and a 7% increase in False Acceptance Rate over the state-of-the-art baseline in non-native scenarios, Comment: Accepted at Interspeech 2024
Published: 2024

25. A Unified Temporal Knowledge Graph Reasoning Model Towards Interpolation and Extrapolation

Author: Chen, Kai, Wang, Ye, Li, Yitong, Li, Aiping, Yu, Han, and Song, Xin
Subjects: Computer Science - Artificial Intelligence
Abstract: Temporal knowledge graph (TKG) reasoning has two settings: interpolation reasoning and extrapolation reasoning. Both of them draw plenty of research interest and have great significance. Methods of the former de-emphasize the temporal correlations among facts sequences, while methods of the latter require strict chronological order of knowledge and ignore inferring clues provided by missing facts of the past. These limit the practicability of TKG applications as almost all of the existing TKG reasoning methods are designed specifically to address either one setting. To this end, this paper proposes an original Temporal PAth-based Reasoning (TPAR) model for both the interpolation and extrapolation reasoning. TPAR performs a neural-driven symbolic reasoning fashion that is robust to ambiguous and noisy temporal data and with fine interpretability as well. Comprehensive experiments show that TPAR outperforms SOTA methods on the link prediction task for both the interpolation and the extrapolation settings. A novel pipeline experimental setting is designed to evaluate the performances of SOTA combinations and the proposed TPAR towards interpolation and extrapolation reasoning. More diverse experiments are conducted to show the robustness and interpretability of TPAR., Comment: To appear in ACL 2024 main conference
Published: 2024

26. Dishonest Approximate Computing: A Coming Crisis for Cloud Clients

Author: Wang, Ye, Dong, Jian, Han, Ming, Wu, Jin, and Qu, Gang
Subjects: Computer Science - Cryptography and Security, Computer Science - Hardware Architecture
Abstract: Approximate Computing (AC) has emerged as a promising technique for achieving energy-efficient architectures and is expected to become an effective technique for reducing the electricity cost for cloud service providers (CSP). However, the potential misuse of AC has not received adequate attention, which is a coming crisis behind the blueprint of AC. Driven by the pursuit of illegal financial profits, untrusted CSPs may deploy low-cost AC devices and deceive clients by presenting AC services as promised accurate computing products, while falsely claiming AC outputs as accurate results. This misuse of AC will cause both financial loss and computing degradation to cloud clients. In this paper, we define this malicious attack as DisHonest Approximate Computing (DHAC) and analyze the technical challenges faced by clients in detecting such attacks. To address this issue, we propose two golden model free detection methods: Residual Class Check (RCC) and Forward-Backward Check (FBC). RCC provides clients a low-cost approach to infer the residual class to which a legitimate accurate output should belong. By comparing the residual class of the returned result, clients can determine whether a computing service contains any AC elements. FBC detects potential DHAC by computing an invertible check branch using the intermediate values of the program. It compares the values before entering and after returning from the check branch to identify any discrepancies. Both RCC and FBC can be executed concurrently with real computing tasks, enabling real-time DHAC detection with current inputs. Our experimental results show that both RCC and FBC can detect over 96%-99% of DHAC cases without misjudging any legitimate accurate results., Comment: 12 pages, 9 figures
Published: 2024

27. End-to-End Real-World Polyphonic Piano Audio-to-Score Transcription with Hierarchical Decoding

Author: Zeng, Wei, He, Xian, and Wang, Ye
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Piano audio-to-score transcription (A2S) is an important yet underexplored task with extensive applications for music composition, practice, and analysis. However, existing end-to-end piano A2S systems faced difficulties in retrieving bar-level information such as key and time signatures, and have been trained and evaluated with only synthetic data. To address these limitations, we propose a sequence-to-sequence (Seq2Seq) model with a hierarchical decoder that aligns with the hierarchical structure of musical scores, enabling the transcription of score information at both the bar and note levels by multi-task learning. To bridge the gap between synthetic data and recordings of human performance, we propose a two-stage training scheme, which involves pre-training the model using an expressive performance rendering (EPR) system on synthetic audio, followed by fine-tuning the model using recordings of human performance. To preserve the voicing structure for score reconstruction, we propose a pre-processing method for **Kern scores in scenarios with an unconstrained number of voices. Experimental results support the effectiveness of our proposed approaches, in terms of both transcription performance on synthetic audio data in comparison to the current state-of-the-art, and the first experiment on human recordings., Comment: 8 pages, 5 figures, accepted by IJCAI 2024 - AI, Arts & Creativity Track
Published: 2024

28. ACEGEN: Reinforcement learning of generative chemical agents for drug discovery

Author: Bou, Albert, Thomas, Morgan, Dittert, Sebastian, Ramírez, Carles Navarro, Majewski, Maciej, Wang, Ye, Patel, Shivam, Tresadern, Gary, Ahmad, Mazen, Moens, Vincent, Sherman, Woody, Sciabola, Simone, and De Fabritiis, Gianni
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Quantitative Biology - Biomolecules
Abstract: In recent years, reinforcement learning (RL) has emerged as a valuable tool in drug design, offering the potential to propose and optimize molecules with desired properties. However, striking a balance between capabilities, flexibility, reliability, and efficiency remains challenging due to the complexity of advanced RL algorithms and the significant reliance on specialized code. In this work, we introduce ACEGEN, a comprehensive and streamlined toolkit tailored for generative drug design, built using TorchRL, a modern RL library that offers thoroughly tested reusable components. We validate ACEGEN by benchmarking against other published generative modeling algorithms and show comparable or improved performance. We also show examples of ACEGEN applied in multiple drug discovery case studies. ACEGEN is accessible at \url{https://github.com/acellera/acegen-open} and available for use under the MIT license.
Published: 2024

29. The Impact of Language Input on Deaf and Hard of Hearing Preschool Children Who Use Listening and Spoken Language

Author: Rufsvold, Ronda, Wang, Ye, Hartman, Maria C., Arora, Sonia B., and Smolen, Elaine R.
Published: 2018
Full Text: View/download PDF

30. A Grounded Theory of Effective Reading by Profoundly Deaf Adults

Author: Silvestri, Julia A. and Wang, Ye
Published: 2018
Full Text: View/download PDF

31. Selected Factors in Reading Comprehension for Deaf and Hearing Adults: Phonological Skills and Metacognition

Author: Wang, Ye, Silvestri, Julia A., and Jahromi, Laudan B.
Published: 2018
Full Text: View/download PDF

32. TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models

Author: Ni, Haomiao, Egger, Bernhard, Lohit, Suhas, Cherian, Anoop, Wang, Ye, Koike-Akino, Toshiaki, Huang, Sharon X., and Marks, Tim K.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Text-conditioned image-to-video generation (TI2V) aims to synthesize a realistic video starting from a given image (e.g., a woman's photo) and a text description (e.g., "a woman is drinking water."). Existing TI2V frameworks often require costly training on video-text datasets and specific model designs for text and image conditioning. In this paper, we propose TI2V-Zero, a zero-shot, tuning-free method that empowers a pretrained text-to-video (T2V) diffusion model to be conditioned on a provided image, enabling TI2V generation without any optimization, fine-tuning, or introducing external modules. Our approach leverages a pretrained T2V diffusion foundation model as the generative prior. To guide video generation with the additional image input, we propose a "repeat-and-slide" strategy that modulates the reverse denoising process, allowing the frozen diffusion model to synthesize a video frame-by-frame starting from the provided image. To ensure temporal continuity, we employ a DDPM inversion strategy to initialize Gaussian noise for each newly synthesized frame and a resampling technique to help preserve visual details. We conduct comprehensive experiments on both domain-specific and open-domain datasets, where TI2V-Zero consistently outperforms a recent open-domain TI2V model. Furthermore, we show that TI2V-Zero can seamlessly extend to other tasks such as video infilling and prediction when provided with more images. Its autoregressive design also supports long video generation., Comment: CVPR 2024
Published: 2024

33. Prompt-tuning for Clickbait Detection via Text Summarization

Author: Deng, Haoxiang, Zhu, Yi, Wang, Ye, Qiang, Jipeng, Yuan, Yunhao, Li, Yun, and Zhang, Runmei
Subjects: Computer Science - Computation and Language
Abstract: Clickbaits are surprising social posts or deceptive news headlines that attempt to lure users for more clicks, which have posted at unprecedented rates for more profit or commercial revenue. The spread of clickbait has significant negative impacts on the users, which brings users misleading or even click-jacking attacks. Different from fake news, the crucial problem in clickbait detection is determining whether the headline matches the corresponding content. Most existing methods compute the semantic similarity between the headlines and contents for detecting clickbait. However, due to significant differences in length and semantic features between headlines and contents, directly calculating semantic similarity is often difficult to summarize the relationship between them. To address this problem, we propose a prompt-tuning method for clickbait detection via text summarization in this paper, text summarization is introduced to summarize the contents, and clickbait detection is performed based on the similarity between the generated summary and the contents. Specifically, we first introduce a two-stage text summarization model to produce high-quality news summaries based on pre-trained language models, and then both the headlines and new generated summaries are incorporated as the inputs for prompt-tuning. Additionally, a variety of strategies are conducted to incorporate external knowledge for improving the performance of clickbait detection. The extensive experiments on well-known clickbait detection datasets demonstrate that our method achieved state-of-the-art performance.
Published: 2024

34. EVAN: Evolutional Video Streaming Adaptation via Neural Representation

Author: Liu, Mufan, Yang, Le, Xu, Yiling, Wang, Ye-kui, and Hwang, Jenq-Neng
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia
Abstract: Adaptive bitrate (ABR) using conventional codecs cannot further modify the bitrate once a decision has been made, exhibiting limited adaptation capability. This may result in either overly conservative or overly aggressive bitrate selection, which could cause either inefficient utilization of the network bandwidth or frequent re-buffering, respectively. Neural representation for video (NeRV), which embeds the video content into neural network weights, allows video reconstruction with incomplete models. Specifically, the recovery of one frame can be achieved without relying on the decoding of adjacent frames. NeRV has the potential to provide high video reconstruction quality and, more importantly, pave the way for developing more flexible ABR strategies for video transmission. In this work, a new framework, named Evolutional Video streaming Adaptation via Neural representation (EVAN), which can adaptively transmit NeRV models based on soft actor-critic (SAC) reinforcement learning, is proposed. EVAN is trained with a more exploitative strategy and utilizes progressive playback to avoid re-buffering. Experiments showed that EVAN can outperform existing ABRs with 50% reduction in re-buffering and achieve nearly 20% ., Comment: accepted by ICME (conference)
Published: 2024

35. Beyond Known Clusters: Probe New Prototypes for Efficient Generalized Class Discovery

Author: Wang, Ye, Wang, Yaxiong, Wu, Yujiao, Zhao, Bingchen, and Qian, Xueming
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Generalized Class Discovery (GCD) aims to dynamically assign labels to unlabelled data partially based on knowledge learned from labelled data, where the unlabelled data may come from known or novel classes. The prevailing approach generally involves clustering across all data and learning conceptions by prototypical contrastive learning. However, existing methods largely hinge on the performance of clustering algorithms and are thus subject to their inherent limitations. Firstly, the estimated cluster number is often smaller than the ground truth, making the existing methods suffer from the lack of prototypes for comprehensive conception learning. To address this issue, we propose an adaptive probing mechanism that introduces learnable potential prototypes to expand cluster prototypes (centers). As there is no ground truth for the potential prototype, we develop a self-supervised prototype learning framework to optimize the potential prototype in an end-to-end fashion. Secondly, clustering is computationally intensive, and the conventional strategy of clustering both labelled and unlabelled instances exacerbates this issue. To counteract this inefficiency, we opt to cluster only the unlabelled instances and subsequently expand the cluster prototypes with our introduced potential prototypes to fast explore novel classes. Despite the simplicity of our proposed method, extensive empirical analysis on a wide range of datasets confirms that our method consistently delivers state-of-the-art results. Specifically, our method surpasses the nearest competitor by a significant margin of 9.7% within the Stanford Cars dataset and 12x clustering efficiency within the Herbarium 19 dataset. We will make the code and checkpoints publicly available at https://github.com/xjtuYW/PNP.git., Comment: 9 pages, 7 figures
Published: 2024

36. Chiral two-dimensional MoS2 by molecular functionalization as ultra-sensitive detectors for circularly polarized light

Author: Wang, Ye, Zhu, Yiru, Yan, Han, Li, Yang, Wang, Yan, and Chhowalla, Manish
Subjects: Condensed Matter - Materials Science
Abstract: Inducing chirality in optically and electronically active materials is interesting for applications in sensing and quantum information transmission. Two-dimensional (2D) transition metal chalcogenides (TMDs) possess excellent electronic and optical properties but are achiral. Here we demonstrate chirality induction in atomically thin layers of 2D MoS2 by functionalization with chiral thiol molecules. Analysis of X-ray absorption near-edge structure and Raman optical activity with circularly polarized excitation suggest chemical and electronic interactions that leads chirality transfer from the molecules to the MoS2. We confirm chirality induction in 2D MoS2 with circular dichroism measurements that show absorption bands at wavelengths of 380-520 nm and 520-600 nm with giant molar ellipticity of 10^8 deg cm2/dmol 2-3 orders of magnitude higher than 3D chiral materials. Phototransistors fabricated from atomically thin chiral MoS2 for detection of circularly polarized light exhibit responsivity of >10^2 A/W and maximum anisotropy g-factor of 1.98 close to the theoretical maximum of 2.0, which indicates that the chiral states of photons are fully distinguishable by the photodetectors. Our results demonstrate that it is possible achieve chirality induction in monolayer MoS2 by molecular functionalization and realise ultra-sensitive detectors for circularly polarized photons.
Published: 2024

37. Contouring Error Bounded Control for Biaxial Switched Linear Systems

Author: Yuan, Meng, Wang, Ye, Manzie, Chris, Xu, Zhezhuang, and Chai, Tianyou
Subjects: Electrical Engineering and Systems Science - Systems and Control
Abstract: Biaxial motion control systems are used extensively in manufacturing and printing industries. To improve throughput and reduce machine cost, lightweight materials are being proposed in structural components but may result in higher flexibility in the machine links. This flexibility is often position dependent and compromises precision of the end effector of the machine. To address the need for improved contouring accuracy in industrial machines with position-dependent structural flexibility, this paper introduces a novel contouring error-bounded control algorithm for biaxial switched linear systems. The proposed algorithm utilizes model predictive control to guarantee the satisfaction of state, input, and contouring error constraints for any admissible mode switching. In this paper, the switching signal remains unknown to the controller, although information about the minimum time the system is expected to stay in a specific mode is considered to be available. The proposed algorithm has the property of recursive feasibility and ensures the stability of the closed-loop system. The effectiveness of the proposed method is demonstrated by applying it to a high-fidelity simulation of a dual-drive industrial laser machine. The results show that the contouring error is successfully bounded within the given tolerance.
Published: 2024

38. Elicitron: An LLM Agent-Based Simulation Framework for Design Requirements Elicitation

Author: Ataei, Mohammadmehdi, Cheong, Hyunmin, Grandi, Daniele, Wang, Ye, Morris, Nigel, and Tessier, Alexander
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Artificial Intelligence, Computer Science - Multiagent Systems
Abstract: Requirements elicitation, a critical, yet time-consuming and challenging step in product development, often fails to capture the full spectrum of user needs. This may lead to products that fall short of expectations. This paper introduces a novel framework that leverages Large Language Models (LLMs) to automate and enhance the requirements elicitation process. LLMs are used to generate a vast array of simulated users (LLM agents), enabling the exploration of a much broader range of user needs and unforeseen use cases. These agents engage in product experience scenarios, through explaining their actions, observations, and challenges. Subsequent agent interviews and analysis uncover valuable user needs, including latent ones. We validate our framework with three experiments. First, we explore different methodologies for diverse agent generation, discussing their advantages and shortcomings. We measure the diversity of identified user needs and demonstrate that context-aware agent generation leads to greater diversity. Second, we show how our framework effectively mimics empathic lead user interviews, identifying a greater number of latent needs than conventional human interviews. Third, we showcase that LLMs can be used to analyze interviews, capture needs, and classify them as latent or not. Our work highlights the potential of using LLM agents to accelerate early-stage product development, reduce costs, and increase innovation.
Published: 2024

39. SuperLoRA: Parameter-Efficient Unified Adaptation of Multi-Layer Attention Modules

Author: Chen, Xiangyu, Liu, Jing, Wang, Ye, Wang, Pu Perry, Brand, Matthew, Wang, Guanghui, and Koike-Akino, Toshiaki
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Low-rank adaptation (LoRA) and its variants are widely employed in fine-tuning large models, including large language models for natural language processing and diffusion models for computer vision. This paper proposes a generalized framework called SuperLoRA that unifies and extends different LoRA variants, which can be realized under different hyper-parameter settings. Introducing grouping, folding, shuffling, projecting, and tensor factoring, SuperLoRA offers high flexibility compared with other LoRA variants and demonstrates superior performance for transfer learning tasks especially in the extremely few-parameter regimes., Comment: 33 pages, 29 figures
Published: 2024

40. Probabilistic reachable sets of stochastic nonlinear systems with contextual uncertainties

Author: Shen, Xun, Wang, Ye, Hashimoto, Kazumune, Wu, Yuhu, and Gros, Sebastien
Subjects: Electrical Engineering and Systems Science - Systems and Control, Mathematics - Dynamical Systems, Mathematics - Optimization and Control
Abstract: Validating and controlling safety-critical systems in uncertain environments necessitates probabilistic reachable sets of future state evolutions. The existing methods of computing probabilistic reachable sets normally assume that the uncertainties are independent of the state. However, this assumption falls short in many real-world applications, where uncertainties are state-dependent, referred to as contextual uncertainties. This paper formulates the problem of computing probabilistic reachable sets of stochastic nonlinear states with contextual uncertainties by seeking minimum-volume polynomial sublevel sets with contextual chance constraints. The formulated problem cannot be solved by the existing sample-based approximation method since the existing methods do not consider the conditional probability densities. To address this, we propose a consistent sample approximation of the original problem by leveraging the conditional density estimation and resampling. The obtained approximate problem is a tractable optimization problem. Additionally, we prove the almost uniform convergence of the proposed sample-based approximation, showing that it gives the optimal solution almost consistently with the original ones. Through a numerical example, we evaluate the effectiveness of the proposed method against existing approaches, highlighting its capability to significantly reduce the bias inherent in sample-based approximation without considering a conditional probability density.
Published: 2024

41. OSTAF: A One-Shot Tuning Method for Improved Attribute-Focused T2I Personalization

Author: Wang, Ye, Yi, Zili, and Ma, Rui
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Personalized text-to-image (T2I) models not only produce lifelike and varied visuals but also allow users to tailor the images to fit their personal taste. These personalization techniques can grasp the essence of a concept through a collection of images, or adjust a pre-trained text-to-image model with a specific image input for subject-driven or attribute-aware guidance. Yet, accurately capturing the distinct visual attributes of an individual image poses a challenge for these methods. To address this issue, we introduce OSTAF, a novel parameter-efficient one-shot fine-tuning method which only utilizes one reference image for T2I personalization. A novel hypernetwork-powered attribute-focused fine-tuning mechanism is employed to achieve the precise learning of various attribute features (e.g., appearance, shape or drawing style) from the reference image. Comparing to existing image customization methods, our method shows significant superiority in attribute identification and application, as well as achieves a good balance between efficiency and output quality.
Published: 2024

42. AutoHLS: Learning to Accelerate Design Space Exploration for HLS Designs

Author: Ahmed, Md Rubel, Koike-Akino, Toshiaki, Parsons, Kieran, and Wang, Ye
Subjects: Computer Science - Hardware Architecture, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: High-level synthesis (HLS) is a design flow that leverages modern language features and flexibility, such as complex data structures, inheritance, templates, etc., to prototype hardware designs rapidly. However, exploring various design space parameters can take much time and effort for hardware engineers to meet specific design specifications. This paper proposes a novel framework called AutoHLS, which integrates a deep neural network (DNN) with Bayesian optimization (BO) to accelerate HLS hardware design optimization. Our tool focuses on HLS pragma exploration and operation transformation. It utilizes integrated DNNs to predict synthesizability within a given FPGA resource budget. We also investigate the potential of emerging quantum neural networks (QNNs) instead of classical DNNs for the AutoHLS pipeline. Our experimental results demonstrate up to a 70-fold speedup in exploration time., Comment: 5 pages, 6 figures, MWSCAS 2023
Published: 2024

43. DA-PFL: Dynamic Affinity Aggregation for Personalized Federated Learning

Author: Yang, Xu, Feng, Jiyuan, Guo, Songyue, Wang, Ye, Ding, Ye, Fang, Binxing, and Liao, Qing
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Personalized federated learning becomes a hot research topic that can learn a personalized learning model for each client. Existing personalized federated learning models prefer to aggregate similar clients with similar data distribution to improve the performance of learning models. However, similaritybased personalized federated learning methods may exacerbate the class imbalanced problem. In this paper, we propose a novel Dynamic Affinity-based Personalized Federated Learning model (DA-PFL) to alleviate the class imbalanced problem during federated learning. Specifically, we build an affinity metric from a complementary perspective to guide which clients should be aggregated. Then we design a dynamic aggregation strategy to dynamically aggregate clients based on the affinity metric in each round to reduce the class imbalanced risk. Extensive experiments show that the proposed DA-PFL model can significantly improve the accuracy of each client in three real-world datasets with state-of-the-art comparison methods.
Published: 2024

44. UniCode: Learning a Unified Codebook for Multimodal Large Language Models

Author: Zheng, Sipeng, Zhou, Bohan, Feng, Yicheng, Wang, Ye, and Lu, Zongqing
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: In this paper, we propose \textbf{UniCode}, a novel approach within the domain of multimodal large language models (MLLMs) that learns a unified codebook to efficiently tokenize visual, text, and potentially other types of signals. This innovation addresses a critical limitation in existing MLLMs: their reliance on a text-only codebook, which restricts MLLM's ability to generate images and texts in a multimodal context. Towards this end, we propose a language-driven iterative training paradigm, coupled with an in-context pre-training task we term ``image decompression'', enabling our model to interpret compressed visual data and generate high-quality images.The unified codebook empowers our model to extend visual instruction tuning to non-linguistic generation tasks. Moreover, UniCode is adaptable to diverse stacked quantization approaches in order to compress visual signals into a more compact token representation. Despite using significantly fewer parameters and less data during training, Unicode demonstrates promising capabilities in visual reconstruction and generation. It also achieves performances comparable to leading MLLMs across a spectrum of VQA benchmarks., Comment: 14 pages, 2 figures, 11 tables
Published: 2024

45. Quasi-spherical metrics and the static Minkowski inequality

Author: Harvie, Brian and Wang, Ye-Kai
Subjects: Mathematics - Differential Geometry, General Relativity and Quantum Cosmology, Mathematics - Analysis of PDEs
Abstract: We prove that equality in the Minkowski inequality for asymptotically flat static manifolds is achieved only by slices of Schwarzschild space. To show this, we establish uniqueness of *quasi-spherical* static metrics: rotationally symmetric regions of Schwarzschild are the only static vacuum metrics which are quasi-spherical with vanishing shear vector. In addition, we observe that the static Minkowski inequality extends to all dimensions for a connected boundary and to asymptotically flat static manifolds of any positive decay order. Altogether, this yields a robust rigidity criterion for Schwarzschild space. Using this criterion, we recover Israel's static black hole uniqueness theorem under this mild decay assumption. Likewise, the uniqueness theorems for photon surfaces and static metric extensions from the prequel extend to all dimensions under these weaker asymptotics. Finally, as a notable by-product of our analysis, we establish regularity of weak inverse mean curvature flow in asymptotically flat manifolds -- that is, a weak IMCF is eventually smooth in an arbitrary asymptotically flat background., Comment: 57 pages. The main rigidity theorem (theorem 1.3) has been strengthened -- the mean-convex boundary assumption may be removed if the manifold is asymptotically flat of order \tau >0 (see Section 2), which yields the complete rigidity statement for these manifolds. Because of the upgraded rigidity statement, we also included a new proof of static black hole uniqueness in section 8
Published: 2024

46. On some path-critical Ramsey numbers

Author: Wang, Ye and Song, Yanyan
Subjects: Mathematics - Combinatorics
Abstract: For graphs $G$ and $H$, the Ramsey number $R(G,H)$ is the smallest $r$ such that any red-blue edge coloring of $K_r$ contains a red $G$ or a blue $H$. The path-critical Ramsey number $R_{\pi}(G,H)$ is the largest $n$ such that any red-blue edge coloring of $K_r \setminus P_{n}$ contains a red $G$ or a blue $H$, where $r=R(G,H)$ and $P_{n}$ is a path of order $n$. In this note, we show a general upper bound for $R_{\pi}(G,H)$, and determine the exact values for some cases of $R_{\pi}(G,H)$.
Published: 2024

47. Can Large Language Models Recall Reference Location Like Humans?

Author: Wang, Ye, Xu, Xinrun, Xie, Rui, Hu, Wenxin, and Ye, Wei
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: When completing knowledge-intensive tasks, humans sometimes need not just an answer but also a corresponding reference passage for auxiliary reading. Previous methods required obtaining pre-segmented article chunks through additional retrieval models. This paper explores leveraging the parameterized knowledge stored during the pre-training phase of large language models (LLMs) to independently recall reference passage from any starting position. We propose a two-stage framework that simulates the scenario of humans recalling easily forgotten references. Initially, the LLM is prompted to recall document title identifiers to obtain a coarse-grained document set. Then, based on the acquired coarse-grained document set, it recalls fine-grained passage. In the two-stage recall process, we use constrained decoding to ensure that content outside of the stored documents is not generated. To increase speed, we only recall a short prefix in the second stage, then locate its position to retrieve a complete passage. Experiments on KILT knowledge-sensitive tasks have verified that LLMs can independently recall reference passage location in various task forms, and the obtained reference significantly assist downstream tasks.
Published: 2024

48. Bidirectional Uncertainty-Based Active Learning for Open Set Annotation

Author: Zong, Chen-Chen, Wang, Ye-Wen, Ning, Kun-Peng, Ye, Hai-Bo, and Huang, Sheng-Jun
Subjects: Computer Science - Machine Learning
Abstract: Active learning (AL) in open set scenarios presents a novel challenge of identifying the most valuable examples in an unlabeled data pool that comprises data from both known and unknown classes. Traditional methods prioritize selecting informative examples with low confidence, with the risk of mistakenly selecting unknown-class examples with similarly low confidence. Recent methods favor the most probable known-class examples, with the risk of picking simple already mastered examples. In this paper, we attempt to query examples that are both likely from known classes and highly informative, and propose a Bidirectional Uncertainty-based Active Learning (BUAL) framework. Specifically, we achieve this by first pushing the unknown class examples toward regions with high-confidence predictions, i.e., the proposed Random Label Negative Learning method. Then, we propose a Bidirectional Uncertainty sampling strategy by jointly estimating uncertainty posed by both positive and negative learning to perform consistent and stable sampling. BUAL successfully extends existing uncertainty-based AL methods to complex open-set scenarios. Extensive experiments on multiple datasets with varying openness demonstrate that BUAL achieves state-of-the-art performance. The code is available at https://github.com/chenchenzong/BUAL., Comment: Accepted to ECCV 2024
Published: 2024

49. TaylorGrid: Towards Fast and High-Quality Implicit Field Learning via Direct Taylor-based Grid Optimization

Author: Mao, Renyi, Xu, Qingshan, Zheng, Peng, Wang, Ye, Wu, Tieru, and Ma, Rui
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics
Abstract: Coordinate-based neural implicit representation or implicit fields have been widely studied for 3D geometry representation or novel view synthesis. Recently, a series of efforts have been devoted to accelerating the speed and improving the quality of the coordinate-based implicit field learning. Instead of learning heavy MLPs to predict the neural implicit values for the query coordinates, neural voxels or grids combined with shallow MLPs have been proposed to achieve high-quality implicit field learning with reduced optimization time. On the other hand, lightweight field representations such as linear grid have been proposed to further improve the learning speed. In this paper, we aim for both fast and high-quality implicit field learning, and propose TaylorGrid, a novel implicit field representation which can be efficiently computed via direct Taylor expansion optimization on 2D or 3D grids. As a general representation, TaylorGrid can be adapted to different implicit fields learning tasks such as SDF learning or NeRF. From extensive quantitative and qualitative comparisons, TaylorGrid achieves a balance between the linear grid and neural voxels, showing its superiority in fast and high-quality implicit field learning.
Published: 2024

50. Why Does Differential Privacy with Large Epsilon Defend Against Practical Membership Inference Attacks?

Author: Lowy, Andrew, Li, Zhuohang, Liu, Jing, Koike-Akino, Toshiaki, Parsons, Kieran, and Wang, Ye
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, 68P27
Abstract: For small privacy parameter $\epsilon$, $\epsilon$-differential privacy (DP) provides a strong worst-case guarantee that no membership inference attack (MIA) can succeed at determining whether a person's data was used to train a machine learning model. The guarantee of DP is worst-case because: a) it holds even if the attacker already knows the records of all but one person in the data set; and b) it holds uniformly over all data sets. In practical applications, such a worst-case guarantee may be overkill: practical attackers may lack exact knowledge of (nearly all of) the private data, and our data set might be easier to defend, in some sense, than the worst-case data set. Such considerations have motivated the industrial deployment of DP models with large privacy parameter (e.g. $\epsilon \geq 7$), and it has been observed empirically that DP with large $\epsilon$ can successfully defend against state-of-the-art MIAs. Existing DP theory cannot explain these empirical findings: e.g., the theoretical privacy guarantees of $\epsilon \geq 7$ are essentially vacuous. In this paper, we aim to close this gap between theory and practice and understand why a large DP parameter can prevent practical MIAs. To tackle this problem, we propose a new privacy notion called practical membership privacy (PMP). PMP models a practical attacker's uncertainty about the contents of the private data. The PMP parameter has a natural interpretation in terms of the success rate of a practical MIA on a given data set. We quantitatively analyze the PMP parameter of two fundamental DP mechanisms: the exponential mechanism and Gaussian mechanism. Our analysis reveals that a large DP parameter often translates into a much smaller PMP parameter, which guarantees strong privacy against practical MIAs. Using our findings, we offer principled guidance for practitioners in choosing the DP parameter., Comment: Accepted at PPAI-24: AAAI Workshop on Privacy-Preserving Artificial Intelligence
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

16,401 results on '"WANG, Ye"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources