63,038 results on '"Wang, Yang"'
Search Results
2. Effects of paeonol sub-inhibitory concentration on Streptococcus suis biofilm and expression of virulence genes
- Author
-
Gao, Shuji, Li, Jinpeng, Fan, Qingying, Xue, Bingqian, Zhang, Xiaoling, Wang, Yang, and Wei, Ying
- Published
- 2022
- Full Text
- View/download PDF
3. Effect of biofilm formation on the Escherichia coli drug resistance of isolates from pigs in central China
- Author
-
Li, Jinpeng, Fan, Qingying, Mao, Chenlong, Jin, Manyu, Yi, Li, and Wang, Yang
- Published
- 2021
- Full Text
- View/download PDF
4. A Powder Diffraction-AI Solution for Crystalline Structure
- Author
-
Wu, Di, Wang, Pengkun, Zhou, Shiming, Zhang, Bochun, Yu, Liheng, Chen, Xi, Wang, Xu, Zhou, Zhengyang, Wang, Yang, Wang, Sujing, and Du, Jiangfeng
- Subjects
Condensed Matter - Materials Science - Abstract
Determining the atomic-level structure of crystalline solids is critically important across a wide array of scientific disciplines. The challenges associated with obtaining samples suitable for single-crystal diffraction, coupled with the limitations inherent in classical structure determination methods that primarily utilize powder diffraction for most polycrystalline materials, underscore an urgent need to develop alternative approaches for elucidating the structures of commonly encountered crystalline compounds. In this work, we present an artificial intelligence-directed leapfrog model capable of accurately determining the structures of both organic and inorganic-organic hybrid crystalline solids through direct analysis of powder X-ray diffraction data. This model not only offers a comprehensive solution that effectively circumvents issues related to insoluble challenges in conventional structure solution methodologies but also demonstrates applicability to crystal structures across all conceivable space groups. Furthermore, it exhibits notable compatibility with routine powder diffraction data typically generated by standard instruments, featuring rapid data collection and normal resolution levels.
- Published
- 2024
5. NetSafe: Exploring the Topological Safety of Multi-agent Networks
- Author
-
Yu, Miao, Wang, Shilong, Zhang, Guibin, Mao, Junyuan, Yin, Chenlong, Liu, Qijiong, Wen, Qingsong, Wang, Kun, and Wang, Yang
- Subjects
Computer Science - Multiagent Systems ,Computer Science - Artificial Intelligence - Abstract
Large language models (LLMs) have empowered nodes within multi-agent networks with intelligence, showing growing applications in both academia and industry. However, how to prevent these networks from generating malicious information remains unexplored with previous research on single LLM's safety be challenging to transfer. In this paper, we focus on the safety of multi-agent networks from a topological perspective, investigating which topological properties contribute to safer networks. To this end, we propose a general framework, NetSafe along with an iterative RelCom interaction to unify existing diverse LLM-based agent frameworks, laying the foundation for generalized topological safety research. We identify several critical phenomena when multi-agent networks are exposed to attacks involving misinformation, bias, and harmful information, termed as Agent Hallucination and Aggregation Safety. Furthermore, we find that highly connected networks are more susceptible to the spread of adversarial attacks, with task performance in a Star Graph Topology decreasing by 29.7%. Besides, our proposed static metrics aligned more closely with real-world dynamic evaluations than traditional graph-theoretic metrics, indicating that networks with greater average distances from attackers exhibit enhanced safety. In conclusion, our work introduces a new topological perspective on the safety of LLM-based multi-agent networks and discovers several unreported phenomena, paving the way for future research to explore the safety of such networks.
- Published
- 2024
6. Aggregation of Bilinear Bipartite Equality Constraints and its Application to Structural Model Updating Problem
- Author
-
Dey, Santanu S, Han, Dahye, and Wang, Yang
- Subjects
Mathematics - Optimization and Control - Abstract
In this paper, we study the strength of convex relaxations obtained by convexification of aggregation of constraints for a set $S$ described by two bilinear bipartite equalities. Aggregation is the process of rescaling the original constraints by scalar weights and adding the scaled constraints together. It is natural to study the aggregation technique as it yields a single bilinear bipartite equality whose convex hull is already understood from previous literature. On the theoretical side, we present sufficient conditions when $\text{conv}(S)$ can be described by the intersection of convex hulls of a finite number of aggregations, examples when $\text{conv}(S)$ can only be obtained as the intersection of the convex hull of an infinite number of aggregations, and examples when $\text{conv}(S)$ cannot be achieved exactly from the process of aggregation. Computationally, we explore different methods to derive aggregation weights in order to obtain tight convex relaxations. We show that even if an exact convex hull may not be achieved using aggregations, including the convex hull of an aggregation often significantly tightens the outer approximation of $\text{conv}(S)$. Finally, we apply the aggregation method to obtain convex relaxation for the structural model updating problem and show that this yields better bounds within a branch-and-bound tree as compared to not using aggregations.
- Published
- 2024
7. Future of Algorithmic Organization: Large-Scale Analysis of Decentralized Autonomous Organizations (DAOs)
- Author
-
Sharma, Tanusree, Potter, Yujin, Pongmala, Kornrapat, Wang, Henry, Miller, Andrew, Song, Dawn, and Wang, Yang
- Subjects
Computer Science - Social and Information Networks ,Computer Science - Computational Engineering, Finance, and Science ,Computer Science - Cryptography and Security ,Computer Science - Computers and Society ,Computer Science - Human-Computer Interaction - Abstract
Decentralized Autonomous Organizations (DAOs) resemble early online communities, particularly those centered around open-source projects, and present a potential empirical framework for complex social-computing systems by encoding governance rules within "smart contracts" on the blockchain. A key function of a DAO is collective decision-making, typically carried out through a series of proposals where members vote on organizational events using governance tokens, signifying relative influence within the DAO. In just a few years, the deployment of DAOs surged with a total treasury of $24.5 billion and 11.1M governance token holders collectively managing decisions across over 13,000 DAOs as of 2024. In this study, we examine the operational dynamics of 100 DAOs, like pleasrdao, lexdao, lootdao, optimism collective, uniswap, etc. With large-scale empirical analysis of a diverse set of DAO categories and smart contracts and by leveraging on-chain (e.g., voting results) and off-chain data, we examine factors such as voting power, participation, and DAO characteristics dictating the level of decentralization, thus, the efficiency of management structures. As such, our study highlights that increased grassroots participation correlates with higher decentralization in a DAO, and lower variance in voting power within a DAO correlates with a higher level of decentralization, as consistently measured by Gini metrics. These insights closely align with key topics in political science, such as the allocation of power in decision-making and the effects of various governance models. We conclude by discussing the implications for researchers, and practitioners, emphasizing how these factors can inform the design of democratic governance systems in emerging applications that require active engagement from stakeholders in decision-making.
- Published
- 2024
8. Task Consistent Prototype Learning for Incremental Few-shot Semantic Segmentation
- Author
-
Xu, Wenbo, Wu, Yanan, Jiang, Haoran, Wang, Yang, Wu, Qiang, and Zhang, Jian
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Incremental Few-Shot Semantic Segmentation (iFSS) tackles a task that requires a model to continually expand its segmentation capability on novel classes using only a few annotated examples. Typical incremental approaches encounter a challenge that the objective of the base training phase (fitting base classes with sufficient instances) does not align with the incremental learning phase (rapidly adapting to new classes with less forgetting). This disconnect can result in suboptimal performance in the incremental setting. This study introduces a meta-learning-based prototype approach that encourages the model to learn how to adapt quickly while preserving previous knowledge. Concretely, we mimic the incremental evaluation protocol during the base training session by sampling a sequence of pseudo-incremental tasks. Each task in the simulated sequence is trained using a meta-objective to enable rapid adaptation without forgetting. To enhance discrimination among class prototypes, we introduce prototype space redistribution learning, which dynamically updates class prototypes to establish optimal inter-prototype boundaries within the prototype space. Extensive experiments on iFSS datasets built upon PASCAL and COCO benchmarks show the advanced performance of the proposed approach, offering valuable insights for addressing iFSS challenges., Comment: conference
- Published
- 2024
9. Caching Content Placement and Beamforming Co-design for IRS-Aided MIMO Systems with Imperfect CSI
- Author
-
Gao, Meng, Wang, Yang, Li, Huafu, and Guo, Junqi
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
When offloading links encounter deep fading and obstruction, edge caching cannot fully enhance wireless network performance and improve the QoS of edge nodes, as it fails to effectively reduce backhaul burden. The emerging technology of intelligent reflecting surfaces (IRS) compensates for this disadvantage by creating a smart and reconfigurable wireless environment. Subsequently, we jointly design content placement and active/passive beamforming to minimize network costs under imperfect channel state information (CSI) in the IRS-oriented edge caching system. This minimization problem is decomposed into two subproblems. The content placement subproblem is addressed by applying KKT optimality conditions. We then develop the alternating optimization method to resolve precoder and reflection beamforming. Specifically, we reduce transmission power by first fixing the phase shift, reducing the problem to a convex one relative to the precoder, which is solved through convex optimization. Next, we fix the precoder and resolve the resulting reflection beamforming problem using the penalty convex-concave procedure (CCP) method. Results demonstrate that our proposed method outperforms uniform caching and random phase approaches in reducing transmission power and saving network costs. Eventually, the proposed approach offers potential improvements in the caching optimization and transmission robustness of wireless communication with imperfect CSI.
- Published
- 2024
10. Get Rid of Task Isolation: A Continuous Multi-task Spatio-Temporal Learning Framework
- Author
-
Yi, Zhongchao, Zhou, Zhengyang, Huang, Qihe, Chen, Yanjiang, Yu, Liheng, Wang, Xu, and Wang, Yang
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Spatiotemporal learning has become a pivotal technique to enable urban intelligence. Traditional spatiotemporal models mostly focus on a specific task by assuming a same distribution between training and testing sets. However, given that urban systems are usually dynamic, multi-sourced with imbalanced data distributions, current specific task-specific models fail to generalize to new urban conditions and adapt to new domains without explicitly modeling interdependencies across various dimensions and types of urban data. To this end, we argue that there is an essential to propose a Continuous Multi-task Spatio-Temporal learning framework (CMuST) to empower collective urban intelligence, which reforms the urban spatiotemporal learning from single-domain to cooperatively multi-dimensional and multi-task learning. Specifically, CMuST proposes a new multi-dimensional spatiotemporal interaction network (MSTI) to allow cross-interactions between context and main observations as well as self-interactions within spatial and temporal aspects to be exposed, which is also the core for capturing task-level commonality and personalization. To ensure continuous task learning, a novel Rolling Adaptation training scheme (RoAda) is devised, which not only preserves task uniqueness by constructing data summarization-driven task prompts, but also harnesses correlated patterns among tasks by iterative model behavior modeling. We further establish a benchmark of three cities for multi-task spatiotemporal learning, and empirically demonstrate the superiority of CMuST via extensive evaluations on these datasets. The impressive improvements on both few-shot streaming data and new domain tasks against existing SOAT methods are achieved. Code is available at https://github.com/DILab-USTCSZ/CMuST., Comment: Accepted by NeurIPS 2024
- Published
- 2024
11. ERCache: An Efficient and Reliable Caching Framework for Large-Scale User Representations in Meta's Ads System
- Author
-
Zhou, Fang, Huang, Yaning, Liang, Dong, Li, Dai, Zhang, Zhongke, Wang, Kai, Xin, Xiao, Aboelela, Abdallah, Jiang, Zheliang, Wang, Yang, Song, Jeff, Zhang, Wei, Liang, Chen, Li, Huayu, Sun, ChongLin, Yang, Hang, Qu, Lei, Shu, Zhan, Yuan, Mindi, Maccherani, Emanuele, Hayat, Taha, Guo, John, Puvvada, Varna, and Pashkevich, Uladzimir
- Subjects
Computer Science - Information Retrieval ,Computer Science - Artificial Intelligence ,Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Machine Learning - Abstract
The increasing complexity of deep learning models used for calculating user representations presents significant challenges, particularly with limited computational resources and strict service-level agreements (SLAs). Previous research efforts have focused on optimizing model inference but have overlooked a critical question: is it necessary to perform user model inference for every ad request in large-scale social networks? To address this question and these challenges, we first analyze user access patterns at Meta and find that most user model inferences occur within a short timeframe. T his observation reveals a triangular relationship among model complexity, embedding freshness, and service SLAs. Building on this insight, we designed, implemented, and evaluated ERCache, an efficient and robust caching framework for large-scale user representations in ads recommendation systems on social networks. ERCache categorizes cache into direct and failover types and applies customized settings and eviction policies for each model, effectively balancing model complexity, embedding freshness, and service SLAs, even considering the staleness introduced by caching. ERCache has been deployed at Meta for over six months, supporting more than 30 ranking models while efficiently conserving computational resources and complying with service SLA requirements.
- Published
- 2024
12. ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving
- Author
-
Qiao, Yifan, Anzai, Shu, Yu, Shan, Ma, Haoran, Wang, Yang, Kim, Miryung, and Xu, Harry
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Machine Learning - Abstract
Many applications are leveraging large language models (LLMs) for complex tasks, and they generally demand low inference latency and high serving throughput for interactive online jobs such as chatbots. However, the tight latency requirement and high load variance of applications pose challenges to serving systems in achieving high GPU utilization. Due to the high costs of scheduling and preemption, today's systems generally use separate clusters to serve online and offline inference tasks, and dedicate GPUs for online inferences to avoid interference. This approach leads to underutilized GPUs because one must reserve enough GPU resources for the peak expected load, even if the average load is low. This paper proposes to harvest stranded GPU resources for offline LLM inference tasks such as document summarization and LLM benchmarking. Unlike online inferences, these tasks usually run in a batch-processing manner with loose latency requirements, making them a good fit for stranded resources that are only available shortly. To enable safe and efficient GPU harvesting without interfering with online tasks, we built ConServe, an LLM serving system that contains (1) an execution engine that preempts running offline tasks upon the arrival of online tasks, (2) an incremental checkpointing mechanism that minimizes the amount of recomputation required by preemptions, and (3) a scheduler that adaptively batches offline tasks for higher GPU utilization. Our evaluation demonstrates that ConServe achieves strong performance isolation when co-serving online and offline tasks but at a much higher GPU utilization. When colocating practical online and offline workloads on popular models such as Llama-2-7B, ConServe achieves 2.35$\times$ higher throughput than state-of-the-art online serving systems and reduces serving latency by 84$\times$ compared to existing co-serving systems.
- Published
- 2024
13. From Facts to Insights: A Study on the Generation and Evaluation of Analytical Reports for Deciphering Earnings Calls
- Author
-
Goldsack, Tomas, Wang, Yang, Lin, Chenghua, and Chen, Chung-Chi
- Subjects
Computer Science - Computation and Language - Abstract
This paper explores the use of Large Language Models (LLMs) in the generation and evaluation of analytical reports derived from Earnings Calls (ECs). Addressing a current gap in research, we explore the generation of analytical reports with LLMs in a multi-agent framework, designing specialized agents that introduce diverse viewpoints and desirable topics of analysis into the report generation process. Through multiple analyses, we examine the alignment between generated and human-written reports and the impact of both individual and collective agents. Our findings suggest that the introduction of additional agents results in more insightful reports, although reports generated by human experts remain preferred in the majority of cases. Finally, we address the challenging issue of report evaluation, we examine the limitations and strengths of LLMs in assessing the quality of generated reports in different settings, revealing a significant correlation with human experts across multiple dimensions., Comment: Pre-print
- Published
- 2024
14. Learning Frame-Wise Emotion Intensity for Audio-Driven Talking-Head Generation
- Author
-
Xu, Jingyi, Le, Hieu, Shu, Zhixin, Wang, Yang, Tsai, Yi-Hsuan, and Samaras, Dimitris
- Subjects
Computer Science - Sound ,Computer Science - Artificial Intelligence ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Human emotional expression is inherently dynamic, complex, and fluid, characterized by smooth transitions in intensity throughout verbal communication. However, the modeling of such intensity fluctuations has been largely overlooked by previous audio-driven talking-head generation methods, which often results in static emotional outputs. In this paper, we explore how emotion intensity fluctuates during speech, proposing a method for capturing and generating these subtle shifts for talking-head generation. Specifically, we develop a talking-head framework that is capable of generating a variety of emotions with precise control over intensity levels. This is achieved by learning a continuous emotion latent space, where emotion types are encoded within latent orientations and emotion intensity is reflected in latent norms. In addition, to capture the dynamic intensity fluctuations, we adopt an audio-to-intensity predictor by considering the speaking tone that reflects the intensity. The training signals for this predictor are obtained through our emotion-agnostic intensity pseudo-labeling method without the need of frame-wise intensity labeling. Extensive experiments and analyses validate the effectiveness of our proposed method in accurately capturing and reproducing emotion intensity fluctuations in talking-head generation, thereby significantly enhancing the expressiveness and realism of the generated outputs.
- Published
- 2024
15. VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models
- Author
-
Liu, Yifei, Wen, Jicheng, Wang, Yang, Ye, Shengyu, Zhang, Li Lyna, Cao, Ting, Li, Cheng, and Yang, Mao
- Subjects
Computer Science - Artificial Intelligence - Abstract
Scaling model size significantly challenges the deployment and inference of Large Language Models (LLMs). Due to the redundancy in LLM weights, recent research has focused on pushing weight-only quantization to extremely low-bit (even down to 2 bits). It reduces memory requirements, optimizes storage costs, and decreases memory bandwidth needs during inference. However, due to numerical representation limitations, traditional scalar-based weight quantization struggles to achieve such extreme low-bit. Recent research on Vector Quantization (VQ) for LLMs has demonstrated the potential for extremely low-bit model quantization by compressing vectors into indices using lookup tables. In this paper, we introduce Vector Post-Training Quantization (VPTQ) for extremely low-bit quantization of LLMs. We use Second-Order Optimization to formulate the LLM VQ problem and guide our quantization algorithm design by solving the optimization. We further refine the weights using Channel-Independent Second-Order Optimization for a granular VQ. In addition, by decomposing the optimization problem, we propose a brief and effective codebook initialization algorithm. We also extend VPTQ to support residual and outlier quantization, which enhances model accuracy and further compresses the model. Our experimental results show that VPTQ reduces model quantization perplexity by $0.01$-$0.34$ on LLaMA-2, $0.38$-$0.68$ on Mistral-7B, $4.41$-$7.34$ on LLaMA-3 over SOTA at 2-bit, with an average accuracy improvement of $0.79$-$1.5\%$ on LLaMA-2, $1\%$ on Mistral-7B, $11$-$22\%$ on LLaMA-3 on QA tasks on average. We only utilize $10.4$-$18.6\%$ of the quantization algorithm execution time, resulting in a $1.6$-$1.8\times$ increase in inference throughput compared to SOTA., Comment: EMNLP 2024, Main, Poster
- Published
- 2024
16. NFTracer: Tracing NFT Impact Dynamics in Transaction-flow Substitutive Systems with Visual Analytics
- Author
-
Cao, Yifan, Shi, Qing, Shen, Lue, Chen, Kani, Wang, Yang, Zeng, Wei, and Qu, Huamin
- Subjects
Computer Science - Computational Engineering, Finance, and Science ,Computer Science - Social and Information Networks - Abstract
Impact dynamics are crucial for estimating the growth patterns of NFT projects by tracking the diffusion and decay of their relative appeal among stakeholders. Machine learning methods for impact dynamics analysis are incomprehensible and rigid in terms of their interpretability and transparency, whilst stakeholders require interactive tools for informed decision-making. Nevertheless, developing such a tool is challenging due to the substantial, heterogeneous NFT transaction data and the requirements for flexible, customized interactions. To this end, we integrate intuitive visualizations to unveil the impact dynamics of NFT projects. We first conduct a formative study and summarize analysis criteria, including substitution mechanisms, impact attributes, and design requirements from stakeholders. Next, we propose the Minimal Substitution Model to simulate substitutive systems of NFT projects that can be feasibly represented as node-link graphs. Particularly, we utilize attribute-aware techniques to embed the project status and stakeholder behaviors in the layout design. Accordingly, we develop a multi-view visual analytics system, namely NFTracer, allowing interactive analysis of impact dynamics in NFT transactions. We demonstrate the informativeness, effectiveness, and usability of NFTracer by performing two case studies with domain experts and one user study with stakeholders. The studies suggest that NFT projects featuring a higher degree of similarity are more likely to substitute each other. The impact of NFT projects within substitutive systems is contingent upon the degree of stakeholders' influx and projects' freshness., Comment: 25 pages, 13 figures, 3 tables, accepted by IEEE Transactions on Visualization and Computer Graphics (2024)
- Published
- 2024
- Full Text
- View/download PDF
17. One Model for Two Tasks: Cooperatively Recognizing and Recovering Low-Resolution Scene Text Images by Iterative Mutual Guidance
- Author
-
Zhao, Minyi, Wang, Yang, Guan, Jihong, and Zhou, Shuigeng
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Scene text recognition (STR) from high-resolution (HR) images has been significantly successful, however text reading on low-resolution (LR) images is still challenging due to insufficient visual information. Therefore, recently many scene text image super-resolution (STISR) models have been proposed to generate super-resolution (SR) images for the LR ones, then STR is done on the SR images, which thus boosts recognition performance. Nevertheless, these methods have two major weaknesses. On the one hand, STISR approaches may generate imperfect or even erroneous SR images, which mislead the subsequent recognition of STR models. On the other hand, as the STISR and STR models are jointly optimized, to pursue high recognition accuracy, the fidelity of SR images may be spoiled. As a result, neither the recognition performance nor the fidelity of STISR models are desirable. Then, can we achieve both high recognition performance and good fidelity? To this end, in this paper we propose a novel method called IMAGE (the abbreviation of Iterative MutuAl GuidancE) to effectively recognize and recover LR scene text images simultaneously. Concretely, IMAGE consists of a specialized STR model for recognition and a tailored STISR model to recover LR images, which are optimized separately. And we develop an iterative mutual guidance mechanism, with which the STR model provides high-level semantic information as clue to the STISR model for better super-resolution, meanwhile the STISR model offers essential low-level pixel clue to the STR model for more accurate recognition. Extensive experiments on two LR datasets demonstrate the superiority of our method over the existing works on both recognition performance and super-resolution fidelity.
- Published
- 2024
18. Hybrid spin-phonon architecture for scalable solid-state quantum nodes
- Author
-
Peng, Ruoming, Wu, Xuntao, Wang, Yang, Zhang, Jixing, Geng, Jianpei, Dasari, Durga Bhaktavatsala Rao, Cleland, Andrew N., and Wrachtrup, Jörg
- Subjects
Quantum Physics - Abstract
Solid-state spin systems hold great promise for quantum information processing and the construction of quantum networks. However, the considerable inhomogeneity of spins in solids poses a significant challenge to the scaling of solid-state quantum systems. A practical protocol to individually control and entangle spins remains elusive. To this end, we propose a hybrid spin-phonon architecture based on spin-embedded SiC optomechanical crystal (OMC) cavities, which integrates photonic and phononic channels allowing for interactions between multiple spins. With a Raman-facilitated process, the OMC cavities support coupling between the spin and the zero-point motion of the OMC cavity mode reaching 0.57 MHz, facilitating phonon preparation and spin Rabi swap processes. Based on this, we develop a spin-phonon interface that achieves a two-qubit controlled-Z gate with a simulated fidelity of $96.80\%$ and efficiently generates highly entangled Dicke states with over $99\%$ fidelity, by engineering the strongly coupled spin-phonon dark state which is robust against loss from excited state relaxation as well as spectral inhomogeneity of the defect centers. This provides a hybrid platform for exploring spin entanglement with potential scalability and full connectivity in addition to an optical link, and offers a pathway to investigate quantum acoustics in solid-state systems.
- Published
- 2024
19. Finite Element Modeling of Surface Traveling Wave Friction Driven for Rotary Ultrasonic Motor
- Author
-
Zhao, Zhanyue, Wang, Yang, Bales, Charles, Jiang, Yiwei, and Fischer, Gregory
- Subjects
Computer Science - Robotics - Abstract
Finite element modeling (FEM) is a critical tool in the design and analysis of piezoelectric devices, offering detailed numerical simulations that guide various applications. While traditionally applied to eigenfrequency analysis and time-dependent studies for predicting excitation eigenfrequencies and estimating traveling wave amplitudes, FEM's potential extends to more sophisticated tasks. Advanced FEM applications, such as modeling friction-driven dynamic motion and reaction forces, are essential for accurately simulating the complex behaviors of piezoelectric actuators under real-world conditions. This paper presents a comprehensive motor model that encompasses the coupling dynamics between the stator and rotor in a piezoelectric ultrasonic motor (USM). Utilizing contact theory, the model simulates the complex conditions encountered during the USM's initial start-up phase and its transition to steady-state operation. Implemented in COMSOL Multiphysics, the model provides an in-depth analysis of a rotary piezoelectric actuator, capturing the dynamic interactions and reaction forces that influence its performance. The introduction of this FEM-based model represents a significant advancement in the simulation and understanding of piezoelectric actuators. By offering a more complete picture of the motor's behavior from start-up to steady state, this study enables more accurate control and optimization of piezoelectric devices, enhancing their efficiency and reliability in practical applications., Comment: 6 pages, 14 figures, 6 tables
- Published
- 2024
20. Learning Agile Swimming: An End-to-End Approach without CPGs
- Author
-
Lin, Xiaozhu, Liu, Xiaopei, and Wang, Yang
- Subjects
Computer Science - Robotics - Abstract
The pursuit of agile and efficient underwater robots, especially bio-mimetic robotic fish, has been impeded by challenges in creating motion controllers that are able to fully exploit their hydrodynamic capabilities. This paper addresses these challenges by introducing a novel, model-free, end-to-end control framework that leverages Deep Reinforcement Learning (DRL) to enable agile and energy-efficient swimming of robotic fish. Unlike existing methods that rely on predefined trigonometric swimming patterns like Central Pattern Generators (CPG), our approach directly outputs low-level actuator commands without strong constraint, enabling the robotic fish to learn agile swimming behaviors. In addition, by integrating a high-performance Computational Fluid Dynamics (CFD) simulator with innovative sim-to-real strategies, such as normalized density matching and servo response matching, the proposed framework significantly mitigates the sim-to-real gap, facilitating direct transfer of control policies to real-world environments without fine-tuning. Comparative experiments demonstrate that our method achieves faster swimming speeds, smaller turning radii, and reduced energy consumption compared to the conventional CPG-PID-based controllers. Furthermore, the proposed framework shows promise in addressing complex tasks in diverse scenario, paving the way for more effective deployment of robotic fish in real aquatic environments., Comment: 8 pages, 7 figures
- Published
- 2024
21. A Preliminary Add-on Differential Drive System for MRI-Compatible Prostate Robotic System
- Author
-
Zhao, Zhanyue, Jiang, Yiwei, Bales, Charles, Wang, Yang, and Fischer, Gregory
- Subjects
Computer Science - Robotics - Abstract
MRI-targeted biopsy has shown significant advantages over conventional random sextant biopsy, detecting more clinically significant cancers and improving risk stratification. However, needle targeting accuracy, especially in transperineal MRI-guided biopsies, presents a challenge due to needle deflection. This can negatively impact patient outcomes, leading to repeated sampling and inaccurate diagnoses if cancerous tissue isn't properly collected. To address this, we developed a novel differential drive prototype designed to improve needle control and targeting precision. This system, featuring a 2-degree-of-freedom (2-DOF) MRI-compatible cooperative needle driver, distances the robot from the MRI imaging area, minimizing image artifacts and distortions. By using two motors for simultaneous needle insertion and rotation without relative movement, the design reduces MRI interference. In this work, we introduced two mechanical differential drive designs: the ball screw/spline and lead screw/bushing types, and explored both hollow-type and side-pulley differentials. Validation through low-resolution rapid-prototyping demonstrated the feasibility of differential drives in prostate biopsies, with the custom hollow-type hybrid ultrasonic motor (USM) achieving a rotary speed of 75 rpm. The side-pulley differential further increased the speed to 168 rpm, ideal for needle rotation applications. Accuracy assessments showed minimal errors in both insertion and rotation motions, indicating that this proof-of-concept design holds great promise for further development. Ultimately, the differential drive offers a promising solution to the critical issue of needle targeting accuracy in MRI-guided prostate biopsies., Comment: 8 pages, 19 figures, 3 tables
- Published
- 2024
22. From Experts to the Public: Governing Multimodal Language Models in Politically Sensitive Video Analysis
- Author
-
Sharma, Tanusree, Potter, Yujin, Kilhoffer, Zachary, Huang, Yun, Song, Dawn, and Wang, Yang
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computers and Society - Abstract
This paper examines the governance of multimodal large language models (MM-LLMs) through individual and collective deliberation, focusing on analyses of politically sensitive videos. We conducted a two-step study: first, interviews with 10 journalists established a baseline understanding of expert video interpretation; second, 114 individuals from the general public engaged in deliberation using Inclusive.AI, a platform that facilitates democratic decision-making through decentralized autonomous organization (DAO) mechanisms. Our findings show that while experts emphasized emotion and narrative, the general public prioritized factual clarity, objectivity of the situation, and emotional neutrality. Additionally, we explored the impact of different governance mechanisms: quadratic vs. weighted ranking voting and equal vs. 20-80 power distributions on users decision-making on how AI should behave. Specifically, quadratic voting enhanced perceptions of liberal democracy and political equality, and participants who were more optimistic about AI perceived the voting process to have a higher level of participatory democracy. Our results suggest the potential of applying DAO mechanisms to help democratize AI governance.
- Published
- 2024
23. Characterization and Design of A Hollow Cylindrical Ultrasonic Motor
- Author
-
Zhao, Zhanyue, Wang, Yang, Bales, Charles, Ruiz-Cadalso, Daniel, Zheng, Howard, Furlong-Vazquez, Cosme, and Fischer, Gregory
- Subjects
Computer Science - Robotics - Abstract
Piezoelectric ultrasonic motors perform the advantages of compact design, faster reaction time, and simpler setup compared to other motion units such as pneumatic and hydraulic motors, especially its non-ferromagnetic property makes it a perfect match in MRI-compatible robotics systems compared to traditional DC motors. Hollow shaft motors address the advantages of being lightweight and comparable to solid shafts of the same diameter, low rotational inertia, high tolerance to rotational imbalance due to low weight, and tolerance to high temperature due to low specific mass. This article presents a prototype of a hollow cylindrical ultrasonic motor (HCM) to perform direct drive, eliminate mechanical non-linearity, and reduce the size and complexity of the actuator or end effector assembly. Two equivalent HCMs are presented in this work, and under 50g prepressure on the rotor, it performed 383.3333rpm rotation speed and 57.3504mNm torque output when applying 282$V_{pp}$ driving voltage., Comment: 6 pages, 9 figures, 2 tables
- Published
- 2024
24. Insulator to Metal Transition under High Pressure in FeNb$_3$Se$_{10}$
- Author
-
Wang, Haozhe, Huyan, Shuyuan, Downey, Eoghan, Wang, Yang, Smolenski, Shane, Li, Du, Yang, Li, Bostwick, Aaron, Jozwiak, Chris, Rotenberg, Eli, Bud'ko, Sergey L., Canfield, Paul C., Cava, R. J., Jo, Na Hyun, and Xie, Weiwei
- Subjects
Condensed Matter - Strongly Correlated Electrons - Abstract
Non-magnetic FeNb$_3$Se$_{10}$ has been demonstrated to be an insulator at ambient pressure through both theoretical calculations and experimental measurements and it does not host topological surface states. Here we show that on the application of pressure, FeNb$_3$Se$_{10}$ transitions to a metallic state at around 3.0 GPa. With a further increase in pressure, its resistivity becomes independent of both temperature and pressure. Its crystal structure is maintained to at least 4.4 GPa., Comment: 20 pages, 5 figures
- Published
- 2024
25. Approximation Bounds for Recurrent Neural Networks with Application to Regression
- Author
-
Jiao, Yuling, Wang, Yang, and Yan, Bokai
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
We study the approximation capacity of deep ReLU recurrent neural networks (RNNs) and explore the convergence properties of nonparametric least squares regression using RNNs. We derive upper bounds on the approximation error of RNNs for H\"older smooth functions, in the sense that the output at each time step of an RNN can approximate a H\"older function that depends only on past and current information, termed a past-dependent function. This allows a carefully constructed RNN to simultaneously approximate a sequence of past-dependent H\"older functions. We apply these approximation results to derive non-asymptotic upper bounds for the prediction error of the empirical risk minimizer in regression problem. Our error bounds achieve minimax optimal rate under both exponentially $\beta$-mixing and i.i.d. data assumptions, improving upon existing ones. Our results provide statistical guarantees on the performance of RNNs.
- Published
- 2024
26. Development of Advanced FEM Simulation Technology for Pre-Operative Surgical Planning
- Author
-
Zhao, Zhanyue, Jiang, Yiwei, Bales, Charles, Wang, Yang, and Fischer, Gregory
- Subjects
Physics - Medical Physics ,Computer Science - Robotics - Abstract
Intracorporeal needle-based therapeutic ultrasound (NBTU) offers a minimally invasive approach for the thermal ablation of malignant brain tumors, including both primary and metastatic cancers. NBTU utilizes a high-frequency alternating electric field to excite a piezoelectric transducer, generating acoustic waves that cause localized heating and tumor cell ablation, and it provides a more precise ablation by delivering lower acoustic power doses directly to targeted tumors while sparing surrounding healthy tissue. Building on our previous work, this study introduces a database for optimizing pre-operative surgical planning by simulating ablation effects in varied tissue environments and develops an extended simulation model incorporating various tumor types and sizes to evaluate thermal damage under trans-tissue conditions. A comprehensive database is created from these simulations, detailing critical parameters such as CEM43 isodose maps, temperature changes, thermal dose areas, and maximum ablation distances for four directional probes. This database serves as a valuable resource for future studies, aiding in complex trajectory planning and parameter optimization for NBTU procedures. Moreover, a novel probe selection method is proposed to enhance pre-surgical planning, providing a strategic approach to selecting probes that maximize therapeutic efficiency and minimize ablation time. By avoiding unnecessary thermal propagation and optimizing probe angles, this method has the potential to improve patient outcomes and streamline surgical procedures. Overall, the findings of this study contribute significantly to the field of NBTU, offering a robust framework for enhancing treatment precision and efficacy in clinical settings., Comment: 8 pages, 17 figures, 2 tables
- Published
- 2024
27. VQ-DeepVSC: A Dual-Stage Vector Quantization Framework for Video Semantic Communication
- Author
-
Miao, Yongyi, Li, Zhongdang, Wang, Yang, Hu, Die, Yan, Jun, and Wang, Youfang
- Subjects
Computer Science - Networking and Internet Architecture - Abstract
In response to the rapid growth of global videomtraffic and the limitations of traditional wireless transmission systems, we propose a novel dual-stage vector quantization framework, VQ-DeepVSC, tailored to enhance video transmission over wireless channels. In the first stage, we design the adaptive keyframe extractor and interpolator, deployed respectively at the transmitter and receiver, which intelligently select key frames to minimize inter-frame redundancy and mitigate the cliff-effect under challenging channel conditions. In the second stage, we propose the semantic vector quantization encoder and decoder, placed respectively at the transmitter and receiver, which efficiently compress key frames using advanced indexing and spatial normalization modules to reduce redundancy. Additionally, we propose adjustable index selection and recovery modules, enhancing compression efficiency and enabling flexible compression ratio adjustment. Compared to the joint source-channel coding (JSCC) framework, the proposed framework exhibits superior compatibility with current digital communication systems. Experimental results demonstrate that VQ-DeepVSC achieves substantial improvements in both Multi-Scale Structural Similarity (MS-SSIM) and Learned Perceptual Image Patch Similarity (LPIPS) metrics than the H.265 standard, particularly under low channel signal-to-noise ratio (SNR) or multi-path channels, highlighting the significantly enhanced transmission capabilities of our approach.
- Published
- 2024
28. Simple H\'uckel Molecular Orbital Theory for M\'obius Carbon Nanobelts
- Author
-
Wang, Yang
- Subjects
Physics - Chemical Physics ,Condensed Matter - Materials Science - Abstract
The recently synthesized M\"obius carbon nanobelts (CNBs) have gained attention owing to their unique $\pi$-conjugation topology, which results in distinctive electronic properties with both fundamental and practical implications. Although M\"obius conjugation with phase inversion in atomic orbital (AO) basis is well-established for monocyclic systems, the extension of this understanding to double-stranded M\"obius CNBs remains uncertain. This study thoroughly examines the simple H\"uckel molecular orbital (SHMO) theory for describing the $\pi$ electronic structures of M\"obius CNBs. We demonstrate that the adjacency matrix for any M\"obius CNB is isomorphism invariant under different placements of the sign inversion, ensuring identical SHMO results regardless of AO phase inversion location. Representative examples of M\"obius CNBs, including the experimentally synthesized one, show that the H\"uckel molecular orbitals (MOs) strikingly resemble the DFT-computed $\pi$ MOs, which were obtained using a herein proposed technique based on the localization and re-delocalization of DFT canonical MOs. Interestingly, the lower-lying $\pi$ MOs exhibit an odd number of nodal planes and are doubly quasidegenerate as a consequence of the phase inversion in M\"obius macrocycles, contrasting with macrocyclic H\"uckel systems. Coulson bond orders derived from SHMO theory correlate well with DFT-calculated Wiberg bond indices for all C-C bonds in tested M\"obius CNBs. Additionally, a remarkable correlation is observed between HOMO-LUMO gaps obtained from the SHMO and GFN2-xTB calculations for a large number of topoisomers of M\"obius CNBs. Thus, the SHMO model not only captures the essence of $\pi$ electronic structure of M\"obius CNBs, but also provides reliable quantitative predictions comparable to DFT results., Comment: 18 pages, 9 figures
- Published
- 2024
29. Deep Brain Ultrasound Ablation Thermal Dose Modeling with in Vivo Experimental Validation
- Author
-
Zhao, Zhanyue, Szewczyk, Benjamin, Tarasek, Matthew, Bales, Charles, Wang, Yang, Liu, Ming, Jiang, Yiwei, Bhushan, Chitresh, Fiveland, Eric, Campwala, Zahabiya, Trowbridge, Rachel, Johansen, Phillip M., Olmsted, Zachary, Ghoshal, Goutam, Heffter, Tamas, Gandomi, Katie, Tavakkolmoghaddam, Farid, Nycz, Christopher, Jeannotte, Erin, Mane, Shweta, Nalwalk, Julia, Burdette, E. Clif, Qian, Jiang, Yeo, Desmond, Pilitsis, Julie, and Fischer, Gregory S.
- Subjects
Physics - Medical Physics ,Computer Science - Robotics - Abstract
Intracorporeal needle-based therapeutic ultrasound (NBTU) is a minimally invasive option for intervening in malignant brain tumors, commonly used in thermal ablation procedures. This technique is suitable for both primary and metastatic cancers, utilizing a high-frequency alternating electric field (up to 10 MHz) to excite a piezoelectric transducer. The resulting rapid deformation of the transducer produces an acoustic wave that propagates through tissue, leading to localized high-temperature heating at the target tumor site and inducing rapid cell death. To optimize the design of NBTU transducers for thermal dose delivery during treatment, numerical modeling of the acoustic pressure field generated by the deforming piezoelectric transducer is frequently employed. The bioheat transfer process generated by the input pressure field is used to track the thermal propagation of the applicator over time. Magnetic resonance thermal imaging (MRTI) can be used to experimentally validate these models. Validation results using MRTI demonstrated the feasibility of this model, showing a consistent thermal propagation pattern. However, a thermal damage isodose map is more advantageous for evaluating therapeutic efficacy. To achieve a more accurate simulation based on the actual brain tissue environment, a new finite element method (FEM) simulation with enhanced damage evaluation capabilities was conducted. The results showed that the highest temperature and ablated volume differed between experimental and simulation results by 2.1884{\deg}C (3.71%) and 0.0631 cm$^3$ (5.74%), respectively. The lowest Pearson correlation coefficient (PCC) for peak temperature was 0.7117, and the lowest Dice coefficient for the ablated area was 0.7021, indicating a good agreement in accuracy between simulation and experiment., Comment: 9 pages, 9 figures, 7 tables
- Published
- 2024
30. Multi-scale Temporal Fusion Transformer for Incomplete Vehicle Trajectory Prediction
- Author
-
Liu, Zhanwen, Li, Chao, Wang, Yang, Yang, Nan, Fan, Xing, Ma, Jiaqi, and Zhao, Xiangmo
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Motion prediction plays an essential role in autonomous driving systems, enabling autonomous vehicles to achieve more accurate local-path planning and driving decisions based on predictions of the surrounding vehicles. However, existing methods neglect the potential missing values caused by object occlusion, perception failures, etc., which inevitably degrades the trajectory prediction performance in real traffic scenarios. To address this limitation, we propose a novel end-to-end framework for incomplete vehicle trajectory prediction, named Multi-scale Temporal Fusion Transformer (MTFT), which consists of the Multi-scale Attention Head (MAH) and the Continuity Representation-guided Multi-scale Fusion (CRMF) module. Specifically, the MAH leverages the multi-head attention mechanism to parallelly capture multi-scale motion representation of trajectory from different temporal granularities, thus mitigating the adverse effect of missing values on prediction. Furthermore, the multi-scale motion representation is input into the CRMF module for multi-scale fusion to obtain the robust temporal feature of the vehicle. During the fusion process, the continuity representation of vehicle motion is first extracted across time steps to guide the fusion, ensuring that the resulting temporal feature incorporates both detailed information and the overall trend of vehicle motion, which facilitates the accurate decoding of future trajectory that is consistent with the vehicle's motion trend. We evaluate the proposed model on four datasets derived from highway and urban traffic scenarios. The experimental results demonstrate its superior performance in the incomplete vehicle trajectory prediction task compared with state-of-the-art models, e.g., a comprehensive performance improvement of more than 39% on the HighD dataset.
- Published
- 2024
31. Scaler: Efficient and Effective Cross Flow Analysis
- Author
-
Steven, Tang, Xiang, Mingcan, Wang, Yang, Wu, Bo, Chen, Jianjun, and Liu, Tongping
- Subjects
Computer Science - Performance - Abstract
Performance analysis is challenging as different components (e.g.,different libraries, and applications) of a complex system can interact with each other. However, few existing tools focus on understanding such interactions. To bridge this gap, we propose a novel analysis method "Cross Flow Analysis (XFA)" that monitors the interactions/flows across these components. We also built the Scaler profiler that provides a holistic view of the time spent on each component (e.g., library or application) and every API inside each component. This paper proposes multiple new techniques, such as Universal Shadow Table, and Relation-Aware Data Folding. These techniques enable Scaler to achieve low runtime overhead, low memory overhead, and high profiling accuracy. Based on our extensive experimental results, Scaler detects multiple unknown performance issues inside widely-used applications, and therefore will be a useful complement to existing work. The reproduction package including the source code, benchmarks, and evaluation scripts, can be found at https://doi.org/10.5281/zenodo.13336658., Comment: Paper has been accepted by ASE'24 https://conf.researchr.org/details/ase-2024/ase-2024-research/73/Scaler-Efficient-and-Effective-Cross-Flow-Analysis
- Published
- 2024
- Full Text
- View/download PDF
32. Digital Parenting Burdens in China
- Author
-
Lim, Sun Sun and Wang, Yang
- Subjects
Parental Investment ,Intensive Parenting ,Edtech ,Home School Conferencing ,Performative Parenting ,Peer Pressure ,Academic Pressure ,Urban Middle Class ,Digitalisation of Family Life ,thema EDItEUR::J Society and Social Sciences::JB Society and culture: general::JBC Cultural and media studies::JBCT Media studies::JBCT1 Media studies: internet, digital media and society ,thema EDItEUR::J Society and Social Sciences::JH Sociology and anthropology::JHB Sociology::JHBK Sociology: family and relationships ,thema EDItEUR::J Society and Social Sciences::JB Society and culture: general::JBS Social groups, communities and identities::JBSP Age groups and generations::JBSP1 Age groups: children - Abstract
The ebook edition of this title is Open Access and freely available to read online. As a world leader in technology, China’s adoption of trend-setting innovations has led to the encroachment of digital technologies into the home. Digital Parenting Burdens in China is the first English language book to explore the impact of digitalisation on family life in China, including the phenomenon of ‘punch-in culture’ and its implications for family wellbeing. In an era of heightened digital connectivity via parent-teacher and parent-parent chatgroups and homework apps, how are Chinese parents coping with the challenges of parental accountability, peer pressure and performative parenting? Delving into 90 interviews from both before and during the Covid-19 pandemic, authors Sun Sun Lim and Yang Wang provide rich vignettes of family life in urban Chinese households in Beijing and Hangzhou to demonstrate how parents appropriate technology as they raise their children, steer them towards the social aspirations of academic achievement, and navigate the rocky terrains of children’s home-based learning during the pandemic lockdowns. Empirically grounded and theoretically informed, these vivid accounts serve as valuable insights into understanding how family life around is shifting in the face of digitalisation not only in China, but globally.
- Published
- 2024
33. Norfloxacin Sub-Inhibitory Concentration Affects Streptococcus suis Biofilm Formation and Virulence Gene Expression
- Author
-
Liu, Baobao, Yi, Li., Li, Jinpeng, Gong, Shenglong, Dong, Xiao, Wang, Chen, and Wang, Yang
- Published
- 2020
- Full Text
- View/download PDF
34. 3D Lead‐Organoselenide‐Halide Perovskites and their Mixed‐Chalcogenide and Mixed‐Halide Alloys
- Author
-
Li, Jiayi, Wang, Yang, Saha, Santanu, Chen, Zhihengyu, Hofmann, Jan, Misleh, Jason, Chapman, Karena W, Reimer, Jeffrey A, Filip, Marina R, and Karunadasa, Hemamala I
- Subjects
Inorganic Chemistry ,Macromolecular and Materials Chemistry ,Chemical Sciences ,Physical Chemistry ,Organic Chemistry ,Chemical sciences - Abstract
Abstract: We incorporate Se into the 3D halide perovskite framework using the zwitterionic ligand: SeCYS (+NH3(CH2)2Se−), which occupies both the X− and A+ sites in the prototypical ABX3 perovskite. The new organoselenide‐halide perovskites: (SeCYS)PbX2 (X=Cl, Br) expand upon the recently discovered organosulfide‐halide perovskites. Single‐crystal X‐ray diffraction and pair distribution function analysis reveal the average structures of the organoselenide‐halide perovskites, whereas the local lead coordination environments and their distributions were probed through solid‐state 77Se and 207Pb NMR, complemented by theoretical simulations. Density functional theory calculations illustrate that the band structures of (SeCYS)PbX2 largely resemble those of their S analogs, with similar band dispersion patterns, yet with a considerable band gap decrease. Optical absorbance measurements indeed show band gaps of 2.07 and 1.86 eV for (SeCYS)PbX2 with X=Cl and Br, respectively. We further demonstrate routes to alloying the halides (Cl, Br) and chalcogenides (S, Se) continuously tuning the band gap from 1.86 to 2.31 eV–straddling the ideal range for tandem solar cells or visible‐light photocatalysis. The comprehensive description of the average and local structures, and how they can fine‐tune the band gap and potential trap states, respectively, establishes the foundation for understanding this new perovskite family, which combines solid‐state and organo‐main‐group chemistry.
- Published
- 2024
35. Human-Inspired Audio-Visual Speech Recognition: Spike Activity, Cueing Interaction and Causal Processing
- Author
-
Liu, Qianhui, Wang, Jiadong, Wang, Yang, Yang, Xin, Pan, Gang, and Li, Haizhou
- Subjects
Computer Science - Multimedia ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Humans naturally perform audiovisual speech recognition (AVSR), enhancing the accuracy and robustness by integrating auditory and visual information. Spiking neural networks (SNNs), which mimic the brain's information-processing mechanisms, are well-suited for emulating the human capability of AVSR. Despite their potential, research on SNNs for AVSR is scarce, with most existing audio-visual multimodal methods focused on object or digit recognition. These models simply integrate features from both modalities, neglecting their unique characteristics and interactions. Additionally, they often rely on future information for current processing, which increases recognition latency and limits real-time applicability. Inspired by human speech perception, this paper proposes a novel human-inspired SNN named HI-AVSNN for AVSR, incorporating three key characteristics: cueing interaction, causal processing and spike activity. For cueing interaction, we propose a visual-cued auditory attention module (VCA2M) that leverages visual cues to guide attention to auditory features. We achieve causal processing by aligning the SNN's temporal dimension with that of visual and auditory features and applying temporal masking to utilize only past and current information. To implement spike activity, in addition to using SNNs, we leverage the event camera to capture lip movement as spikes, mimicking the human retina and providing efficient visual data. We evaluate HI-AVSNN on an audiovisual speech recognition dataset combining the DVS-Lip dataset with its corresponding audio samples. Experimental results demonstrate the superiority of our proposed fusion method, outperforming existing audio-visual SNN fusion methods and achieving a 2.27% improvement in accuracy over the only existing SNN-based AVSR method.
- Published
- 2024
36. Cross-sectional imaging of speed-of-sound distribution using photoacoustic reversal beacons
- Author
-
Wang, Yang, Wang, Danni, Zhong, Liting, Zhou, Yi, Wang, Qing, Chen, Wufan, and Qi, Li
- Subjects
Physics - Medical Physics ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Photoacoustic tomography (PAT) enables non-invasive cross-sectional imaging of biological tissues, but it fails to map the spatial variation of speed-of-sound (SOS) within tissues. While SOS is intimately linked to density and elastic modulus of tissues, the imaging of SOS distri-bution serves as a complementary imaging modality to PAT. Moreover, an accurate SOS map can be leveraged to correct for PAT image degradation arising from acoustic heterogene-ities. Herein, we propose a novel approach for SOS reconstruction using only PAT imaging modality. Our method is based on photoacoustic reversal beacons (PRBs), which are small light-absorbing targets with strong photoacoustic contrast. We excite and scan a number of PRBs positioned at the periphery of the target, and the generated photoacoustic waves prop-agate through the target from various directions, thereby achieve spatial sampling of the internal SOS. We formulate a linear inverse model for pixel-wise SOS reconstruction and solve it with iterative optimization technique. We validate the feasibility of the proposed method through simulations, phantoms, and ex vivo biological tissue tests. Experimental results demonstrate that our approach can achieve accurate reconstruction of SOS distribu-tion. Leveraging the obtained SOS map, we further demonstrate significantly enhanced PAT image reconstruction with acoustic correction.
- Published
- 2024
37. TripleMixer: A 3D Point Cloud Denoising Model for Adverse Weather
- Author
-
Zhao, Xiongwei, Wen, Congcong, Wang, Yang, Bai, Haojie, and Dou, Wenhao
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Robotics - Abstract
LiDAR sensors are crucial for providing high-resolution 3D point cloud data in autonomous driving systems, enabling precise environmental perception. However, real-world adverse weather conditions, such as rain, fog, and snow, introduce significant noise and interference, degrading the reliability of LiDAR data and the performance of downstream tasks like semantic segmentation. Existing datasets often suffer from limited weather diversity and small dataset sizes, which restrict their effectiveness in training models. Additionally, current deep learning denoising methods, while effective in certain scenarios, often lack interpretability, complicating the ability to understand and validate their decision-making processes. To overcome these limitations, we introduce two large-scale datasets, Weather-KITTI and Weather-NuScenes, which cover three common adverse weather conditions: rain, fog, and snow. These datasets retain the original LiDAR acquisition information and provide point-level semantic labels for rain, fog, and snow. Furthermore, we propose a novel point cloud denoising model, TripleMixer, comprising three mixer layers: the Geometry Mixer Layer, the Frequency Mixer Layer, and the Channel Mixer Layer. These layers are designed to capture geometric spatial information, extract multi-scale frequency information, and enhance the multi-channel feature information of point clouds, respectively. Experiments conducted on the WADS dataset in real-world scenarios, as well as on our proposed Weather-KITTI and Weather-NuScenes datasets, demonstrate that our model achieves state-of-the-art denoising performance. Additionally, our experiments show that integrating the denoising model into existing segmentation frameworks enhances the performance of downstream tasks.The datasets and code will be made publicly available at https://github.com/Grandzxw/TripleMixer., Comment: 15 pages, submit to IEEE TIP
- Published
- 2024
38. Motion-driven quantum dissipation in an open electronic system with nonlocal interaction
- Author
-
Wang, Yang, Zhang, Ruanjing, and Liu, Feiyi
- Subjects
Quantum Physics ,Condensed Matter - Other Condensed Matter - Abstract
In this paper, we study excitations and dissipation in two infinite parallel metallic plates with relative motion. We model the degrees of freedom of the electrons in both plates using the 1+2 dimensional Dirac field and select a nonlocal potential to describe the interaction between the two plates. The internal relative motion is introduced via a Galilean boost, assuming one plate slides relative to the other. We then calculate the effective action of the system and derive the vacuum occupation number in momentum space using a perturbative method. The numerical plots show that, as a function of momentum the vacuum occupation number is isotropic for a motion speed v = 0 and anisotropic for nonzero v. Due to energy transfer between the plates, the process of relative motion induces on-shell excitations, similar to the dissipative process of the Schwinger effect. Therefore, we can study the motion-induced dissipation effects and the dissipative forces via quantum action. The numerical results demonstrate that both the imaginary part of the quantum action for the motion boost and the dissipative force have a threshold as a function of v, and both are positively correlated with v.
- Published
- 2024
39. QMambaBSR: Burst Image Super-Resolution with Query State Space Model
- Author
-
Di, Xin, Peng, Long, Xia, Peizhe, Li, Wenbo, Pei, Renjing, Cao, Yang, Wang, Yang, and Zha, Zheng-Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Burst super-resolution aims to reconstruct high-resolution images with higher quality and richer details by fusing the sub-pixel information from multiple burst low-resolution frames. In BusrtSR, the key challenge lies in extracting the base frame's content complementary sub-pixel details while simultaneously suppressing high-frequency noise disturbance. Existing methods attempt to extract sub-pixels by modeling inter-frame relationships frame by frame while overlooking the mutual correlations among multi-current frames and neglecting the intra-frame interactions, leading to inaccurate and noisy sub-pixels for base frame super-resolution. Further, existing methods mainly employ static upsampling with fixed parameters to improve spatial resolution for all scenes, failing to perceive the sub-pixel distribution difference across multiple frames and cannot balance the fusion weights of different frames, resulting in over-smoothed details and artifacts. To address these limitations, we introduce a novel Query Mamba Burst Super-Resolution (QMambaBSR) network, which incorporates a Query State Space Model (QSSM) and Adaptive Up-sampling module (AdaUp). Specifically, based on the observation that sub-pixels have consistent spatial distribution while random noise is inconsistently distributed, a novel QSSM is proposed to efficiently extract sub-pixels through inter-frame querying and intra-frame scanning while mitigating noise interference in a single step. Moreover, AdaUp is designed to dynamically adjust the upsampling kernel based on the spatial distribution of multi-frame sub-pixel information in the different burst scenes, thereby facilitating the reconstruction of the spatial arrangement of high-resolution details. Extensive experiments on four popular synthetic and real-world benchmarks demonstrate that our method achieves a new state-of-the-art performance.
- Published
- 2024
40. Quantum key distribution based on mid-infrared and telecom band two-color entanglement source
- Author
-
Li, Wu-Zhen, Zhou, Chun, Wang, Yang, Chen, Li, Chen, Ren-Hui, Han, Zhao-Qi-Zhi, Gao, Ming-Yuan, Wang, Xiao-Hua, Zheng, Di-Yuan, Xie, Meng-Yu, Li, Yin-Hai, Zhou, Zhi-Yuan, Bao, Wan-Su, and Shi, Bao-Sen
- Subjects
Quantum Physics - Abstract
Due to the high noise caused by solar background radiation, the existing satellite-based free-space quantum key distribution (QKD) experiments are mainly carried out at night, hindering the establishment of a practical all-day real-time global-scale quantum network. Given that the 3-5 {\mu}m mid-infrared (MIR) band has extremely low solar background radiation and strong scattering resistance, it is one of the ideal bands for free-space quantum communication. Here, firstly, we report on the preparation of a high-quality MIR (3370 nm) and telecom band (1555 nm) two-color polarization-entangled photon source, then we use this source to realize a principle QKD based on free-space and fiber hybrid channels in a laboratory. The theoretical analysis clearly shows that a long-distance QKD over 500 km of free-space and 96 km of fiber hybrid channels can be reached simultaneously. This work represents a significant step toward developing all-day global-scale quantum communication networks., Comment: 24 pages, 9 figures
- Published
- 2024
41. PointNCBW: Towards Dataset Ownership Verification for Point Clouds via Negative Clean-label Backdoor Watermark
- Author
-
Wei, Cheng, Wang, Yang, Gao, Kuofeng, Shao, Shuo, Li, Yiming, Wang, Zhibo, and Qin, Zhan
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Recently, point clouds have been widely used in computer vision, whereas their collection is time-consuming and expensive. As such, point cloud datasets are the valuable intellectual property of their owners and deserve protection. To detect and prevent unauthorized use of these datasets, especially for commercial or open-sourced ones that cannot be sold again or used commercially without permission, we intend to identify whether a suspicious third-party model is trained on our protected dataset under the black-box setting. We achieve this goal by designing a scalable clean-label backdoor-based dataset watermark for point clouds that ensures both effectiveness and stealthiness. Unlike existing clean-label watermark schemes, which are susceptible to the number of categories, our method could watermark samples from all classes instead of only from the target one. Accordingly, it can still preserve high effectiveness even on large-scale datasets with many classes. Specifically, we perturb selected point clouds with non-target categories in both shape-wise and point-wise manners before inserting trigger patterns without changing their labels. The features of perturbed samples are similar to those of benign samples from the target class. As such, models trained on the watermarked dataset will have a distinctive yet stealthy backdoor behavior, i.e., misclassifying samples from the target class whenever triggers appear, since the trained DNNs will treat the inserted trigger pattern as a signal to deny predicting the target label. We also design a hypothesis-test-guided dataset ownership verification based on the proposed watermark. Extensive experiments on benchmark datasets are conducted, verifying the effectiveness of our method and its resistance to potential removal methods., Comment: This paper was accepted by IEEE Transactions on Information Forensics and Security (TIFS), 2024. 16 pages
- Published
- 2024
42. Early Risk Assessment Model for ICA Timing Strategy in Unstable Angina Patients Using Multi-Modal Machine Learning
- Author
-
Zheng, Candi, Liu, Kun, Wang, Yang, Chen, Shiyi, and Li, Hongli
- Subjects
Computer Science - Machine Learning - Abstract
Background: Invasive coronary arteriography (ICA) is recognized as the gold standard for diagnosing cardiovascular diseases, including unstable angina (UA). The challenge lies in determining the optimal timing for ICA in UA patients, balancing the need for revascularization in high-risk patients against the potential complications in low-risk ones. Unlike myocardial infarction, UA does not have specific indicators like ST-segment deviation or cardiac enzymes, making risk assessment complex. Objectives: Our study aims to enhance the early risk assessment for UA patients by utilizing machine learning algorithms. These algorithms can potentially identify patients who would benefit most from ICA by analyzing less specific yet related indicators that are challenging for human physicians to interpret. Methods: We collected data from 640 UA patients at Shanghai General Hospital, including medical history and electrocardiograms (ECG). Machine learning algorithms were trained using multi-modal demographic characteristics including clinical risk factors, symptoms, biomarker levels, and ECG features extracted by pre-trained neural networks. The goal was to stratify patients based on their revascularization risk. Additionally, we translated our models into applicable and explainable look-up tables through discretization for practical clinical use. Results: The study achieved an Area Under the Curve (AUC) of $0.719 \pm 0.065$ in risk stratification, significantly surpassing the widely adopted GRACE score's AUC of $0.579 \pm 0.044$. Conclusions: The results suggest that machine learning can provide superior risk stratification for UA patients. This improved stratification could help in balancing the risks, costs, and complications associated with ICA, indicating a potential shift in clinical assessment practices for unstable angina.
- Published
- 2024
43. CLIP-based Point Cloud Classification via Point Cloud to Image Translation
- Author
-
Ghose, Shuvozit, Li, Manyi, Qian, Yiming, and Wang, Yang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Point cloud understanding is an inherently challenging problem because of the sparse and unordered structure of the point cloud in the 3D space. Recently, Contrastive Vision-Language Pre-training (CLIP) based point cloud classification model i.e. PointCLIP has added a new direction in the point cloud classification research domain. In this method, at first multi-view depth maps are extracted from the point cloud and passed through the CLIP visual encoder. To transfer the 3D knowledge to the network, a small network called an adapter is fine-tuned on top of the CLIP visual encoder. PointCLIP has two limitations. Firstly, the point cloud depth maps lack image information which is essential for tasks like classification and recognition. Secondly, the adapter only relies on the global representation of the multi-view features. Motivated by this observation, we propose a Pretrained Point Cloud to Image Translation Network (PPCITNet) that produces generalized colored images along with additional salient visual cues to the point cloud depth maps so that it can achieve promising performance on point cloud classification and understanding. In addition, we propose a novel viewpoint adapter that combines the view feature processed by each viewpoint as well as the global intertwined knowledge that exists across the multi-view features. The experimental results demonstrate the superior performance of the proposed model over existing state-of-the-art CLIP-based models on ModelNet10, ModelNet40, and ScanobjectNN datasets., Comment: Accepted by ICPR2024
- Published
- 2024
44. The magnon mediated plasmon friction: a functional integral approach
- Author
-
Wang, Yang, Zhang, Ruanjing, and Liu, Feiyi
- Subjects
Condensed Matter - Statistical Mechanics ,Condensed Matter - Other Condensed Matter - Abstract
In this paper, we discuss quantum friction in a system formed by two metallic surfaces separated by a ferromagnetic intermedium of a certain thickness. The internal degrees of freedom in the two metallic surfaces are assumed to be plasmons, while the excitations in the intermediate material are magnons, modeling plasmons coupled to magnons. During relative sliding, one surface moves uniformly parallel to the other, causing friction in the system. By calculating the effective action of the magnons, we can determine the particle production probability, which shows a positive correlation between the probability and the sliding speed. Finally, we derive the frictional force of the system, with both theoretical and numerical results indicating that the friction, like the particle production probability, also has a positive correlation with the speed.
- Published
- 2024
45. BIV-Priv-Seg: Locating Private Content in Images Taken by People With Visual Impairments
- Author
-
Tseng, Yu-Yun, Sharma, Tanusree, Zhang, Lotus, Stangl, Abigale, Findlater, Leah, Wang, Yang, and Gurari, Danna
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Individuals who are blind or have low vision (BLV) are at a heightened risk of sharing private information if they share photographs they have taken. To facilitate developing technologies that can help preserve privacy, we introduce BIV-Priv-Seg, the first localization dataset originating from people with visual impairments that shows private content. It contains 1,028 images with segmentation annotations for 16 private object categories. We first characterize BIV-Priv-Seg and then evaluate modern models' performance for locating private content in the dataset. We find modern models struggle most with locating private objects that are not salient, small, and lack text as well as recognizing when private content is absent from an image. We facilitate future extensions by sharing our new dataset with the evaluation server at https://vizwiz.org/tasks-and-datasets/object-localization.
- Published
- 2024
46. TransFeat-TPP: An Interpretable Deep Covariate Temporal Point Processes
- Author
-
Meng, Zizhuo, Li, Boyu, Fan, Xuhui, Li, Zhidong, Wang, Yang, Chen, Fang, and Zhou, Feng
- Subjects
Computer Science - Machine Learning - Abstract
The classical temporal point process (TPP) constructs an intensity function by taking the occurrence times into account. Nevertheless, occurrence time may not be the only relevant factor, other contextual data, termed covariates, may also impact the event evolution. Incorporating such covariates into the model is beneficial, while distinguishing their relevance to the event dynamics is of great practical significance. In this work, we propose a Transformer-based covariate temporal point process (TransFeat-TPP) model to improve the interpretability of deep covariate-TPPs while maintaining powerful expressiveness. TransFeat-TPP can effectively model complex relationships between events and covariates, and provide enhanced interpretability by discerning the importance of various covariates. Experimental results on synthetic and real datasets demonstrate improved prediction accuracy and consistently interpretable feature importance when compared to existing deep covariate-TPPs.
- Published
- 2024
47. Distribution Alignment for Fully Test-Time Adaptation with Dynamic Online Data Streams
- Author
-
Wang, Ziqiang, Chi, Zhixiang, Wu, Yanan, Gu, Li, Liu, Zhi, Plataniotis, Konstantinos, and Wang, Yang
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Given a model trained on source data, Test-Time Adaptation (TTA) enables adaptation and inference in test data streams with domain shifts from the source. Current methods predominantly optimize the model for each incoming test data batch using self-training loss. While these methods yield commendable results in ideal test data streams, where batches are independently and identically sampled from the target distribution, they falter under more practical test data streams that are not independent and identically distributed (non-i.i.d.). The data batches in a non-i.i.d. stream display prominent label shifts relative to each other. It leads to conflicting optimization objectives among batches during the TTA process. Given the inherent risks of adapting the source model to unpredictable test-time distributions, we reverse the adaptation process and propose a novel Distribution Alignment loss for TTA. This loss guides the distributions of test-time features back towards the source distributions, which ensures compatibility with the well-trained source model and eliminates the pitfalls associated with conflicting optimization objectives. Moreover, we devise a domain shift detection mechanism to extend the success of our proposed TTA method in the continual domain shift scenarios. Our extensive experiments validate the logic and efficacy of our method. On six benchmark datasets, we surpass existing methods in non-i.i.d. scenarios and maintain competitive performance under the ideal i.i.d. assumption., Comment: Accepted to ECCV 2024
- Published
- 2024
48. SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling
- Author
-
Wang, Huizheng, Fang, Jiahao, Tang, Xinru, Yue, Zhiheng, Li, Jinxi, Qin, Yubin, Guan, Sihan, Yang, Qize, Wang, Yang, Li, Chao, Hu, Yang, and Yin, Shouyi
- Subjects
Computer Science - Hardware Architecture - Abstract
Benefiting from the self-attention mechanism, Transformer models have attained impressive contextual comprehension capabilities for lengthy texts. The requirements of high-throughput inference arise as the large language models (LLMs) become increasingly prevalent, which calls for large-scale token parallel processing (LTPP). However, existing dynamic sparse accelerators struggle to effectively handle LTPP, as they solely focus on separate stage optimization, and with most efforts confined to computational enhancements. By re-examining the end-to-end flow of dynamic sparse acceleration, we pinpoint an ever-overlooked opportunity that the LTPP can exploit the intrinsic coordination among stages to avoid excessive memory access and redundant computation. Motivated by our observation, we present SOFA, a cross-stage compute-memory efficient algorithm-hardware co-design, which is tailored to tackle the challenges posed by LTPP of Transformer inference effectively. We first propose a novel leading zero computing paradigm, which predicts attention sparsity by using log-based add-only operations to avoid the significant overhead of prediction. Then, a distributed sorting and a sorted updating FlashAttention mechanism are proposed with a cross-stage coordinated tiling principle, which enables fine-grained and lightweight coordination among stages, helping optimize memory access and latency. Further, we propose a SOFA accelerator to support these optimizations efficiently. Extensive experiments on 20 benchmarks show that SOFA achieves $9.5\times$ speed up and $71.5\times$ higher energy efficiency than Nvidia A100 GPU. Compared to 8 SOTA accelerators, SOFA achieves an average $15.8\times$ energy efficiency, $10.3\times$ area efficiency and $9.3\times$ speed up, respectively.
- Published
- 2024
49. Synthetic Data: Revisiting the Privacy-Utility Trade-off
- Author
-
Sarmin, Fatima Jahan, Sarkar, Atiquer Rahman, Wang, Yang, and Mohammed, Noman
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Machine Learning - Abstract
Synthetic data has been considered a better privacy-preserving alternative to traditionally sanitized data across various applications. However, a recent article challenges this notion, stating that synthetic data does not provide a better trade-off between privacy and utility than traditional anonymization techniques, and that it leads to unpredictable utility loss and highly unpredictable privacy gain. The article also claims to have identified a breach in the differential privacy guarantees provided by PATEGAN and PrivBayes. When a study claims to refute or invalidate prior findings, it is crucial to verify and validate the study. In our work, we analyzed the implementation of the privacy game described in the article and found that it operated in a highly specialized and constrained environment, which limits the applicability of its findings to general cases. Our exploration also revealed that the game did not satisfy a crucial precondition concerning data distributions, which contributed to the perceived violation of the differential privacy guarantees offered by PATEGAN and PrivBayes. We also conducted a privacy-utility trade-off analysis in a more general and unconstrained environment. Our experimentation demonstrated that synthetic data achieves a more favorable privacy-utility trade-off compared to the provided implementation of k-anonymization, thereby reaffirming earlier conclusions.
- Published
- 2024
50. MSTF: Multiscale Transformer for Incomplete Trajectory Prediction
- Author
-
Liu, Zhanwen, Li, Chao, Yang, Nan, Wang, Yang, Ma, Jiaqi, Cheng, Guangliang, and Zhao, Xiangmo
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Motion forecasting plays a pivotal role in autonomous driving systems, enabling vehicles to execute collision warnings and rational local-path planning based on predictions of the surrounding vehicles. However, prevalent methods often assume complete observed trajectories, neglecting the potential impact of missing values induced by object occlusion, scope limitation, and sensor failures. Such oversights inevitably compromise the accuracy of trajectory predictions. To tackle this challenge, we propose an end-to-end framework, termed Multiscale Transformer (MSTF), meticulously crafted for incomplete trajectory prediction. MSTF integrates a Multiscale Attention Head (MAH) and an Information Increment-based Pattern Adaptive (IIPA) module. Specifically, the MAH component concurrently captures multiscale motion representation of trajectory sequence from various temporal granularities, utilizing a multi-head attention mechanism. This approach facilitates the modeling of global dependencies in motion across different scales, thereby mitigating the adverse effects of missing values. Additionally, the IIPA module adaptively extracts continuity representation of motion across time steps by analyzing missing patterns in the data. The continuity representation delineates motion trend at a higher level, guiding MSTF to generate predictions consistent with motion continuity. We evaluate our proposed MSTF model using two large-scale real-world datasets. Experimental results demonstrate that MSTF surpasses state-of-the-art (SOTA) models in the task of incomplete trajectory prediction, showcasing its efficacy in addressing the challenges posed by missing values in motion forecasting for autonomous driving systems.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.