25,030 results on '"Yu, Tao"'
Search Results
2. PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion
- Author
-
Li, Peng, Zheng, Wangguandong, Liu, Yuan, Yu, Tao, Li, Yangguang, Qi, Xingqun, Li, Mengfei, Chi, Xiaowei, Xia, Siyu, Xue, Wei, Luo, Wenhan, Liu, Qifeng, and Guo, Yike
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Detailed and photorealistic 3D human modeling is essential for various applications and has seen tremendous progress. However, full-body reconstruction from a monocular RGB image remains challenging due to the ill-posed nature of the problem and sophisticated clothing topology with self-occlusions. In this paper, we propose PSHuman, a novel framework that explicitly reconstructs human meshes utilizing priors from the multiview diffusion model. It is found that directly applying multiview diffusion on single-view human images leads to severe geometric distortions, especially on generated faces. To address it, we propose a cross-scale diffusion that models the joint probability distribution of global full-body shape and local facial characteristics, enabling detailed and identity-preserved novel-view generation without any geometric distortion. Moreover, to enhance cross-view body shape consistency of varied human poses, we condition the generative model on parametric models like SMPL-X, which provide body priors and prevent unnatural views inconsistent with human anatomy. Leveraging the generated multi-view normal and color images, we present SMPLX-initialized explicit human carving to recover realistic textured human meshes efficiently. Extensive experimental results and quantitative evaluations on CAPE and THuman2.1 datasets demonstrate PSHumans superiority in geometry details, texture fidelity, and generalization capability.
- Published
- 2024
3. Digital Twin-Empowered Routing Management for Reliable Multi-Hop Millimeter Wave V2X
- Author
-
Roongpraiwan, Supat, Li, Zongdian, Yu, Tao, and Sakaguchi, Kei
- Subjects
Computer Science - Networking and Internet Architecture - Abstract
Digital twin (DT) technology can replicate physical entities in cyberspace. A mobility DT digitalizes connected and autonomous vehicles (CAVs) and their surrounding traffic environment, allowing to monitor the maneuvering and distribution of CAVs in real-time, which is crucial for managing vehicle-to-everything (V2X) connectivity, especially when millimeter wave (mmWave) is adopted. MmWave V2X relies on dynamic multi-hop communications to ensure high reliability. Therefore, in this paper, the challenges of mmWave V2X are presented to motivate the utilization of DT, and then we introduce the system model for DT-based multi-hop routing management, incorporating two different routing algorithms: with and without future trajectory prediction. For proof of concept, we implement the proposed DT system using Unity-based AWSIM and evaluate the proposed algorithms via simulations. The results show that, compared to the conventional routing algorithm in vehicular ad hoc networks (VANETs), the DT-based algorithms significantly improve the reliability of mmWave V2X, and such improvements can be seen in both fully connected and mixed traffic scenarios.
- Published
- 2024
4. Neural Octahedral Field: Octahedral prior for simultaneous smoothing and sharp edge regularization
- Author
-
Zheng, Ruichen and Yu, Tao
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Graphics - Abstract
Neural implicit representation, the parameterization of distance function as a coordinate neural field, has emerged as a promising lead in tackling surface reconstruction from unoriented point clouds. To enforce consistent orientation, existing methods focus on regularizing the gradient of the distance function, such as constraining it to be of the unit norm, minimizing its divergence, or aligning it with the eigenvector of Hessian that corresponds to zero eigenvalue. However, under the presence of large scanning noise, they tend to either overfit the noise input or produce an excessively smooth reconstruction. In this work, we propose to guide the surface reconstruction under a new variant of neural field, the octahedral field, leveraging the spherical harmonics representation of octahedral frames originated in the hexahedral meshing. Such field automatically snaps to geometry features when constrained to be smooth, and naturally preserves sharp angles when interpolated over creases. By simultaneously fitting and smoothing the octahedral field alongside the implicit geometry, it behaves analogously to bilateral filtering, resulting in smooth reconstruction while preserving sharp edges. Despite being operated purely pointwise, our method outperforms various traditional and neural approaches across extensive experiments, and is very competitive with methods that require normal and data priors. Our full implementation is available at: https://github.com/Ankbzpx/frame-field., Comment: project page: https://github.com/Ankbzpx/frame-field
- Published
- 2024
5. Shape-restricted transfer learning analysis for generalized linear regression model
- Author
-
Li, Pengfei, Yu, Tao, Chen, Chixiang, and Qin, Jing
- Subjects
Statistics - Methodology ,Mathematics - Statistics Theory - Abstract
Transfer learning has emerged as a highly sought-after and actively pursued research area within the statistical community. The core concept of transfer learning involves leveraging insights and information from auxiliary datasets to enhance the analysis of the primary dataset of interest. In this paper, our focus is on datasets originating from distinct yet interconnected distributions. We assume that the training data conforms to a standard generalized linear model, while the testing data exhibit a connection to the training data based on a prior probability shift assumption. Ultimately, we discover that the two-sample conditional means are interrelated through an unknown, nondecreasing function. We integrate the power of generalized estimating equations with the shape-restricted score function, creating a robust framework for improved inference regarding the underlying parameters. We theoretically establish the asymptotic properties of our estimator and demonstrate, through simulation studies, that our method yields more accurate parameter estimates compared to those based solely on the testing or training data. Finally, we apply our method to a real-world example., Comment: 35 pages, 2 tables, and 1 figure
- Published
- 2024
6. Persistent nodal magnon-photon polariton in ferromagnetic heterostructures
- Author
-
Qiu, Zhuolun, Zhou, Xi-Han, Wang, Hanchen, Yang, Guang, and Yu, Tao
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
Exceptional points with coalescence of eigenvalues and eigenvectors are spectral singularities in the parameter space, achieving which often needs fine-tuning of parameters in quantum systems. We predict a persistent realization of nodal magnon-photon polariton, i.e., a polariton without any gap splitting in a thin ferromagnetic insulator film sandwiched by two normal metals, which persistently exists when the ferromagnet is sufficiently thick $\sim 100$~nm. We perform the model calculation beyond the perturbation theory using a classical approach, develop a quantum scheme able to account for the Ohmic dissipation, and find ultrastrong coupling with coupling strength comparable to the bare magnon frequency. Via revealing a simple conversion relation we extend this formalism to superconductors and predict the gap opened by the ultrastrong coupling strongly depends on the direction of polariton propagation. Our findings may help search for robust non-Hermitian topological phases in magnonic and spintronic devices., Comment: 16 pages, 6 figures
- Published
- 2024
7. Non-Overlapping Placement of Macro Cells based on Reinforcement Learning in Chip Design
- Author
-
Yu, Tao, Gao, Peng, Wang, Fei, and Yuan, Ru-Yue
- Subjects
Computer Science - Hardware Architecture ,Computer Science - Artificial Intelligence - Abstract
Due to the increasing complexity of chip design, existing placement methods still have many shortcomings in dealing with macro cells coverage and optimization efficiency. Aiming at the problems of layout overlap, inferior performance, and low optimization efficiency in existing chip design methods, this paper proposes an end-to-end placement method, SRLPlacer, based on reinforcement learning. First, the placement problem is transformed into a Markov decision process by establishing the coupling relationship graph model between macro cells to learn the strategy for optimizing layouts. Secondly, the whole placement process is optimized after integrating the standard cell layout. By assessing on the public benchmark ISPD2005, the proposed SRLPlacer can effectively solve the overlap problem between macro cells while considering routing congestion and shortening the total wire length to ensure routability.
- Published
- 2024
8. CSWin-UNet: Transformer UNet with Cross-Shaped Windows for Medical Image Segmentation
- Author
-
Liu, Xiao, Gao, Peng, Yu, Tao, Wang, Fei, and Yuan, Ru-Yue
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Deep learning, especially convolutional neural networks (CNNs) and Transformer architectures, have become the focus of extensive research in medical image segmentation, achieving impressive results. However, CNNs come with inductive biases that limit their effectiveness in more complex, varied segmentation scenarios. Conversely, while Transformer-based methods excel at capturing global and long-range semantic details, they suffer from high computational demands. In this study, we propose CSWin-UNet, a novel U-shaped segmentation method that incorporates the CSWin self-attention mechanism into the UNet to facilitate horizontal and vertical stripes self-attention. This method significantly enhances both computational efficiency and receptive field interactions. Additionally, our innovative decoder utilizes a content-aware reassembly operator that strategically reassembles features, guided by predicted kernels, for precise image resolution restoration. Our extensive empirical evaluations on diverse datasets, including synapse multi-organ CT, cardiac MRI, and skin lesions, demonstrate that CSWin-UNet maintains low model complexity while delivering high segmentation accuracy. Codes are available at https://github.com/eatbeanss/CSWin-UNet.
- Published
- 2024
- Full Text
- View/download PDF
9. Rethinking Domain Adaptation and Generalization in the Era of CLIP
- Author
-
Feng, Ruoyu, Yu, Tao, Jin, Xin, Yu, Xiaoyuan, Xiao, Lei, and Chen, Zhibo
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In recent studies on domain adaptation, significant emphasis has been placed on the advancement of learning shared knowledge from a source domain to a target domain. Recently, the large vision-language pre-trained model, i.e., CLIP has shown strong ability on zero-shot recognition, and parameter efficient tuning can further improve its performance on specific tasks. This work demonstrates that a simple domain prior boosts CLIP's zero-shot recognition in a specific domain. Besides, CLIP's adaptation relies less on source domain data due to its diverse pre-training dataset. Furthermore, we create a benchmark for zero-shot adaptation and pseudo-labeling based self-training with CLIP. Last but not least, we propose to improve the task generalization ability of CLIP from multiple unlabeled domains, which is a more practical and unique scenario. We believe our findings motivate a rethinking of domain adaptation benchmarks and the associated role of related algorithms in the era of CLIP.
- Published
- 2024
10. BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
- Author
-
Su, Hongjin, Yen, Howard, Xia, Mengzhou, Shi, Weijia, Muennighoff, Niklas, Wang, Han-yu, Liu, Haisu, Shi, Quan, Siegel, Zachary S., Tang, Michael, Sun, Ruoxi, Yoon, Jinsung, Arik, Sercan O., Chen, Danqi, and Yu, Tao
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Information Retrieval - Abstract
Existing retrieval benchmarks primarily consist of information-seeking queries (e.g., aggregated questions from search engines) where keyword or semantic-based retrieval is usually sufficient. However, many complex real-world queries require in-depth reasoning to identify relevant documents that go beyond surface form matching. For example, finding documentation for a coding question requires understanding the logic and syntax of the functions involved. To better benchmark retrieval on such challenging queries, we introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents. BRIGHT is constructed from the 1,398 real-world queries collected from diverse domains (such as economics, psychology, robotics, software engineering, earth sciences, etc.), sourced from naturally occurring or carefully curated human data. Extensive evaluation reveals that even state-of-the-art retrieval models perform poorly on BRIGHT. The leading model on the MTEB leaderboard [38 ], which achieves a score of 59.0 nDCG@10,2 produces a score of nDCG@10 of 18.0 on BRIGHT. We further demonstrate that augmenting queries with Chain-of-Thought reasoning generated by large language models (LLMs) improves performance by up to 12.2 points. Moreover, BRIGHT is robust against data leakage during pretraining of the benchmarked models as we validate by showing similar performance even when documents from the benchmark are included in the training data. We believe that BRIGHT paves the way for future research on retrieval systems in more realistic and challenging settings. Our code and data are available at https://brightbenchmark.github.io., Comment: 50 pages
- Published
- 2024
11. Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
- Author
-
Cao, Ruisheng, Lei, Fangyu, Wu, Haoyuan, Chen, Jixuan, Fu, Yeqiao, Gao, Hongcheng, Xiong, Xinzhuang, Zhang, Hanchong, Mao, Yuchen, Hu, Wenjing, Xie, Tianbao, Xu, Hongshen, Zhang, Danyang, Wang, Sida, Sun, Ruoxi, Yin, Pengcheng, Xiong, Caiming, Ni, Ansong, Liu, Qian, Zhong, Victor, Chen, Lu, Yu, Kai, and Yu, Tao
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte. As vision language models (VLMs) advance in multimodal understanding and code generation, VLM-based agents could potentially automate these workflows by generating SQL queries, Python code, and GUI operations. This automation can improve the productivity of experts while democratizing access to large-scale data analysis. In this paper, we introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering workflows, featuring 494 real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications. These tasks, derived from real-world use cases, evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems. To balance realistic simulation with evaluation simplicity, we devote significant effort to developing automatic configurations for task setup and carefully crafting evaluation metrics for each task. Furthermore, we supplement multimodal agents with comprehensive documents of these enterprise data software systems. Our empirical evaluation reveals that existing state-of-the-art LLM/VLM-based agents do not reliably automate full data workflows (14.0% success). Even with step-by-step guidance, these agents still underperform in tasks that require fine-grained, knowledge-intensive GUI actions (16.2%) and involve remote cloud-hosted workspaces (10.6%). We hope that Spider2-V paves the way for autonomous multimodal agents to transform the automation of data science and engineering workflow. Our code and data are available at https://spider2-v.github.io., Comment: 34 pages, 14 figures, 10 tables
- Published
- 2024
12. KnobCF: Uncertainty-aware Knob Tuning
- Author
-
Yan, Yu, Huang, Junfang, Wang, Hongzhi, Geng, Jian, Zhang, Kaixin, and Yu, Tao
- Subjects
Computer Science - Databases - Abstract
The knob tuning aims to optimize database performance by searching for the most effective knob configuration under a certain workload. Existing works suffer two significant problems. On the one hand, there exist multiple similar even useless evaluations of knob tuning even with the diverse searching methods because of the different sensitivities of knobs on a certain workload. On the other hand, the single evaluation of knob configurations may bring overestimation or underestimation because of the query uncertainty performance. To solve the above problems, we propose a decoupled query uncertainty-aware knob classifier, called KnobCF, to enhance the knob tuning. Our method has three significant contributions: (1) We propose a novel concept of the uncertainty-aware knob configuration estimation to enhance the knob tuning process. (2) We provide an effective few-shot uncertainty knob estimator without extra time consumption in training data collection, which has a high time efficiency in practical tuning tasks. (3) Our method provides a general framework that could be easily deployed in any knob tuning task because we make no changes to the knob tuners and the database management system. Our experiments on four open-source benchmarks demonstrate that our method effectively reduces useless evaluations and improves the tuning results. Especially in TPCC, our method achieves competitive tuning results with only 60% to 70% time consumption compared to the full workload evaluations.
- Published
- 2024
13. The 3d-index of the 3d-skein module via the quantum trace map
- Author
-
Garoufalidis, Stavros and Yu, Tao
- Subjects
Mathematics - Geometric Topology ,High Energy Physics - Theory - Abstract
We define a map from the skein module of a cusped hyperbolic 3-manifold to the ring of Laurent series in one variable with integer coefficients that satisfies two properties: its evaluation at peripheral curves coincides with the Dimofte--Gaiotto--Gukov 3d-index, and it factors through the 3d-quantum trace map associated to a suitable ideal triangulation of the manifold. The map fulfills a supersymmetry prediction of mathematical physics and is part of a conjectural 3+1 dimensional topological quantum field theory., Comment: 37 pages, 16 figures
- Published
- 2024
14. Spin-orbit locking of magnons with localized microwave fields
- Author
-
Cai, Chengyuan, Zhang, Zubiao, Zou, Ji, Bauer, Gerrit E. W., and Yu, Tao
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
We address the photonic spin-orbit coupling known from nano-optics and plasmonics in the microwave regime. The spin $\mathbf{S}$ and momentum $\mathbf{q}$ of microwaves emitted by an excited magnetic particle are locked by $\mathbf{q}\cdot\mathbf{S}=0$ with a fixed chirality $\hat{\mathbf{n}}\cdot(\hat{\bf S}\times\hat{\bf q})=1$ when evanescent along $\hat{\mathbf{n}}\perp {\bf q}$. This field excites magnons in a nearby magnetic film in the form of directional beams that rotate with the magnetization direction. The exchange of these magnons between two distant nanomagnets leads to a highly tunable strong coupling and entangles their excited states., Comment: 11 pages, 8 figures
- Published
- 2024
15. HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models
- Author
-
Li, Mengcheng, Zhang, Hongwen, Zhang, Yuxiang, Shao, Ruizhi, Yu, Tao, and Liu, Yebin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent years have witnessed a trend of the deep integration of the generation and reconstruction paradigms. In this paper, we extend the ability of controllable generative models for a more comprehensive hand mesh recovery task: direct hand mesh generation, inpainting, reconstruction, and fitting in a single framework, which we name as Holistic Hand Mesh Recovery (HHMR). Our key observation is that different kinds of hand mesh recovery tasks can be achieved by a single generative model with strong multimodal controllability, and in such a framework, realizing different tasks only requires giving different signals as conditions. To achieve this goal, we propose an all-in-one diffusion framework based on graph convolution and attention mechanisms for holistic hand mesh recovery. In order to achieve strong control generation capability while ensuring the decoupling of multimodal control signals, we map different modalities to a shared feature space and apply cross-scale random masking in both modality and feature levels. In this way, the correlation between different modalities can be fully exploited during the learning of hand priors. Furthermore, we propose Condition-aligned Gradient Guidance to enhance the alignment of the generated model with the control signals, which significantly improves the accuracy of the hand mesh reconstruction and fitting. Experiments show that our novel framework can realize multiple hand mesh recovery tasks simultaneously and outperform the existing methods in different tasks, which provides more possibilities for subsequent downstream applications including gesture recognition, pose generation, mesh editing, and so on., Comment: accepted in CVPR2024, project page: https://dw1010.github.io/project/HHMR/HHMR.html
- Published
- 2024
16. IREE Oriented Green 6G Networks: A Radial Basis Function Based Approach
- Author
-
Yu, Tao, Huang, Pengbo, Zhang, Shunqing, Chen, Xiaojing, Sun, Yanzan, and Wang, Xin
- Subjects
Computer Science - Networking and Internet Architecture - Abstract
In order to provide design guidelines for energy efficient 6G networks, we propose a novel radial basis function (RBF) based optimization framework to maximize the integrated relative energy efficiency (IREE) metric. Different from the conventional energy efficient optimization schemes, we maximize the transformed utility for any given IREE using spectrum efficiency oriented RBF network and gradually update the IREE metric using proposed Dinkelbach's algorithm. The existence and uniqueness properties of RBF networks are provided, and the convergence conditions of the entire framework are discussed as well. Through some numerical experiments, we show that the proposed IREE outperforms many existing SE or EE oriented designs and find a new Jensen-Shannon (JS) divergence constrained region, which behaves differently from the conventional EE-SE region. Meanwhile, by studying IREE-SE trade-offs under different traffic requirements, we suggest that network operators shall spend more efforts to balance the distributions of traffic demands and network capacities in order to improve the IREE performance, especially when the spatial variations of the traffic distribution are significant.
- Published
- 2024
17. Audio Matters Too! Enhancing Markerless Motion Capture with Audio Signals for String Performance Capture
- Author
-
Jin, Yitong, Qiu, Zhiping, Shi, Yi, Sun, Shuangpeng, Wang, Chongwu, Pan, Donghao, Zhao, Jiachen, Liang, Zhenghao, Wang, Yuan, Li, Xiaobing, Yu, Feng, Yu, Tao, and Dai, Qionghai
- Subjects
Computer Science - Multimedia - Abstract
In this paper, we touch on the problem of markerless multi-modal human motion capture especially for string performance capture which involves inherently subtle hand-string contacts and intricate movements. To fulfill this goal, we first collect a dataset, named String Performance Dataset (SPD), featuring cello and violin performances. The dataset includes videos captured from up to 23 different views, audio signals, and detailed 3D motion annotations of the body, hands, instrument, and bow. Moreover, to acquire the detailed motion annotations, we propose an audio-guided multi-modal motion capture framework that explicitly incorporates hand-string contacts detected from the audio signals for solving detailed hand poses. This framework serves as a baseline for string performance capture in a completely markerless manner without imposing any external devices on performers, eliminating the potential of introducing distortion in such delicate movements. We argue that the movements of performers, particularly the sound-producing gestures, contain subtle information often elusive to visual methods but can be inferred and retrieved from audio cues. Consequently, we refine the vision-based motion capture results through our innovative audio-guided approach, simultaneously clarifying the contact relationship between the performer and the instrument, as deduced from the audio. We validate the proposed framework and conduct ablation studies to demonstrate its efficacy. Our results outperform current state-of-the-art vision-based algorithms, underscoring the feasibility of augmenting visual motion capture with audio modality. To the best of our knowledge, SPD is the first dataset for musical instrument performance, covering fine-grained hand motion details in a multi-modal, large-scale collection., Comment: SIGGRAPH2024
- Published
- 2024
- Full Text
- View/download PDF
18. Collage: Light-Weight Low-Precision Strategy for LLM Training
- Author
-
Yu, Tao, Gupta, Gaurav, Gopalswamy, Karthick, Mamidala, Amith, Zhou, Hao, Huynh, Jeffrey, Park, Youngsuk, Diamant, Ron, Deoras, Anoop, and Huan, Luke
- Subjects
Computer Science - Machine Learning - Abstract
Large models training is plagued by the intense compute cost and limited hardware memory. A practical solution is low-precision representation but is troubled by loss in numerical accuracy and unstable training rendering the model less useful. We argue that low-precision floating points can perform well provided the error is properly compensated at the critical locations in the training process. We propose Collage which utilizes multi-component float representation in low-precision to accurately perform operations with numerical errors accounted. To understand the impact of imprecision to training, we propose a simple and novel metric which tracks the lost information during training as well as differentiates various precision strategies. Our method works with commonly used low-precision such as half-precision ($16$-bit floating points) and can be naturally extended to work with even lower precision such as $8$-bit. Experimental results show that pre-training using Collage removes the requirement of using $32$-bit floating-point copies of the model and attains similar/better training performance compared to $(16, 32)$-bit mixed-precision strategy, with up to $3.7\times$ speedup and $\sim 15\%$ to $23\%$ less memory usage in practice., Comment: ICML 2024
- Published
- 2024
19. Roadside Units Assisted Localized Automated Vehicle Maneuvering: An Offline Reinforcement Learning Approach
- Author
-
Wang, Kui, She, Changyang, Li, Zongdian, Yu, Tao, Li, Yonghui, and Sakaguchi, Kei
- Subjects
Electrical Engineering and Systems Science - Systems and Control - Abstract
Traffic intersections present significant challenges for the safe and efficient maneuvering of connected and automated vehicles (CAVs). This research proposes an innovative roadside unit (RSU)-assisted cooperative maneuvering system aimed at enhancing road safety and traveling efficiency at intersections for CAVs. We utilize RSUs for real-time traffic data acquisition and train an offline reinforcement learning (RL) algorithm based on human driving data. Evaluation results obtained from hardware-in-loop autonomous driving simulations show that our approach employing the twin delayed deep deterministic policy gradient and behavior cloning (TD3+BC), achieves performance comparable to state-of-the-art autonomous driving systems in terms of safety measures while significantly enhancing travel efficiency by up to 17.38% in intersection areas. This paper makes a pivotal contribution to the field of intelligent transportation systems, presenting a breakthrough solution for improving urban traffic flow and safety at intersections., Comment: 6 pages, 6 figures
- Published
- 2024
20. OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
- Author
-
Xie, Tianbao, Zhang, Danyang, Chen, Jixuan, Li, Xiaochuan, Zhao, Siheng, Cao, Ruisheng, Hua, Toh Jing, Cheng, Zhoujun, Shin, Dongchan, Lei, Fangyu, Liu, Yitao, Xu, Yiheng, Zhou, Shuyan, Savarese, Silvio, Xiong, Caiming, Zhong, Victor, and Yu, Tao
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Autonomous agents that accomplish complex computer tasks with minimal human interventions have the potential to transform human-computer interaction, significantly enhancing accessibility and productivity. However, existing benchmarks either lack an interactive environment or are limited to environments specific to certain applications or domains, failing to reflect the diverse and complex nature of real-world computer use, thereby limiting the scope of tasks and agent scalability. To address this issue, we introduce OSWorld, the first-of-its-kind scalable, real computer environment for multimodal agents, supporting task setup, execution-based evaluation, and interactive learning across various operating systems such as Ubuntu, Windows, and macOS. OSWorld can serve as a unified, integrated computer environment for assessing open-ended computer tasks that involve arbitrary applications. Building upon OSWorld, we create a benchmark of 369 computer tasks involving real web and desktop apps in open domains, OS file I/O, and workflows spanning multiple applications. Each task example is derived from real-world computer use cases and includes a detailed initial state setup configuration and a custom execution-based evaluation script for reliable, reproducible evaluation. Extensive evaluation of state-of-the-art LLM/VLM-based agents on OSWorld reveals significant deficiencies in their ability to serve as computer assistants. While humans can accomplish over 72.36% of the tasks, the best model achieves only 12.24% success, primarily struggling with GUI grounding and operational knowledge. Comprehensive analysis using OSWorld provides valuable insights for developing multimodal generalist agents that were not possible with previous benchmarks. Our code, environment, baseline models, and data are publicly available at https://os-world.github.io., Comment: 51 pages, 21 figures
- Published
- 2024
21. Spin Radiation of Electrons, Excitons, and Phonons
- Author
-
Cai, Chengyuan and Yu, Tao
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
In the celebrated Stern-Gerlach experiment an inhomogeneous static magnetic field separates a beam of charge-neutral atoms with opposite spins, thereby driving a ``spin current" normal to the propagation direction. Here we generalize it to the dynamic scenario by demonstrating a spin transfer between an AC inhomogeneous magnetic field and intraband electrons or charge-neutral excitons and phonons. We predict that parametric pumping can efficiently radiate their DC spin currents from local AC magnetic sources with van der Waals semiconductors as prototypes. This mechanism brings a unified and efficient paradigm in the spin transport of distinct mobile carriers., Comment: 7 pages, 4 figures
- Published
- 2024
22. Enhancement of Magnon Transport by Superconductor Meissner Screening
- Author
-
Zhou, Xi-Han, Ye, Xiyin, Bai, Lihui, and Yu, Tao
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
Recent experiments observe the spin-wave-Meissner-current modes in ferromagnetic insulator-superconductor heterostructures, in which the coherently excited spin waves seemingly do not decay as usual beneath the superconductor strip [Borst et al., Science 382, 430 (2023)]. We interpret this phenomenon by demonstrating that the stray magnetic field emitted by the magnetization dynamics is reflected, focused, and enhanced inside the ferromagnet by the supercurrent induced in the superconductor, such that the group velocity of spin waves is strongly enhanced. Analytical and numerical calculations based on this model predict that the coherent transport of magnons is enhanced by close to 500% for yttrium iron garnet capped by superconducting NbN with a decay length exceeding millimeters. Our finding may augment the performance of magnons in quantum information and quantum transport processing., Comment: 7 pages, 4 figures
- Published
- 2024
23. MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors
- Author
-
Zhang, He, Ren, Shenghao, Yuan, Haolei, Zhao, Jianhui, Li, Fan, Sun, Shuangpeng, Liang, Zhenghao, Yu, Tao, Shen, Qiu, and Cao, Xun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Foot contact is an important cue for human motion capture, understanding, and generation. Existing datasets tend to annotate dense foot contact using visual matching with thresholding or incorporating pressure signals. However, these approaches either suffer from low accuracy or are only designed for small-range and slow motion. There is still a lack of a vision-pressure multimodal dataset with large-range and fast human motion, as well as accurate and dense foot-contact annotation. To fill this gap, we propose a Multimodal MoCap Dataset with Vision and Pressure sensors, named MMVP. MMVP provides accurate and dense plantar pressure signals synchronized with RGBD observations, which is especially useful for both plausible shape estimation, robust pose fitting without foot drifting, and accurate global translation tracking. To validate the dataset, we propose an RGBD-P SMPL fitting method and also a monocular-video-based baseline framework, VP-MoCap, for human motion capture. Experiments demonstrate that our RGBD-P SMPL Fitting results significantly outperform pure visual motion capture. Moreover, VP-MoCap outperforms SOTA methods in foot-contact and global translation estimation accuracy. We believe the configuration of the dataset and the baseline frameworks will stimulate the research in this direction and also provide a good reference for MoCap applications in various domains. Project page: https://metaverse-ai-lab-thu.github.io/MMVP-Dataset/., Comment: CVPR2024
- Published
- 2024
24. Interaction-Induced Dimensional Crossover through Full 3D to 1D
- Author
-
Yu, Tao, Ye, Xiaoran, and Liang, Zhaoxin
- Subjects
Condensed Matter - Quantum Gases - Abstract
The exploration of dimensional crossover carries profound fundamental significance, serving as a crucial bridge in comprehending the remarkable disparities observed in transitional phenomena across the two distinct dimensions of a physical system. The prevalent strategy for manipulating the dimensionality involves meticulously controlling the external trapping geometry, thereby restricting the degrees of freedom of the kinetic energy from three-dimensional (3D) to lower-dimensional spaces, while maintaining the 3D nature of the interaction energy degrees of freedom. The aim of this work is to introduce an innovative scenario to achieve dimensional crossover, characterized by lower-D nature of both the kinetic and the interaction energy degrees of freedom. To accomplish this objective, we delve deeply into the realm of a 2D optically trapped Bose gas, focusing specifically on its finite-range interaction. Our emphasis lies in exploring the lattice-induced dimensional crossover from full 3D to 1D in both kinetic and interaction terms. Utilizing the functional path integral method, we derive the equation of states of the model system, encompassing crucial quantities such as the ground state energy and quantum depletion. These equations enable us to analyze the combined effects of finite range interaction and an optical lattice on quantum fluctuations of the BEC system. Notably, our analytical findings reconcile the Lee-Huang-Yang (LHY) correction to the ground state energy in 3D and Lieb-Liniger (LL) ones in 1D limit, thereby providing fresh insights into the intriguing disparities between LHY and LL corrections., Comment: 9 pages, 1 figure. In this revised version, we have incorporated five additional citations and provided detailed explanations of the experimental parameters
- Published
- 2024
- Full Text
- View/download PDF
25. A Quantum trace map for 3-manifolds
- Author
-
Garoufalidis, Stavros and Yu, Tao
- Subjects
Mathematics - Geometric Topology ,High Energy Physics - Theory - Abstract
We define a quantum trace map from the skein module of a 3-manifold with torus boundary components to a module (left and right quotient of a quantum torus) constructed from an ideal triangulation. Our map is a 3-dimensional version of the well-known quantum trace map on surfaces introduced by Bonahon and Wong and further developed by L\^e., Comment: 42 pages, 63 figures
- Published
- 2024
26. Den-SOFT: Dense Space-Oriented Light Field DataseT for 6-DOF Immersive Experience
- Author
-
Yu, Xiaohang, Yang, Zhengxian, Pan, Shi, Han, Yuqi, Wang, Haoxiang, Zhang, Jun, Yan, Shi, Lin, Borong, Yang, Lei, Yu, Tao, and Fang, Lu
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We have built a custom mobile multi-camera large-space dense light field capture system, which provides a series of high-quality and sufficiently dense light field images for various scenarios. Our aim is to contribute to the development of popular 3D scene reconstruction algorithms such as IBRnet, NeRF, and 3D Gaussian splitting. More importantly, the collected dataset, which is much denser than existing datasets, may also inspire space-oriented light field reconstruction, which is potentially different from object-centric 3D reconstruction, for immersive VR/AR experiences. We utilized a total of 40 GoPro 10 cameras, capturing images of 5k resolution. The number of photos captured for each scene is no less than 1000, and the average density (view number within a unit sphere) is 134.68. It is also worth noting that our system is capable of efficiently capturing large outdoor scenes. Addressing the current lack of large-space and dense light field datasets, we made efforts to include elements such as sky, reflections, lights and shadows that are of interest to researchers in the field of 3D reconstruction during the data capture process. Finally, we validated the effectiveness of our provided dataset on three popular algorithms and also integrated the reconstructed 3DGS results into the Unity engine, demonstrating the potential of utilizing our datasets to enhance the realism of virtual reality (VR) and create feasible interactive spaces. The dataset is available at our project website.
- Published
- 2024
27. VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark
- Author
-
Huang, Han, Zhong, Haitian, Yu, Tao, Liu, Qiang, Wu, Shu, Wang, Liang, and Tan, Tieniu
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Recently, knowledge editing on large language models (LLMs) has received considerable attention. Compared to this, editing Large Vision-Language Models (LVLMs) faces extra challenges from diverse data modalities and complicated model components, and data for LVLMs editing are limited. The existing LVLM editing benchmark, which comprises three metrics (Reliability, Locality, and Generality), falls short in the quality of synthesized evaluation images and cannot assess whether models apply edited knowledge in relevant content. Therefore, we employ more reliable data collection methods to construct a new Large $\textbf{V}$ision-$\textbf{L}$anguage Model $\textbf{K}$nowledge $\textbf{E}$diting $\textbf{B}$enchmark, $\textbf{VLKEB}$, and extend the Portability metric for more comprehensive evaluation. Leveraging a multi-modal knowledge graph, our image data are bound with knowledge entities. This can be further used to extract entity-related knowledge, which constitutes the base of editing data. We conduct experiments of different editing methods on five LVLMs, and thoroughly analyze how do they impact the models. The results reveal strengths and deficiencies of these methods and hopefully provide insights for future research. The codes and dataset are available at: $\href{https://github.com/VLKEB/VLKEB}{\text{https://github.com/VLKEB/VLKEB}}$., Comment: 9+11 pages (main+appendix), 7 figures, 13 tables. $\href{https://github.com/VLKEB/VLKEB}{\text{get code and data}}$
- Published
- 2024
28. Yi: Open Foundation Models by 01.AI
- Author
-
AI, 01., Young, Alex, Chen, Bei, Li, Chao, Huang, Chengen, Zhang, Ge, Zhang, Guanwei, Li, Heng, Zhu, Jiangcheng, Chen, Jianqun, Chang, Jing, Yu, Kaidong, Liu, Peng, Liu, Qiang, Yue, Shawn, Yang, Senbin, Yang, Shiming, Yu, Tao, Xie, Wen, Huang, Wenhao, Hu, Xiaohui, Ren, Xiaoyi, Niu, Xinyao, Nie, Pengcheng, Xu, Yuchi, Liu, Yudong, Wang, Yue, Cai, Yuxuan, Gu, Zhenyu, Liu, Zhiyuan, and Dai, Zonghong
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU, and our finetuned chat models deliver strong human preference rate on major evaluation platforms like AlpacaEval and Chatbot Arena. Building upon our scalable super-computing infrastructure and the classical transformer architecture, we attribute the performance of Yi models primarily to its data quality resulting from our data-engineering efforts. For pretraining, we construct 3.1 trillion tokens of English and Chinese corpora using a cascaded data deduplication and quality filtering pipeline. For finetuning, we polish a small scale (less than 10K) instruction dataset over multiple iterations such that every single instance has been verified directly by our machine learning engineers. For vision-language, we combine the chat language model with a vision transformer encoder and train the model to align visual representations to the semantic space of the language model. We further extend the context length to 200K through lightweight continual pretraining and demonstrate strong needle-in-a-haystack retrieval performance. We show that extending the depth of the pretrained checkpoint through continual pretraining further improves performance. We believe that given our current results, continuing to scale up model parameters using thoroughly optimized data will lead to even stronger frontier models.
- Published
- 2024
29. YOLO-TLA: An Efficient and Lightweight Small Object Detection Model based on YOLOv5
- Author
-
Ji, Chun-Lin, Yu, Tao, Gao, Peng, Wang, Fei, and Yuan, Ru-Yue
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Object detection, a crucial aspect of computer vision, has seen significant advancements in accuracy and robustness. Despite these advancements, practical applications still face notable challenges, primarily the inaccurate detection or missed detection of small objects. In this paper, we propose YOLO-TLA, an advanced object detection model building on YOLOv5. We first introduce an additional detection layer for small objects in the neck network pyramid architecture, thereby producing a feature map of a larger scale to discern finer features of small objects. Further, we integrate the C3CrossCovn module into the backbone network. This module uses sliding window feature extraction, which effectively minimizes both computational demand and the number of parameters, rendering the model more compact. Additionally, we have incorporated a global attention mechanism into the backbone network. This mechanism combines the channel information with global information to create a weighted feature map. This feature map is tailored to highlight the attributes of the object of interest, while effectively ignoring irrelevant details. In comparison to the baseline YOLOv5s model, our newly developed YOLO-TLA model has shown considerable improvements on the MS COCO validation dataset, with increases of 4.6% in mAP@0.5 and 4% in mAP@0.5:0.95, all while keeping the model size compact at 9.49M parameters. Further extending these improvements to the YOLOv5m model, the enhanced version exhibited a 1.7% and 1.9% increase in mAP@0.5 and mAP@0.5:0.95, respectively, with a total of 27.53M parameters. These results validate the YOLO-TLA model's efficient and effective performance in small object detection, achieving high accuracy with fewer parameters and computational demands.
- Published
- 2024
- Full Text
- View/download PDF
30. Automated Design and Optimization of Distributed Filtering Circuits via Reinforcement Learning
- Author
-
Gao, Peng, Yu, Tao, Wang, Fei, and Yuan, Ru-Yue
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Hardware Architecture - Abstract
Designing distributed filter circuits (DFCs) is complex and time-consuming, involving setting and optimizing multiple hyperparameters. Traditional optimization methods, such as using the commercial finite element solver HFSS (High-Frequency Structure Simulator) to enumerate all parameter combinations with fixed steps and then simulate each combination, are not only time-consuming and labor-intensive but also rely heavily on the expertise and experience of electronics engineers, making it difficult to adapt to rapidly changing design requirements. Additionally, these commercial tools struggle with precise adjustments when parameters are sensitive to numerical changes, resulting in limited optimization effectiveness. This study proposes a novel end-to-end automated method for DFC design. The proposed method harnesses reinforcement learning (RL) algorithms, eliminating the dependence on the design experience of engineers. Thus, it significantly reduces the subjectivity and constraints associated with circuit design. The experimental findings demonstrate clear improvements in design efficiency and quality when comparing the proposed method with traditional engineer-driven methods. Furthermore, the proposed method achieves superior performance when designing complex or rapidly evolving DFCs, highlighting the substantial potential of RL in circuit design automation. In particular, compared to the existing DFC automation design method CircuitGNN, our method achieves an average performance improvement of 8.72%. Additionally, the execution efficiency of our method is 2000 times higher than CircuitGNN on the CPU and 241 times higher on the GPU.
- Published
- 2024
- Full Text
- View/download PDF
31. Smart Mobility Digital Twin Based Automated Vehicle Navigation System: A Proof of Concept
- Author
-
Wang, Kui, Li, Zongdian, Nonomura, Kazuma, Yu, Tao, Sakaguchi, Kei, Hashash, Omar, and Saad, Walid
- Subjects
Electrical Engineering and Systems Science - Systems and Control - Abstract
Digital twins (DTs) have driven major advancements across various industrial domains over the past two decades. With the rapid advancements in autonomous driving and vehicle-to-everything (V2X) technologies, integrating DTs into vehicular platforms is anticipated to further revolutionize smart mobility systems. In this paper, a new smart mobility DT (SMDT) platform is proposed for the control of connected and automated vehicles (CAVs) over next-generation wireless networks. In particular, the proposed platform enables cloud services to leverage the abilities of DTs to promote the autonomous driving experience. To enhance traffic efficiency and road safety measures, a novel navigation system that exploits available DT information is designed. The SMDT platform and navigation system are implemented with state-of-the-art products, e.g., CAVs and roadside units (RSUs), and emerging technologies, e.g., cloud and cellular V2X (C-V2X). In addition, proof-of-concept (PoC) experiments are conducted to validate system performance. The performance of SMDT is evaluated from two standpoints: (i) the rewards of the proposed navigation system on traffic efficiency and safety and, (ii) the latency and reliability of the SMDT platform. Our experimental results using SUMO-based large-scale traffic simulations show that the proposed SMDT can reduce the average travel time and the blocking probability due to unexpected traffic incidents. Furthermore, the results record a peak overall latency for DT modeling and route planning services to be 155.15 ms and 810.59 ms, respectively, which validates that our proposed design aligns with the 3GPP requirements for emerging V2X use cases and fulfills the targets of the proposed design. Our demonstration video can be found at https://youtu.be/3waQwlaHQkk., Comment: 15 pages, 10 figures
- Published
- 2024
- Full Text
- View/download PDF
32. ARKS: Active Retrieval in Knowledge Soup for Code Generation
- Author
-
Su, Hongjin, Jiang, Shuyang, Lai, Yuhang, Wu, Haoyuan, Shi, Boao, Liu, Che, Liu, Qian, and Yu, Tao
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Recently the retrieval-augmented generation (RAG) paradigm has raised much attention for its potential in incorporating external knowledge into large language models (LLMs) without further training. While widely explored in natural language applications, its utilization in code generation remains under-explored. In this paper, we introduce Active Retrieval in Knowledge Soup (ARKS), an advanced strategy for generalizing large language models for code. In contrast to relying on a single source, we construct a knowledge soup integrating web search, documentation, execution feedback, and evolved code snippets. We employ an active retrieval strategy that iteratively refines the query and updates the knowledge soup. To assess the performance of ARKS, we compile a new benchmark comprising realistic coding problems associated with frequently updated libraries and long-tail programming languages. Experimental results on ChatGPT and CodeLlama demonstrate a substantial improvement in the average execution accuracy of ARKS on LLMs. The analysis confirms the effectiveness of our proposed knowledge soup and active retrieval strategies, offering rich insights into the construction of effective retrieval-augmented code generation (RACG) pipelines. Our model, code, and data are available at https://arks-codegen.github.io., Comment: Retrieval-augmented code generation
- Published
- 2024
33. Generative Representational Instruction Tuning
- Author
-
Muennighoff, Niklas, Su, Hongjin, Wang, Liang, Yang, Nan, Wei, Furu, Yu, Tao, Singh, Amanpreet, and Kiela, Douwe
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
All text-based language problems can be reduced to either generation or embedding. Current models only perform well at one or the other. We introduce generative representational instruction tuning (GRIT) whereby a large language model is trained to handle both generative and embedding tasks by distinguishing between them through instructions. Compared to other open models, our resulting GritLM 7B sets a new state of the art on the Massive Text Embedding Benchmark (MTEB) and outperforms all models up to its size on a range of generative tasks. By scaling up further, GritLM 8x7B outperforms all open generative language models that we tried while still being among the best embedding models. Notably, we find that GRIT matches training on only generative or embedding data, thus we can unify both at no performance loss. Among other benefits, the unification via GRIT speeds up Retrieval-Augmented Generation (RAG) by > 60% for long documents, by no longer requiring separate retrieval and generation models. Models, code, etc. are freely available at https://github.com/ContextualAI/gritlm., Comment: 66 pages (16 main), 25 figures, 34 tables
- Published
- 2024
34. Momentum Approximation in Asynchronous Private Federated Learning
- Author
-
Yu, Tao, Song, Congzheng, Wang, Jianyu, and Chitnis, Mona
- Subjects
Computer Science - Machine Learning - Abstract
Asynchronous protocols have been shown to improve the scalability of federated learning (FL) with a massive number of clients. Meanwhile, momentum-based methods can achieve the best model quality in synchronous FL. However, naively applying momentum in asynchronous FL algorithms leads to slower convergence and degraded model performance. It is still unclear how to effective combinie these two techniques together to achieve a win-win. In this paper, we find that asynchrony introduces implicit bias to momentum updates. In order to address this problem, we propose momentum approximation that minimizes the bias by finding an optimal weighted average of all historical model updates. Momentum approximation is compatible with secure aggregation as well as differential privacy, and can be easily integrated in production FL systems with a minor communication and storage cost. We empirically demonstrate that on benchmark FL datasets, momentum approximation can achieve $1.15 \textrm{--}4\times$ speed up in convergence compared to existing asynchronous FL optimizers with momentum.
- Published
- 2024
35. OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
- Author
-
Wu, Zhiyong, Han, Chengcheng, Ding, Zichen, Weng, Zhenmin, Liu, Zhoumianze, Yao, Shunyu, Yu, Tao, and Kong, Lingpeng
- Subjects
Computer Science - Artificial Intelligence - Abstract
Autonomous interaction with the computer has been a longstanding challenge with great potential, and the recent proliferation of large language models (LLMs) has markedly accelerated progress in building digital agents. However, most of these agents are designed to interact with a narrow domain, such as a specific software or website. This narrow focus constrains their applicability for general computer tasks. To this end, we introduce OS-Copilot, a framework to build generalist agents capable of interfacing with comprehensive elements in an operating system (OS), including the web, code terminals, files, multimedia, and various third-party applications. We use OS-Copilot to create FRIDAY, a self-improving embodied agent for automating general computer tasks. On GAIA, a general AI assistants benchmark, FRIDAY outperforms previous methods by 35%, showcasing strong generalization to unseen applications via accumulated skills from previous tasks. We also present numerical and quantitative evidence that FRIDAY learns to control and self-improve on Excel and Powerpoint with minimal supervision. Our OS-Copilot framework and empirical findings provide infrastructure and insights for future research toward more capable and general-purpose computer agents., Comment: Project page: https://os-copilot.github.io
- Published
- 2024
36. Nonuniversal Equation of State of a Quasi-2D Bose Gas in Dimensional Crossover
- Author
-
Ye, Xiaoran, Yu, Tao, and Liang, Zhaoxin
- Subjects
Condensed Matter - Quantum Gases - Abstract
Equation of state (EOS) for a pure two-dimensional (2D) Bose gas exhibits a logarithmic dependence on the s-wave scattering length [L. Salasnich, Phys. Rev. Lett. 118, 130402 (2017)]. The pronounced disparity between the EOS of a 2D Bose gas and its 3D counterpart underscores the significance of exploring the dimensional crossover between these two distinct dimensions. In this work, we are motivated to deduce nonuniversal corrections to EOS for an optically trapped Bose gas along the dimensional crossover from 3D to 2D, incorporating the finite-range effects of the interatomic potential. Employing the framework of effective field theory, we derive the analytical expressions for both the ground state energy and quantum depletion. The introduction of the lattice induces a transition from a 3D to a quasi-2D regime. In particular, we systematically analyze the asymptotic behaviors of both the 2D and 3D aspects of the model system, with a specific focus on the nonuniversal effects on the EOS arising from finite-range interactions. The nonuniversal effects proposed in this study along the dimensional crossover represent a significant stride toward unraveling the intricate interplay between dimensionality and quantum fluctuations., Comment: 9 pages, 1 figure
- Published
- 2024
- Full Text
- View/download PDF
37. Chiral-Damping-Enhanced Magnon Transmission
- Author
-
Ye, Xiyin, Xia, Ke, Bauer, Gerrit E. W., and Yu, Tao
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
The inevitable Gilbert damping in magnetization dynamics is usually regarded as detrimental to spin transport. Here we apply a general feature of chiral non-Hermitian dynamics to a ferromagnetic-insulator--normal-metal heterostructure to show that the strong momentum dependence and chirality of the eddy-current-induced damping also causes beneficial scattering properties: A potential barrier that reflects magnon wave packets becomes unidirectionally transparent in the presence of a metallic cap layer. Passive magnon gates that turn presumably harmful dissipation into useful functionalities should be useful for future quantum magnonic devices., Comment: 7 pages, 3 figures
- Published
- 2024
38. Stochastic Dynamic Power Dispatch with High Generalization and Few-Shot Adaption via Contextual Meta Graph Reinforcement Learning
- Author
-
Deng, Bairong, Yu, Tao, Pan, Zhenning, Zhang, Xuehan, Wu, Yufeng, and Ding, Qiaoyi
- Subjects
Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Systems and Control - Abstract
Reinforcement learning is an emerging approaches to facilitate multi-stage sequential decision-making problems. This paper studies a real-time multi-stage stochastic power dispatch considering multivariate uncertainties. Current researches suffer from low generalization and practicality, that is, the learned dispatch policy can only handle a specific dispatch scenario, its performance degrades significantly if actual samples and training samples are inconsistent. To fill these gaps, a novel contextual meta graph reinforcement learning (Meta-GRL) for a highly generalized multi-stage optimal dispatch policy is proposed. Specifically, a more general contextual Markov decision process (MDP) and scalable graph representation are introduced to achieve a more generalized multi-stage stochastic power dispatch modeling. An upper meta-learner is proposed to encode context for different dispatch scenarios and learn how to achieve dispatch task identification while the lower policy learner learns context-specified dispatch policy. After sufficient offline learning, this approach can rapidly adapt to unseen and undefined scenarios with only a few updations of the hypothesis judgments generated by the meta-learner. Numerical comparisons with state-of-the-art policies and traditional reinforcement learning verify the optimality, efficiency, adaptability, and scalability of the proposed Meta-GRL.
- Published
- 2024
39. Machine Learning of Knot Topology in Non-Hermitian Band Braids
- Author
-
Chen, Jiangzhi, Wang, Zi, Tan, Yu-Tao, Wang, Ce, and Ren, Jie
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
The deep connection among braids, knots and topological physics has provided valuable insights into studying topological states in various physical systems. However, identifying distinct braid groups and knot topology embedded in non-Hermitian systems is challenging and requires significant efforts. Here, we demonstrate that an unsupervised learning with the representation basis of $su(n)$ Lie algebra on $n$-fold extended non-Hermitian bands can fully classify braid group and knot topology therein, without requiring any prior mathematical knowledge or any pre-defined topological invariants. We demonstrate that the approach successfully identifies different topological elements, such as unlink, unknot, Hopf link, Solomon ring, trefoil, and so on, by employing generalized Gell-Mann matrices in non-Hermitian models with $n$=2 and $n$=3 energy bands. Moreover, since eigenstate information of non-Hermitian bands is incorporated in addition to eigenvalues, the approach distinguishes the different parity-time symmetry and breaking phases, recognizes the opposite chirality of braids and knots, and identifies out distinct topological phases that were overlooked before. Our study shows significant potential of machine learning in classification of knots, braid groups, and non-Hermitian topological phases.
- Published
- 2024
- Full Text
- View/download PDF
40. Design of Ionic Liquids for HF/HFC-245fa Superefficient Separation: COSMO-RS Selection and Process Assessment
- Author
-
Liao, Yuan-Hao, Zeng, Jijun, Yang, Zhiqiang, Han, Sheng, Zhao, Bo, an, Yu, Tang, Xiaobo, Yu, Tao, Zhang, Wei, and Lu, Jian
- Published
- 2024
- Full Text
- View/download PDF
41. GEML: a graph-enhanced pre-trained language model framework for text classification via mutual learning
- Author
-
Yu, Tao, Song, Rui, Pinto, Sandro, Gomes, Tiago, Tavares, Adriano, and Xu, Hao
- Published
- 2024
- Full Text
- View/download PDF
42. Alterations in the Glymphatic System and Association with Brain Structure and Cognitive Function in Moyamoya Disease
- Author
-
Zhu, Huan, Zhu, Chenyu, Liu, Tong, Wang, Peijiong, Li, Wenjie, Zhang, Qihang, Zhao, Yahui, Yu, Tao, Liu, Xingju, Zhang, Qian, Zhao, Jizong, and Zhang, Yan
- Published
- 2024
- Full Text
- View/download PDF
43. Smartphone-based colorimetric detection of formaldehyde in the air
- Author
-
Yang, Meng, Ye, Jin, Yu, Tao, Song, Ying, Qian, Hua, Liu, Tianyi, Chen, Yang, Wang, Junqi, Cao, Shi-jie, and Liu, Cong
- Published
- 2024
- Full Text
- View/download PDF
44. GHG emission quantification and reduction pathway of subway shield tunnel engineering: a case study on Guangzhou Metro, China
- Author
-
Wu, Huanyu, Yang, Kehua, Chen, Kunyang, Zhou, Wenwen, Yu, Tao, and Wang, Kai
- Published
- 2024
- Full Text
- View/download PDF
45. A gene-encoded FRET fluorescent sensor based on hairpin design for sensitive detection of aflatoxin biosynthesis-related genes aflD in rice
- Author
-
Li, Yaqi, Yu, Tao, Jiang, Xinrong, Chen, Xin, Kong, Dezhao, Liu, Chang, Shi, Qiaoqiao, Zhang, Qi, Li, Shijie, and Liu, Guorui
- Published
- 2024
- Full Text
- View/download PDF
46. Sleep onset time as a mediator in the association between screen exposure and aging: a cross-sectional study
- Author
-
Lin, Senlin, Gao, Meng, Zhang, Juzhao, Wu, Yuting, Yu, Tao, Peng, Yajun, Jia, Yingnan, Zou, Haidong, Lu, Lina, Li, Deshang, and Ma, Yingyan
- Published
- 2024
- Full Text
- View/download PDF
47. Ultramorphology of the third instar larvae of Miridiba trichophora (Coleoptera: Melolonthinae: Rhizotrogini)
- Author
-
Zhang, Gui-Zhi, Wang, Xiao-Tong, Li, Yu-Tao, Fang, Hong, and Jiang, Lu
- Published
- 2024
- Full Text
- View/download PDF
48. Comprehensive assessment of fire hazard for polyurethane foam based on AHP-entropy-weighted TOPSIS
- Author
-
Qin, Rongshui, Shi, Chenchen, Yu, Tao, Ding, Chao, Zhan, Jing, Jiao, Yan, and Zhang, Zelong
- Published
- 2024
- Full Text
- View/download PDF
49. Dual-Specificity Phosphatase 4 Promotes Malignant Features in Colorectal Cancer Through Cyclic-AMP Response Element Binding Protein/Protein Kinase CAMP-Activated Catalytic Subunit Beta Activation
- Author
-
Pei, Wenju, Yin, Wanbin, Yu, Tao, Zhang, Xiaoyuan, Zhang, Qi, Yang, Xiaowen, Shi, Chunlei, Shen, Wenzhi, and Liu, Gang
- Published
- 2024
- Full Text
- View/download PDF
50. Influence of Thickness of the Partition Wall on the Bearing Capacity of Double-Arch Tunnel Under Different Buried Depth
- Author
-
Yao, Zhigang, Liu, Xuedan, Pu, Jiapeng, Yu, Tao, Fang, Yong, Yue, Jianhong, and Zhang, Weibo
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.