2,866,844 results on '"An, Yi"'
Search Results
2. Memory benefits of daily-living-related contextual cueing for individuals with subjective cognitive decline and mild cognitive impairment
- Author
-
Liu, Chien-hsiou, Li, Kuan-yi, Liao, Wan-wen, Chuang, I-ching, Huang, Yan-hua, and Wu, Ching-yi
- Published
- 2024
3. BERT-like pre-training for symbolic piano music classification tasks
- Author
-
Chou, Yi-Hui, Chen, I-Chun, Chang, Chin-Jui, Ching, Joann, and Yang, Yi-Hsuan
- Published
- 2024
4. Experimental electronic phase diagram in a diamond-lattice antiferromagnetic system
- Author
-
Ji, Liang-Wen, Yang, Wu-Zhang, Lu, Yi-Ming, Lu, Jia-Yi, Li, Jing, Liu, Yi, Ren, Zhi, and Cao, Guang-Han
- Subjects
Condensed Matter - Strongly Correlated Electrons ,Condensed Matter - Materials Science - Abstract
We report Ni-doping effect on the magnetic and electronic properties of thiospinel Co$_{1-x}$Ni$_x$[Co$_{0.3}$Ir$_{1.7}$]S$_4$ (0 $\leq x \leq$ 1). The parent compound Co[Co$_{0.3}$Ir$_{1.7}$]S$_4$ exhibits antiferromagnetic order below $T_\mathrm{N} \sim$ 292 K within the $A$-site diamond sublattice, along with a narrow charge-transfer gap. Upon Ni doping, an insulator-to-metal crossover occurs at $x \sim$ 0.35, and the antiferromagnetism is gradually suppressed, with $T_\mathrm{N}$ decreasing to 23 K at $x =$ 0.7. In the metallic state, a spin-glass-like transition emerges at low temperatures. The antiferromagnetic transition is completely suppressed at $x_\mathrm{c} \sim$ 0.95, around which a non-Fermi-liquid behavior emerges, evident from the $T^\alpha$ temperature dependence with $\alpha \approx$ 1.2-1.3 in resistivity and divergent behavior of $C/T$ in specific heat at low temperatures. Meanwhile, the electronic specific heat coefficient $\gamma$ increases substantially, signifying an enhancement of the quasiparticle effective mass. The magnetic phase diagram has been established, in which an antiferromagnetic quantum critical point is avoided at $x_\mathrm{c}$. Conversely, the observed glass-like tail above the critical concentration aligns more closely with theoretical predictions for an extended region of quantum Griffiths phase in the presence of strong disorder., Comment: 8 pages, 5 figures
- Published
- 2024
- Full Text
- View/download PDF
5. Photometric and Spectroscopic Investigations of Three Large Amplitude Contact Binaries
- Author
-
Xu, Xin, Li, Kai, Liu, Fei, Yan, Qian-Xue, Wang, Yi-Fan, Cui, Xin-Yu, Wang, Jing-Yi, Gao, Xing, Sun, Guo-You, Wu, Cheng-Yu, and Li, Mu-Zi-Mei
- Subjects
Astrophysics - Solar and Stellar Astrophysics - Abstract
We performed photometric and spectroscopic studies of three large amplitude contact binaries, NSVS 2418361, ATLAS J057.1170+31.2384 and NSVS 7377875. The amplitudes of three systems' light curves are more than 0.7 magnitude. We analyzed the light curves using Wilson-Devinney code to yield physical parameters. The photometric solutions suggested that NSVS 7377875 belongs to an A-subtype contact binary, while the others are classified as W-subtype ones. Furthermore, the mass ratio of NSVS 7377875 is higher than 0.72, so it belongs to H-subtype contact binaries. Since their light curves have unequal height at two maxima which is called O'Connell effect, a dark spot on the primary component for each target was required to get a better fit of light curves. The orbital period investigation shows that the period of NSVS 2418361 is increasing, indicating a mass transfer from the less massive component to the more massive one, while the other targets exhibit no long-term variation. Our spectral subtraction analysis of LAMOST spectra revealed excess emissions in the $H_\alpha$ line, indicating chromospheric activity in all the three targets. The Gaia distance was applied to estimate the absolute parameters of the three targets, and we obtained their evolutionary state. The relationships between the energy transfer parameter of 76 H-subtype contact binaries and their bolometric luminosity ratios, as well as their contact degree, were presented. We discovered that H-subtype systems have less efficient energy transfer rate, which is corresponding to the conclusion proposed by Csizmadia \& Klagyivik., Comment: 21 pages, 10 figures, 12 tables, accepted by AJ
- Published
- 2024
6. How interfacial tension enhances drag in turbulent Taylor-Couette flow with neutrally buoyant and equally viscous droplets
- Author
-
Su, Jinghong, Zhang, Yi-bao, Wang, Cheng, Yi, Lei, Xu, Fan, Fan, Yaning, Wang, Junwu, and Sun, Chao
- Subjects
Physics - Fluid Dynamics - Abstract
The presence of dispersed-phase droplets can result in a notable increase in the system's drag. However, our understanding of the mechanism underlying this phenomenon remains limited. In this study, we use three-dimensional direct numerical simulations with a modified multi-marker volume-of-fluid method to investigate liquid-liquid two-phase turbulence in a Taylor-Couette geometry. The dispersed phase has the same density and viscosity as the continuous phase. The Reynolds number $Re\equiv r_i\omega_i d/\nu$ is fixed at 5200, the volume fraction of the dispersed phase is up to $40\%$, and the Weber number $We\equiv \rho u^2_\tau d/\sigma$ is around 8. It is found that the increase in the system's drag originates from the contribution of interfacial tension. Specifically, droplets experience significant deformation and stretching in the streamwise direction due to shear near the inner cylinder. Consequently, the rear end of the droplets lags behind the fore head. This causes opposing interfacial tension effects on the fore head and rear end of the droplets. For the fore head of the droplets, the effect of interfacial tension appears to act against the flow direction. For the rear end, the effect appears to act in the flow direction. The increase in the system's drag is primarily attributed to the effect of interfacial tension on the fore head of the droplets which leads to the hindering effect of the droplets on the surrounding continuous phase. This hindering effect disrupts the formation of high-speed streaks, favoring the formation of low-speed ones, which are generally associated with higher viscous stress and drag of the system. This study provides new insights into the mechanism of drag enhancement reported in our previous experiments.
- Published
- 2024
7. From Estimands to Robust Inference of Treatment Effects in Platform Trials
- Author
-
Qian, Yuhan, Yi, Yifan, Shao, Jun, Yi, Yanyao, Mayer-Hamblett, Nicole, Heagerty, Patrick J., and Ye, Ting
- Subjects
Statistics - Methodology - Abstract
A platform trial is an innovative clinical trial design that uses a master protocol (i.e., one overarching protocol) to evaluate multiple treatments in an ongoing manner and can accelerate the evaluation of new treatments. However, the flexibility that marks the potential of platform trials also creates inferential challenges. Two key challenges are the precise definition of treatment effects and the robust and efficient inference on these effects. To address these challenges, we first define a clinically meaningful estimand that characterizes the treatment effect as a function of the expected outcomes under two given treatments among concurrently eligible patients. Then, we develop weighting and post-stratification methods for estimation of treatment effects with minimal assumptions. To fully leverage the efficiency potential of data from concurrently eligible patients, we also consider a model-assisted approach for baseline covariate adjustment to gain efficiency while maintaining robustness against model misspecification. We derive and compare asymptotic distributions of proposed estimators in theory and propose robust variance estimators. The proposed estimators are empirically evaluated in a simulation study and illustrated using the SIMPLIFY trial. Our methods are implemented in the R package RobinCID.
- Published
- 2024
8. Look a Group at Once: Multi-Slide Modeling for Survival Prediction
- Author
-
Li, Xinyang, Zhang, Yi, Xie, Yi, Yang, Jianfei, Wang, Xi, Chen, Hao, and Zhang, Haixian
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Survival prediction is a critical task in pathology. In clinical practice, pathologists often examine multiple cases, leveraging a broader spectrum of cancer phenotypes to enhance pathological assessment. Despite significant advancements in deep learning, current solutions typically model each slide as a sample, struggling to effectively capture comparable and slide-agnostic pathological features. In this paper, we introduce GroupMIL, a novel framework inspired by the clinical practice of collective analysis, which models multiple slides as a single sample and organizes groups of patches and slides sequentially to capture cross-slide prognostic features. We also present GPAMamba, a model designed to facilitate intra- and inter-slide feature interactions, effectively capturing local micro-environmental characteristics within slide-level graphs while uncovering essential prognostic patterns across an extended patch sequence within the group framework. Furthermore, we develop a dual-head predictor that delivers comprehensive survival risk and probability assessments for each patient. Extensive empirical evaluations demonstrate that our model significantly outperforms state-of-the-art approaches across five datasets from The Cancer Genome Atlas.
- Published
- 2024
9. Revisit of discrete energy bands in Galilean moon's footprint tails: remote signals of particle absorption
- Author
-
Yang, Fan, Xuzhi-Zhou, Liu, Ying, Sun, Yi-Xin, Yin, Ze-Fan, Hao, Yi-Xin, Liu, Zhi-Yang, Blanc, Michel, Zhao, Jiu-Tong, He, Dong-Wen, Wu, Ya-Ze, Wang, Shan, Yue, Chao, and Zong, Qiu-Gang
- Subjects
Astrophysics - Earth and Planetary Astrophysics ,Physics - Space Physics - Abstract
Recent observations from the Juno spacecraft during its transit over flux tubes of the Galilean moons have identified sharp enhancements of particle fluxes at discrete energies. These banded structures have been suspected to originate from a bounce resonance between particles and standing Alfven waves generated by the moon-magnetospheric interaction. Here, we show that predictions from the above hypothesis are inconsistent with the observations, and propose an alternative interpretation that the banded structures are remote signals of particle absorption at the moons. In this scenario, whether a particle would encounter the moon before reaching Juno depends on the number of bounce cycles it experiences within a fixed section of drift motion determined by moon-spacecraft longitudinal separation. Therefore, the absorption bands are expected to appear at discrete, equally-spaced velocities consistent with the observations. This finding improves our understanding of moon-plasma interactions and provides a potential way to evaluate the Jovian magnetospheric models., Comment: 15 pages, 4 figures
- Published
- 2024
10. SMILE-UHURA Challenge -- Small Vessel Segmentation at Mesoscopic Scale from Ultra-High Resolution 7T Magnetic Resonance Angiograms
- Author
-
Chatterjee, Soumick, Mattern, Hendrik, Dörner, Marc, Sciarra, Alessandro, Dubost, Florian, Schnurre, Hannes, Khatun, Rupali, Yu, Chun-Chih, Hsieh, Tsung-Lin, Tsai, Yi-Shan, Fang, Yi-Zeng, Yang, Yung-Ching, Huang, Juinn-Dar, Xu, Marshall, Liu, Siyu, Ribeiro, Fernanda L., Bollmann, Saskia, Chintalapati, Karthikesh Varma, Radhakrishna, Chethan Mysuru, Kumara, Sri Chandana Hudukula Ram, Sutrave, Raviteja, Qayyum, Abdul, Mazher, Moona, Razzak, Imran, Rodero, Cristobal, Niederren, Steven, Lin, Fengming, Xia, Yan, Wang, Jiacheng, Qiu, Riyu, Wang, Liansheng, Panah, Arya Yazdan, Jurdi, Rosana El, Fu, Guanghui, Arslan, Janan, Vaillant, Ghislain, Valabregue, Romain, Dormont, Didier, Stankoff, Bruno, Colliot, Olivier, Vargas, Luisa, Chacón, Isai Daniel, Pitsiorlas, Ioannis, Arbeláez, Pablo, Zuluaga, Maria A., Schreiber, Stefanie, Speck, Oliver, and Nürnberger, Andreas
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
The human brain receives nutrients and oxygen through an intricate network of blood vessels. Pathology affecting small vessels, at the mesoscopic scale, represents a critical vulnerability within the cerebral blood supply and can lead to severe conditions, such as Cerebral Small Vessel Diseases. The advent of 7 Tesla MRI systems has enabled the acquisition of higher spatial resolution images, making it possible to visualise such vessels in the brain. However, the lack of publicly available annotated datasets has impeded the development of robust, machine learning-driven segmentation algorithms. To address this, the SMILE-UHURA challenge was organised. This challenge, held in conjunction with the ISBI 2023, in Cartagena de Indias, Colombia, aimed to provide a platform for researchers working on related topics. The SMILE-UHURA challenge addresses the gap in publicly available annotated datasets by providing an annotated dataset of Time-of-Flight angiography acquired with 7T MRI. This dataset was created through a combination of automated pre-segmentation and extensive manual refinement. In this manuscript, sixteen submitted methods and two baseline methods are compared both quantitatively and qualitatively on two different datasets: held-out test MRAs from the same dataset as the training data (with labels kept secret) and a separate 7T ToF MRA dataset where both input volumes and labels are kept secret. The results demonstrate that most of the submitted deep learning methods, trained on the provided training dataset, achieved reliable segmentation performance. Dice scores reached up to 0.838 $\pm$ 0.066 and 0.716 $\pm$ 0.125 on the respective datasets, with an average performance of up to 0.804 $\pm$ 0.15.
- Published
- 2024
11. DarkSHINE Baseline Design Report: Physics Prospects and Detector Technologies
- Author
-
Chen, Jing, Chen, Ji-Yuan, Chen, Jun-Feng, Chen, Xiang, Fu, Chang-Bo, Guo, Jun, Guo, Yi-Han, Khaw, Kim Siang, Li, Jia-Lin, Li, Liang, Li, Shu, Lin, Yu-ming, Liu, Dan-Ning, Liu, Kang, Liu, Kun, Liu, Qi-Bin, Liu, Zhi, Lu, Ze-Jia, Lv, Meng, Song, Si-Yuan, Sun, Tong, Tang, Jian-Nan, Wan, Wei-Shi, Wang, Dong, Wang, Xiao-Long, Wang, Yu-Feng, Wang, Zhen, Wang, Zi-Rui, Wu, Wei-Hao, Yang, Hai-Jun, Yang, Lin, Yang, Yong, Yu, Dian, Yuan, Rui, Zhang, Jun-Hua, Zhang, Yu-Lei, Zhang, Yun-Long, Zhao, Zhi-Yu, Zhou, Bai-Hong, Zhu, Chun-Xiang, Zhu, Xu-Liang, and Zhu, Yi-Fan
- Subjects
Physics - Instrumentation and Detectors ,High Energy Physics - Experiment - Abstract
DarkSHINE is a newly proposed fixed-target experiment initiative to search for the invisible decay of Dark Photon via missing energy/momentum signatures, based on the high repetition rate electron beam to be deployed/delivered by the Shanghai High repetition rate XFEL and Extreme light facility (SHINE). This report elaborates the baseline design of DarkSHINE experiment by introducing the physics goals, experimental setups, details of each sub-detector system technical designs, signal and backgground modelings, expected search sensitivities and future prospects, which mark an important step towards the further prototyping and technical demonstrations.
- Published
- 2024
12. A fiber array architecture for atom quantum computing
- Author
-
Li, Xiao, Hou, Jia-Yi, Wang, Jia-Chao, Wang, Guang-Wei, He, Xiao-Dong, Zhou, Feng, Wang, Yi-Bo, Liu, Min, Wang, Jin, Xu, Peng, and Zhan, Ming-Sheng
- Subjects
Quantum Physics - Abstract
Arrays of single atoms trapped in optical tweezers are increasingly recognized as a promising platform for scalable quantum computing. In both the fault-tolerant and NISQ eras, the ability to individually control qubits is essential for the efficient execution of quantum circuits. Time-division multiplexed control schemes based on atom shuttling or beam scanning have been employed to build programmable neutral atom quantum processors, but achieving high-rate, highly parallel gate operations remains a challenge. Here, we propose a fiber array architecture for atom quantum computing capable of fully independent control of individual atoms. The trapping and addressing lasers for each individual atom are emitted from the same optical waveguide, enabling robust control through common-mode suppression of beam pointing noise. Using a fiber array, we experimentally demonstrate the trapping and independent control of ten single atoms in two-dimensional optical tweezers, achieving individually addressed single-qubit gate with an average fidelity of 0.9966(3). Moreover, we perform simultaneous arbitrary single-qubit gate on four randomly selected qubits, resulting in an average fidelity of 0.9961(4). Our work paves the way for time-efficient execution of quantum algorithms on neutral atom quantum computers., Comment: 12 pages
- Published
- 2024
13. RoundTable: Investigating Group Decision-Making Mechanism in Multi-Agent Collaboration
- Author
-
Cho, Young-Min, Shu, Raphael, Das, Nilaksh, Alkhouli, Tamer, Lai, Yi-An, Cai, Jason, Sunkara, Monica, and Zhang, Yi
- Subjects
Computer Science - Multiagent Systems ,Computer Science - Artificial Intelligence - Abstract
This study investigates the efficacy of Multi-Agent Systems in eliciting cross-agent communication and enhancing collective intelligence through group decision-making in a decentralized setting. Unlike centralized mechanisms, where a fixed hierarchy governs social choice, decentralized group decision-making allows agents to engage in joint deliberation. Our research focuses on the dynamics of communication and decision-making within various social choice methods. By applying different voting rules in various environments, we find that moderate decision flexibility yields better outcomes. Additionally, exploring the linguistic features of agent-to-agent conversations reveals indicators of effective collaboration, offering insights into communication patterns that facilitate or hinder collaboration. Finally, we propose various methods for determining the optimal stopping point in multi-agent collaborations based on linguistic cues. Our findings contribute to a deeper understanding of how decentralized decision-making and group conversation shape multi-agent collaboration, with implications for the design of more effective MAS environments., Comment: preprint
- Published
- 2024
14. Building a Taiwanese Mandarin Spoken Language Model: A First Attempt
- Author
-
Yang, Chih-Kai, Fu, Yu-Kuan, Li, Chen-An, Lin, Yi-Cheng, Lin, Yu-Xiang, Chen, Wei-Chih, Chung, Ho Lam, Kuan, Chun-Yi, Huang, Wei-Ping, Lu, Ke-Han, Lin, Tzu-Quan, Wang, Hsiu-Hsuan, Hu, En-Pei, Hsu, Chan-Jan, Tseng, Liang-Hsuan, Chiu, I-Hsiang, Sanga, Ulin, Chen, Xuanjun, Hsu, Po-chun, Yang, Shu-wen, and Lee, Hung-yi
- Subjects
Computer Science - Computation and Language ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
This technical report presents our initial attempt to build a spoken large language model (LLM) for Taiwanese Mandarin, specifically tailored to enable real-time, speech-to-speech interaction in multi-turn conversations. Our end-to-end model incorporates a decoder-only transformer architecture and aims to achieve seamless interaction while preserving the conversational flow, including full-duplex capabilities allowing simultaneous speaking and listening. The paper also details the training process, including data preparation with synthesized dialogues and adjustments for real-time interaction. We also developed a platform to evaluate conversational fluency and response coherence in multi-turn dialogues. We hope the release of the report can contribute to the future development of spoken LLMs in Taiwanese Mandarin., Comment: Work in progress
- Published
- 2024
15. Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
- Author
-
Huang, Chien-yu, Chen, Wei-Chih, Yang, Shu-wen, Liu, Andy T., Li, Chen-An, Lin, Yu-Xiang, Tseng, Wei-Cheng, Diwan, Anuj, Shih, Yi-Jen, Shi, Jiatong, Chen, William, Chen, Xuanjun, Hsiao, Chi-Yuan, Peng, Puyuan, Wang, Shih-Heng, Kuan, Chun-Yi, Lu, Ke-Han, Chang, Kai-Wei, Yang, Chih-Kai, Ritter-Gutierrez, Fabian, Chuang, Ming To, Huang, Kuan-Po, Arora, Siddhant, Lin, You-Kuan, Yeo, Eunjung, Chang, Kalvin, Chien, Chung-Ming, Choi, Kwanghee, Hsieh, Cheng-Hsiu, Lin, Yi-Cheng, Yu, Chee-En, Chiu, I-Hsiang, Guimarães, Heitor R., Han, Jionghao, Lin, Tzu-Quan, Lin, Tzu-Yuan, Chang, Homu, Chang, Ting-Wu, Chen, Chun Wei, Chen, Shou-Jen, Chen, Yu-Hua, Cheng, Hsi-Chun, Dhawan, Kunal, Fang, Jia-Lin, Fang, Shi-Xin, Chiang, Kuan-Yu Fang, Fu, Chi An, Hsiao, Hsien-Fu, Hsu, Ching Yu, Huang, Shao-Syuan, Wei, Lee Chen, Lin, Hsi-Che, Lin, Hsuan-Hao, Lin, Hsuan-Ting, Lin, Jian-Ren, Liu, Ting-Chun, Lu, Li-Chun, Pai, Tsung-Min, Pasad, Ankita, Kuan, Shih-Yun Shan, Shon, Suwon, Tang, Yuxun, Tsai, Yun-Shao, Wei, Jui-Chiang, Wei, Tzu-Chieh, Wu, Chengxi, Wu, Dien-Ruei, Yang, Chao-Han Huck, Yang, Chieh-Chi, Yip, Jia Qi, Yuan, Shao-Xiang, Noroozi, Vahid, Chen, Zhehuai, Wu, Haibin, Livescu, Karen, Harwath, David, Watanabe, Shinji, and Lee, Hung-yi
- Subjects
Computer Science - Computation and Language ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language instructions is critical for bridging communication gaps and facilitating more intuitive interactions. However, the absence of a comprehensive evaluation benchmark poses a significant challenge. We present Dynamic-SUPERB Phase-2, an open and evolving benchmark for the comprehensive evaluation of instruction-based universal speech models. Building upon the first generation, this second version incorporates 125 new tasks contributed collaboratively by the global research community, expanding the benchmark to a total of 180 tasks, making it the largest benchmark for speech and audio evaluation. While the first generation of Dynamic-SUPERB was limited to classification tasks, Dynamic-SUPERB Phase-2 broadens its evaluation capabilities by introducing a wide array of novel and diverse tasks, including regression and sequence generation, across speech, music, and environmental audio. Evaluation results indicate that none of the models performed well universally. SALMONN-13B excelled in English ASR, while WavLLM demonstrated high accuracy in emotion recognition, but current models still require further innovations to handle a broader range of tasks. We will soon open-source all task data and the evaluation pipeline.
- Published
- 2024
16. NeuroFly: A framework for whole-brain single neuron reconstruction
- Author
-
Zhao, Rubin, Liu, Yang, Zhang, Shiqi, Yi, Zijian, Xiao, Yanyang, Xu, Fang, Yang, Yi, and Zhou, Pencheng
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Quantitative Biology - Quantitative Methods - Abstract
Neurons, with their elongated, tree-like dendritic and axonal structures, enable efficient signal integration and long-range communication across brain regions. By reconstructing individual neurons' morphology, we can gain valuable insights into brain connectivity, revealing the structure basis of cognition, movement, and perception. Despite the accumulation of extensive 3D microscopic imaging data, progress has been considerably hindered by the absence of automated tools to streamline this process. Here we introduce NeuroFly, a validated framework for large-scale automatic single neuron reconstruction. This framework breaks down the process into three distinct stages: segmentation, connection, and proofreading. In the segmentation stage, we perform automatic segmentation followed by skeletonization to generate over-segmented neuronal fragments without branches. During the connection stage, we use a 3D image-based path following approach to extend each fragment and connect it with other fragments of the same neuron. Finally, human annotators are required only to proofread the few unresolved positions. The first two stages of our process are clearly defined computer vision problems, and we have trained robust baseline models to solve them. We validated NeuroFly's efficiency using in-house datasets that include a variety of challenging scenarios, such as dense arborizations, weak axons, images with contamination. We will release the datasets along with a suite of visualization and annotation tools for better reproducibility. Our goal is to foster collaboration among researchers to address the neuron reconstruction challenge, ultimately accelerating advancements in neuroscience research. The dataset and code are available at https://github.com/beanli161514/neurofly
- Published
- 2024
17. Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey
- Author
-
Fu, Ao, Zhou, Yi, Zhou, Tao, Yang, Yi, Gao, Bojun, Li, Qun, Wu, Guobin, and Shao, Ling
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Robotics - Abstract
World models and video generation are pivotal technologies in the domain of autonomous driving, each playing a critical role in enhancing the robustness and reliability of autonomous systems. World models, which simulate the dynamics of real-world environments, and video generation models, which produce realistic video sequences, are increasingly being integrated to improve situational awareness and decision-making capabilities in autonomous vehicles. This paper investigates the relationship between these two technologies, focusing on how their structural parallels, particularly in diffusion-based models, contribute to more accurate and coherent simulations of driving scenarios. We examine leading works such as JEPA, Genie, and Sora, which exemplify different approaches to world model design, thereby highlighting the lack of a universally accepted definition of world models. These diverse interpretations underscore the field's evolving understanding of how world models can be optimized for various autonomous driving tasks. Furthermore, this paper discusses the key evaluation metrics employed in this domain, such as Chamfer distance for 3D scene reconstruction and Fr\'echet Inception Distance (FID) for assessing the quality of generated video content. By analyzing the interplay between video generation and world models, this survey identifies critical challenges and future research directions, emphasizing the potential of these technologies to jointly advance the performance of autonomous driving systems. The findings presented in this paper aim to provide a comprehensive understanding of how the integration of video generation and world models can drive innovation in the development of safer and more reliable autonomous vehicles.
- Published
- 2024
18. Automated Vulnerability Detection Using Deep Learning Technique
- Author
-
Yang, Guan-Yan, Ko, Yi-Heng, Wang, Farn, Yeh, Kuo-Hui, Chang, Haw-Shiang, and Chen, Hsueh-Yi
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence ,Computer Science - Software Engineering ,D.2.4 ,D.2.5 - Abstract
Our work explores the utilization of deep learning, specifically leveraging the CodeBERT model, to enhance code security testing for Python applications by detecting SQL injection vulnerabilities. Unlike traditional security testing methods that may be slow and error-prone, our approach transforms source code into vector representations and trains a Long Short-Term Memory (LSTM) model to identify vulnerable patterns. When compared with existing static application security testing (SAST) tools, our model displays superior performance, achieving higher precision, recall, and F1-score. The study demonstrates that deep learning techniques, particularly with CodeBERT's advanced contextual understanding, can significantly improve vulnerability detection, presenting a scalable methodology applicable to various programming languages and vulnerability types., Comment: 4 pages, 1 figures; Presented at The 30st International Conference on Computational & Experimental Engineering and Sciences (ICCES2024)
- Published
- 2024
19. SleepNetZero: Zero-Burden Zero-Shot Reliable Sleep Staging With Neural Networks Based on Ballistocardiograms
- Author
-
Li, Shuzhen, Chen, Yuxin, Chen, Xuesong, Gao, Ruiyang, Zhang, Yupeng, Yu, Chao, Li, Yunfei, Ye, Ziyi, Huang, Weijun, Yi, Hongliang, Leng, Yue, and Wu, Yi
- Subjects
Electrical Engineering and Systems Science - Signal Processing ,Computer Science - Machine Learning - Abstract
Sleep monitoring plays a crucial role in maintaining good health, with sleep staging serving as an essential metric in the monitoring process. Traditional methods, utilizing medical sensors like EEG and ECG, can be effective but often present challenges such as unnatural user experience, complex deployment, and high costs. Ballistocardiography~(BCG), a type of piezoelectric sensor signal, offers a non-invasive, user-friendly, and easily deployable alternative for long-term home monitoring. However, reliable BCG-based sleep staging is challenging due to the limited sleep monitoring data available for BCG. A restricted training dataset prevents the model from generalization across populations. Additionally, transferring to BCG faces difficulty ensuring model robustness when migrating from other data sources. To address these issues, we introduce SleepNetZero, a zero-shot learning based approach for sleep staging. To tackle the generalization challenge, we propose a series of BCG feature extraction methods that align BCG components with corresponding respiratory, cardiac, and movement channels in PSG. This allows models to be trained on large-scale PSG datasets that are diverse in population. For the migration challenge, we employ data augmentation techniques, significantly enhancing generalizability. We conducted extensive training and testing on large datasets~(12393 records from 9637 different subjects), achieving an accuracy of 0.803 and a Cohen's Kappa of 0.718. ZeroSleepNet was also deployed in real prototype~(monitoring pads) and tested in actual hospital settings~(265 users), demonstrating an accuracy of 0.697 and a Cohen's Kappa of 0.589. To the best of our knowledge, this work represents the first known reliable BCG-based sleep staging effort and marks a significant step towards in-home health monitoring., Comment: 25 pages
- Published
- 2024
20. YourSkatingCoach: A Figure Skating Video Benchmark for Fine-Grained Element Analysis
- Author
-
Chen, Wei-Yi, Lin, Yi-Ling, Su, Yu-An, Yeh, Wei-Hsin, and Ku, Lun-Wei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Combining sports and machine learning involves leveraging ML algorithms and techniques to extract insight from sports-related data such as player statistics, game footage, and other relevant information. However, datasets related to figure skating in the literature focus primarily on element classification and are currently unavailable or exhibit only limited access, which greatly raise the entry barrier to developing visual sports technology for it. Moreover, when using such data to help athletes improve their skills, we find they are very coarse-grained: they work for learning what an element is, but they are poorly suited to learning whether the element is good or bad. Here we propose air time detection, a novel motion analysis task, the goal of which is to accurately detect the duration of the air time of a jump. We present YourSkatingCoach, a large, novel figure skating dataset which contains 454 videos of jump elements, the detected skater skeletons in each video, along with the gold labels of the start and ending frames of each jump, together as a video benchmark for figure skating. In addition, although this type of task is often viewed as classification, we cast it as a sequential labeling problem and propose a Transformer-based model to calculate the duration. Experimental results show that the proposed model yields a favorable results for a strong baseline. To further verify the generalizability of the fine-grained labels, we apply the same process to other sports as cross-sports tasks but for coarse-grained task action classification. Here we fine-tune the classification to demonstrate that figure skating, as it contains the essential body movements, constitutes a strong foundation for adaptation to other sports.
- Published
- 2024
21. Gain-Loss Coupled Systems
- Author
-
Zhang, Chunlei, Kim, Mun, Zhang, Yi-Hui, Wang, Yi-Pu, Trivedi, Deepanshu, Krasnok, Alex, Wang, Jianbo, Isleifson, Dustin, Roshko, Roy, and Hu, Can-Ming
- Subjects
Quantum Physics ,Condensed Matter - Mesoscale and Nanoscale Physics ,Physics - Applied Physics ,Physics - Optics - Abstract
Achieving oscillations with small dimensions, high power, high coherence, and low phase noise has been a long-standing goal in wave physics, driving innovations across classical electromagnetic theory and quantum physics. Key applications include electronic oscillators, lasers, and spin-torque oscillations. In recent decades, physicists have increasingly focused on harnessing passive oscillatory modes to manipulate these oscillations, leading to the development of diverse gain-loss coupled systems, including photon-photon, exciton-photon, photon-magnon, magnon-phonon, and magnon-magnon couplings. This review provides a comprehensive overview of these systems, exploring their fundamental physical structures, key experimental observations, and theoretical insights. By synthesizing insights from these studies, we propose future research directions to further advance the understanding and application of gain-loss coupled systems for quantum science and quantum technologies. (The field of gain-loss coupled systems is vast. The authors welcome suggestions and feedback from the community to continuously improve this review article until it is published)., Comment: 20 pages, 10 figures
- Published
- 2024
22. Non-Hermitian Hamiltonian Approach for Two-Dimensional Spectroscopy
- Author
-
Zhang, Hao-Yue, Huang, Bin-Yao, Jin, Jing-Yi-Ran, Yao, Yi-Xuan, and Ai, Qing
- Subjects
Quantum Physics ,Physics - Optics - Abstract
Two-dimensional spectroscopy (2DS) offers significant advantages in terms of high temporal and frequency resolutions and signal-to-noise ratio. Until now, the response-function (RF) formalism has been the prevalent theoretical description. In this study, we compare the non-Hermitian Hamiltonian (NHH) method with the RF formalism in a three-level system with a constant control field. We obtain the signals from both approaches and compare their population dynamics and 2DS. We propose the quasi-Green function for the NHH method, which allows all possible Liouville paths to be inferred. Although the NHH method overestimates relaxations, it also provides a more comprehensive description. Our results demonstrate that the NHH method is more suitable than the RF formalism for investigating the systems that are either dissipative or complex via the 2DS.
- Published
- 2024
23. Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning
- Author
-
Kuan, Chun-Yi and Lee, Hung-yi
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Computation and Language ,Computer Science - Sound - Abstract
Recent advancements in large audio-language models (LALMs) have shown impressive capabilities in understanding and reasoning about audio and speech information. However, these models still face challenges, including hallucinating non-existent sound events, misidentifying the order of sound events, and incorrectly attributing sound sources, which undermine their reliability and real-world application. To systematically evaluate these issues, we propose three distinct tasks: object existence, temporal order, and object attribute within audio. These tasks assess the models' comprehension of critical audio information aspects. Our experimental results reveal limitations in these fundamental tasks, underscoring the need for better models in recognizing specific sound events, determining event sequences, and identifying sound sources. To improve performance in these areas, we introduce a multi-turn chain-of-thought approach, which demonstrates significantly improved model performance across the proposed tasks., Comment: 5 pages, 1 figure
- Published
- 2024
24. A Study of Decay Rate of Bound Negative Muons
- Author
-
Deng, Jian-Bo, Deng, Miao-Yi, Ma, Shi-Jie, Wang, Rui-Bo, Fan, Qi-Qi, He, Peng-Zhang, He, Yi-Peng, Li, Shuo-Wen, and Hu, Xian-Ru
- Subjects
High Energy Physics - Phenomenology - Abstract
A number of experiments show that the decay lifetimes of muons bound to atomic nuclei are longer than the decay lifetimes of free muons. In this paper, a scheme of extending quantum mechanics (EQM) is proposed to resolve this problem. The Schr$\ddot{\text{o}}$dinger's equation is obtained to prove the validation of this attempt. The decay ratio of bound muons is also calculated in EQM, and the result is in good agreement with the experimental data., Comment: 5 pages, 1 figure, 2 tables
- Published
- 2024
25. Bots can Snoop: Uncovering and Mitigating Privacy Risks of Bots in Group Chats
- Author
-
Chou, Kai-Hsiang, Lin, Yi-Min, Wang, Yi-An, Li, Jonathan Weiping, Kim, Tiffany Hyun-Jin, and Hsiao, Hsu-Chun
- Subjects
Computer Science - Cryptography and Security - Abstract
New privacy concerns arise with chatbots on group messaging platforms. Chatbots may access information beyond their intended functionalities, such as messages unintended for chatbots or sender's identities. Chatbot operators may exploit such information to infer personal information and link users across groups, potentially leading to personal data breaches, pervasive tracking, and targeted advertising. Our analysis of conversation datasets shows that (1) chatbots often access far more messages than needed, and (2) when a user joins a new group with chatbots, there is a 3.4% chance that at least one of the chatbots can recognize and associate the user with their previous interactions in other groups. Although state-of-the-art group messaging protocols provide robust end-to-end security and some platforms have implemented policies to limit chatbot access, no platforms successfully combine these features. This paper introduces SnoopGuard, a secure group messaging protocol that ensures user privacy against chatbots while maintaining strong end-to-end security. Our method offers selective message access, preventing chatbots from accessing unrelated messages, and ensures sender anonymity within the group. SnoopGuard achieves $O(\log n + m)$ message-sending complexity for a group of $n$ users and $m$ chatbots, compared to $O(\log(n + m))$ in state-of-the-art protocols, with acceptable overhead for enhanced privacy. Our prototype implementation shows that sending a message in a group of 50 users and 10 chatbots takes about 30 milliseconds when integrated with Message Layer Security (MLS)., Comment: 18 pages, 5 figures
- Published
- 2024
26. First Very Long Baseline Interferometry Detections at 870{\mu}m
- Author
-
Raymond, Alexander W., Doeleman, Sheperd S., Asada, Keiichi, Blackburn, Lindy, Bower, Geoffrey C., Bremer, Michael, Broguiere, Dominique, Chen, Ming-Tang, Crew, Geoffrey B., Dornbusch, Sven, Fish, Vincent L., García, Roberto, Gentaz, Olivier, Goddi, Ciriaco, Han, Chih-Chiang, Hecht, Michael H., Huang, Yau-De, Janssen, Michael, Keating, Garrett K., Koay, Jun Yi, Krichbaum, Thomas P., Lo, Wen-Ping, Matsushita, Satoki, Matthews, Lynn D., Moran, James M., Norton, Timothy J., Patel, Nimesh, Pesce, Dominic W., Ramakrishnan, Venkatessh, Rottmann, Helge, Roy, Alan L., Sánchez, Salvador, Tilanus, Remo P. J., Titus, Michael, Torne, Pablo, Wagner, Jan, Weintroub, Jonathan, Wielgus, Maciek, Young, André, Akiyama, Kazunori, Albentosa-Ruíz, Ezequiel, Alberdi, Antxon, Alef, Walter, Algaba, Juan Carlos, Anantua, Richard, Azulay, Rebecca, Bach, Uwe, Baczko, Anne-Kathrin, Ball, David, Baloković, Mislav, Bandyopadhyay, Bidisha, Barrett, John, Bauböck, Michi, Benson, Bradford A., Bintley, Dan, Blundell, Raymond, Bouman, Katherine L., Boyce, Hope, Brissenden, Roger, Britzen, Silke, Broderick, Avery E., Bronzwaer, Thomas, Bustamante, Sandra, Carlstrom, John E., Chael, Andrew, Chan, Chi-kwan, Chang, Dominic O., Chatterjee, Koushik, Chatterjee, Shami, Chen, Yongjun, Cheng, Xiaopeng, Cho, Ilje, Christian, Pierre, Conroy, Nicholas S., Conway, John E., Crawford, Thomas M., Cruz-Osorio, Alejandro, Cui, Yuzhu, Dahale, Rohan, Davelaar, Jordy, De Laurentis, Mariafelicia, Deane, Roger, Dempsey, Jessica, Desvignes, Gregory, Dexter, Jason, Dhruv, Vedant, Dihingia, Indu K., Dzib, Sergio A., Eatough, Ralph P., Emami, Razieh, Falcke, Heino, Farah, Joseph, Fomalont, Edward, Fontana, Anne-Laure, Ford, H. Alyson, Foschi, Marianna, Fraga-Encinas, Raquel, Freeman, William T., Friberg, Per, Fromm, Christian M., Fuentes, Antonio, Galison, Peter, Gammie, Charles F., Georgiev, Boris, Gold, Roman, Gómez-Ruiz, Arturo I., Gómez, José L., Gu, Minfeng, Gurwell, Mark, Hada, Kazuhiro, Haggard, Daryl, Hesper, Ronald, Heumann, Dirk, Ho, Luis C., Ho, Paul, Honma, Mareki, Huang, Chih-Wei L., Huang, Lei, Hughes, David H., Ikeda, Shiro, Impellizzeri, C. M. Violette, Inoue, Makoto, Issaoun, Sara, James, David J., Jannuzi, Buell T., Jeter, Britton, Jiang, Wu, Jiménez-Rosales, Alejandra, Johnson, Michael D., Jorstad, Svetlana, Jones, Adam C., Joshi, Abhishek V., Jung, Taehyun, Karuppusamy, Ramesh, Kawashima, Tomohisa, Kettenis, Mark, Kim, Dong-Jin, Kim, Jae-Young, Kim, Jongsoo, Kim, Junhan, Kino, Motoki, Kocherlakota, Prashant, Kofuji, Yutaro, Koch, Patrick M., Koyama, Shoko, Kramer, Carsten, Kramer, Joana A., Kramer, Michael, Kubo, Derek, Kuo, Cheng-Yu, La Bella, Noemi, Lee, Sang-Sung, Levis, Aviad, Li, Zhiyuan, Lico, Rocco, Lindahl, Greg, Lindqvist, Michael, Lisakov, Mikhail, Liu, Jun, Liu, Kuo, Liuzzo, Elisabetta, Lobanov, Andrei P., Loinard, Laurent, Lonsdale, Colin J., Lowitz, Amy E., Lu, Ru-Sen, MacDonald, Nicholas R., Mahieu, Sylvain, Maier, Doris, Mao, Jirong, Marchili, Nicola, Markoff, Sera, Marrone, Daniel P., Marscher, Alan P., Martí-Vidal, Iván, Medeiros, Lia, Menten, Karl M., Mizuno, Izumi, Mizuno, Yosuke, Montgomery, Joshua, Moriyama, Kotaro, Moscibrodzka, Monika, Mulaudzi, Wanga, Müller, Cornelia, Müller, Hendrik, Mus, Alejandro, Musoke, Gibwa, Myserlis, Ioannis, Nagai, Hiroshi, Nagar, Neil M., Nakamura, Masanori, Narayanan, Gopal, Natarajan, Iniyan, Nathanail, Antonios, Fuentes, Santiago Navarro, Neilsen, Joey, Ni, Chunchong, Nowak, Michael A., Oh, Junghwan, Okino, Hiroki, Sánchez, Héctor Raúl Olivares, Oyama, Tomoaki, Özel, Feryal, Palumbo, Daniel C. M., Paraschos, Georgios Filippos, Park, Jongho, Parsons, Harriet, Pen, Ue-Li, Piétu, Vincent, PopStefanija, Aleksandar, Porth, Oliver, Prather, Ben, Principe, Giacomo, Psaltis, Dimitrios, Pu, Hung-Yi, Raffin, Philippe A., Rao, Ramprasad, Rawlings, Mark G., Ricarte, Angelo, Ripperda, Bart, Roelofs, Freek, Romero-Cañizales, Cristina, Ros, Eduardo, Roshanineshat, Arash, Ruiz, Ignacio, Ruszczyk, Chet, Rygl, Kazi L. J., Sánchez-Argüelles, David, Sánchez-Portal, Miguel, Sasada, Mahito, Satapathy, Kaushik, Savolainen, Tuomas, Schloerb, F. Peter, Schonfeld, Jonathan, Schuster, Karl-Friedrich, Shao, Lijing, Shen, Zhiqiang, Small, Des, Sohn, Bong Won, SooHoo, Jason, Salas, León David Sosapanta, Souccar, Kamal, Srinivasan, Ranjani, Stanway, Joshua S., Sun, He, Tazaki, Fumie, Tetarenko, Alexandra J., Tiede, Paul, Toma, Kenji, Toscano, Teresa, Traianou, Efthalia, Trent, Tyler, Trippe, Sascha, Turk, Matthew, van Bemmel, Ilse, van Langevelde, Huib Jan, van Rossum, Daniel R., Vos, Jesse, Ward-Thompson, Derek, Wardle, John, Washington, Jasmin E., Wharton, Robert, Wiik, Kaj, Witzel, Gunther, Wondrak, Michael F., Wong, George N., Wu, Qingwen, Yadlapalli, Nitika, Yamaguchi, Paul, Yfantis, Aristomenis, Yoon, Doosoo, Younsi, Ziri, Yu, Wei, Yuan, Feng, Yuan, Ye-Fei, Zensus, J. Anton, Zhang, Shuo, Zhao, Guang-Yao, and Zhao, Shan-Shan
- Subjects
Astrophysics - Instrumentation and Methods for Astrophysics ,Astrophysics - High Energy Astrophysical Phenomena - Abstract
The first very long baseline interferometry (VLBI) detections at 870$\mu$m wavelength (345$\,$GHz frequency) are reported, achieving the highest diffraction-limited angular resolution yet obtained from the surface of the Earth, and the highest-frequency example of the VLBI technique to date. These include strong detections for multiple sources observed on inter-continental baselines between telescopes in Chile, Hawaii, and Spain, obtained during observations in October 2018. The longest-baseline detections approach 11$\,$G$\lambda$ corresponding to an angular resolution, or fringe spacing, of 19$\mu$as. The Allan deviation of the visibility phase at 870$\mu$m is comparable to that at 1.3$\,$mm on the relevant integration time scales between 2 and 100$\,$s. The detections confirm that the sensitivity and signal chain stability of stations in the Event Horizon Telescope (EHT) array are suitable for VLBI observations at 870$\mu$m. Operation at this short wavelength, combined with anticipated enhancements of the EHT, will lead to a unique high angular resolution instrument for black hole studies, capable of resolving the event horizons of supermassive black holes in both space and time., Comment: Corresponding author: S. Doeleman
- Published
- 2024
- Full Text
- View/download PDF
27. Energy calibration of GTM on ground
- Author
-
Huang, Chien-You, Chang, Hsiang-Kuang, Lin, Chih-Hsun, Tsao, Che-Chih, Hu, Chin-Ping, Chang, Hao-Min, Chen, Yan-Fu, Feng, An-Hsuan, Huang, Yi-Wen, Lin, Tzu-Hsuan, Tsao, Yi-Ning, Wu, Chih-En, and Wu, Chun-Wei
- Subjects
Astrophysics - Instrumentation and Methods for Astrophysics - Abstract
The Gamma-ray Transients Monitor (GTM) on board the Formosat-8B (FS-8B) satellite is designed to detect and localize Gamma-Ray Bursts (GRBs). By utilizing 2+2 CITIROC chips to manipulate 4+4 detectors, which are composed of GAGG(Ce) scintillators coupled with Silicon Photomultipliers (SiPMs) and oriented in various directions to achieve all-sky coverage, the GRB saturation fluences of GTM in the 50 keV to 1 MeV range for Short GRBs (SGRBs) and Long GRBs (LGRBs) were estimated to be about $3.1 \times 10^{-4}$ and $5.0 \times 10^{-3}\ {\rm erg/cm^2}$, respectively, based on simulations. To precisely interpret the GTM readout signal in terms of energy, several measurements for isotope and gain calibration were conducted. Despite encountering issues with crosstalk and SiPM saturation effect in the data, the energy spectrum can still be recovered by appropriately discarding channel noise and mapping with the correct ADC-to-energy relation. This paper summarizes the energy resolution of GTM and the linear variations in the relationship between photon energy and readout signal. At 662 keV, the energy resolution is about 16 %. Also, it demonstrates that greater gain is achieved by increasing voltage or decreasing temperature.
- Published
- 2024
28. PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs
- Author
-
Chen, Mengzhao, Liu, Yi, Wang, Jiahao, Bin, Yi, Shao, Wenqi, and Luo, Ping
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language - Abstract
Quantization is essential for deploying Large Language Models (LLMs) by enhancing memory efficiency and inference speed. Existing methods for activation quantization mainly address channel-wise outliers, often neglecting token-wise outliers, leading to reliance on costly per-token dynamic quantization. To address this, we introduce PrefixQuant, a novel technique that isolates outlier tokens offline without re-training. Specifically, PrefixQuant identifies high-frequency outlier tokens and prefixes them in the KV cache, preventing the generation of outlier tokens during inference and simplifying quantization. To our knowledge, PrefixQuant is the first to enable efficient per-tensor static quantization to outperform expensive per-token dynamic quantization. For instance, in W4A4KV4 (4- bit weight, 4-bit activation, and 4-bit KV cache) Llama-3-8B, PrefixQuant with per-tensor static quantization achieves a 7.43 WikiText2 perplexity and 71.08% average accuracy on 5 common-sense reasoning tasks, outperforming previous per-token dynamic quantization methods like QuaRot with 0.98 perplexity improvement and +5.98 points accuracy. Additionally, the inference speed of W4A4 quantized models using PrefixQuant is 1.60x to 2.81x faster than FP16 models and exceeds QuaRot models by 1.2x to 1.3x. Our code is available at \url{https://github.com/ChenMnZ/PrefixQuant}., Comment: A PTQ method to significantly boost the performance of static activation quantization
- Published
- 2024
29. Estimating Body and Hand Motion in an Ego-sensed World
- Author
-
Yi, Brent, Ye, Vickie, Zheng, Maya, Müller, Lea, Pavlakos, Georgios, Ma, Yi, Malik, Jitendra, and Kanazawa, Angjoo
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
We present EgoAllo, a system for human motion estimation from a head-mounted device. Using only egocentric SLAM poses and images, EgoAllo guides sampling from a conditional diffusion model to estimate 3D body pose, height, and hand parameters that capture the wearer's actions in the allocentric coordinate frame of the scene. To achieve this, our key insight is in representation: we propose spatial and temporal invariance criteria for improving model performance, from which we derive a head motion conditioning parameterization that improves estimation by up to 18%. We also show how the bodies estimated by our system can improve the hands: the resulting kinematic and temporal constraints result in over 40% lower hand estimation errors compared to noisy monocular estimates. Project page: https://egoallo.github.io/, Comment: v2: fixed figures for Safari, typos
- Published
- 2024
30. Current Status of Inert Higgs Dark Matter with Dark Fermions
- Author
-
Fan, Yi-Zhong, Li, Yao-Yu, Lu, Chih-Ting, Luo, Xiao-Yi, Tang, Tian-Peng, Tran, Van Que, and Tsai, Yue-Lin Sming
- Subjects
High Energy Physics - Phenomenology - Abstract
The precision measurements of the muon magnetic moment and the $W$ boson mass have sparked interest in the potential deviations from standard model (SM) predictions. While it may be premature to attribute any excesses in these precision measurements to new physics, they do offer a valuable indication of potential directions for physics beyond the SM. Additionally, the particle nature of dark matter (DM) remains a crucial enigma. Despite the absence of any definitive DM signal in direct detection and collider experiments, the Galactic Center GeV $\gamma$-ray excess and the AMS-02 antiproton ($\overline{p}$) excess could potentially offer hints related to the evidence of DM. Motivated by these observations, we propose a simple DM model that addresses all these issues. This model extends the SM by incorporating singlet and doublet Dirac fermion fields, along with a doublet complex scalar field. For the viable parameter regions in this model, we find that future upgrades of the Large Hadron Collider and DM direct detection experiments can only partially probe them, while future high-energy muon colliders hold promise for exploring the unexplored parameter space., Comment: 33 pages, 10 figures, 2 tables. Comments are welcome
- Published
- 2024
31. Intestinal Symptoms among Children Aged 2-7 Years with Autism Spectrum Disorder in 13 Cities of China
- Author
-
Ting Yang, Qian Zhang, Li Chen, Ying Dai, Fei-Yong Jia, Yan Hao, Ling Li, Jie Zhang, Li-Jie Wu, Xiao-Yan Ke, Ming-Ji Yi, Qi Hong, Jin-Jin Chen, Shuan-Feng Fang, Yi-Chao Wang, Qi Wang, Chun-Hua Jin, Jie Chen, and Ting-Yu Li
- Abstract
Background: Autism spectrum disorder (ASD) is a multifactorial, pervasive, neurodevelopmental disorder, of which intestinal symptoms collectively represent one of the most common comorbidities. Methods: In this study, 1,222 children with ASD and 1,206 typically developing (TD) children aged 2-7 years were enrolled from 13 cities in China. Physical measurement and basic information questionnaires were conducted in ASD and TD children. The Childhood Autism Rating Scale (CARS), Social Responsiveness Scale (SRS), and Autism Behavior Checklist (ABC) were used to evaluate the clinical symptoms of children with ASD. The six-item Gastrointestinal Severity Index (6-GSI) was used to evaluate the prevalence of intestinal symptoms in two groups. Results: The detection rates of constipation, stool odor, and total intestinal symptoms in ASD children were significantly higher than those in TD children (40.098% vs. 25.622%, 17.021% vs. 9.287%, and 53.601% vs. 41.294%, respectively). Autistic children presenting with intestinal comorbidity had significantly higher scores on the ABC, SRS, CARS, and multiple subscales than autistic children without intestinal symptoms, suggesting that intestinal comorbidity may exacerbates the core symptoms of ASD children. Conclusion: Intestinal dysfunction was significantly more common in autistic than in TD children. This dysfunction may aggravate the core symptoms of children with ASD.
- Published
- 2024
- Full Text
- View/download PDF
32. Advancing readiness for change in substance use for people with substance use disorders using the Kawa model based intervention program: A quasi-experimental study
- Author
-
Hsiao, Han-Yi, Wang, Tsui-Ying, Lee, Chun-Hung, Lu, Young-Chin, Huang, Yu-Chen, Chien, Ying-Chun, Potenza, Marc N, and Lin, Chung-Ying
- Published
- 2024
33. Comparative genomics revealing the genomic characteristics of 'Klebsiella variicola' clinical isolates in China
- Author
-
Yang, Fang, Liu, Fei-Yi, and Zhong, Yi-Ming
- Published
- 2024
34. Temporally distinct 3D multi-omic dynamics in the developing human brain
- Author
-
Heffel, Matthew G, Zhou, Jingtian, Zhang, Yi, Lee, Dong-Sung, Hou, Kangcheng, Pastor-Alonso, Oier, Abuhanna, Kevin D, Galasso, Joseph, Kern, Colin, Tai, Chu-Yi, Garcia-Padilla, Carlos, Nafisi, Mahsa, Zhou, Yi, Schmitt, Anthony D, Li, Terence, Haeussler, Maximilian, Wick, Brittney, Zhang, Martin Jinye, Xie, Fangming, Ziffra, Ryan S, Mukamel, Eran A, Eskin, Eleazar, Nowakowski, Tomasz J, Dixon, Jesse R, Pasaniuc, Bogdan, Ecker, Joseph R, Zhu, Quan, Bintu, Bogdan, Paredes, Mercedes F, and Luo, Chongyuan
- Subjects
Biological Sciences ,Genetics ,Brain Disorders ,Neurosciences ,Mental Illness ,Mental Health ,Human Genome ,2.1 Biological and endogenous factors ,1.1 Normal biological development and functioning ,Neurological ,Humans ,Cell Differentiation ,Chromatin ,Disease Susceptibility ,DNA Methylation ,Epigenesis ,Genetic ,Epigenomics ,Fetus ,Hippocampus ,Multiomics ,Neuroglia ,Neurons ,Prefrontal Cortex ,Schizophrenia ,Single Molecule Imaging ,Single-Cell Analysis ,Time Factors ,Infant ,Newborn ,General Science & Technology - Abstract
The human hippocampus and prefrontal cortex play critical roles in learning and cognition1,2, yet the dynamic molecular characteristics of their development remain enigmatic. Here we investigated the epigenomic and three-dimensional chromatin conformational reorganization during the development of the hippocampus and prefrontal cortex, using more than 53,000 joint single-nucleus profiles of chromatin conformation and DNA methylation generated by single-nucleus methyl-3C sequencing (snm3C-seq3)3. The remodelling of DNA methylation is temporally separated from chromatin conformation dynamics. Using single-cell profiling and multimodal single-molecule imaging approaches, we have found that short-range chromatin interactions are enriched in neurons, whereas long-range interactions are enriched in glial cells and non-brain tissues. We reconstructed the regulatory programs of cell-type development and differentiation, finding putatively causal common variants for schizophrenia strongly overlapping with chromatin loop-connected, cell-type-specific regulatory regions. Our data provide multimodal resources for studying gene regulatory dynamics in brain development and demonstrate that single-cell three-dimensional multi-omics is a powerful approach for dissecting neuropsychiatric risk loci.
- Published
- 2024
35. Viral proteins resolve the virus-vector conundrum during hemipteran-mediated transmission by subverting salicylic acid signaling pathway.
- Author
-
Zhang, Jing-Ru, Liu, Yi-Ming, Li, Di, Wu, Yi-Jie, Zhao, Shi-Xing, Wang, Xiao-Wei, Liu, Shu-Sheng, Walling, Linda, and Pan, Li-Long
- Subjects
Salicylic Acid ,Animals ,Signal Transduction ,Plant Diseases ,Insect Vectors ,Begomovirus ,Viral Proteins ,Nicotiana ,Hemiptera ,Plant Proteins ,HSP90 Heat-Shock Proteins - Abstract
Hemipteran insects transmit viruses when infesting plants, during which vectors activate salicylic acid (SA)-regulated antiviral defenses. How vector-borne plant viruses circumvent these antiviral defenses is largely unexplored. During co-infections of begomoviruses and betasatellites in plants, betasatellite-encoded βC1 proteins interfere with SA signaling and reduce the activation of antiviral resistance. βC1 inhibits SA-induced degradation of NbNPR3 (Nicotiana benthamiana nonexpressor of pathogenesis-related genes 3), a negative regulator of SA signaling. βC1 does not bind directly to NbNPR3, but regulates NbNPR3 degradation via heat shock protein 90s (NbHSP90s). NbHSP90s bind to both NbNPR3 and βC1 and suppress SA signaling. This viral success strategy appears to be conserved as it is also documented for viral proteins encoded by two aphid-borne viruses. Our findings reveal an exquisite mechanism that facilitates the persistence of vector-borne plant viruses and provide important insights into the intricacies of the virus life cycle.
- Published
- 2024
36. CDK12 loss drives prostate cancer progression, transcription-replication conflicts, and synthetic lethality with paralog CDK13.
- Author
-
Tien, Jean, Luo, Jie, Chang, Yu, Zhang, Yuping, Cheng, Yunhui, Wang, Xiaoju, Yang, Jianzhang, Mannan, Rahul, Mahapatra, Somnath, Shah, Palak, Wang, Xiao-Ming, Todd, Abigail, Eyunni, Sanjana, Cheng, Caleb, Rebernick, Ryan, Xiao, Lanbo, Bao, Yi, Neiswender, James, Brough, Rachel, Pettitt, Stephen, Cao, Xuhong, Miner, Stephanie, Zhou, Licheng, Wu, Yi-Mi, Labanca, Estefania, Wang, Yuzhuo, Parolia, Abhijit, Cieslik, Marcin, Robinson, Dan, Wang, Zhen, Feng, Felix, Chou, Jonathan, Lord, Christopher, Ding, Ke, and Chinnaiyan, Arul
- Subjects
CDK12 ,CDK13 ,Cdk12 knockout ,R-loops ,paralog-based synthetic lethality ,prostate cancer ,transcription-replication conflicts ,Male ,Animals ,Humans ,Cyclin-Dependent Kinases ,Mice ,Synthetic Lethal Mutations ,Prostatic Neoplasms ,Tumor Suppressor Protein p53 ,Disease Progression ,PTEN Phosphohydrolase ,Genomic Instability ,Transcription ,Genetic ,Organoids ,Prostatic Neoplasms ,Castration-Resistant ,Cell Proliferation ,DNA Replication ,Mice ,Knockout ,Cell Line ,Tumor ,Mice ,Inbred C57BL ,CDC2 Protein Kinase - Abstract
Biallelic loss of cyclin-dependent kinase 12 (CDK12) defines a metastatic castration-resistant prostate cancer (mCRPC) subtype. It remains unclear, however, whether CDK12 loss drives prostate cancer (PCa) development or uncovers pharmacologic vulnerabilities. Here, we show Cdk12 ablation in murine prostate epithelium is sufficient to induce preneoplastic lesions with lymphocytic infiltration. In allograft-based CRISPR screening, Cdk12 loss associates positively with Trp53 inactivation but negatively with Pten inactivation. Moreover, concurrent Cdk12/Trp53 ablation promotes proliferation of prostate-derived organoids, while Cdk12 knockout in Pten-null mice abrogates prostate tumor growth. In syngeneic systems, Cdk12/Trp53-null allografts exhibit luminal morphology and immune checkpoint blockade sensitivity. Mechanistically, Cdk12 inactivation mediates genomic instability by inducing transcription-replication conflicts. Strikingly, CDK12-mutant organoids and patient-derived xenografts are sensitive to inhibition or degradation of the paralog kinase, CDK13. We therein establish CDK12 as a bona fide tumor suppressor, mechanistically define how CDK12 inactivation causes genomic instability, and advance a therapeutic strategy for CDK12-mutant mCRPC.
- Published
- 2024
37. Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
- Author
-
Chen, Zhe, Wang, Weiyun, Cao, Yue, Liu, Yangzhou, Gao, Zhangwei, Cui, Erfei, Zhu, Jinguo, Ye, Shenglong, Tian, Hao, Liu, Zhaoyang, Gu, Lixin, Wang, Xuehui, Li, Qingyun, Ren, Yimin, Chen, Zixuan, Luo, Jiapeng, Wang, Jiahao, Jiang, Tan, Wang, Bo, He, Conghui, Shi, Botian, Zhang, Xingcheng, Lv, Han, Wang, Yi, Shao, Wenqi, Chu, Pei, Tu, Zhongying, He, Tong, Wu, Zhiyong, Deng, Huipeng, Ge, Jiaye, Chen, Kai, Dou, Min, Lu, Lewei, Zhu, Xizhou, Lu, Tong, Lin, Dahua, Qiao, Yu, Dai, Jifeng, and Wang, Wenhai
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We introduce InternVL 2.5, an advanced multimodal large language model (MLLM) series that builds upon InternVL 2.0, maintaining its core model architecture while introducing significant enhancements in training and testing strategies as well as data quality. In this work, we delve into the relationship between model scaling and performance, systematically exploring the performance trends in vision encoders, language models, dataset sizes, and test-time configurations. Through extensive evaluations on a wide range of benchmarks, including multi-discipline reasoning, document understanding, multi-image / video understanding, real-world comprehension, multimodal hallucination detection, visual grounding, multilingual capabilities, and pure language processing, InternVL 2.5 exhibits competitive performance, rivaling leading commercial models such as GPT-4o and Claude-3.5-Sonnet. Notably, our model is the first open-source MLLMs to surpass 70% on the MMMU benchmark, achieving a 3.7-point improvement through Chain-of-Thought (CoT) reasoning and showcasing strong potential for test-time scaling. We hope this model contributes to the open-source community by setting new standards for developing and applying multimodal AI systems. HuggingFace demo see https://huggingface.co/spaces/OpenGVLab/InternVL, Comment: Technical Report
- Published
- 2024
38. DreamColour: Controllable Video Colour Editing without Training
- Author
-
Utintu, Chaitat, Chowdhury, Pinaki Nath, Sain, Aneeshan, Koley, Subhadeep, Bhunia, Ayan Kumar, and Song, Yi-Zhe
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Video colour editing is a crucial task for content creation, yet existing solutions either require painstaking frame-by-frame manipulation or produce unrealistic results with temporal artefacts. We present a practical, training-free framework that makes precise video colour editing accessible through an intuitive interface while maintaining professional-quality output. Our key insight is that by decoupling spatial and temporal aspects of colour editing, we can better align with users' natural workflow -- allowing them to focus on precise colour selection in key frames before automatically propagating changes across time. We achieve this through a novel technical framework that combines: (i) a simple point-and-click interface merging grid-based colour selection with automatic instance segmentation for precise spatial control, (ii) bidirectional colour propagation that leverages inherent video motion patterns, and (iii) motion-aware blending that ensures smooth transitions even with complex object movements. Through extensive evaluation on diverse scenarios, we demonstrate that our approach matches or exceeds state-of-the-art methods while eliminating the need for training or specialized hardware, making professional-quality video colour editing accessible to everyone., Comment: Project page available at https://chaitron.github.io/DreamColour-demo
- Published
- 2024
39. QueEn: A Large Language Model for Quechua-English Translation
- Author
-
Chen, Junhao, Shu, Peng, Li, Yiwei, Zhao, Huaqin, Jiang, Hanqi, Pan, Yi, Zhou, Yifan, Liu, Zhengliang, Howe, Lewis C, and Liu, Tianming
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Recent studies show that large language models (LLMs) are powerful tools for working with natural language, bringing advances in many areas of computational linguistics. However, these models face challenges when applied to low-resource languages due to limited training data and difficulty in understanding cultural nuances. In this paper, we propose QueEn, a novel approach for Quechua-English translation that combines Retrieval-Augmented Generation (RAG) with parameter-efficient fine-tuning techniques. Our method leverages external linguistic resources through RAG and uses Low-Rank Adaptation (LoRA) for efficient model adaptation. Experimental results show that our approach substantially exceeds baseline models, with a BLEU score of 17.6 compared to 1.5 for standard GPT models. The integration of RAG with fine-tuning allows our system to address the challenges of low-resource language translation while maintaining computational efficiency. This work contributes to the broader goal of preserving endangered languages through advanced language technologies.
- Published
- 2024
40. Non-Hermitian Generalization of Rayleigh-Schr\'odinger Perturbation Theory
- Author
-
Chen, Wei-Ming, Lin, Yen-Ting, and Ju, Chia-Yi
- Subjects
Quantum Physics - Abstract
While perturbation theories constitute a significant foundation of modern quantum system analysis, extending them from the Hermitian to the non-Hermitian regime remains a non-trivial task. In this work, we generalize the Rayleigh-Schr\"odinger perturbation theory to the non-Hermitian regime by employing a geometric formalism. This framework allows us to compute perturbative corrections to eigenstates and eigenvalues of Hamiltonians iteratively to any order. Furthermore, we observe that the recursion equation for the eigenstates resembles the form of the Girard-Newton formulas, which helps us uncover the general solution to the recursion equation. Moreover, we demonstrate that the perturbation method proposed in this paper reduces to the standard Rayleigh-Schr\"odinger perturbation theory in the Hermitian regime., Comment: 8 pages
- Published
- 2024
41. EvTTC: An Event Camera Dataset for Time-to-Collision Estimation
- Author
-
Sun, Kaizhen, Li, Jinghang, Dai, Kuan, Liao, Bangyan, Xiong, Wei, and Zhou, Yi
- Subjects
Computer Science - Robotics ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Time-to-Collision (TTC) estimation lies in the core of the forward collision warning (FCW) functionality, which is key to all Automatic Emergency Braking (AEB) systems. Although the success of solutions using frame-based cameras (e.g., Mobileye's solutions) has been witnessed in normal situations, some extreme cases, such as the sudden variation in the relative speed of leading vehicles and the sudden appearance of pedestrians, still pose significant risks that cannot be handled. This is due to the inherent imaging principles of frame-based cameras, where the time interval between adjacent exposures introduces considerable system latency to AEB. Event cameras, as a novel bio-inspired sensor, offer ultra-high temporal resolution and can asynchronously report brightness changes at the microsecond level. To explore the potential of event cameras in the above-mentioned challenging cases, we propose EvTTC, which is, to the best of our knowledge, the first multi-sensor dataset focusing on TTC tasks under high-relative-speed scenarios. EvTTC consists of data collected using standard cameras and event cameras, covering various potential collision scenarios in daily driving and involving multiple collision objects. Additionally, LiDAR and GNSS/INS measurements are provided for the calculation of ground-truth TTC. Considering the high cost of testing TTC algorithms on full-scale mobile platforms, we also provide a small-scale TTC testbed for experimental validation and data augmentation. All the data and the design of the testbed are open sourced, and they can serve as a benchmark that will facilitate the development of vision-based TTC techniques., Comment: 8 pages, 7 figures, 5 tables
- Published
- 2024
42. Ultrahigh-temperature ferromagnetism in ultrathin insulating films with ripple-infinite-layer structure
- Author
-
Yi, Yazhuo, Huang, Haoliang, Shao, Ruiwen, Liu, Yukuai, Chen, Guangzheng, Ou, Jiahui, Zhang, Xi, Hua, Ze, Chen, Lang, Leung, Chi Wah, Zeng, Xie-Rong, Rao, Feng, Liu, Nan, Wang, Heng, Si, Liang, An, Hongyu, Chen, Zhuoyu, and Huang, Chuanwei
- Subjects
Condensed Matter - Materials Science - Abstract
Ferromagnetism and electrical insulation are often at odds, signifying an inherent trade off. The simultaneous optimization of both in one material, essential for advancing spintronics and topological electronics, necessitates the individual manipulation over various degrees of freedom of strongly correlated electrons. Here, by selective control of the spin exchange and Coulomb interactions, we report the achievement of SrFeO2 thin films with resistivity above 106 Ohm.cm and strong magnetization with Curie temperature extrapolated to be 1200 K. Robust ferromagnetism is obtained down to 1.0 nm thickness on substrate and 2.0 nm for freestanding films. Featuring an out of plane oriented ripple infinite layer structure, this ferromagnetic insulating phase is obtained through extensive reduction of as grown brownmillerite SrFeO2.5 films at high compressive strains. Pronounced spin Hall magnetoresistance signals up to 0.0026 is further demonstrated with a Pt Hall bar device. Our findings promise emerging spintronic and topological electronic functionalities harnessing spin dynamics with minimized power dissipations.
- Published
- 2024
43. VTD: Visual and Tactile Database for Driver State and Behavior Perception
- Author
-
Wang, Jie, Cai, Mobing, Zhu, Zhongpan, Ding, Hongjun, Yi, Jiwei, and Du, Aimin
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence - Abstract
In the domain of autonomous vehicles, the human-vehicle co-pilot system has garnered significant research attention. To address the subjective uncertainties in driver state and interaction behaviors, which are pivotal to the safety of Human-in-the-loop co-driving systems, we introduce a novel visual-tactile perception method. Utilizing a driving simulation platform, a comprehensive dataset has been developed that encompasses multi-modal data under fatigue and distraction conditions. The experimental setup integrates driving simulation with signal acquisition, yielding 600 minutes of fatigue detection data from 15 subjects and 102 takeover experiments with 17 drivers. The dataset, synchronized across modalities, serves as a robust resource for advancing cross-modal driver behavior perception algorithms.
- Published
- 2024
44. NebulaFL: Effective Asynchronous Federated Learning for JointCloud Computing
- Author
-
Gao, Fei, Hu, Ming, Xie, Zhiyu, Shi, Peichang, Xie, Xiaofei, Yi, Guodong, and Wang, Huaimin
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Artificial Intelligence ,Computer Science - Networking and Internet Architecture - Abstract
With advancements in AI infrastructure and Trusted Execution Environment (TEE) technology, Federated Learning as a Service (FLaaS) through JointCloud Computing (JCC) is promising to break through the resource constraints caused by heterogeneous edge devices in the traditional Federated Learning (FL) paradigm. Specifically, with the protection from TEE, data owners can achieve efficient model training with high-performance AI services in the cloud. By providing additional FL services, cloud service providers can achieve collaborative learning among data owners. However, FLaaS still faces three challenges, i.e., i) low training performance caused by heterogeneous data among data owners, ii) high communication overhead among different clouds (i.e., data centers), and iii) lack of efficient resource scheduling strategies to balance training time and cost. To address these challenges, this paper presents a novel asynchronous FL approach named NebulaFL for collaborative model training among multiple clouds. To address data heterogeneity issues, NebulaFL adopts a version control-based asynchronous FL training scheme in each data center to balance training time among data owners. To reduce communication overhead, NebulaFL adopts a decentralized model rotation mechanism to achieve effective knowledge sharing among data centers. To balance training time and cost, NebulaFL integrates a reward-guided strategy for data owners selection and resource scheduling. The experimental results demonstrate that, compared to the state-of-the-art FL methods, NebulaFL can achieve up to 5.71\% accuracy improvement. In addition, NebulaFL can reduce up to 50% communication overhead and 61.94% costs under a target accuracy.
- Published
- 2024
45. MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects
- Author
-
Fan, Lei, Fan, Dongdong, Hu, Zhiguang, Ding, Yiwen, Di, Donglin, Yi, Kai, Pagnucco, Maurice, and Song, Yang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We present MANTA, a visual-text anomaly detection dataset for tiny objects. The visual component comprises over 137.3K images across 38 object categories spanning five typical domains, of which 8.6K images are labeled as anomalous with pixel-level annotations. Each image is captured from five distinct viewpoints to ensure comprehensive object coverage. The text component consists of two subsets: Declarative Knowledge, including 875 words that describe common anomalies across various domains and specific categories, with detailed explanations for < what, why, how>, including causes and visual characteristics; and Constructivist Learning, providing 2K multiple-choice questions with varying levels of difficulty, each paired with images and corresponded answer explanations. We also propose a baseline for visual-text tasks and conduct extensive benchmarking experiments to evaluate advanced methods across different settings, highlighting the challenges and efficacy of our dataset., Comment: https://grainnet.github.io/MANTA
- Published
- 2024
46. Pushing Rendering Boundaries: Hard Gaussian Splatting
- Author
-
Xu, Qingshan, Cui, Jiequan, Yi, Xuanyu, Wang, Yuxuan, Zhou, Yuan, Ong, Yew-Soon, and Zhang, Hanwang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
3D Gaussian Splatting (3DGS) has demonstrated impressive Novel View Synthesis (NVS) results in a real-time rendering manner. During training, it relies heavily on the average magnitude of view-space positional gradients to grow Gaussians to reduce rendering loss. However, this average operation smooths the positional gradients from different viewpoints and rendering errors from different pixels, hindering the growth and optimization of many defective Gaussians. This leads to strong spurious artifacts in some areas. To address this problem, we propose Hard Gaussian Splatting, dubbed HGS, which considers multi-view significant positional gradients and rendering errors to grow hard Gaussians that fill the gaps of classical Gaussian Splatting on 3D scenes, thus achieving superior NVS results. In detail, we present positional gradient driven HGS, which leverages multi-view significant positional gradients to uncover hard Gaussians. Moreover, we propose rendering error guided HGS, which identifies noticeable pixel rendering errors and potentially over-large Gaussians to jointly mine hard Gaussians. By growing and optimizing these hard Gaussians, our method helps to resolve blurring and needle-like artifacts. Experiments on various datasets demonstrate that our method achieves state-of-the-art rendering quality while maintaining real-time efficiency.
- Published
- 2024
47. NLP-ADBench: NLP Anomaly Detection Benchmark
- Author
-
Li, Yuangang, Li, Jiaqi, Xiao, Zhuo, Yang, Tiankai, Nian, Yi, Hu, Xiyang, and Zhao, Yue
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Anomaly detection (AD) is a critical machine learning task with diverse applications in web systems, including fraud detection, content moderation, and user behavior analysis. Despite its significance, AD in natural language processing (NLP) remains underexplored, limiting advancements in detecting anomalies in text data such as harmful content, phishing attempts, or spam reviews. In this paper, we introduce NLP-ADBench, the most comprehensive benchmark for NLP anomaly detection (NLP-AD), comprising eight curated datasets and evaluations of nineteen state-of-the-art algorithms. These include three end-to-end methods and sixteen two-step algorithms that apply traditional anomaly detection techniques to language embeddings generated by bert-base-uncased and OpenAI's text-embedding-3-large models. Our results reveal critical insights and future directions for NLP-AD. Notably, no single model excels across all datasets, highlighting the need for automated model selection. Moreover, two-step methods leveraging transformer-based embeddings consistently outperform specialized end-to-end approaches, with OpenAI embeddings demonstrating superior performance over BERT embeddings. By releasing NLP-ADBench at https://github.com/USC-FORTIS/NLP-ADBench, we provide a standardized framework for evaluating NLP-AD methods, fostering the development of innovative approaches. This work fills a crucial gap in the field and establishes a foundation for advancing NLP anomaly detection, particularly in the context of improving the safety and reliability of web-based systems., Comment: The project is available at https://github.com/USC-FORTIS/NLP-ADBench
- Published
- 2024
48. Overlay Network Construction: Improved Overall and Node-Wise Message Complexity
- Author
-
Chang, Yi-Jun, Chen, Yanyu, and Mishra, Gopinath
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Data Structures and Algorithms - Abstract
We consider the problem of constructing distributed overlay networks, where nodes in a reconfigurable system can create or sever connections with nodes whose identifiers they know. Initially, each node knows only its own and its neighbors' identifiers, forming a local channel, while the evolving structure is termed the global channel. The goal is to reconfigure any connected graph into a desired topology, such as a bounded-degree expander graph or a well-formed tree with a constant maximum degree and logarithmic diameter, minimizing the total number of rounds and message complexity. This problem mirrors real-world peer-to-peer network construction, where creating robust and efficient systems is desired. We study the overlay reconstruction problem in a network of $n$ nodes in two models: \textbf{GOSSIP-reply} and \textbf{HYBRID}. In the \textbf{GOSSIP-reply} model, each node can send a message and receive a corresponding reply message in one round. In the \textbf{HYBRID} model, a node can send $O(1)$ messages to each neighbor in the local channel and a total of $O(\log n)$ messages in the global channel. In both models, we propose protocols with $O\left(\log^2 n\right)$ round complexities and $O\left(n \log^2 n\right)$ message complexities using messages of $O(\log n)$ bits. Both protocols use $O\left(n \log^3 n\right)$ bits of communication, which we conjecture to be optimal. Additionally, our approach ensures that the total number of messages for node $v$, with degree $\deg(v)$ in the initial topology, is bounded by $O\left(\deg(v) + \log^2 n\right)$ with high probability.
- Published
- 2024
49. Diff4Steer: Steerable Diffusion Prior for Generative Music Retrieval with Semantic Guidance
- Author
-
Bao, Xuchan, Li, Judith Yue, Wan, Zhong Yi, Su, Kun, Denk, Timo, Lee, Joonseok, Kuzmin, Dima, and Sha, Fei
- Subjects
Computer Science - Sound ,Computer Science - Information Retrieval ,Computer Science - Multimedia ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Modern music retrieval systems often rely on fixed representations of user preferences, limiting their ability to capture users' diverse and uncertain retrieval needs. To address this limitation, we introduce Diff4Steer, a novel generative retrieval framework that employs lightweight diffusion models to synthesize diverse seed embeddings from user queries that represent potential directions for music exploration. Unlike deterministic methods that map user query to a single point in embedding space, Diff4Steer provides a statistical prior on the target modality (audio) for retrieval, effectively capturing the uncertainty and multi-faceted nature of user preferences. Furthermore, Diff4Steer can be steered by image or text inputs, enabling more flexible and controllable music discovery combined with nearest neighbor search. Our framework outperforms deterministic regression methods and LLM-based generative retrieval baseline in terms of retrieval and ranking metrics, demonstrating its effectiveness in capturing user preferences, leading to more diverse and relevant recommendations. Listening examples are available at tinyurl.com/diff4steer., Comment: NeurIPS 2024 Creative AI Track
- Published
- 2024
50. Robots in the Wild: Contextually-Adaptive Human-Robot Interactions in Urban Public Environments
- Author
-
Yu, Xinyan, Wang, Yiyuan, Tran, Tram Thi Minh, Zhao, Yi, Perez, Julie Stephany Berrio, Hoggenmuller, Marius, Humphry, Justine, Loke, Lian, Masuda, Lynn, Parker, Callum, Tomitsch, Martin, and Worrall, Stewart
- Subjects
Computer Science - Robotics ,Computer Science - Human-Computer Interaction - Abstract
The increasing transition of human-robot interaction (HRI) context from controlled settings to dynamic, real-world public environments calls for enhanced adaptability in robotic systems. This can go beyond algorithmic navigation or traditional HRI strategies in structured settings, requiring the ability to navigate complex public urban systems containing multifaceted dynamics and various socio-technical needs. Therefore, our proposed workshop seeks to extend the boundaries of adaptive HRI research beyond predictable, semi-structured contexts and highlight opportunities for adaptable robot interactions in urban public environments. This half-day workshop aims to explore design opportunities and challenges in creating contextually-adaptive HRI within these spaces and establish a network of interested parties within the OzCHI research community. By fostering ongoing discussions, sharing of insights, and collaborations, we aim to catalyse future research that empowers robots to navigate the inherent uncertainties and complexities of real-world public interactions.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.