85,188 results on '"Wang, Yang"'
Search Results
2. Comments on the Du-Kakade-Wang-Yang Lower Bounds
- Author
-
Van Roy, Benjamin and Dong, Shi
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Du, Kakade, Wang, and Yang recently established intriguing lower bounds on sample complexity, which suggest that reinforcement learning with a misspecified representation is intractable. Another line of work, which centers around a statistic called the eluder dimension, establishes tractability of problems similar to those considered in the Du-Kakade-Wang-Yang paper. We compare these results and reconcile interpretations.
- Published
- 2019
3. The Great Synthesis of Wang Yang Ming Neo-Confucianism in Korea: The Chonŏn (Testament) by Chŏng Chedu (Hagok) by Edward Y.J. Chung (review)
- Author
-
Long, Maria Hasfeldt
- Published
- 2024
- Full Text
- View/download PDF
4. The Great Synthesis of Wang Yang Ming Neo-Confucianism in Korea: The Chonŏn (Testament) by Chŏng Chedu (Hagok) by Edward Y.J. Chung (review)
- Published
- 2024
- Full Text
- View/download PDF
5. Wang, Yang
- Published
- 2022
6. Hybrid spin-phonon architecture for scalable solid-state quantum nodes
- Author
-
Peng, Ruoming, Wu, Xuntao, Wang, Yang, Zhang, Jixing, Geng, Jianpei, Dasari, Durga Bhaktavatsala Rao, Cleland, Andrew N., and Wrachtrup, Jörg
- Subjects
Quantum Physics - Abstract
Solid-state spin systems hold great promise for quantum information processing and the construction of quantum networks. However, the considerable inhomogeneity of spins in solids poses a significant challenge to the scaling of solid-state quantum systems. A practical protocol to individually control and entangle spins remains elusive. To this end, we propose a hybrid spin-phonon architecture based on spin-embedded SiC optomechanical crystal (OMC) cavities, which integrates photonic and phononic channels allowing for interactions between multiple spins. With a Raman-facilitated process, the OMC cavities support coupling between the spin and the zero-point motion of the OMC cavity mode reaching 0.57 MHz, facilitating phonon preparation and spin Rabi swap processes. Based on this, we develop a spin-phonon interface that achieves a two-qubit controlled-Z gate with a simulated fidelity of $96.80\%$ and efficiently generates highly entangled Dicke states with over $99\%$ fidelity, by engineering the strongly coupled spin-phonon dark state which is robust against loss from excited state relaxation as well as spectral inhomogeneity of the defect centers. This provides a hybrid platform for exploring spin entanglement with potential scalability and full connectivity in addition to an optical link, and offers a pathway to investigate quantum acoustics in solid-state systems.
- Published
- 2024
7. Finite Element Modeling of Surface Traveling Wave Friction Driven for Rotary Ultrasonic Motor
- Author
-
Zhao, Zhanyue, Wang, Yang, Bales, Charles, Jiang, Yiwei, and Fischer, Gregory
- Subjects
Computer Science - Robotics - Abstract
Finite element modeling (FEM) is a critical tool in the design and analysis of piezoelectric devices, offering detailed numerical simulations that guide various applications. While traditionally applied to eigenfrequency analysis and time-dependent studies for predicting excitation eigenfrequencies and estimating traveling wave amplitudes, FEM's potential extends to more sophisticated tasks. Advanced FEM applications, such as modeling friction-driven dynamic motion and reaction forces, are essential for accurately simulating the complex behaviors of piezoelectric actuators under real-world conditions. This paper presents a comprehensive motor model that encompasses the coupling dynamics between the stator and rotor in a piezoelectric ultrasonic motor (USM). Utilizing contact theory, the model simulates the complex conditions encountered during the USM's initial start-up phase and its transition to steady-state operation. Implemented in COMSOL Multiphysics, the model provides an in-depth analysis of a rotary piezoelectric actuator, capturing the dynamic interactions and reaction forces that influence its performance. The introduction of this FEM-based model represents a significant advancement in the simulation and understanding of piezoelectric actuators. By offering a more complete picture of the motor's behavior from start-up to steady state, this study enables more accurate control and optimization of piezoelectric devices, enhancing their efficiency and reliability in practical applications., Comment: 6 pages, 14 figures, 6 tables
- Published
- 2024
8. Learning Agile Swimming: An End-to-End Approach without CPGs
- Author
-
Lin, Xiaozhu, Liu, Xiaopei, and Wang, Yang
- Subjects
Computer Science - Robotics - Abstract
The pursuit of agile and efficient underwater robots, especially bio-mimetic robotic fish, has been impeded by challenges in creating motion controllers that are able to fully exploit their hydrodynamic capabilities. This paper addresses these challenges by introducing a novel, model-free, end-to-end control framework that leverages Deep Reinforcement Learning (DRL) to enable agile and energy-efficient swimming of robotic fish. Unlike existing methods that rely on predefined trigonometric swimming patterns like Central Pattern Generators (CPG), our approach directly outputs low-level actuator commands without strong constraint, enabling the robotic fish to learn agile swimming behaviors. In addition, by integrating a high-performance Computational Fluid Dynamics (CFD) simulator with innovative sim-to-real strategies, such as normalized density matching and servo response matching, the proposed framework significantly mitigates the sim-to-real gap, facilitating direct transfer of control policies to real-world environments without fine-tuning. Comparative experiments demonstrate that our method achieves faster swimming speeds, smaller turning radii, and reduced energy consumption compared to the conventional CPG-PID-based controllers. Furthermore, the proposed framework shows promise in addressing complex tasks in diverse scenario, paving the way for more effective deployment of robotic fish in real aquatic environments., Comment: 8 pages, 7 figures
- Published
- 2024
9. A Preliminary Add-on Differential Drive System for MRI-Compatible Prostate Robotic System
- Author
-
Zhao, Zhanyue, Jiang, Yiwei, Bales, Charles, Wang, Yang, and Fischer, Gregory
- Subjects
Computer Science - Robotics - Abstract
MRI-targeted biopsy has shown significant advantages over conventional random sextant biopsy, detecting more clinically significant cancers and improving risk stratification. However, needle targeting accuracy, especially in transperineal MRI-guided biopsies, presents a challenge due to needle deflection. This can negatively impact patient outcomes, leading to repeated sampling and inaccurate diagnoses if cancerous tissue isn't properly collected. To address this, we developed a novel differential drive prototype designed to improve needle control and targeting precision. This system, featuring a 2-degree-of-freedom (2-DOF) MRI-compatible cooperative needle driver, distances the robot from the MRI imaging area, minimizing image artifacts and distortions. By using two motors for simultaneous needle insertion and rotation without relative movement, the design reduces MRI interference. In this work, we introduced two mechanical differential drive designs: the ball screw/spline and lead screw/bushing types, and explored both hollow-type and side-pulley differentials. Validation through low-resolution rapid-prototyping demonstrated the feasibility of differential drives in prostate biopsies, with the custom hollow-type hybrid ultrasonic motor (USM) achieving a rotary speed of 75 rpm. The side-pulley differential further increased the speed to 168 rpm, ideal for needle rotation applications. Accuracy assessments showed minimal errors in both insertion and rotation motions, indicating that this proof-of-concept design holds great promise for further development. Ultimately, the differential drive offers a promising solution to the critical issue of needle targeting accuracy in MRI-guided prostate biopsies., Comment: 8 pages, 19 figures, 3 tables
- Published
- 2024
10. Characterization and Design of A Hollow Cylindrical Ultrasonic Motor
- Author
-
Zhao, Zhanyue, Wang, Yang, Bales, Charles, Ruiz-Cadalso, Daniel, Zheng, Howard, Furlong-Vazquez, Cosme, and Fischer, Gregory
- Subjects
Computer Science - Robotics - Abstract
Piezoelectric ultrasonic motors perform the advantages of compact design, faster reaction time, and simpler setup compared to other motion units such as pneumatic and hydraulic motors, especially its non-ferromagnetic property makes it a perfect match in MRI-compatible robotics systems compared to traditional DC motors. Hollow shaft motors address the advantages of being lightweight and comparable to solid shafts of the same diameter, low rotational inertia, high tolerance to rotational imbalance due to low weight, and tolerance to high temperature due to low specific mass. This article presents a prototype of a hollow cylindrical ultrasonic motor (HCM) to perform direct drive, eliminate mechanical non-linearity, and reduce the size and complexity of the actuator or end effector assembly. Two equivalent HCMs are presented in this work, and under 50g prepressure on the rotor, it performed 383.3333rpm rotation speed and 57.3504mNm torque output when applying 282$V_{pp}$ driving voltage., Comment: 6 pages, 9 figures, 2 tables
- Published
- 2024
11. Insulator to Metal Transition under High Pressure in FeNb$_3$Se$_{10}$
- Author
-
Wang, Haozhe, Huyan, Shuyuan, Downey, Eoghan, Wang, Yang, Smolenski, Shane, Li, Du, Yang, Li, Bostwick, Aaron, Jozwiak, Chris, Rotenberg, Eli, Bud'ko, Sergey L., Canfield, Paul C., Cava, R. J., Jo, Na Hyun, and Xie, Weiwei
- Subjects
Condensed Matter - Strongly Correlated Electrons - Abstract
Non-magnetic FeNb$_3$Se$_{10}$ has been demonstrated to be an insulator at ambient pressure through both theoretical calculations and experimental measurements and it does not host topological surface states. Here we show that on the application of pressure, FeNb$_3$Se$_{10}$ transitions to a metallic state at around 3.0 GPa. With a further increase in pressure, its resistivity becomes independent of both temperature and pressure. Its crystal structure is maintained to at least 4.4 GPa., Comment: 20 pages, 5 figures
- Published
- 2024
12. Approximation Bounds for Recurrent Neural Networks with Application to Regression
- Author
-
Jiao, Yuling, Wang, Yang, and Yan, Bokai
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
We study the approximation capacity of deep ReLU recurrent neural networks (RNNs) and explore the convergence properties of nonparametric least squares regression using RNNs. We derive upper bounds on the approximation error of RNNs for H\"older smooth functions, in the sense that the output at each time step of an RNN can approximate a H\"older function that depends only on past and current information, termed a past-dependent function. This allows a carefully constructed RNN to simultaneously approximate a sequence of past-dependent H\"older functions. We apply these approximation results to derive non-asymptotic upper bounds for the prediction error of the empirical risk minimizer in regression problem. Our error bounds achieve minimax optimal rate under both exponentially $\beta$-mixing and i.i.d. data assumptions, improving upon existing ones. Our results provide statistical guarantees on the performance of RNNs.
- Published
- 2024
13. Development of Advanced FEM Simulation Technology for Pre-Operative Surgical Planning
- Author
-
Zhao, Zhanyue, Jiang, Yiwei, Bales, Charles, Wang, Yang, and Fischer, Gregory
- Subjects
Physics - Medical Physics ,Computer Science - Robotics - Abstract
Intracorporeal needle-based therapeutic ultrasound (NBTU) offers a minimally invasive approach for the thermal ablation of malignant brain tumors, including both primary and metastatic cancers. NBTU utilizes a high-frequency alternating electric field to excite a piezoelectric transducer, generating acoustic waves that cause localized heating and tumor cell ablation, and it provides a more precise ablation by delivering lower acoustic power doses directly to targeted tumors while sparing surrounding healthy tissue. Building on our previous work, this study introduces a database for optimizing pre-operative surgical planning by simulating ablation effects in varied tissue environments and develops an extended simulation model incorporating various tumor types and sizes to evaluate thermal damage under trans-tissue conditions. A comprehensive database is created from these simulations, detailing critical parameters such as CEM43 isodose maps, temperature changes, thermal dose areas, and maximum ablation distances for four directional probes. This database serves as a valuable resource for future studies, aiding in complex trajectory planning and parameter optimization for NBTU procedures. Moreover, a novel probe selection method is proposed to enhance pre-surgical planning, providing a strategic approach to selecting probes that maximize therapeutic efficiency and minimize ablation time. By avoiding unnecessary thermal propagation and optimizing probe angles, this method has the potential to improve patient outcomes and streamline surgical procedures. Overall, the findings of this study contribute significantly to the field of NBTU, offering a robust framework for enhancing treatment precision and efficacy in clinical settings., Comment: 8 pages, 17 figures, 2 tables
- Published
- 2024
14. VQ-DeepVSC: A Dual-Stage Vector Quantization Framework for Video Semantic Communication
- Author
-
Miao, Yongyi, Li, Zhongdang, Wang, Yang, Hu, Die, Yan, Jun, and Wang, Youfang
- Subjects
Computer Science - Networking and Internet Architecture - Abstract
In response to the rapid growth of global videomtraffic and the limitations of traditional wireless transmission systems, we propose a novel dual-stage vector quantization framework, VQ-DeepVSC, tailored to enhance video transmission over wireless channels. In the first stage, we design the adaptive keyframe extractor and interpolator, deployed respectively at the transmitter and receiver, which intelligently select key frames to minimize inter-frame redundancy and mitigate the cliff-effect under challenging channel conditions. In the second stage, we propose the semantic vector quantization encoder and decoder, placed respectively at the transmitter and receiver, which efficiently compress key frames using advanced indexing and spatial normalization modules to reduce redundancy. Additionally, we propose adjustable index selection and recovery modules, enhancing compression efficiency and enabling flexible compression ratio adjustment. Compared to the joint source-channel coding (JSCC) framework, the proposed framework exhibits superior compatibility with current digital communication systems. Experimental results demonstrate that VQ-DeepVSC achieves substantial improvements in both Multi-Scale Structural Similarity (MS-SSIM) and Learned Perceptual Image Patch Similarity (LPIPS) metrics than the H.265 standard, particularly under low channel signal-to-noise ratio (SNR) or multi-path channels, highlighting the significantly enhanced transmission capabilities of our approach.
- Published
- 2024
15. Simple H\'uckel Molecular Orbital Theory for M\'obius Carbon Nanobelts
- Author
-
Wang, Yang
- Subjects
Physics - Chemical Physics ,Condensed Matter - Materials Science - Abstract
The recently synthesized M\"obius carbon nanobelts (CNBs) have gained attention owing to their unique $\pi$-conjugation topology, which results in distinctive electronic properties with both fundamental and practical implications. Although M\"obius conjugation with phase inversion in atomic orbital (AO) basis is well-established for monocyclic systems, the extension of this understanding to double-stranded M\"obius CNBs remains uncertain. This study thoroughly examines the simple H\"uckel molecular orbital (SHMO) theory for describing the $\pi$ electronic structures of M\"obius CNBs. We demonstrate that the adjacency matrix for any M\"obius CNB is isomorphism invariant under different placements of the sign inversion, ensuring identical SHMO results regardless of AO phase inversion location. Representative examples of M\"obius CNBs, including the experimentally synthesized one, show that the H\"uckel molecular orbitals (MOs) strikingly resemble the DFT-computed $\pi$ MOs, which were obtained using a herein proposed technique based on the localization and re-delocalization of DFT canonical MOs. Interestingly, the lower-lying $\pi$ MOs exhibit an odd number of nodal planes and are doubly quasidegenerate as a consequence of the phase inversion in M\"obius macrocycles, contrasting with macrocyclic H\"uckel systems. Coulson bond orders derived from SHMO theory correlate well with DFT-calculated Wiberg bond indices for all C-C bonds in tested M\"obius CNBs. Additionally, a remarkable correlation is observed between HOMO-LUMO gaps obtained from the SHMO and GFN2-xTB calculations for a large number of topoisomers of M\"obius CNBs. Thus, the SHMO model not only captures the essence of $\pi$ electronic structure of M\"obius CNBs, but also provides reliable quantitative predictions comparable to DFT results., Comment: 18 pages, 9 figures
- Published
- 2024
16. Deep Brain Ultrasound Ablation Thermal Dose Modeling with in Vivo Experimental Validation
- Author
-
Zhao, Zhanyue, Szewczyk, Benjamin, Tarasek, Matthew, Bales, Charles, Wang, Yang, Liu, Ming, Jiang, Yiwei, Bhushan, Chitresh, Fiveland, Eric, Campwala, Zahabiya, Trowbridge, Rachel, Johansen, Phillip M., Olmsted, Zachary, Ghoshal, Goutam, Heffter, Tamas, Gandomi, Katie, Tavakkolmoghaddam, Farid, Nycz, Christopher, Jeannotte, Erin, Mane, Shweta, Nalwalk, Julia, Burdette, E. Clif, Qian, Jiang, Yeo, Desmond, Pilitsis, Julie, and Fischer, Gregory S.
- Subjects
Physics - Medical Physics ,Computer Science - Robotics - Abstract
Intracorporeal needle-based therapeutic ultrasound (NBTU) is a minimally invasive option for intervening in malignant brain tumors, commonly used in thermal ablation procedures. This technique is suitable for both primary and metastatic cancers, utilizing a high-frequency alternating electric field (up to 10 MHz) to excite a piezoelectric transducer. The resulting rapid deformation of the transducer produces an acoustic wave that propagates through tissue, leading to localized high-temperature heating at the target tumor site and inducing rapid cell death. To optimize the design of NBTU transducers for thermal dose delivery during treatment, numerical modeling of the acoustic pressure field generated by the deforming piezoelectric transducer is frequently employed. The bioheat transfer process generated by the input pressure field is used to track the thermal propagation of the applicator over time. Magnetic resonance thermal imaging (MRTI) can be used to experimentally validate these models. Validation results using MRTI demonstrated the feasibility of this model, showing a consistent thermal propagation pattern. However, a thermal damage isodose map is more advantageous for evaluating therapeutic efficacy. To achieve a more accurate simulation based on the actual brain tissue environment, a new finite element method (FEM) simulation with enhanced damage evaluation capabilities was conducted. The results showed that the highest temperature and ablated volume differed between experimental and simulation results by 2.1884{\deg}C (3.71%) and 0.0631 cm$^3$ (5.74%), respectively. The lowest Pearson correlation coefficient (PCC) for peak temperature was 0.7117, and the lowest Dice coefficient for the ablated area was 0.7021, indicating a good agreement in accuracy between simulation and experiment., Comment: 9 pages, 9 figures, 7 tables
- Published
- 2024
17. Multi-scale Temporal Fusion Transformer for Incomplete Vehicle Trajectory Prediction
- Author
-
Liu, Zhanwen, Li, Chao, Wang, Yang, Yang, Nan, Fan, Xing, Ma, Jiaqi, and Zhao, Xiangmo
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Motion prediction plays an essential role in autonomous driving systems, enabling autonomous vehicles to achieve more accurate local-path planning and driving decisions based on predictions of the surrounding vehicles. However, existing methods neglect the potential missing values caused by object occlusion, perception failures, etc., which inevitably degrades the trajectory prediction performance in real traffic scenarios. To address this limitation, we propose a novel end-to-end framework for incomplete vehicle trajectory prediction, named Multi-scale Temporal Fusion Transformer (MTFT), which consists of the Multi-scale Attention Head (MAH) and the Continuity Representation-guided Multi-scale Fusion (CRMF) module. Specifically, the MAH leverages the multi-head attention mechanism to parallelly capture multi-scale motion representation of trajectory from different temporal granularities, thus mitigating the adverse effect of missing values on prediction. Furthermore, the multi-scale motion representation is input into the CRMF module for multi-scale fusion to obtain the robust temporal feature of the vehicle. During the fusion process, the continuity representation of vehicle motion is first extracted across time steps to guide the fusion, ensuring that the resulting temporal feature incorporates both detailed information and the overall trend of vehicle motion, which facilitates the accurate decoding of future trajectory that is consistent with the vehicle's motion trend. We evaluate the proposed model on four datasets derived from highway and urban traffic scenarios. The experimental results demonstrate its superior performance in the incomplete vehicle trajectory prediction task compared with state-of-the-art models, e.g., a comprehensive performance improvement of more than 39% on the HighD dataset.
- Published
- 2024
18. Scaler: Efficient and Effective Cross Flow Analysis
- Author
-
Steven, Tang, Xiang, Mingcan, Wang, Yang, Wu, Bo, Chen, Jianjun, and Liu, Tongping
- Subjects
Computer Science - Performance - Abstract
Performance analysis is challenging as different components (e.g.,different libraries, and applications) of a complex system can interact with each other. However, few existing tools focus on understanding such interactions. To bridge this gap, we propose a novel analysis method "Cross Flow Analysis (XFA)" that monitors the interactions/flows across these components. We also built the Scaler profiler that provides a holistic view of the time spent on each component (e.g., library or application) and every API inside each component. This paper proposes multiple new techniques, such as Universal Shadow Table, and Relation-Aware Data Folding. These techniques enable Scaler to achieve low runtime overhead, low memory overhead, and high profiling accuracy. Based on our extensive experimental results, Scaler detects multiple unknown performance issues inside widely-used applications, and therefore will be a useful complement to existing work. The reproduction package including the source code, benchmarks, and evaluation scripts, can be found at https://doi.org/10.5281/zenodo.13336658., Comment: Paper has been accepted by ASE'24
- Published
- 2024
19. Human-Inspired Audio-Visual Speech Recognition: Spike Activity, Cueing Interaction and Causal Processing
- Author
-
Liu, Qianhui, Wang, Jiadong, Wang, Yang, Yang, Xin, Pan, Gang, and Li, Haizhou
- Subjects
Computer Science - Multimedia ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Humans naturally perform audiovisual speech recognition (AVSR), enhancing the accuracy and robustness by integrating auditory and visual information. Spiking neural networks (SNNs), which mimic the brain's information-processing mechanisms, are well-suited for emulating the human capability of AVSR. Despite their potential, research on SNNs for AVSR is scarce, with most existing audio-visual multimodal methods focused on object or digit recognition. These models simply integrate features from both modalities, neglecting their unique characteristics and interactions. Additionally, they often rely on future information for current processing, which increases recognition latency and limits real-time applicability. Inspired by human speech perception, this paper proposes a novel human-inspired SNN named HI-AVSNN for AVSR, incorporating three key characteristics: cueing interaction, causal processing and spike activity. For cueing interaction, we propose a visual-cued auditory attention module (VCA2M) that leverages visual cues to guide attention to auditory features. We achieve causal processing by aligning the SNN's temporal dimension with that of visual and auditory features and applying temporal masking to utilize only past and current information. To implement spike activity, in addition to using SNNs, we leverage the event camera to capture lip movement as spikes, mimicking the human retina and providing efficient visual data. We evaluate HI-AVSNN on an audiovisual speech recognition dataset combining the DVS-Lip dataset with its corresponding audio samples. Experimental results demonstrate the superiority of our proposed fusion method, outperforming existing audio-visual SNN fusion methods and achieving a 2.27% improvement in accuracy over the only existing SNN-based AVSR method.
- Published
- 2024
20. Cross-sectional imaging of speed-of-sound distribution using photoacoustic reversal beacons
- Author
-
Wang, Yang, Wang, Danni, Zhong, Liting, Zhou, Yi, Wang, Qing, Chen, Wufan, and Qi, Li
- Subjects
Physics - Medical Physics ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Photoacoustic tomography (PAT) enables non-invasive cross-sectional imaging of biological tissues, but it fails to map the spatial variation of speed-of-sound (SOS) within tissues. While SOS is intimately linked to density and elastic modulus of tissues, the imaging of SOS distri-bution serves as a complementary imaging modality to PAT. Moreover, an accurate SOS map can be leveraged to correct for PAT image degradation arising from acoustic heterogene-ities. Herein, we propose a novel approach for SOS reconstruction using only PAT imaging modality. Our method is based on photoacoustic reversal beacons (PRBs), which are small light-absorbing targets with strong photoacoustic contrast. We excite and scan a number of PRBs positioned at the periphery of the target, and the generated photoacoustic waves prop-agate through the target from various directions, thereby achieve spatial sampling of the internal SOS. We formulate a linear inverse model for pixel-wise SOS reconstruction and solve it with iterative optimization technique. We validate the feasibility of the proposed method through simulations, phantoms, and ex vivo biological tissue tests. Experimental results demonstrate that our approach can achieve accurate reconstruction of SOS distribu-tion. Leveraging the obtained SOS map, we further demonstrate significantly enhanced PAT image reconstruction with acoustic correction.
- Published
- 2024
21. TripleMixer: A 3D Point Cloud Denoising Model for Adverse Weather
- Author
-
Zhao, Xiongwei, Wen, Congcong, Wang, Yang, Bai, Haojie, and Dou, Wenhao
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Robotics - Abstract
LiDAR sensors are crucial for providing high-resolution 3D point cloud data in autonomous driving systems, enabling precise environmental perception. However, real-world adverse weather conditions, such as rain, fog, and snow, introduce significant noise and interference, degrading the reliability of LiDAR data and the performance of downstream tasks like semantic segmentation. Existing datasets often suffer from limited weather diversity and small dataset sizes, which restrict their effectiveness in training models. Additionally, current deep learning denoising methods, while effective in certain scenarios, often lack interpretability, complicating the ability to understand and validate their decision-making processes. To overcome these limitations, we introduce two large-scale datasets, Weather-KITTI and Weather-NuScenes, which cover three common adverse weather conditions: rain, fog, and snow. These datasets retain the original LiDAR acquisition information and provide point-level semantic labels for rain, fog, and snow. Furthermore, we propose a novel point cloud denoising model, TripleMixer, comprising three mixer layers: the Geometry Mixer Layer, the Frequency Mixer Layer, and the Channel Mixer Layer. These layers are designed to capture geometric spatial information, extract multi-scale frequency information, and enhance the multi-channel feature information of point clouds, respectively. Experiments conducted on the WADS dataset in real-world scenarios, as well as on our proposed Weather-KITTI and Weather-NuScenes datasets, demonstrate that our model achieves state-of-the-art denoising performance. Additionally, our experiments show that integrating the denoising model into existing segmentation frameworks enhances the performance of downstream tasks.The datasets and code will be made publicly available at https://github.com/Grandzxw/TripleMixer., Comment: 15 pages, submit to IEEE TIP
- Published
- 2024
22. Motion-driven quantum dissipation in an open electronic system with nonlocal interaction
- Author
-
Wang, Yang, Zhang, Ruanjing, and Liu, Feiyi
- Subjects
Quantum Physics ,Condensed Matter - Other Condensed Matter - Abstract
In this paper, we study excitations and dissipation in two infinite parallel metallic plates with relative motion. We model the degrees of freedom of the electrons in both plates using the 1+2 dimensional Dirac field and select a nonlocal potential to describe the interaction between the two plates. The internal relative motion is introduced via a Galilean boost, assuming one plate slides relative to the other. We then calculate the effective action of the system and derive the vacuum occupation number in momentum space using a perturbative method. The numerical plots show that, as a function of momentum the vacuum occupation number is isotropic for a motion speed v = 0 and anisotropic for nonzero v. Due to energy transfer between the plates, the process of relative motion induces on-shell excitations, similar to the dissipative process of the Schwinger effect. Therefore, we can study the motion-induced dissipation effects and the dissipative forces via quantum action. The numerical results demonstrate that both the imaginary part of the quantum action for the motion boost and the dissipative force have a threshold as a function of v, and both are positively correlated with v.
- Published
- 2024
23. QMambaBSR: Burst Image Super-Resolution with Query State Space Model
- Author
-
Di, Xin, Peng, Long, Xia, Peizhe, Li, Wenbo, Pei, Renjing, Cao, Yang, Wang, Yang, and Zha, Zheng-Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Burst super-resolution aims to reconstruct high-resolution images with higher quality and richer details by fusing the sub-pixel information from multiple burst low-resolution frames. In BusrtSR, the key challenge lies in extracting the base frame's content complementary sub-pixel details while simultaneously suppressing high-frequency noise disturbance. Existing methods attempt to extract sub-pixels by modeling inter-frame relationships frame by frame while overlooking the mutual correlations among multi-current frames and neglecting the intra-frame interactions, leading to inaccurate and noisy sub-pixels for base frame super-resolution. Further, existing methods mainly employ static upsampling with fixed parameters to improve spatial resolution for all scenes, failing to perceive the sub-pixel distribution difference across multiple frames and cannot balance the fusion weights of different frames, resulting in over-smoothed details and artifacts. To address these limitations, we introduce a novel Query Mamba Burst Super-Resolution (QMambaBSR) network, which incorporates a Query State Space Model (QSSM) and Adaptive Up-sampling module (AdaUp). Specifically, based on the observation that sub-pixels have consistent spatial distribution while random noise is inconsistently distributed, a novel QSSM is proposed to efficiently extract sub-pixels through inter-frame querying and intra-frame scanning while mitigating noise interference in a single step. Moreover, AdaUp is designed to dynamically adjust the upsampling kernel based on the spatial distribution of multi-frame sub-pixel information in the different burst scenes, thereby facilitating the reconstruction of the spatial arrangement of high-resolution details. Extensive experiments on four popular synthetic and real-world benchmarks demonstrate that our method achieves a new state-of-the-art performance.
- Published
- 2024
24. Quantum key distribution based on mid-infrared and telecom band two-color entanglement source
- Author
-
Li, Wu-Zhen, Zhou, Chun, Wang, Yang, Chen, Li, Chen, Ren-Hui, Han, Zhao-Qi-Zhi, Gao, Ming-Yuan, Wang, Xiao-Hua, Zheng, Di-Yuan, Xie, Meng-Yu, Li, Yin-Hai, Zhou, Zhi-Yuan, Bao, Wan-Su, and Shi, Bao-Sen
- Subjects
Quantum Physics - Abstract
Due to the high noise caused by solar background radiation, the existing satellite-based free-space quantum key distribution (QKD) experiments are mainly carried out at night, hindering the establishment of a practical all-day real-time global-scale quantum network. Given that the 3-5 {\mu}m mid-infrared (MIR) band has extremely low solar background radiation and strong scattering resistance, it is one of the ideal bands for free-space quantum communication. Here, firstly, we report on the preparation of a high-quality MIR (3370 nm) and telecom band (1555 nm) two-color polarization-entangled photon source, then we use this source to realize a principle QKD based on free-space and fiber hybrid channels in a laboratory. The theoretical analysis clearly shows that a long-distance QKD over 500 km of free-space and 96 km of fiber hybrid channels can be reached simultaneously. This work represents a significant step toward developing all-day global-scale quantum communication networks., Comment: 24 pages, 9 figures
- Published
- 2024
25. PointNCBW: Towards Dataset Ownership Verification for Point Clouds via Negative Clean-label Backdoor Watermark
- Author
-
Wei, Cheng, Wang, Yang, Gao, Kuofeng, Shao, Shuo, Li, Yiming, Wang, Zhibo, and Qin, Zhan
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Recently, point clouds have been widely used in computer vision, whereas their collection is time-consuming and expensive. As such, point cloud datasets are the valuable intellectual property of their owners and deserve protection. To detect and prevent unauthorized use of these datasets, especially for commercial or open-sourced ones that cannot be sold again or used commercially without permission, we intend to identify whether a suspicious third-party model is trained on our protected dataset under the black-box setting. We achieve this goal by designing a scalable clean-label backdoor-based dataset watermark for point clouds that ensures both effectiveness and stealthiness. Unlike existing clean-label watermark schemes, which are susceptible to the number of categories, our method could watermark samples from all classes instead of only from the target one. Accordingly, it can still preserve high effectiveness even on large-scale datasets with many classes. Specifically, we perturb selected point clouds with non-target categories in both shape-wise and point-wise manners before inserting trigger patterns without changing their labels. The features of perturbed samples are similar to those of benign samples from the target class. As such, models trained on the watermarked dataset will have a distinctive yet stealthy backdoor behavior, i.e., misclassifying samples from the target class whenever triggers appear, since the trained DNNs will treat the inserted trigger pattern as a signal to deny predicting the target label. We also design a hypothesis-test-guided dataset ownership verification based on the proposed watermark. Extensive experiments on benchmark datasets are conducted, verifying the effectiveness of our method and its resistance to potential removal methods., Comment: 12 pages
- Published
- 2024
26. Early Risk Assessment Model for ICA Timing Strategy in Unstable Angina Patients Using Multi-Modal Machine Learning
- Author
-
Zheng, Candi, Liu, Kun, Wang, Yang, Chen, Shiyi, and Li, Hongli
- Subjects
Computer Science - Machine Learning - Abstract
Background: Invasive coronary arteriography (ICA) is recognized as the gold standard for diagnosing cardiovascular diseases, including unstable angina (UA). The challenge lies in determining the optimal timing for ICA in UA patients, balancing the need for revascularization in high-risk patients against the potential complications in low-risk ones. Unlike myocardial infarction, UA does not have specific indicators like ST-segment deviation or cardiac enzymes, making risk assessment complex. Objectives: Our study aims to enhance the early risk assessment for UA patients by utilizing machine learning algorithms. These algorithms can potentially identify patients who would benefit most from ICA by analyzing less specific yet related indicators that are challenging for human physicians to interpret. Methods: We collected data from 640 UA patients at Shanghai General Hospital, including medical history and electrocardiograms (ECG). Machine learning algorithms were trained using multi-modal demographic characteristics including clinical risk factors, symptoms, biomarker levels, and ECG features extracted by pre-trained neural networks. The goal was to stratify patients based on their revascularization risk. Additionally, we translated our models into applicable and explainable look-up tables through discretization for practical clinical use. Results: The study achieved an Area Under the Curve (AUC) of $0.719 \pm 0.065$ in risk stratification, significantly surpassing the widely adopted GRACE score's AUC of $0.579 \pm 0.044$. Conclusions: The results suggest that machine learning can provide superior risk stratification for UA patients. This improved stratification could help in balancing the risks, costs, and complications associated with ICA, indicating a potential shift in clinical assessment practices for unstable angina.
- Published
- 2024
27. CLIP-based Point Cloud Classification via Point Cloud to Image Translation
- Author
-
Ghose, Shuvozit, Li, Manyi, Qian, Yiming, and Wang, Yang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Point cloud understanding is an inherently challenging problem because of the sparse and unordered structure of the point cloud in the 3D space. Recently, Contrastive Vision-Language Pre-training (CLIP) based point cloud classification model i.e. PointCLIP has added a new direction in the point cloud classification research domain. In this method, at first multi-view depth maps are extracted from the point cloud and passed through the CLIP visual encoder. To transfer the 3D knowledge to the network, a small network called an adapter is fine-tuned on top of the CLIP visual encoder. PointCLIP has two limitations. Firstly, the point cloud depth maps lack image information which is essential for tasks like classification and recognition. Secondly, the adapter only relies on the global representation of the multi-view features. Motivated by this observation, we propose a Pretrained Point Cloud to Image Translation Network (PPCITNet) that produces generalized colored images along with additional salient visual cues to the point cloud depth maps so that it can achieve promising performance on point cloud classification and understanding. In addition, we propose a novel viewpoint adapter that combines the view feature processed by each viewpoint as well as the global intertwined knowledge that exists across the multi-view features. The experimental results demonstrate the superior performance of the proposed model over existing state-of-the-art CLIP-based models on ModelNet10, ModelNet40, and ScanobjectNN datasets., Comment: Accepted by ICPR2024
- Published
- 2024
28. The magnon mediated plasmon friction: a functional integral approach
- Author
-
Wang, Yang, Zhang, Ruanjing, and Liu, Feiyi
- Subjects
Condensed Matter - Statistical Mechanics ,Condensed Matter - Other Condensed Matter - Abstract
In this paper, we discuss quantum friction in a system formed by two metallic surfaces separated by a ferromagnetic intermedium of a certain thickness. The internal degrees of freedom in the two metallic surfaces are assumed to be plasmons, while the excitations in the intermediate material are magnons, modeling plasmons coupled to magnons. During relative sliding, one surface moves uniformly parallel to the other, causing friction in the system. By calculating the effective action of the magnons, we can determine the particle production probability, which shows a positive correlation between the probability and the sliding speed. Finally, we derive the frictional force of the system, with both theoretical and numerical results indicating that the friction, like the particle production probability, also has a positive correlation with the speed.
- Published
- 2024
29. BIV-Priv-Seg: Locating Private Content in Images Taken by People With Visual Impairments
- Author
-
Tseng, Yu-Yun, Sharma, Tanusree, Zhang, Lotus, Stangl, Abigale, Findlater, Leah, Wang, Yang, and Gurari, Danna
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Individuals who are blind or have low vision (BLV) are at a heightened risk of sharing private information if they share photographs they have taken. To facilitate developing technologies that can help preserve privacy, we introduce BIV-Priv-Seg, the first localization dataset originating from people with visual impairments that shows private content. It contains 1,028 images with segmentation annotations for 16 private object categories. We first characterize BIV-Priv-Seg and then evaluate modern models' performance for locating private content in the dataset. We find modern models struggle most with locating private objects that are not salient, small, and lack text as well as recognizing when private content is absent from an image. We facilitate future extensions by sharing our new dataset with the evaluation server at https://vizwiz.org/tasks-and-datasets/object-localization.
- Published
- 2024
30. TransFeat-TPP: An Interpretable Deep Covariate Temporal Point Processes
- Author
-
Meng, Zizhuo, Li, Boyu, Fan, Xuhui, Li, Zhidong, Wang, Yang, Chen, Fang, and Zhou, Feng
- Subjects
Computer Science - Machine Learning - Abstract
The classical temporal point process (TPP) constructs an intensity function by taking the occurrence times into account. Nevertheless, occurrence time may not be the only relevant factor, other contextual data, termed covariates, may also impact the event evolution. Incorporating such covariates into the model is beneficial, while distinguishing their relevance to the event dynamics is of great practical significance. In this work, we propose a Transformer-based covariate temporal point process (TransFeat-TPP) model to improve the interpretability of deep covariate-TPPs while maintaining powerful expressiveness. TransFeat-TPP can effectively model complex relationships between events and covariates, and provide enhanced interpretability by discerning the importance of various covariates. Experimental results on synthetic and real datasets demonstrate improved prediction accuracy and consistently interpretable feature importance when compared to existing deep covariate-TPPs.
- Published
- 2024
31. Distribution Alignment for Fully Test-Time Adaptation with Dynamic Online Data Streams
- Author
-
Wang, Ziqiang, Chi, Zhixiang, Wu, Yanan, Gu, Li, Liu, Zhi, Plataniotis, Konstantinos, and Wang, Yang
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Given a model trained on source data, Test-Time Adaptation (TTA) enables adaptation and inference in test data streams with domain shifts from the source. Current methods predominantly optimize the model for each incoming test data batch using self-training loss. While these methods yield commendable results in ideal test data streams, where batches are independently and identically sampled from the target distribution, they falter under more practical test data streams that are not independent and identically distributed (non-i.i.d.). The data batches in a non-i.i.d. stream display prominent label shifts relative to each other. It leads to conflicting optimization objectives among batches during the TTA process. Given the inherent risks of adapting the source model to unpredictable test-time distributions, we reverse the adaptation process and propose a novel Distribution Alignment loss for TTA. This loss guides the distributions of test-time features back towards the source distributions, which ensures compatibility with the well-trained source model and eliminates the pitfalls associated with conflicting optimization objectives. Moreover, we devise a domain shift detection mechanism to extend the success of our proposed TTA method in the continual domain shift scenarios. Our extensive experiments validate the logic and efficacy of our method. On six benchmark datasets, we surpass existing methods in non-i.i.d. scenarios and maintain competitive performance under the ideal i.i.d. assumption., Comment: Accepted to ECCV 2024
- Published
- 2024
32. SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling
- Author
-
Wang, Huizheng, Fang, Jiahao, Tang, Xinru, Yue, Zhiheng, Li, Jinxi, Qin, Yubin, Guan, Sihan, Yang, Qize, Wang, Yang, Li, Chao, Hu, Yang, and Yin, Shouyi
- Subjects
Computer Science - Hardware Architecture - Abstract
Benefiting from the self-attention mechanism, Transformer models have attained impressive contextual comprehension capabilities for lengthy texts. The requirements of high-throughput inference arise as the large language models (LLMs) become increasingly prevalent, which calls for large-scale token parallel processing (LTPP). However, existing dynamic sparse accelerators struggle to effectively handle LTPP, as they solely focus on separate stage optimization, and with most efforts confined to computational enhancements. By re-examining the end-to-end flow of dynamic sparse acceleration, we pinpoint an ever-overlooked opportunity that the LTPP can exploit the intrinsic coordination among stages to avoid excessive memory access and redundant computation. Motivated by our observation, we present SOFA, a cross-stage compute-memory efficient algorithm-hardware co-design, which is tailored to tackle the challenges posed by LTPP of Transformer inference effectively. We first propose a novel leading zero computing paradigm, which predicts attention sparsity by using log-based add-only operations to avoid the significant overhead of prediction. Then, a distributed sorting and a sorted updating FlashAttention mechanism are proposed with a cross-stage coordinated tiling principle, which enables fine-grained and lightweight coordination among stages, helping optimize memory access and latency. Further, we propose a SOFA accelerator to support these optimizations efficiently. Extensive experiments on 20 benchmarks show that SOFA achieves $9.5\times$ speed up and $71.5\times$ higher energy efficiency than Nvidia A100 GPU. Compared to 8 SOTA accelerators, SOFA achieves an average $15.8\times$ energy efficiency, $10.3\times$ area efficiency and $9.3\times$ speed up, respectively.
- Published
- 2024
33. Synthetic Data: Revisiting the Privacy-Utility Trade-off
- Author
-
Sarmin, Fatima Jahan, Sarkar, Atiquer Rahman, Wang, Yang, and Mohammed, Noman
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Machine Learning - Abstract
Synthetic data has been considered a better privacy-preserving alternative to traditionally sanitized data across various applications. However, a recent article challenges this notion, stating that synthetic data does not provide a better trade-off between privacy and utility than traditional anonymization techniques, and that it leads to unpredictable utility loss and highly unpredictable privacy gain. The article also claims to have identified a breach in the differential privacy guarantees provided by PATEGAN and PrivBayes. When a study claims to refute or invalidate prior findings, it is crucial to verify and validate the study. In our work, we analyzed the implementation of the privacy game described in the article and found that it operated in a highly specialized and constrained environment, which limits the applicability of its findings to general cases. Our exploration also revealed that the game did not satisfy a crucial precondition concerning data distributions, which contributed to the perceived violation of the differential privacy guarantees offered by PATEGAN and PrivBayes. We also conducted a privacy-utility trade-off analysis in a more general and unconstrained environment. Our experimentation demonstrated that synthetic data achieves a more favorable privacy-utility trade-off compared to the provided implementation of k-anonymization, thereby reaffirming earlier conclusions.
- Published
- 2024
34. MSTF: Multiscale Transformer for Incomplete Trajectory Prediction
- Author
-
Liu, Zhanwen, Li, Chao, Yang, Nan, Wang, Yang, Ma, Jiaqi, Cheng, Guangliang, and Zhao, Xiangmo
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Motion forecasting plays a pivotal role in autonomous driving systems, enabling vehicles to execute collision warnings and rational local-path planning based on predictions of the surrounding vehicles. However, prevalent methods often assume complete observed trajectories, neglecting the potential impact of missing values induced by object occlusion, scope limitation, and sensor failures. Such oversights inevitably compromise the accuracy of trajectory predictions. To tackle this challenge, we propose an end-to-end framework, termed Multiscale Transformer (MSTF), meticulously crafted for incomplete trajectory prediction. MSTF integrates a Multiscale Attention Head (MAH) and an Information Increment-based Pattern Adaptive (IIPA) module. Specifically, the MAH component concurrently captures multiscale motion representation of trajectory sequence from various temporal granularities, utilizing a multi-head attention mechanism. This approach facilitates the modeling of global dependencies in motion across different scales, thereby mitigating the adverse effects of missing values. Additionally, the IIPA module adaptively extracts continuity representation of motion across time steps by analyzing missing patterns in the data. The continuity representation delineates motion trend at a higher level, guiding MSTF to generate predictions consistent with motion continuity. We evaluate our proposed MSTF model using two large-scale real-world datasets. Experimental results demonstrate that MSTF surpasses state-of-the-art (SOTA) models in the task of incomplete trajectory prediction, showcasing its efficacy in addressing the challenges posed by missing values in motion forecasting for autonomous driving systems.
- Published
- 2024
35. Brain‐age prediction: Systematic evaluation of site effects, and sample age range and size
- Author
-
Yu, Yuetong, Cui, Hao‐Qi, Haas, Shalaila S, New, Faye, Sanford, Nicole, Yu, Kevin, Zhan, Denghuang, Yang, Guoyuan, Gao, Jia‐Hong, Wei, Dongtao, Qiu, Jiang, Banaj, Nerisa, Boomsma, Dorret I, Breier, Alan, Brodaty, Henry, Buckner, Randy L, Buitelaar, Jan K, Cannon, Dara M, Caseras, Xavier, Clark, Vincent P, Conrod, Patricia J, Crivello, Fabrice, Crone, Eveline A, Dannlowski, Udo, Davey, Christopher G, de Haan, Lieuwe, de Zubicaray, Greig I, Di Giorgio, Annabella, Fisch, Lukas, Fisher, Simon E, Franke, Barbara, Glahn, David C, Grotegerd, Dominik, Gruber, Oliver, Gur, Raquel E, Gur, Ruben C, Hahn, Tim, Harrison, Ben J, Hatton, Sean, Hickie, Ian B, Pol, Hilleke E Hulshoff, Jamieson, Alec J, Jernigan, Terry L, Jiang, Jiyang, Kalnin, Andrew J, Kang, Sim, Kochan, Nicole A, Kraus, Anna, Lagopoulos, Jim, Lazaro, Luisa, McDonald, Brenna C, McDonald, Colm, McMahon, Katie L, Mwangi, Benson, Piras, Fabrizio, Rodriguez‐Cruces, Raul, Royer, Jessica, Sachdev, Perminder S, Satterthwaite, Theodore D, Saykin, Andrew J, Schumann, Gunter, Sevaggi, Pierluigi, Smoller, Jordan W, Soares, Jair C, Spalletta, Gianfranco, Tamnes, Christian K, Trollor, Julian N, Ent, Dennis Van't, Vecchio, Daniela, Walter, Henrik, Wang, Yang, Weber, Bernd, Wen, Wei, Wierenga, Lara M, Williams, Steven CR, Wu, Mon‐Ju, Zunta‐Soares, Giovana B, Bernhardt, Boris, Thompson, Paul, Frangou, Sophia, Ge, Ruiyang, and Group, ENIGMA‐Lifespan Working
- Subjects
Biological Psychology ,Psychology ,Aging ,Clinical Research ,Neurosciences ,Neurological ,Mental health ,Humans ,Adolescent ,Female ,Aged ,Adult ,Child ,Young Adult ,Male ,Brain ,Aged ,80 and over ,Child ,Preschool ,Middle Aged ,Magnetic Resonance Imaging ,Neuroimaging ,Sample Size ,benchmarking ,brain aging ,brainAGE ,ENIGMA‐Lifespan Working Group ,Cognitive Sciences ,Experimental Psychology ,Biological psychology ,Cognitive and computational psychology - Abstract
Structural neuroimaging data have been used to compute an estimate of the biological age of the brain (brain-age) which has been associated with other biologically and behaviorally meaningful measures of brain development and aging. The ongoing research interest in brain-age has highlighted the need for robust and publicly available brain-age models pre-trained on data from large samples of healthy individuals. To address this need we have previously released a developmental brain-age model. Here we expand this work to develop, empirically validate, and disseminate a pre-trained brain-age model to cover most of the human lifespan. To achieve this, we selected the best-performing model after systematically examining the impact of seven site harmonization strategies, age range, and sample size on brain-age prediction in a discovery sample of brain morphometric measures from 35,683 healthy individuals (age range: 5-90 years; 53.59% female). The pre-trained models were tested for cross-dataset generalizability in an independent sample comprising 2101 healthy individuals (age range: 8-80 years; 55.35% female) and for longitudinal consistency in a further sample comprising 377 healthy individuals (age range: 9-25 years; 49.87% female). This empirical examination yielded the following findings: (1) the accuracy of age prediction from morphometry data was higher when no site harmonization was applied; (2) dividing the discovery sample into two age-bins (5-40 and 40-90 years) provided a better balance between model accuracy and explained age variance than other alternatives; (3) model accuracy for brain-age prediction plateaued at a sample size exceeding 1600 participants. These findings have been incorporated into CentileBrain (https://centilebrain.org/#/brainAGE2), an open-science, web-based platform for individualized neuroimaging metrics.
- Published
- 2024
36. 3D Lead‐Organoselenide‐Halide Perovskites and their Mixed‐Chalcogenide and Mixed‐Halide Alloys
- Author
-
Karunadasa, Hemamala, Li, Jiayi, Wang, Yang, Saha, Santanu, Chen, Zhihengyu, Hofmann, Jan, Misleh, Jason, Chapman, Karena W, Reimer, Jeffrey A, and Filip, Marina R
- Subjects
Inorganic Chemistry ,Macromolecular and Materials Chemistry ,Chemical Sciences ,Physical Chemistry ,Organic Chemistry ,Chemical sciences - Abstract
We incorporate Se into the 3D halide perovskite framework using the zwitterionic ligand: SeCYS (+NH3(CH2)2Se−), which occupies both the X− and A+ sites in the prototypical ABX3 perovskite. The new organoselenide‐halide perovskites: (SeCYS)PbX2 (X = Cl, Br) expand upon the recently discovered organosulfide‐halide perovskites. Single‐crystal X‐ray diffraction and pair distribution function analysis reveal the average structures of the organoselenide‐halide perovskites, whereas the local lead coordination environments and their distributions were probed through solid‐state 77Se and 207Pb NMR, complemented by theoretical simulations. Density functional theory calculations illustrate that the band structures of (SeCYS)PbX2 largely resemble those of their S analogs, with similar band dispersion patterns, yet with a considerable bandgap decrease. Optical absorbance measurements indeed show bandgaps of 2.07 and 1.86 eV for (SeCYS)PbX2 with X = Cl and Br, respectively. We further demonstrate routes to alloying the halides (Cl, Br) and chalcogenides (S, Se) continuously tuning the bandgap from 1.86 to 2.31 eV—straddling the ideal range for tandem solar cells or visible‐light photocatalysis. The comprehensive description of the average and local structures, and how they can fine‐tune the bandgap and potential trap states, respectively, establishes the foundation for understanding this new perovskite family, which combines solid‐state and organo‐main‐group chemistry.
- Published
- 2024
37. 3D Lead‐Organoselenide‐Halide Perovskites and their Mixed‐Chalcogenide and Mixed‐Halide Alloys
- Author
-
Li, Jiayi, Wang, Yang, Saha, Santanu, Chen, Zhihengyu, Hofmann, Jan, Misleh, Jason, Chapman, Karena W, Reimer, Jeffrey A, Filip, Marina R, and Karunadasa, Hemamala I
- Subjects
Inorganic Chemistry ,Macromolecular and Materials Chemistry ,Chemical Sciences ,Physical Chemistry ,Chalcogenide ,Halide ,Organoselenide ,Perovskite ,Selenide ,Organic Chemistry ,Chemical sciences - Abstract
We incorporate Se into the 3D halide perovskite framework using the zwitterionic ligand: SeCYS (+NH3(CH2)2Se-), which occupies both the X- and A+ sites in the prototypical ABX3 perovskite. The new organoselenide-halide perovskites: (SeCYS)PbX2 (X = Cl, Br) expand upon the recently discovered organosulfide-halide perovskites. Single-crystal X-ray diffraction and pair distribution function analysis reveal the average structures of the organoselenide-halide perovskites, whereas the local lead coordination environments and their distributions were probed through solid-state 77Se and 207Pb NMR, complemented by theoretical simulations. Density functional theory calculations illustrate that the band structures of (SeCYS)PbX2 largely resemble those of their S analogs, with similar band dispersion patterns, yet with a considerable bandgap decrease. Optical absorbance measurements indeed show bandgaps of 2.07 and 1.86 eV for (SeCYS)PbX2 with X = Cl and Br, respectively. We further demonstrate routes to alloying the halides (Cl, Br) and chalcogenides (S, Se) continuously tuning the bandgap from 1.86 to 2.31 eV-straddling the ideal range for tandem solar cells or visible-light photocatalysis. The comprehensive description of the average and local structures, and how they can fine-tune the bandgap and potential trap states, respectively, establishes the foundation for understanding this new perovskite family, which combines solid-state and organo-main-group chemistry.
- Published
- 2024
38. Nonuniqueness in Defining the Polarization: Nonlocal Surface Charges and the Electrostatic, Energetic, and Transport Perspectives
- Author
-
Sen, Shoham, Wang, Yang, Breitzman, Timothy, and Dayal, Kaushik
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
Ionic crystals play a central role in functional applications. Mesoscale descriptions of these crystals are based on the continuum polarization density field to represent the effective physics of charge distribution at the scale of the atomic lattice. However, a long-standing difficulty is that the classical electrostatic definition of the macroscopic polarization -- as the dipole or first moment of the charge density in a unit cell -- is not unique. This unphysical non-uniqueness has been shown to arise from starting directly with an infinite system rather than starting with a finite body and taking appropriate limits. This limit process shows that the electrostatic description requires not only the bulk polarization density, but also the surface charge density, as the effective macroscopic descriptors; that is, a nonlocal effective description. Other approaches to resolve this difficulty include relating the change in polarization to the transport of charge; or, to define the polarization as the energy-conjugate to the electric field. This work examines the relation between the classical electrostatic definition of polarization, and the transport and energy-conjugate definitions of polarization. We show the following: (1) The transport of charge does not correspond to the change in polarization in general; instead, one requires additional simplifying assumptions on the electrostatic definition of polarization for these approaches to give rise to the same macroscopic electric fields. Thus, the electrostatic definition encompasses the transport definition as a special case. (2) The energy-conjugate definition has both bulk and surface contributions; while traditional approaches neglect the surface contribution, we find that accounting for the nonlocal surface contributions is essential to obtain the correct macroscopic electric fields., Comment: Accepted to appear in Journal of the Mechanics and Physics of Solids
- Published
- 2024
- Full Text
- View/download PDF
39. Evaluating Quality of Answers for Retrieval-Augmented Generation: A Strong LLM Is All You Need
- Author
-
Wang, Yang, Hernandez, Alberto Garcia, Kyslyi, Roman, and Kersting, Nicholas
- Subjects
Computer Science - Computation and Language - Abstract
We present a comprehensive study of answer quality evaluation in Retrieval-Augmented Generation (RAG) applications using vRAG-Eval, a novel grading system that is designed to assess correctness, completeness, and honesty. We further map the grading of quality aspects aforementioned into a binary score, indicating an accept or reject decision, mirroring the intuitive "thumbs-up" or "thumbs-down" gesture commonly used in chat applications. This approach suits factual business settings where a clear decision opinion is essential. Our assessment applies vRAG-Eval to two Large Language Models (LLMs), evaluating the quality of answers generated by a vanilla RAG application. We compare these evaluations with human expert judgments and find a substantial alignment between GPT-4's assessments and those of human experts, reaching 83% agreement on accept or reject decisions. This study highlights the potential of LLMs as reliable evaluators in closed-domain, closed-ended settings, particularly when human evaluations require significant resources., Comment: 13 pages, 8 figures, 12 tables
- Published
- 2024
40. Hybrid Precoding With Low-Resolution PSs for Wideband Terahertz Communication Systems in The Face of Beam Squint
- Author
-
Wang, Yang, Yang, Chuang, and Peng, Mugen
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
Terahertz (THz) communication is considered one of the most critical technologies for 6G because of its abundant bandwidth. To compensate the high propagation of THz, analog/digital hybrid precoding for THz massive multiple input multiple output (MIMO) is proposed to focus signals and extend communication range. Notably, considering hardware cost and power consumption, infinite and high-resolution phase shifters (PSs) are difficult to implement in THz massive MIMO and low-resolution PSs are typically adopted in practice. However, low-resolution PSs cause severe performance degradation. Moreover, the beam squint in wideband THz massive MIMO increases the performance degradation because of the frequency independence of the analog PSs. Motivated by the above factors, in this paper, we firstly propose a heuristic algorithm under fully connected (FC) structure, which optimize the digital precoder and the analog precoder alternately. Then we migrate the proposed heuristic algorithm to the partially-connected (PC) architecture. To further improve the performance, we extend our design to dynamic subarrays in which each RF chain is connected to any antenna that does not duplicate. The numerical results demonstrate that our proposed wideband hybrid precoding with low-resolution PSs achieves better performance to the comparisons for both FC structure and PC structure.
- Published
- 2024
41. Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation
- Author
-
Lyu, Yuanjie, Niu, Zihan, Xie, Zheyong, Zhang, Chao, Xu, Tong, Wang, Yang, and Chen, Enhong
- Subjects
Computer Science - Computation and Language - Abstract
Despite the significant progress of large language models (LLMs) in various tasks, they often produce factual errors due to their limited internal knowledge. Retrieval-Augmented Generation (RAG), which enhances LLMs with external knowledge sources, offers a promising solution. However, these methods can be misled by irrelevant paragraphs in retrieved documents. Due to the inherent uncertainty in LLM generation, inputting the entire document may introduce off-topic information, causing the model to deviate from the central topic and affecting the relevance of the generated content. To address these issues, we propose the Retrieve-Plan-Generation (RPG) framework. RPG generates plan tokens to guide subsequent generation in the plan stage. In the answer stage, the model selects relevant fine-grained paragraphs based on the plan and uses them for further answer generation. This plan-answer process is repeated iteratively until completion, enhancing generation relevance by focusing on specific topics. To implement this framework efficiently, we utilize a simple but effective multi-task prompt-tuning method, enabling the existing LLMs to handle both planning and answering. We comprehensively compare RPG with baselines across 5 knowledge-intensive generation tasks, demonstrating the effectiveness of our approach.
- Published
- 2024
42. FC3DNet: A Fully Connected Encoder-Decoder for Efficient Demoir'eing
- Author
-
Du, Zhibo, Peng, Long, Wang, Yang, Cao, Yang, and Zha, Zheng-Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Moir\'e patterns are commonly seen when taking photos of screens. Camera devices usually have limited hardware performance but take high-resolution photos. However, users are sensitive to the photo processing time, which presents a hardly considered challenge of efficiency for demoir\'eing methods. To balance the network speed and quality of results, we propose a \textbf{F}ully \textbf{C}onnected en\textbf{C}oder-de\textbf{C}oder based \textbf{D}emoir\'eing \textbf{Net}work (FC3DNet). FC3DNet utilizes features with multiple scales in each stage of the decoder for comprehensive information, which contains long-range patterns as well as various local moir\'e styles that both are crucial aspects in demoir\'eing. Besides, to make full use of multiple features, we design a Multi-Feature Multi-Attention Fusion (MFMAF) module to weigh the importance of each feature and compress them for efficiency. These designs enable our network to achieve performance comparable to state-of-the-art (SOTA) methods in real-world datasets while utilizing only a fraction of parameters, FLOPs, and runtime., Comment: Accepted by ICIP2024
- Published
- 2024
43. Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition
- Author
-
Wang, Yang, Mei, Haiyang, Bao, Qirui, Wei, Ziqi, Shou, Mike Zheng, Li, Haizhou, Dong, Bo, and Yang, Xin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Neural and Evolutionary Computing - Abstract
We introduce a novel multimodality synergistic knowledge distillation scheme tailored for efficient single-eye motion recognition tasks. This method allows a lightweight, unimodal student spiking neural network (SNN) to extract rich knowledge from an event-frame multimodal teacher network. The core strength of this approach is its ability to utilize the ample, coarser temporal cues found in conventional frames for effective emotion recognition. Consequently, our method adeptly interprets both temporal and spatial information from the conventional frame domain, eliminating the need for specialized sensing devices, e.g., event-based camera. The effectiveness of our approach is thoroughly demonstrated using both existing and our compiled single-eye emotion recognition datasets, achieving unparalleled performance in accuracy and efficiency over existing state-of-the-art methods., Comment: Accepted by IJCAI 2024
- Published
- 2024
44. Personalized Music Recommendation with a Heterogeneity-aware Deep Bayesian Network
- Author
-
Jing, Erkang, Liu, Yezheng, Chai, Yidong, Yu, Shuo, Liu, Longshun, Jiang, Yuanchun, and Wang, Yang
- Subjects
Computer Science - Artificial Intelligence - Abstract
Music recommender systems are crucial in music streaming platforms, providing users with music they would enjoy. Recent studies have shown that user emotions can affect users' music mood preferences. However, existing emotion-aware music recommender systems (EMRSs) explicitly or implicitly assume that users' actual emotional states expressed by an identical emotion word are homogeneous. They also assume that users' music mood preferences are homogeneous under an identical emotional state. In this article, we propose four types of heterogeneity that an EMRS should consider: emotion heterogeneity across users, emotion heterogeneity within a user, music mood preference heterogeneity across users, and music mood preference heterogeneity within a user. We further propose a Heterogeneity-aware Deep Bayesian Network (HDBN) to model these assumptions. The HDBN mimics a user's decision process to choose music with four components: personalized prior user emotion distribution modeling, posterior user emotion distribution modeling, user grouping, and Bayesian neural network-based music mood preference prediction. We constructed a large-scale dataset called EmoMusicLJ to validate our method. Extensive experiments demonstrate that our method significantly outperforms baseline approaches on widely used HR and NDCG recommendation metrics. Ablation experiments and case studies further validate the effectiveness of our HDBN. The source code is available at https://github.com/jingrk/HDBN., Comment: 34 pages, 19 figures
- Published
- 2024
45. The multi-component fitting to the star formation histories in the TNG simulation
- Author
-
Wang, Yang, Dong, Chengxing, Ruan, Hengxin, Lin, Qiufan, Zhang, Yucheng, and Chen, Shupei
- Subjects
Astrophysics - Astrophysics of Galaxies - Abstract
The star formation history (SFH) is a key issue in the evolution of galaxies. In this work, we developed a model based on a Gaussian and gamma function mixture to fit SFHs with varying numbers of components. Our primary objective was to use this model to reveal the shape of SFHs and the corresponding physical driving factors. Specifically, we applied this model to fit SFHs from the TNG100-1 simulation. Our study led to the following findings: 1) Our model fits with TNG star formation histories well, especially for high-mass and red galaxies; 2) A clear relationship exists between the number and shape of fitted components and the mass and color of galaxies, with notable differences observed between central/isolated and satellite galaxies. 3) Our model allowed us to extract different episodes of star formation within star formation histories with ease and analyze the duration and timing of each star formation episode. Our findings indicated a strong relationship between the timing of each star formation episode and galaxy mass and color.
- Published
- 2024
- Full Text
- View/download PDF
46. Exploring Parent-Child Perceptions on Safety in Generative AI: Concerns, Mitigation Strategies, and Design Implications
- Author
-
Yu, Yaman, Sharma, Tanusree, Hu, Melinda, Wang, Justin, and Wang, Yang
- Subjects
Computer Science - Human-Computer Interaction - Abstract
The widespread use of Generative Artificial Intelligence (GAI) among teenagers has led to significant misuse and safety concerns. To identify risks and understand parental controls challenges, we conducted a content analysis on Reddit and interviewed 20 participants (seven teenagers and 13 parents). Our study reveals a significant gap in parental awareness of the extensive ways children use GAI, such as interacting with character-based chatbots for emotional support or engaging in virtual relationships. Parents and children report differing perceptions of risks associated with GAI. Parents primarily express concerns about data collection, misinformation, and exposure to inappropriate content. In contrast, teenagers are more concerned about becoming addicted to virtual relationships with GAI, the potential misuse of GAI to spread harmful content in social groups, and the invasion of privacy due to unauthorized use of their personal data in GAI applications. The absence of parental control features on GAI platforms forces parents to rely on system-built controls, manually check histories, share accounts, and engage in active mediation. Despite these efforts, parents struggle to grasp the full spectrum of GAI-related risks and to perform effective real-time monitoring, mediation, and education. We provide design recommendations to improve parent-child communication and enhance the safety of GAI use., Comment: 13 pages
- Published
- 2024
47. ELF-UA: Efficient Label-Free User Adaptation in Gaze Estimation
- Author
-
Wu, Yong, Wang, Yang, Qu, Sanqing, Li, Zhijun, and Chen, Guang
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
We consider the problem of user-adaptive 3D gaze estimation. The performance of person-independent gaze estimation is limited due to interpersonal anatomical differences. Our goal is to provide a personalized gaze estimation model specifically adapted to a target user. Previous work on user-adaptive gaze estimation requires some labeled images of the target person data to fine-tune the model at test time. However, this can be unrealistic in real-world applications, since it is cumbersome for an end-user to provide labeled images. In addition, previous work requires the training data to have both gaze labels and person IDs. This data requirement makes it infeasible to use some of the available data. To tackle these challenges, this paper proposes a new problem called efficient label-free user adaptation in gaze estimation. Our model only needs a few unlabeled images of a target user for the model adaptation. During offline training, we have some labeled source data without person IDs and some unlabeled person-specific data. Our proposed method uses a meta-learning approach to learn how to adapt to a new user with only a few unlabeled images. Our key technical innovation is to use a generalization bound from domain adaptation to define the loss function in meta-learning, so that our method can effectively make use of both the labeled source data and the unlabeled person-specific data during training. Extensive experiments validate the effectiveness of our method on several challenging benchmarks., Comment: This paper has been accepted by IJCAI'24
- Published
- 2024
48. LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
- Author
-
Guan, Wenhao, Wang, Kaidi, Zhou, Wangjin, Wang, Yang, Deng, Feng, Wang, Hui, Li, Lin, Hong, Qingyang, and Qin, Yong
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Sound - Abstract
Recently, the application of diffusion models has facilitated the significant development of speech and audio generation. Nevertheless, the quality of samples generated by diffusion models still needs improvement. And the effectiveness of the method is accompanied by the extensive number of sampling steps, leading to an extended synthesis time necessary for generating high-quality audio. Previous Text-to-Audio (TTA) methods mostly used diffusion models in the latent space for audio generation. In this paper, we explore the integration of the Flow Matching (FM) model into the audio latent space for audio generation. The FM is an alternative simulation-free method that trains continuous normalization flows (CNF) based on regressing vector fields. We demonstrate that our model significantly enhances the quality of generated audio samples, achieving better performance than prior models. Moreover, it reduces the number of inference steps to ten steps almost without sacrificing performance., Comment: Accepted at Interspeech2024
- Published
- 2024
49. Towards Realistic Data Generation for Real-World Super-Resolution
- Author
-
Peng, Long, Li, Wenbo, Pei, Renjing, Ren, Jingjing, Fu, Xueyang, Wang, Yang, Cao, Yang, and Zha, Zheng-Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Existing image super-resolution (SR) techniques often fail to generalize effectively in complex real-world settings due to the significant divergence between training data and practical scenarios. To address this challenge, previous efforts have either manually simulated intricate physical-based degradations or utilized learning-based techniques, yet these approaches remain inadequate for producing large-scale, realistic, and diverse data simultaneously. In this paper, we introduce a novel Realistic Decoupled Data Generator (RealDGen), an unsupervised learning data generation framework designed for real-world super-resolution. We meticulously develop content and degradation extraction strategies, which are integrated into a novel content-degradation decoupled diffusion model to create realistic low-resolution images from unpaired real LR and HR images. Extensive experiments demonstrate that RealDGen excels in generating large-scale, high-quality paired data that mirrors real-world degradations, significantly advancing the performance of popular SR models on various real-world benchmarks.
- Published
- 2024
50. Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment
- Author
-
Zhang, Chen, He, Qiang, Yuan, Zhou, Liu, Elvis S., Wang, Hong, Zhao, Jian, and Wang, Yang
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Human-Computer Interaction ,Computer Science - Machine Learning - Abstract
Deep Reinforcement Learning (DRL) agents have demonstrated impressive success in a wide range of game genres. However, existing research primarily focuses on optimizing DRL competence rather than addressing the challenge of prolonged player interaction. In this paper, we propose a practical DRL agent system for fighting games named Sh\=ukai, which has been successfully deployed to Naruto Mobile, a popular fighting game with over 100 million registered users. Sh\=ukai quantifies the state to enhance generalizability, introducing Heterogeneous League Training (HELT) to achieve balanced competence, generalizability, and training efficiency. Furthermore, Sh\=ukai implements specific rewards to align the agent's behavior with human expectations. Sh\=ukai's ability to generalize is demonstrated by its consistent competence across all characters, even though it was trained on only 13% of them. Additionally, HELT exhibits a remarkable 22% improvement in sample efficiency. Sh\=ukai serves as a valuable training partner for players in Naruto Mobile, enabling them to enhance their abilities and skills., Comment: Accept at ICML 2024
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.