3,082,486 results on '"A. Hong"'
Search Results
2. BESTAnP: Bi-Step Efficient and Statistically Optimal Estimator for Acoustic-n-Point Problem
- Author
-
Sheng, Wenliang, Zhao, Hongxu, Chen, Lingpeng, Zeng, Guangyang, Shao, Yunling, Hong, Yuze, Yang, Chao, Hong, Ziyang, and Wu, Junfeng
- Subjects
Computer Science - Robotics - Abstract
We consider the acoustic-n-point (AnP) problem, which estimates the pose of a 2D forward-looking sonar (FLS) according to n 3D-2D point correspondences. We explore the nature of the measured partial spherical coordinates and reveal their inherent relationships to translation and orientation. Based on this, we propose a bi-step efficient and statistically optimal AnP (BESTAnP) algorithm that decouples the estimation of translation and orientation. Specifically, in the first step, the translation estimation is formulated as the range-based localization problem based on distance-only measurements. In the second step, the rotation is estimated via eigendecomposition based on azimuth-only measurements and the estimated translation. BESTAnP is the first AnP algorithm that gives a closed-form solution for the full six-degree pose. In addition, we conduct bias elimination for BESTAnP such that it owns the statistical property of consistency. Through simulation and real-world experiments, we demonstrate that compared with the state-of-the-art (SOTA) methods, BESTAnP is over ten times faster and features real-time capacity in resource-constrained platforms while exhibiting comparable accuracy. Moreover, for the first time, we embed BESTAnP into a sonar-based odometry which shows its effectiveness for trajectory estimation.
- Published
- 2024
3. Approximate model for the coupling of far-field wavefront errors and jitter in space-based gravitational wave laser interferometry
- Author
-
Tao, Ya-Zheng, Gao, Rui-Hong, Jin, Hong-Bo, Hao, Zhen-Xiang, Jin, Gang, and Wu, Yue-Liang
- Subjects
Astrophysics - Instrumentation and Methods for Astrophysics - Abstract
Space-based gravitational wave observatories, such as LISA, Taiji, and TianQin, employ long-baseline laser interferometry, necessitating displacement measurement sensitivity at 1 pm/$\sqrt{Hz}$ level. A significant challenge in achieving this precision is the coupling noise arising from far-field wavefront errors (WFE) and laser pointing jitter. This paper presents a comprehensive noise model that incorporates three critical factors: transmitted WFE, static pointing angle, and laser beam jitter. Utilizing the Nijboer-Zernike diffraction theory, we derive an approximate expression for far-field WFE, ensuring minimal error and efficient computational performance. The approximate expression has convincing physical interpretability and reveals how various Zernike aberrations and their coupling impact far-field WFE. Furthermore, the study identifies that correcting optical axis deviations induced by $Z_3^{\pm1}$ through beam tilt exacerbates far-field WFE, underscoring the necessity for active suppression of $Z_3^{\pm1}$. The proposed model facilitates detailed system simulations of the laser link, evaluates Tilt-to-Length (TTL) noise, and offers theoretical insights for system optimization., Comment: 25 pages, 13 figures
- Published
- 2024
4. YOLO-TS: Real-Time Traffic Sign Detection with Enhanced Accuracy Using Optimized Receptive Fields and Anchor-Free Fusion
- Author
-
Chen, Junzhou, Huang, Heqiang, Zhang, Ronghui, Lyu, Nengchao, Guo, Yanyong, Dai, Hong-Ning, and Yan, Hong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Ensuring safety in both autonomous driving and advanced driver-assistance systems (ADAS) depends critically on the efficient deployment of traffic sign recognition technology. While current methods show effectiveness, they often compromise between speed and accuracy. To address this issue, we present a novel real-time and efficient road sign detection network, YOLO-TS. This network significantly improves performance by optimizing the receptive fields of multi-scale feature maps to align more closely with the size distribution of traffic signs in various datasets. Moreover, our innovative feature-fusion strategy, leveraging the flexibility of Anchor-Free methods, allows for multi-scale object detection on a high-resolution feature map abundant in contextual information, achieving remarkable enhancements in both accuracy and speed. To mitigate the adverse effects of the grid pattern caused by dilated convolutions on the detection of smaller objects, we have devised a unique module that not only mitigates this grid effect but also widens the receptive field to encompass an extensive range of spatial contextual information, thus boosting the efficiency of information usage. Evaluation on challenging public datasets, TT100K and CCTSDB2021, demonstrates that YOLO-TS surpasses existing state-of-the-art methods in terms of both accuracy and speed. The code for our method will be available., Comment: 13 pages, 9 figures and 7 tables
- Published
- 2024
5. A single-phase epitaxially grown ferroelectric perovskite nitride
- Author
-
Choi, Songhee, Jin, Qiao, Zi, Xian, Rong, Dongke, Fang, Jie, Zhang, Jinfeng, Zhang, Qinghua, Li, Wei, Xu, Shuai, Chen, Shengru, Hong, Haitao, Ting, Cui, Wang, Qianying, Tang, Gang, Ge, Chen, Wang, Can, Chen, Zhiguo, Gu, Lin, Li, Qian, Wang, Lingfei, Wang, Shanmin, Hong, Jiawang, Jin, Kuijuan, and Guo, Er-Jia
- Subjects
Condensed Matter - Materials Science ,Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
The integration of ferroelectrics with semiconductors is crucial for developing functional devices, such as field-effect transistors, tunnel junctions, and nonvolatile memories. However, the synthesis of high-quality single-crystalline ferroelectric nitride perovskites has been limited, hindering a comprehensive understanding of their switching dynamics and potential applications. Here we report the synthesis and characterizations of epitaxial single-phase ferroelectric cerium tantalum nitride (CeTaN3) on both oxides and semiconductors. The polar symmetry of CeTaN3 was confirmed by observing the atomic displacement of central ions relative to the center of the TaN6 octahedra, as well as through optical second harmonic generation. We observed switchable ferroelectric domains in CeTaN3 films using piezo-response force microscopy, complemented by the characterization of square-like polarization-electric field hysteresis loops. The remanent polarization of CeTaN3 reaches approximately 20 uC/cm2 at room temperature, consistent with theoretical calculations. This work establishes a vital link between ferroelectric nitride perovskites and their practical applications, paving the way for next-generation information and energy-storage devices with enhanced performance, scalability, and manufacturability., Comment: 47 pages, 4 figures
- Published
- 2024
6. QCD sum rule analysis of $0^{+}$ fourquarks
- Author
-
Li, Shuang-Hong, Chen, Ze-Sheng, Chen, Yi-Xin, and Jin, Hong-Ying
- Subjects
High Energy Physics - Phenomenology - Abstract
We present a comprehensive QCD sum rules analysis for all types of light $J^P=0^{+}$ four-quark states at next-to-leading order. Most of them have masses around $1-2\text{GeV}$ and can be interpreted as the $0^+$ mesons observed in experiments. We find a category of four-quark nonets with masses $\lesssim1\text{GeV}$, potentially corresponding to the light $0^+$ mesons $f_0(500)$, $K^*_0(700)$, $f_0(980)$ and $a_0(980)$. Additionally, the 27-fold states may also exist, which are heavier than the $0^+$ nonets. The main uncertainty in the results arises from the factorization deviation of the multi-quark condensates. We also find that the factorization of dimension-8 condensates involves an ambiguity larger than $O(1/N_C^2)$. As a byproduct, a simple trick for renormalizing multi-quark operators at one-loop level is proposed in this paper., Comment: 56 pages, 40 figures, 8 tables
- Published
- 2024
7. Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models
- Author
-
Li, Kaican, Xie, Weiyan, Huang, Yongxiang, Deng, Didan, Hong, Lanqing, Li, Zhenguo, Silva, Ricardo, and Zhang, Nevin L.
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Fine-tuning foundation models often compromises their robustness to distribution shifts. To remedy this, most robust fine-tuning methods aim to preserve the pre-trained features. However, not all pre-trained features are robust and those methods are largely indifferent to which ones to preserve. We propose dual risk minimization (DRM), which combines empirical risk minimization with worst-case risk minimization, to better preserve the core features of downstream tasks. In particular, we utilize core-feature descriptions generated by LLMs to induce core-based zero-shot predictions which then serve as proxies to estimate the worst-case risk. DRM balances two crucial aspects of model robustness: expected performance and worst-case performance, establishing a new state of the art on various real-world benchmarks. DRM significantly improves the out-of-distribution performance of CLIP ViT-L/14@336 on ImageNet (75.9 to 77.1), WILDS-iWildCam (47.1 to 51.8), and WILDS-FMoW (50.7 to 53.1); opening up new avenues for robust fine-tuning. Our code is available at https://github.com/vaynexie/DRM ., Comment: NeurIPS 2024
- Published
- 2024
8. SoK: Detection and Repair of Accessibility Issues
- Author
-
Nie, Liming, Liu, Hao, Sun, Jing, Said, Kabir Sulaiman, Hong, Shanshan, Xue, Lei, Wei, Zhiyuan, Zhao, Yangyang, and Li, Meng
- Subjects
Computer Science - Software Engineering ,Computer Science - Human-Computer Interaction - Abstract
There is an increasing global emphasis on information accessibility, with numerous researchers actively developing automated tools to detect and repair accessibility issues, thereby ensuring that individuals with diverse abilities can independently access software products and services. However, current research still encounters significant challenges in two key areas: the absence of a comprehensive taxonomy of accessibility issue types, and the lack of comprehensive analysis of the capabilities of detection and repair tools, as well as the status of corresponding datasets. To address these challenges, this paper introduces the Accessibility Issue Analysis (AIA) framework. Utilizing this framework, we develop a comprehensive taxonomy that categorizes 55 types of accessibility issues across four pivotal dimensions: Perceivability, Operability, Understandability, and Robustness. This taxonomy has been rigorously recognized through a questionnaire survey (n=130). Building on this taxonomy, we conduct an in-depth analysis of existing detection and repair tools, as well as the status of corresponding datasets. In terms of tools, our findings indicate that 14 detection tools can identify 31 issue types, achieving a 56.3% rate (31/55). Meanwhile, 9 repair tools address just 13 issue types, with a 23.6% rate. In terms of datasets, those for detection tools cover 21 issue types, at a 38.1% coverage rate, whereas those for repair tools cover only 7 types, at a 12.7% coverage rate., Comment: 16 pages, 3 figures
- Published
- 2024
9. Inverse Design of Mechanical Metamaterials Using a Point-Cloud-Based Deep Generative Model
- Author
-
Hong, Seungwook, Kim, Kijung, Jung, Wonjun, Kim, Wooseok, Kim, Namjung, and Lee, Howon
- Subjects
Condensed Matter - Soft Condensed Matter - Abstract
Mechanical metamaterials have garnered attention for their engineered mechanical properties, enabling control over specific behaviors. Advances in additive manufacturing have expanded design freedom for complex metamaterials. However, the design process remains challenging due to vast design space and numerous parameters. While artificial intelligence (AI) aids design, current approaches are often restricted to predefined, parameterized structures. This study introduces a parameter-free design strategy for 3D mechanical metamaterials using point-cloud-based deep generative networks. A library of widely known metamaterial structures was constructed to train the machine learning model. The trained latent space forms clusters of unit cell topologies with similar properties, enabling efficient exploration and smooth interpolation. Additionally, mechanical properties can be predicted more faster than with conventional methods. This approach created metamaterials with targeted properties, unrestricted by parameterized constraints. Computational and experimental validations confirmed alignment with desired properties within acceptable error margins. We believe this work significantly enhances design flexibility in AI-driven metamaterials, expanding their potential applications across various fields.
- Published
- 2024
10. Measurement of the Inclusive Cross Sections of Prompt $J/\psi$ and $\psi(3686)$ Production in $e^{+}e^{-}$ Annihilation from $\sqrt{s}=3.808$ to $4.951$ GeV
- Author
-
BESIII Collaboration, Ablikim, M., Achasov, M. N., Adlarson, P., Ai, X. C., Aliberti, R., Amoroso, A., An, M. R., An, Q., Bai, Y., Bakina, O., Balossino, I., Ban, Y., Batozskaya, V., Begzsuren, K., Berger, N., Berlowski, M., Bertani, M., Bettoni, D., Bianchi, F., Bianco, E., Bortone, A., Boyko, I., Briere, R. A., Brueggemann, A., Cai, H., Cai, X., Calcaterra, A., Cao, G. F., Cao, N., Cetin, S. A., Chang, J. F., Chang, T. T., Chang, W. L., Che, G. R., Chelkov, G., Chen, C., Chen, Chao, Chen, G., Chen, H. S., Chen, M. L., Chen, S. J., Chen, S. M., Chen, T., Chen, X. R., Chen, X. T., Chen, Y. B., Chen, Y. Q., Chen, Z. J., Cheng, W. S., Choi, S. K., Chu, X., Cibinetto, G., Coen, S. C., Cossio, F., Cui, J. J., Dai, H. L., Dai, J. P., Dbeyssi, A., de Boer, R. E., Dedovich, D., Deng, Z. Y., Denig, A., Denysenko, I., Destefanis, M., De Mori, F., Ding, B., Ding, X. X., Ding, Y., Dong, J., Dong, L. Y., Dong, M. Y., Dong, X., Du, M. C., Du, S. X., Duan, Z. H., Egorov, P., Fan, Y. H. Y., Fan, Y. L., Fang, J., Fang, S. S., Fang, W. X., Fang, Y., Farinelli, R., Fava, L., Feldbauer, F., Felici, G., Feng, C. Q., Feng, J. H., Fischer, K, Fritsch, M., Fritzsch, C., Fu, C. D., Fu, J. L., Fu, Y. W., Gao, H., Gao, Y. N., Gao, Yang, Garbolino, S., Garzia, I., Ge, P. T., Ge, Z. W., Geng, C., Gersabeck, E. M., Gilman, A, Goetzen, K., Gong, L., Gong, W. X., Gradl, W., Gramigna, S., Greco, M., Gu, M. H., Guan, C. Y, Guan, Z. L., Guo, A. Q., Guo, L. B., Guo, M. J., Guo, R. P., Guo, Y. P., Guskov, A., Han, T. T., Han, W. Y., Hao, X. Q., Harris, F. A., He, K. K., He, K. L., Heinsius, F. H H., Heinz, C. H., Heng, Y. K., Herold, C., Holtmann, T., Hong, P. C., Hou, G. Y., Hou, X. T., Hou, Y. R., Hou, Z. L., Hu, H. M., Hu, J. F., Hu, T., Hu, Y., Huang, G. S., Huang, K. X., Huang, L. Q., Huang, X. T., Huang, Y. P., Hussain, T., Hüsken, N, Imoehl, W., Jackson, J., Jaeger, S., Janchiv, S., Jeong, J. H., Ji, Q., Ji, Q. P., Ji, X. B., Ji, X. L., Ji, Y. Y., Jia, X. Q., Jia, Z. K., Jiang, H. J., Jiang, P. C., Jiang, S. S., Jiang, T. J., Jiang, X. S., Jiang, Y., Jiao, J. B., Jiao, Z., Jin, S., Jin, Y., Jing, M. Q., Johansson, T., K., X., Kabana, S., Kalantar-Nayestanaki, N., Kang, X. L., Kang, X. S., Kavatsyuk, M., Ke, B. C., Khoukaz, A., Kiuchi, R., Kliemt, R., Kolcu, O. B., Kopf, B., Kuessner, M., Kupsc, A., Kühn, W., Lane, J. J., Larin, P., Lavania, A., Lavezzi, L., Lei, T. T., Lei, Z. H., Leithoff, H., Lellmann, M., Lenz, T., Li, C., Li, C. H., Li, Cheng, Li, D. M., Li, F., Li, G., Li, H., Li, H. B., Li, H. J., Li, H. N., Li, Hui, Li, J. R., Li, J. S., Li, J. W., Li, K. L., Li, Ke, Li, L. J, Li, L. K., Li, Lei, Li, M. H., Li, P. R., Li, Q. X., Li, S. X., Li, T., Li, W. D., Li, W. G., Li, X. H., Li, X. L., Li, Xiaoyu, Li, Y. G., Li, Z. J., Liang, C., Liang, H., Liang, Y. F., Liang, Y. T., Liao, G. R., Liao, L. Z., Liao, Y. P., Libby, J., Limphirat, A., Lin, D. X., Lin, T., Liu, B. J., Liu, B. X., Liu, C., Liu, C. X., Liu, F. H., Liu, Fang, Liu, Feng, Liu, G. M., Liu, H., Liu, H. M., Liu, Huanhuan, Liu, Huihui, Liu, J. B., Liu, J. L., Liu, J. Y., Liu, K., Liu, K. Y., Liu, Ke, Liu, L., Liu, L. C., Liu, Lu, Liu, M. H., Liu, P. L., Liu, Q., Liu, S. B., Liu, T., Liu, W. K., Liu, W. M., Liu, X., Liu, Y., Liu, Y. B., Liu, Z. A., Liu, Z. Q., Lou, X. C., Lu, F. X., Lu, H. J., Lu, J. G., Lu, X. L., Lu, Y., Lu, Y. P., Lu, Z. H., Luo, C. L., Luo, M. X., Luo, T., Luo, X. L., Lyu, X. R., Lyu, Y. F., Ma, F. C., Ma, H. L., Ma, J. L., Ma, L. L., Ma, M. M., Ma, Q. M., Ma, R. Q., Ma, R. T., Ma, X. Y., Ma, Y., Ma, Y. M., Maas, F. E., Maggiora, M., Malde, S., Malik, Q. A., Mangoni, A., Mao, Y. J., Mao, Z. P., Marcello, S., Meng, Z. X., Messchendorp, J. G., Mezzadri, G., Miao, H., Min, T. J., Mitchell, R. E., Mo, X. H., Muchnoi, N. Yu., Muskalla, J., Nefedov, Y., Nerling, F., Nikolaev, I. B., Ning, Z., Nisar, S., Niu, W. D., Niu, Y., Olsen, S. L., Ouyang, Q., Pacetti, S., Pan, X., Pan, Y., Pathak, A., Patteri, P., Pei, Y. P., Pelizaeus, M., Peng, H. P., Peters, K., Ping, J. L., Ping, R. G., Plura, S., Pogodin, S., Prasad, V., Qi, F. Z., Qi, H., Qi, H. R., Qi, M., Qi, T. Y., Qian, S., Qian, W. B., Qiao, C. F., Qin, J. J., Qin, L. Q., Qin, X. P., Qin, X. S., Qin, Z. H., Qiu, J. F., Qu, S. Q., Redmer, C. F., Ren, K. J., Rivetti, A., Rolo, M., Rong, G., Rosner, Ch., Ruan, S. N., Salone, N., Sarantsev, A., Schelhaas, Y., Schoenning, K., Scodeggio, M., Shan, K. Y., Shan, W., Shan, X. Y., Shangguan, J. F., Shao, L. G., Shao, M., Shen, C. P., Shen, H. F., Shen, W. H., Shen, X. Y., Shi, B. A., Shi, H. C., Shi, J. L., Shi, J. Y., Shi, Q. Q., Shi, R. S., Shi, X., Song, J. J., Song, T. Z., Song, W. M., Song, Y. J., Song, Y. X., Sosio, S., Spataro, S., Stieler, F., Su, Y. J., Sun, G. B., Sun, G. X., Sun, H., Sun, H. K., Sun, J. F., Sun, K., Sun, L., Sun, S. S., Sun, T., Sun, W. Y., Sun, Y., Sun, Y. J., Sun, Y. Z., Sun, Z. T., Tan, Y. X., Tang, C. J., Tang, G. Y., Tang, J., Tang, Y. A., Tao, L. Y, Tao, Q. T., Tat, M., Teng, J. X., Thoren, V., Tian, W. H., Tian, Y., Tian, Z. F., Uman, I., Wang, S. J., Wang, B., Wang, B. L., Wang, Bo, Wang, C. W., Wang, D. Y., Wang, F., Wang, H. J., Wang, H. P., Wang, J. P., Wang, K., Wang, L. L., Wang, M., Wang, Meng, Wang, S., Wang, T., Wang, T. J., Wang, W., Wang, W. P., Wang, X., Wang, X. F., Wang, X. J., Wang, X. L., Wang, Y., Wang, Y. D., Wang, Y. F., Wang, Y. H., Wang, Y. N., Wang, Y. Q., Wang, Yaqian, Wang, Yi, Wang, Z., Wang, Z. L., Wang, Z. Y., Wang, Ziyi, Wei, D., Wei, D. H., Weidner, F., Wen, S. P., Wenzel, C. W., Wiedner, U., Wilkinson, G., Wolke, M., Wollenberg, L., Wu, C., Wu, J. F., Wu, L. H., Wu, L. J., Wu, X., Wu, X. H., Wu, Y., Wu, Y. H., Wu, Y. J., Wu, Z., Xia, L., Xian, X. M., Xiang, T., Xiao, D., Xiao, G. Y., Xiao, S. Y., Xiao, Y. L., Xiao, Z. J., Xie, C., Xie, X. H., Xie, Y., Xie, Y. G., Xie, Y. H., Xie, Z. P., Xing, T. Y., Xu, C. F., Xu, C. J., Xu, G. F., Xu, H. Y., Xu, Q. J., Xu, Q. N., Xu, W., Xu, W. L., Xu, X. P., Xu, Y. C., Xu, Z. P., Xu, Z. S., Yan, F., Yan, L., Yan, W. B., Yan, W. C., Yan, X. Q., Yang, H. J., Yang, H. L., Yang, H. X., Yang, Tao, Yang, Y., Yang, Y. F., Yang, Y. X., Yang, Yifan, Yang, Z. W., Yao, Z. P., Ye, M., Ye, M. H., Yin, J. H., You, Z. Y., Yu, B. X., Yu, C. X., Yu, G., Yu, J. S., Yu, T., Yu, X. D., Yuan, C. Z., Yuan, L., Yuan, S. C., Yuan, X. Q., Yuan, Y., Yuan, Z. Y., Yue, C. X., Zafar, A. A., Zeng, F. R., Zeng, X., Zeng, Y., Zeng, Y. J., Zhai, X. Y., Zhai, Y. C., Zhan, Y. H., Zhang, A. Q., Zhang, B. L., Zhang, B. X., Zhang, D. H., Zhang, G. Y., Zhang, H., Zhang, H. H., Zhang, H. Q., Zhang, H. Y., Zhang, J., Zhang, J. J., Zhang, J. L., Zhang, J. Q., Zhang, J. W., Zhang, J. X., Zhang, J. Y., Zhang, J. Z., Zhang, Jianyu, Zhang, Jiawei, Zhang, L. M., Zhang, L. Q., Zhang, Lei, Zhang, P., Zhang, Q. Y., Zhang, Shuihan, Zhang, Shulei, Zhang, X. D., Zhang, X. M., Zhang, X. Y., Zhang, Xuyan, Zhang, Y., Zhang, Y. T., Zhang, Y. H., Zhang, Yan, Zhang, Yao, Zhang, Z. H., Zhang, Z. L., Zhang, Z. Y., Zhao, G., Zhao, J., Zhao, J. Y., Zhao, J. Z., Zhao, Lei, Zhao, Ling, Zhao, M. G., Zhao, S. J., Zhao, Y. B., Zhao, Y. X., Zhao, Z. G., Zhemchugov, A., Zheng, B., Zheng, J. P., Zheng, W. J., Zheng, Y. H., Zhong, B., Zhong, X., Zhou, H., Zhou, L. P., Zhou, X., Zhou, X. K., Zhou, X. R., Zhou, X. Y., Zhou, Y. Z., Zhu, J., Zhu, K., Zhu, K. J., Zhu, L., Zhu, L. X., Zhu, S. H., Zhu, S. Q., Zhu, T. J., Zhu, W. J., Zhu, Y. C., Zhu, Z. A., Zou, J. H., and Zu, J.
- Subjects
High Energy Physics - Experiment - Abstract
The inclusive cross sections of prompt $J/\psi$ and $\psi(3686)$ production are measured at center-of-mass energies from 3.808 to 4.951 GeV. The dataset used is 22 fb$^{-1}$ of $e^{+}e^{-}$ annihilation data collected with the BESIII detector operating at the BEPCII storage ring. The results obtained are in agreement with the previous BESIII measurements of exclusive $J/\psi$ and $\psi(3686)$ production. The average values obtained for the cross sections measured in the center-of-mass energy ranges from 4.527 to 4.951 GeV for $J/\psi$ and from 4.843 to 4.951 GeV for $\psi(3686)$, where the impact of known resonances is negligible, are $14.0\pm1.7\pm3.1$ pb and $15.3\pm3.0$ pb, respectively. For $J/\psi$, the first and the second uncertainties are statistical and systematic, respectively. For $\psi(3686)$, the uncertainty is total. These values are useful for testing charmonium production models., Comment: 20 pages, 6 figures
- Published
- 2024
11. Massless Dirac equation on spinor bundles over real hyperbolic spaces
- Author
-
Meng, Long, Zhang, Hong-Wei, and Zhang, Junyong
- Subjects
Mathematics - Analysis of PDEs ,35Q41, 35L05, 35R01 - Abstract
We prove a sharp-in-time dispersive estimate of the Dirac equation on spinor bundles over the real hyperbolic space. Compared with the Euclidean counterparts, our result shows that the dispersive estimate differs between short and long times, reflecting the intuitive influence of negative curvature on the dispersion. Moreover, the well-known equivalence between dispersive estimates for Dirac and wave propagators in the Euclidean setting no longer holds in this context. This finding suggests that spinor fields are affected by the geometry at infinity of the manifold. As a key application, we establish an improved global-in-time Strichartz estimate, in the sense that there is no loss of angular derivatives and the admissible set is larger than previously known results in other settings.
- Published
- 2024
12. Tortho-Gaussian: Splatting True Digital Orthophoto Maps
- Author
-
Wang, Xin, Zhang, Wendi, Xie, Hong, Ai, Haibin, Yuan, Qiangqiang, and Zhan, Zongqian
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
True Digital Orthophoto Maps (TDOMs) are essential products for digital twins and Geographic Information Systems (GIS). Traditionally, TDOM generation involves a complex set of traditional photogrammetric process, which may deteriorate due to various challenges, including inaccurate Digital Surface Model (DSM), degenerated occlusion detections, and visual artifacts in weak texture regions and reflective surfaces, etc. To address these challenges, we introduce TOrtho-Gaussian, a novel method inspired by 3D Gaussian Splatting (3DGS) that generates TDOMs through orthogonal splatting of optimized anisotropic Gaussian kernel. More specifically, we first simplify the orthophoto generation by orthographically splatting the Gaussian kernels onto 2D image planes, formulating a geometrically elegant solution that avoids the need for explicit DSM and occlusion detection. Second, to produce TDOM of large-scale area, a divide-and-conquer strategy is adopted to optimize memory usage and time efficiency of training and rendering for 3DGS. Lastly, we design a fully anisotropic Gaussian kernel that adapts to the varying characteristics of different regions, particularly improving the rendering quality of reflective surfaces and slender structures. Extensive experimental evaluations demonstrate that our method outperforms existing commercial software in several aspects, including the accuracy of building boundaries, the visual quality of low-texture regions and building facades. These results underscore the potential of our approach for large-scale urban scene reconstruction, offering a robust alternative for enhancing TDOM quality and scalability., Comment: This work has been submitted to the IEEE Transactions on Geoscience and Remote Sensing for possible publication
- Published
- 2024
13. DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding
- Author
-
Cho, Jungbin, Kim, Junwan, Kim, Jisoo, Kim, Minseo, Kang, Mingu, Hong, Sungeun, Oh, Tae-Hyun, and Yu, Youngjae
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Human motion, inherently continuous and dynamic, presents significant challenges for generative models. Despite their dominance, discrete quantization methods, such as VQ-VAEs, suffer from inherent limitations, including restricted expressiveness and frame-wise noise artifacts. Continuous approaches, while producing smoother and more natural motions, often falter due to high-dimensional complexity and limited training data. To resolve this "discord" between discrete and continuous representations, we introduce DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding, a novel method that decodes discrete motion tokens into continuous motion through rectified flow. By employing an iterative refinement process in the continuous space, DisCoRD captures fine-grained dynamics and ensures smoother and more natural motions. Compatible with any discrete-based framework, our method enhances naturalness without compromising faithfulness to the conditioning signals. Extensive evaluations demonstrate that DisCoRD achieves state-of-the-art performance, with FID of 0.032 on HumanML3D and 0.169 on KIT-ML. These results solidify DisCoRD as a robust solution for bridging the divide between discrete efficiency and continuous realism. Our project page is available at: https://whwjdqls.github.io/discord.github.io/., Comment: 20 pages 18 figures
- Published
- 2024
14. ATOMS: ALMA Three-millimeter Observations of Massive Star-forming regions -- XIX. The origin of SiO emission
- Author
-
Liu, Rong, Liu, Tie, Jiménez-Serra, Izaskun, Li, Jin-Zeng, Martín-Pintado, Jesús, Liu, Xunchuan, Lee, Chang Won, Sanhueza, Patricio, Chibueze, James O., Rivilla, Víctor M., Juvela, Mika, Colzi, Laura, Bronfman, Leonardo, Liu, Hong-Li, Sanz-Novo, Miguel, López-Gallifa, Álvaro, Li, Shanghuo, Megías, Andrés, Andrés, David San, Garay, Guido, Hwang, Jihye, Zhou, Jianwen, Xu, Fengwei, Martínez-Henares, Antonio, Saha, Anindya, and Nazeer, Hafiz
- Subjects
Astrophysics - Astrophysics of Galaxies - Abstract
The production of silicon monoxide (SiO) can be considered as a fingerprint of shock interaction. In this work, we use high-sensitivity observations of the SiO (2-1) and H$^{13}$CO$^{+}$ (1-0) emission to investigate the broad and narrow SiO emission toward 146 massive star-forming regions in the ATOMS survey. We detected SiO emission in 136 regions and distinguished broad and narrow components across the extension of 118 sources (including 58 UC $H_{II}$ regions) with an average angular resolution of 2.5$^{\prime}$$^{\prime}$. The derived SiO luminosity ($L_{SiO}$) across the whole sample shows that the majority of $L_{SiO}$ (above 66$\%$) can be attributed to broad SiO, indicating its association with strong outflows. The comparison of the ALMA SiO images with the filamentary skeletons identified from H$^{13}$CO$^{+}$ and in the infrared data (at 4.5, 8, and 24 $mu$m), further confirms that most SiO emission originates from outflows. However, note that for nine sources in our sample, the observed SiO emission may be generated by expanding UC $H_{II}$ regions. There is a moderate positive correlation between the bolometric luminosity ($L_{bol}$) and $L_{SiO}$ for both components (narrow and broad). The UC $H_{II}$ sources show a weaker positive correlation between $L_{bol}$ and $L_{SiO}$ and higher $L_{SiO}$ compared to the sources without UC $H_{II}$ regions. These results imply that the SiO emission from UC $H_{II}$ sources might be affected by UV-photochemistry induced by UC $H_{II}$ regions., Comment: 23 pages, 14 figures
- Published
- 2024
15. An unstructured adaptive mesh refinement for steady flows based on physics-informed neural networks
- Author
-
Zhu, Yongzheng, Zhao, Shiji, Zhou, Yuanye, Liang, Hong, and Bian, Xin
- Subjects
Physics - Fluid Dynamics ,Physics - Computational Physics - Abstract
Mesh generation is essential for accurate and efficient computational fluid dynamics simulations. To resolve critical features in the flow, adaptive mesh refinement (AMR) is routinely employed in certain regions of the computational domain, where gradients or error estimates of the solution are often considered as the refining criteria. In many scenarios, however, these indicators can lead to unnecessary refinement over a large region, making the process a matter of trial and error and resulting in slow convergence of the computation. To this end, we propose a heuristic strategy that employs the residuals of the governing partial differential equations (PDEs) as a novel criterion to adaptively guide the mesh refining process. In particular, we leverage on the physics-informed neural networks (PINNs) to integrate imprecise data obtained on a coarse mesh and the governing PDEs. Once trained, PINNs are capable of identifying regions of highest residuals of the Navier-Stokes/Euler equations and suggesting new potential vertices for the coarse mesh cells. Moreover, we put forth two schemes to maintain the quality of the refined mesh through the strategic insertion of vertices and the implementation of Delaunay triangulation. By applying the residuals-guided AMR to address a multitude of typical incompressible/compressible flow problems and comparing the outcomes with those of gradient-based methods, we illustrate that the former effectively attains a favorable balance between the computational accuracy and cost., Comment: 36 pages, 29 figures
- Published
- 2024
16. On the perturbations of Noetherian local domains
- Author
-
Nguyen, Hong Duc, Nguyen, Hop D., and Quy, Pham Hung
- Subjects
Mathematics - Commutative Algebra ,Mathematics - Algebraic Geometry ,13D10, 13B40, 14B12 - Abstract
We study how the properties of being reduced, integral domain, and normal, behave under small perturbations of the defining equations of a noetherian local ring. It is not hard to show that the property of being a local integral domain (reduced, normal ring) is not stable under small perturbations in general. We prove that perturbation stability holds in the following situations: (1) perturbation of being an integral domain for factorial excellent Henselian local rings; (2) perturbation of normality for excellent local complete intersections containing a field of characteristic zero; and (3) perturbation of reducedness for excellent local complete intersections containing a field of characteristic zero, and for factorial Nagata local rings., Comment: 15 pages, comments are very well-come
- Published
- 2024
17. Ground electron calibration of the Gamma-ray Transient Monitor onboard DRO-A Satellite
- Author
-
Feng, Pei-Yi, An, Zheng-Hua, Li, Yu-Hui, Le, Qi, Zhang, Da-Li, Li, Xin-Qiao, Xiong, Shao-Lin, Liu, Cong-Zhan, Liu, Wei-Bin, Wang, Jian-Li, Deng, Bing-Lin, Xu, He, and Lu, Hong
- Subjects
Astrophysics - Instrumentation and Methods for Astrophysics ,High Energy Physics - Experiment ,Physics - Accelerator Physics ,Physics - Instrumentation and Detectors - Abstract
The Gamma-Ray Transient Monitor (GTM) is an all-sky monitor onboard the Distant Retrograde Orbit-A (DRO-A) satellite, with the scientific objective of detecting gamma-ray bursts in the energy range of 20 keV to 1 MeV. The GTM is equipped with five Gamma-Ray Transient Probes (GTPs), utilizing silicon photomultiplier (SiPM) arrays coupled with NaI(Tl) scintillators for signal readout. To test the performance of the GTP in detecting electrons, we independently developed a continuous-energy-tunable, low-current, quasi-single-electron accelerator, and used this facility for ground-based electron calibration of the GTP. This paper provides a detailed description of the operational principles of the unique electron accelerator and comprehensively presents the process and results of electron calibration for the GTP. The calibration results indicate that the dead time for normal signals is less than 4 $\mu$s, while for overflow signals, it is approximately 70 $\mu$s, consistent with the design specifications. The GTP's time-recording capability is working correctly, accurately recording overflow events. The GTP responds normally to electrons in the 0.4-1.4 MeV energy range. The ground-based electron calibration validates the design of the GTP and enhances the probe's mass model, laying the foundation for payload development, in-orbit observation strategies, and scientific data analysis., Comment: 14 pages, 16 figures
- Published
- 2024
18. Maximal Steered Coherence in Accelerating Unruh-DeWitt Detectors
- Author
-
Li, Hong-Wei, Fan, Yi-Hao, Shen, Shu-Ting, Yan, Xiao-Jing, Li, Xi-Yun, Zhong, Wei, Sheng, Yu-Bo, Zhou, Lan, and Du, Ming-Ming
- Subjects
Quantum Physics - Abstract
Quantum coherence, a fundamental aspect of quantum mechanics, plays a crucial role in various quantum information tasks. However, preserving coherence under extreme conditions, such as relativistic acceleration, poses significant challenges. In this paper, we investigate the influence of Unruh temperature and energy levels on the evolution of maximal steered coherence (MSC) for different initial states. Our results reveal that MSC is strongly dependent on Unruh temperature, exhibiting behaviors ranging from monotonic decline to non-monotonic recovery, depending on the initial state parameter. Notably, when \Delta=1, MSC is generated as Unruh temperature increases. Additionally, we observe that higher energy levels help preserve or enhance MSC in the presence of Unruh effects. These findings offer valuable insights into the intricate relationship between relativistic effects and quantum coherence, with potential applications in developing robust quantum technologies for non-inertial environments., Comment: 6 pages, 2 figures
- Published
- 2024
- Full Text
- View/download PDF
19. Spanning trees and continued fractions
- Author
-
Chan, Swee Hong, Kontorovich, Alex, and Pak, Igor
- Subjects
Mathematics - Combinatorics ,Mathematics - Number Theory - Abstract
We prove the exponential growth of the cardinality of the set of numbers of spanning trees in simple (and planar) graphs on $n$ vertices, answering a question of Sedl\'a\v{c}ek from 1969. The proof uses a connection with continued fractions, ``thin orbits,'' and Zaremba's conjecture., Comment: 20 pages, 7 figures
- Published
- 2024
20. Polarization Calibration of the FAST L-band 19-beam Receiver: I. On-axis Mueller Matrix Parameters
- Author
-
Ching, Tao-Chung, Heiles, Carl, Li, Di, Robishaw, Timothy, Chen, Xunzhou, Meng, Lingqi, Yue, You-Ling, Qian, Lei, and Liu, Hong-Fei
- Subjects
Astrophysics - Instrumentation and Methods for Astrophysics - Abstract
We present the polarization calibration of the 19-beam receiver at 1420 MHz within the full illumination of the Five-hundred-meter Aperture Spherical Telescope from October 2018 to March 2023. We perform spider observations to characterize the on-axis Mueller matrix of the central beam. The calibrated polarization percentage and polarization angle of a source with strong linear polarization emission are about 0.2\% and 0.5$^{\circ}$. Several parameters of the central-beam Mueller matrix show time variability from months to years, suggesting relatively frequent polarization calibrations are needed. We obtain the Mueller matrix parameters of the 18 off-center beams with the combination of on-the-fly observations and spider observations. The polarization calibration provides consistent fractional Stokes parameters of the 19 beams, although the Mueller matrix parameters of the off-center beams are not as accurate as those of the central beam. The Mueller matrix parameters of the central beam do not show a strong dependence on the reflector surface. However, we notice different off-center Mueller matrix parameters between the eastern and western sides of the reflector surface. We provide average parameters of the 19-beam Mueller matrices which should be applicable to observations from 2020 to 2022 with several caveats. After applying the average parameters, on-axis fractional linear polarization measurements $\gtrsim$ 10\% and on-axis fractional circular polarization measurements $\gtrsim$ 1.5\% can be considered high-confidence detections. For sources with weak polarization, timely polarization calibrations using spider observations are required., Comment: 16 pages, 13 figures, , Submitted to AJ
- Published
- 2024
21. Embodied Red Teaming for Auditing Robotic Foundation Models
- Author
-
Karnik, Sathwik, Hong, Zhang-Wei, Abhangi, Nishant, Lin, Yen-Chen, Wang, Tsun-Hsuan, and Agrawal, Pulkit
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Language-conditioned robot models (i.e., robotic foundation models) enable robots to perform a wide range of tasks based on natural language instructions. Despite strong performance on existing benchmarks, evaluating the safety and effectiveness of these models is challenging due to the complexity of testing all possible language variations. Current benchmarks have two key limitations: they rely on a limited set of human-generated instructions, missing many challenging cases, and they focus only on task performance without assessing safety, such as avoiding damage. To address these gaps, we introduce Embodied Red Teaming (ERT), a new evaluation method that generates diverse and challenging instructions to test these models. ERT uses automated red teaming techniques with Vision Language Models (VLMs) to create contextually grounded, difficult instructions. Experimental results show that state-of-the-art models frequently fail or behave unsafely on ERT tests, underscoring the shortcomings of current benchmarks in evaluating real-world performance and safety. Code and videos are available at: https://sites.google.com/view/embodiedredteam.
- Published
- 2024
22. Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling
- Author
-
Hyung, Junha, Kim, Kinam, Hong, Susung, Kim, Min-Jung, and Choo, Jaegul
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Diffusion models have emerged as a powerful tool for generating high-quality images, videos, and 3D content. While sampling guidance techniques like CFG improve quality, they reduce diversity and motion. Autoguidance mitigates these issues but demands extra weak model training, limiting its practicality for large-scale models. In this work, we introduce Spatiotemporal Skip Guidance (STG), a simple training-free sampling guidance method for enhancing transformer-based video diffusion models. STG employs an implicit weak model via self-perturbation, avoiding the need for external models or additional training. By selectively skipping spatiotemporal layers, STG produces an aligned, degraded version of the original model to boost sample quality without compromising diversity or dynamic degree. Our contributions include: (1) introducing STG as an efficient, high-performing guidance technique for video diffusion models, (2) eliminating the need for auxiliary models by simulating a weak model through layer skipping, and (3) ensuring quality-enhanced guidance without compromising sample diversity or dynamics unlike CFG. For additional results, visit https://junhahyung.github.io/STGuidance., Comment: project page: https://junhahyung.github.io/STGuidance
- Published
- 2024
23. GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data
- Author
-
Wang, Wentao, Ye, Hang, Hong, Fangzhou, Yang, Xue, Zhang, Jianfu, Wang, Yizhou, Liu, Ziwei, and Pan, Liang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Given a single in-the-wild human photo, it remains a challenging task to reconstruct a high-fidelity 3D human model. Existing methods face difficulties including a) the varying body proportions captured by in-the-wild human images; b) diverse personal belongings within the shot; and c) ambiguities in human postures and inconsistency in human textures. In addition, the scarcity of high-quality human data intensifies the challenge. To address these problems, we propose a Generalizable image-to-3D huMAN reconstruction framework, dubbed GeneMAN, building upon a comprehensive multi-source collection of high-quality human data, including 3D scans, multi-view videos, single photos, and our generated synthetic human data. GeneMAN encompasses three key modules. 1) Without relying on parametric human models (e.g., SMPL), GeneMAN first trains a human-specific text-to-image diffusion model and a view-conditioned diffusion model, serving as GeneMAN 2D human prior and 3D human prior for reconstruction, respectively. 2) With the help of the pretrained human prior models, the Geometry Initialization-&-Sculpting pipeline is leveraged to recover high-quality 3D human geometry given a single image. 3) To achieve high-fidelity 3D human textures, GeneMAN employs the Multi-Space Texture Refinement pipeline, consecutively refining textures in the latent and the pixel spaces. Extensive experimental results demonstrate that GeneMAN could generate high-quality 3D human models from a single image input, outperforming prior state-of-the-art methods. Notably, GeneMAN could reveal much better generalizability in dealing with in-the-wild images, often yielding high-quality 3D human models in natural poses with common items, regardless of the body proportions in the input images., Comment: Project page: https://roooooz.github.io/GeneMAN/
- Published
- 2024
24. SoK: Watermarking for AI-Generated Content
- Author
-
Zhao, Xuandong, Gunn, Sam, Christ, Miranda, Fairoze, Jaiden, Fabrega, Andres, Carlini, Nicholas, Garg, Sanjam, Hong, Sanghyun, Nasr, Milad, Tramer, Florian, Jha, Somesh, Li, Lei, Wang, Yu-Xiang, and Song, Dawn
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
As the outputs of generative AI (GenAI) techniques improve in quality, it becomes increasingly challenging to distinguish them from human-created content. Watermarking schemes are a promising approach to address the problem of distinguishing between AI and human-generated content. These schemes embed hidden signals within AI-generated content to enable reliable detection. While watermarking is not a silver bullet for addressing all risks associated with GenAI, it can play a crucial role in enhancing AI safety and trustworthiness by combating misinformation and deception. This paper presents a comprehensive overview of watermarking techniques for GenAI, beginning with the need for watermarking from historical and regulatory perspectives. We formalize the definitions and desired properties of watermarking schemes and examine the key objectives and threat models for existing approaches. Practical evaluation strategies are also explored, providing insights into the development of robust watermarking techniques capable of resisting various attacks. Additionally, we review recent representative works, highlight open challenges, and discuss potential directions for this emerging field. By offering a thorough understanding of watermarking in GenAI, this work aims to guide researchers in advancing watermarking methods and applications, and support policymakers in addressing the broader implications of GenAI.
- Published
- 2024
25. A Multi-Agent Dual Dialogue System to Support Mental Health Care Providers
- Author
-
Kampman, Onno P., Phang, Ye Sheng, Han, Stanley, Xing, Michael, Hong, Xinyi, Hoosainsah, Hazirah, Tan, Caleb, Winata, Genta Indra, Wang, Skyler, Heaukulani, Creighton, Weng, Janice Huiqin, and Morris, Robert JT
- Subjects
Computer Science - Human-Computer Interaction - Abstract
We introduce a general-purpose, human-in-the-loop dual dialogue system to support mental health care professionals. The system, co-designed with care providers, is conceptualized to assist them in interacting with care seekers rather than functioning as a fully automated dialogue system solution. The AI assistant within the system reduces the cognitive load of mental health care providers by proposing responses, analyzing conversations to extract pertinent themes, summarizing dialogues, and recommending localized relevant content and internet-based cognitive behavioral therapy exercises. These functionalities are achieved through a multi-agent system design, where each specialized, supportive agent is characterized by a large language model. In evaluating the multi-agent system, we focused specifically on the proposal of responses to emotionally distressed care seekers. We found that the proposed responses matched a reasonable human quality in demonstrating empathy, showing its appropriateness for augmenting the work of mental health care providers., Comment: Update: Render figures properly and update title
- Published
- 2024
26. Krylov Complexity in early universe
- Author
-
Zhai, Ke-Hong and Liu, Lei-Hua
- Subjects
High Energy Physics - Theory ,Astrophysics - Cosmology and Nongalactic Astrophysics ,General Relativity and Quantum Cosmology ,High Energy Physics - Phenomenology ,Quantum Physics - Abstract
The Lanczos algorithm offers a method for constructing wave functions for both closed and open systems based on their Hamiltonians. Given that the entire early universe is fundamentally an open system, we apply the Lanczos algorithm to investigate Krylov complexity across different phases of the early universe, including inflation, the radiation dominated period (RD), and the matter dominated period (MD). Notably, we find that Krylov complexity differs between the closed and open system approaches. To effectively capture the impact of potentials during the RD and MD phases, we analyze various inflationary potentials, including the Higgs potential, the $R^2$ inflationary potential, and chaotic inflationary potential, taking into account the violations of slow roll conditions. This analysis is conducted in terms of conformal time through the preheating process. Our numerical results indicate that the evolution of Krylov complexity and Krylov entropy is remarkably similar across both methods, regardless of the potential under consideration. Additionally, we rigorously construct what is referred to as an open two-mode squeezed state, utilizing the second kind of Meixner polynomials. Based on this construction, we are the first to calculate the evolution equations for $r_k$ and $\phi_k$ as they relate to the scale factor. Our findings suggest that dissipative effects lead to a rapid decoherence like behavior. Moreover, our results indicate that inflation behaves as a strongly dissipative system, while both the RD and MD phases exhibit characteristics of weak dissipation. This research provides new insights into exploring the universe from the perspective of quantum information., Comment: 34 pages, 10 figures
- Published
- 2024
27. Progress on the spectroscopy of an Sp(4) gauge theory coupled to matter in multiple representations
- Author
-
Hsiao, Ho, Bennett, Ed, Forzano, Niccolò, Hong, Deog Ki, Lee, Jong-Wan, Lin, C. -J. David, Lucini, Biagio, Piai, Maurizio, Vadacchino, Davide, and Zierler, Fabian
- Subjects
High Energy Physics - Lattice ,High Energy Physics - Phenomenology - Abstract
We report progress on our lattice calculations for the mass spectra of low-lying composite states in the Sp(4) gauge theory coupled to two and three flavors of Dirac fermions transforming in the fundamental and the two-index antisymmetric representations, respectively. This theory provides an ultraviolet completion to the composite Higgs model with Goldstone modes in the SU(4)/Sp(4) coset and with partial compositeness for generating the top-quark mass. We measure the meson and chimera baryon masses. These masses are crucial for constructing the composite Higgs model. In particular, the chimera baryon masses are important inputs for implementing top partial compositeness. We employ Wilson fermions and the Wilson plaquette action in our simulations. Techniques such as APE and Wuppertal smearing, as well as the procedure of generalised eigenvalue problem, are implemented in our analysis., Comment: 12 pages, 8 figures, 1 tables, Proceedings of the 41st International Symposium on Lattice Field Theory (Lattice 2024), July 28th - August 3rd, 2024, University of Liverpool, UK
- Published
- 2024
28. TransferFuzz: Fuzzing with Historical Trace for Verifying Propagated Vulnerability Code
- Author
-
Li, Siyuan, Li, Yuekang, Chen, Zuxin, Dong, Chaopeng, Wang, Yongpan, Li, Hong, Chen, Yongle, and Zhu, Hongsong
- Subjects
Computer Science - Software Engineering - Abstract
Code reuse in software development frequently facilitates the spread of vulnerabilities, making the scope of affected software in CVE reports imprecise. Traditional methods primarily focus on identifying reused vulnerability code within target software, yet they cannot verify if these vulnerabilities can be triggered in new software contexts. This limitation often results in false positives. In this paper, we introduce TransferFuzz, a novel vulnerability verification framework, to verify whether vulnerabilities propagated through code reuse can be triggered in new software. Innovatively, we collected runtime information during the execution or fuzzing of the basic binary (the vulnerable binary detailed in CVE reports). This process allowed us to extract historical traces, which proved instrumental in guiding the fuzzing process for the target binary (the new binary that reused the vulnerable function). TransferFuzz introduces a unique Key Bytes Guided Mutation strategy and a Nested Simulated Annealing algorithm, which transfers these historical traces to implement trace-guided fuzzing on the target binary, facilitating the accurate and efficient verification of the propagated vulnerability. Our evaluation, conducted on widely recognized datasets, shows that TransferFuzz can quickly validate vulnerabilities previously unverifiable with existing techniques. Its verification speed is 2.5 to 26.2 times faster than existing methods. Moreover, TransferFuzz has proven its effectiveness by expanding the impacted software scope for 15 vulnerabilities listed in CVE reports, increasing the number of affected binaries from 15 to 53. The datasets and source code used in this article are available at https://github.com/Siyuan-Li201/TransferFuzz., Comment: Accepted by 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE'25)
- Published
- 2024
29. OSDFace: One-Step Diffusion Model for Face Restoration
- Author
-
Wang, Jingkai, Gong, Jue, Zhang, Lin, Chen, Zheng, Liu, Xing, Gu, Hong, Liu, Yutong, Zhang, Yulun, and Yang, Xiaokang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Diffusion models have demonstrated impressive performance in face restoration. Yet, their multi-step inference process remains computationally intensive, limiting their applicability in real-world scenarios. Moreover, existing methods often struggle to generate face images that are harmonious, realistic, and consistent with the subject's identity. In this work, we propose OSDFace, a novel one-step diffusion model for face restoration. Specifically, we propose a visual representation embedder (VRE) to better capture prior information and understand the input face. In VRE, low-quality faces are processed by a visual tokenizer and subsequently embedded with a vector-quantized dictionary to generate visual prompts. Additionally, we incorporate a facial identity loss derived from face recognition to further ensure identity consistency. We further employ a generative adversarial network (GAN) as a guidance model to encourage distribution alignment between the restored face and the ground truth. Experimental results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics, generating high-fidelity, natural face images with high identity consistency. The code and model will be released at https://github.com/jkwang28/OSDFace., Comment: 8 pages, 6 figures. The code and model will be available at https://github.com/jkwang28/OSDFace
- Published
- 2024
30. Space-borne Interferometers to Detect Thousands of Memory Signals Emitted by Stellar-mass Binary Black Holes
- Author
-
Hou, Shaoqi, Zhao, Zhi-Chao, Cao, Zhoujian, and Zhu, Zong-Hong
- Subjects
General Relativity and Quantum Cosmology ,High Energy Physics - Phenomenology ,High Energy Physics - Theory - Abstract
The gravitational memory effect manifests the nonlinearity of the gravitation, reflects the degenerate gravitational vacua, and indicates the types of the asymptotic symmetries. However, by the received wisdom, it would be challenging to detect it. In this Letter, we envisioned employing the space-borne interferometer, especially DECIGO, to detect memory signals generated by the stellar-mass binary black hole (BBHs) systems, the conventional targets of the ground-based detectors. We estimated that during its 5 years' observation, DECIGO could detect nearly 2,258 loud enough memory signals. Among them, 102 have signal-to-noise ratios greater than 8. Our prediction was obtained with the BBH population model constrained by the recent gravitational wave observations in GWTC-3. The Power Law + Peak mass model and the DEFAULT spin model were employed, and the merger rate was chosen to be proportional to the Madau-Dickinson formation rate. In the analysis, the impact of the orbital eccentricity was also considered. The high rate is due to the sufficiently strong memory signal right at the bandwidth of DECIGO, and also because of the humongous number of the surely existing stellar-mass BBHs. The substantial, and yet conservative, number of detections enables to leverage statistical approaches to harness the memory effect for fundamental physics and astrophysics., Comment: 11 pages, comments very welcomed
- Published
- 2024
31. Exclusion of a direct progenitor detection for the Type Ic SN 2017ein based on late-time observations
- Author
-
Zhao, Yi-Han, Sun, Ning-Chen, Wu, Junjie, Niu, Zexi, Hong, Xinyi, Huang, Yinhan, Maund, Justyn R., Xi, Qiang, Xiang, Danfeng, and Liu, Jifeng
- Subjects
Astrophysics - High Energy Astrophysical Phenomena ,Astrophysics - Solar and Stellar Astrophysics - Abstract
To date, SN 2017ein is the only Type Ic supernova with a directly identified progenitor candidate. This candidate points to a very massive ($>$45 $M_\odot$) Wolf-Rayet progenitor, but its disappearance after the explosion of SN 2017ein remains unconfirmed. In this work, we revisit SN 2017ein in late-time images acquired by the Hubble Space Telescope (HST) at 2.4--3.8 yrs after peak brightness. We find this source has not disappeared and its brightness and color remain almost the same as in the pre-explosion images. Thus, we conclude that the pre-explosion source is not the genuine progenitor of SN 2017ein. We exclude the possibility that it is a companion star of the progenitor, since it has a much lower extinction than SN 2017ein; its color is also inconsistent with a star cluster, indicated by the newly added magnitude limit in F336W, apart from F555W and F814W. We suggest, therefore, this source is an unrelated star in chance alignment with SN 2017ein. Based on the low ejecta mass, we propose that SN 2017ein is most likely originated from a moderately massive star with $M_{\rm ini}$ $\sim$ 8--20 $M_\odot$, stripped by binary interaction, rather than a very massive Wolf-Rayet progenitor., Comment: Submitted to ApJL. 8 pages, 4 figures
- Published
- 2024
32. Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey
- Author
-
Nguyen-Le, Hong-Hanh, Tran, Van-Tuan, Nguyen, Dinh-Thuc, and Le-Khac, Nhien-An
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Cryptography and Security - Abstract
In recent years, deepfakes (DFs) have been utilized for malicious purposes, such as individual impersonation, misinformation spreading, and artists' style imitation, raising questions about ethical and security concerns. However, existing surveys have focused on accuracy performance of passive DF detection approaches for single modalities, such as image, video or audio. This comprehensive survey explores passive approaches across multiple modalities, including image, video, audio, and multi-modal domains, and extend our discussion beyond detection accuracy, including generalization, robustness, attribution, and interpretability. Additionally, we discuss threat models for passive approaches, including potential adversarial strategies and different levels of adversary knowledge and capabilities. We also highlights current challenges in DF detection, including the lack of generalization across different generative models, the need for comprehensive trustworthiness evaluation, and the limitations of existing multi-modal approaches. Finally, we propose future research directions that address these unexplored and emerging issues in the field of passive DF detection, such as adaptive learning, dynamic benchmark, holistic trustworthiness evaluation, and multi-modal detectors for talking-face video generation., Comment: 26 pages
- Published
- 2024
33. Bi-ICE: An Inner Interpretable Framework for Image Classification via Bi-directional Interactions between Concept and Input Embeddings
- Author
-
Hong, Jinyung, Kim, Yearim, Park, Keun Hee, Han, Sangyu, Kwak, Nojun, and Pavlic, Theodore P.
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Inner interpretability is a promising field focused on uncovering the internal mechanisms of AI systems and developing scalable, automated methods to understand these systems at a mechanistic level. While significant research has explored top-down approaches starting from high-level problems or algorithmic hypotheses and bottom-up approaches building higher-level abstractions from low-level or circuit-level descriptions, most efforts have concentrated on analyzing large language models. Moreover, limited attention has been given to applying inner interpretability to large-scale image tasks, primarily focusing on architectural and functional levels to visualize learned concepts. In this paper, we first present a conceptual framework that supports inner interpretability and multilevel analysis for large-scale image classification tasks. We introduce the Bi-directional Interaction between Concept and Input Embeddings (Bi-ICE) module, which facilitates interpretability across the computational, algorithmic, and implementation levels. This module enhances transparency by generating predictions based on human-understandable concepts, quantifying their contributions, and localizing them within the inputs. Finally, we showcase enhanced transparency in image classification, measuring concept contributions and pinpointing their locations within the inputs. Our approach highlights algorithmic interpretability by demonstrating the process of concept learning and its convergence., Comment: The first two authors equally contributed to this work, 27 pages, 19 figures, 9 tables
- Published
- 2024
34. New Test-Time Scenario for Biosignal: Concept and Its Approach
- Author
-
Jo, Yong-Yeon, Lee, Byeong Tak, Kim, Beom Joon, Hong, Jeong-Ho, Lee, Hak Seung, and Kwon, Joon-myoung
- Subjects
Electrical Engineering and Systems Science - Signal Processing ,Computer Science - Machine Learning - Abstract
Online Test-Time Adaptation (OTTA) enhances model robustness by updating pre-trained models with unlabeled data during testing. In healthcare, OTTA is vital for real-time tasks like predicting blood pressure from biosignals, which demand continuous adaptation. We introduce a new test-time scenario with streams of unlabeled samples and occasional labeled samples. Our framework combines supervised and self-supervised learning, employing a dual-queue buffer and weighted batch sampling to balance data types. Experiments show improved accuracy and adaptability under real-world conditions., Comment: Findings paper presented at Machine Learning for Health (ML4H) symposium 2024, December 15-16, 2024, Vancouver, Canada, 6 pages
- Published
- 2024
35. Efficient Multi-modal Large Language Models via Visual Token Grouping
- Author
-
Huang, Minbin, Huang, Runhui, Shi, Han, Chen, Yimeng, Zheng, Chuanyang, Sun, Xiangguo, Jiang, Xin, Li, Zhenguo, and Cheng, Hong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The development of Multi-modal Large Language Models (MLLMs) enhances Large Language Models (LLMs) with the ability to perceive data formats beyond text, significantly advancing a range of downstream applications, such as visual question answering and image captioning. However, the substantial computational costs associated with processing high-resolution images and videos pose a barrier to their broader adoption. To address this challenge, compressing vision tokens in MLLMs has emerged as a promising approach to reduce inference costs. While existing methods conduct token reduction in the feature alignment phase. In this paper, we introduce VisToG, a novel grouping mechanism that leverages the capabilities of pre-trained vision encoders to group similar image segments without the need for segmentation masks. Specifically, we concatenate semantic tokens to represent image semantic segments after the linear projection layer before feeding into the vision encoder. Besides, with the isolated attention we adopt, VisToG can identify and eliminate redundant visual tokens utilizing the prior knowledge in the pre-trained vision encoder, which effectively reduces computational demands. Extensive experiments demonstrate the effectiveness of VisToG, maintaining 98.1% of the original performance while achieving a reduction of over 27\% inference time.
- Published
- 2024
36. SuperMat: Physically Consistent PBR Material Estimation at Interactive Rates
- Author
-
Hong, Yijia, Guo, Yuan-Chen, Yi, Ran, Chen, Yulong, Cao, Yan-Pei, and Ma, Lizhuang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Decomposing physically-based materials from images into their constituent properties remains challenging, particularly when maintaining both computational efficiency and physical consistency. While recent diffusion-based approaches have shown promise, they face substantial computational overhead due to multiple denoising steps and separate models for different material properties. We present SuperMat, a single-step framework that achieves high-quality material decomposition with one-step inference. This enables end-to-end training with perceptual and re-render losses while decomposing albedo, metallic, and roughness maps at millisecond-scale speeds. We further extend our framework to 3D objects through a UV refinement network, enabling consistent material estimation across viewpoints while maintaining efficiency. Experiments demonstrate that SuperMat achieves state-of-the-art PBR material decomposition quality while reducing inference time from seconds to milliseconds per image, and completes PBR material estimation for 3D objects in approximately 3 seconds. The project page is at https://hyj542682306.github.io/SuperMat/., Comment: https://hyj542682306.github.io/SuperMat/
- Published
- 2024
37. CleanVul: Automatic Function-Level Vulnerability Detection in Code Commits Using LLM Heuristics
- Author
-
Li, Yikun, Zhang, Ting, Widyasari, Ratnadira, Tun, Yan Naing, Nguyen, Huu Hung, Bui, Tan, Irsan, Ivana Clairine, Cheng, Yiran, Lan, Xiang, Ang, Han Wei, Liauw, Frank, Weyssow, Martin, Kang, Hong Jin, Ouh, Eng Lieh, Shar, Lwin Khin, and Lo, David
- Subjects
Computer Science - Software Engineering ,Computer Science - Cryptography and Security - Abstract
Accurate identification of software vulnerabilities is crucial for system integrity. Vulnerability datasets, often derived from the National Vulnerability Database (NVD) or directly from GitHub, are essential for training machine learning models to detect these security flaws. However, these datasets frequently suffer from significant noise, typically 40% to 75%, due primarily to the automatic and indiscriminate labeling of all changes in vulnerability-fixing commits (VFCs) as vulnerability-related. This misclassification occurs because not all changes in a commit aimed at fixing vulnerabilities pertain to security threats; many are routine updates like bug fixes or test improvements. This paper introduces the first methodology that uses the Large Language Model (LLM) with a heuristic enhancement to automatically identify vulnerability-fixing changes from VFCs, achieving an F1-score of 0.82. VulSifter was applied to a large-scale study, where we conducted a crawl of 127,063 repositories on GitHub, resulting in the acquisition of 5,352,105 commits. VulSifter involves utilizing an LLM to comprehend code semantics and contextual information, while applying heuristics to filter out unrelated changes. We then developed CleanVul, a high-quality dataset comprising 11,632 functions using our LLM heuristic enhancement approach, demonstrating Correctness (90.6%) comparable to established datasets such as SVEN and PrimeVul. To evaluate the CleanVul dataset, we conducted experiments focusing on fine-tuning various LLMs on CleanVul and other high-quality datasets. Evaluation results reveal that LLMs fine-tuned on CleanVul not only exhibit enhanced accuracy but also superior generalization capabilities compared to those trained on uncleaned datasets. Specifically, models trained on CleanVul and tested on PrimeVul achieve accuracy higher than those trained and tested exclusively on PrimeVul.
- Published
- 2024
38. VIRES: Video Instance Repainting with Sketch and Text Guidance
- Author
-
Weng, Shuchen, Zheng, Haojie, Zhan, Peixuan, Hong, Yuchen, Jiang, Han, Li, Si, and Shi, Boxin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We introduce VIRES, a video instance repainting method with sketch and text guidance, enabling video instance repainting, replacement, generation, and removal. Existing approaches struggle with temporal consistency and accurate alignment with the provided sketch sequence. VIRES leverages the generative priors of text-to-video models to maintain temporal consistency and produce visually pleasing results. We propose the Sequential ControlNet with the standardized self-scaling, which effectively extracts structure layouts and adaptively captures high-contrast sketch details. We further augment the diffusion transformer backbone with the sketch attention to interpret and inject fine-grained sketch semantics. A sketch-aware encoder ensures that repainted results are aligned with the provided sketch sequence. Additionally, we contribute the VireSet, a dataset with detailed annotations tailored for training and evaluating video instance editing methods. Experimental results demonstrate the effectiveness of VIRES, which outperforms state-of-the-art methods in visual quality, temporal consistency, condition alignment, and human ratings. Project page:https://suimuc.github.io/suimu.github.io/projects/VIRES/
- Published
- 2024
39. CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction
- Author
-
Zhou, Yuan, Xu, Qingshan, Cui, Jiequan, Zhou, Junbao, Zhang, Jing, Hong, Richang, and Zhang, Hanwang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recently, large efforts have been made to design efficient linear-complexity visual Transformers. However, current linear attention models are generally unsuitable to be deployed in resource-constrained mobile devices, due to suffering from either few efficiency gains or significant accuracy drops. In this paper, we propose a new de\textbf{C}oupled du\textbf{A}l-interactive linea\textbf{R} att\textbf{E}ntion (CARE) mechanism, revealing that features' decoupling and interaction can fully unleash the power of linear attention. We first propose an asymmetrical feature decoupling strategy that asymmetrically decouples the learning process for local inductive bias and long-range dependencies, thereby preserving sufficient local and global information while effectively enhancing the efficiency of models. Then, a dynamic memory unit is employed to maintain critical information along the network pipeline. Moreover, we design a dual interaction module to effectively facilitate interaction between local inductive bias and long-range information as well as among features at different layers. By adopting a decoupled learning way and fully exploiting complementarity across features, our method can achieve both high efficiency and accuracy. Extensive experiments on ImageNet-1K, COCO, and ADE20K datasets demonstrate the effectiveness of our approach, e.g., achieving $78.4/82.1\%$ top-1 accuracy on ImagegNet-1K at the cost of only $0.7/1.9$ GMACs. Codes will be released on \href{..}{github}.
- Published
- 2024
40. Control of ferromagnetism of Vanadium Oxide thin films by oxidation states
- Author
-
Park, Kwonjin, Cho, Jaeyong, Lee, Soobeom, Cho, Jaehun, Ha, Jae-Hyun, Jung, Jinyong, Kim, Dongryul, Choi, Won-Chang, Hong, Jung-Il, and You, Chun-Yeol
- Subjects
Condensed Matter - Materials Science - Abstract
Vanadium oxide (VOx) is a material of significant interest due to its metal-insulator transition (MIT) properties as well as its diverse stable antiferromagnetism depending on the valence states of V and O with distinct MIT transitions and N\'eel temperatures. Although several studies reported the ferromagnetism in the VOx, it was mostly associated with impurities or defects, and pure VOx has rarely been reported as ferromagnetic. Our research presents clear evidence of ferromagnetism in the VOx thin films, exhibiting a saturation magnetization of approximately 14 kA/m at 300 K. We fabricated 20-nm thick VOx thin films via reactive sputtering from a metallic vanadium target in various oxygen atmosphere. The oxidation states of ferromagnetic VOx films show an ill-defined stoichiometry of V2O3+p, where p = 0.05, 0.23, 0.49, with predominantly disordered microstructures. Ferromagnetic nature of these VOx films is confirmed through a strong antiferromagnetic exchange coupling with the neighboring ferromagnetic layer in the VOx/Co bilayers, in which the spin configurations of Co layer is influenced strongly due to the additional anisotropy introduced by VOx layer. The present study highlights the potential of VOx as an emerging functional magnetic material with tunability by oxidation states for modern spintronic applications., Comment: 6 figures, and supporting information with 3 figures
- Published
- 2024
41. Determination of the binding and $KD$ probability of the $D^{*}_{s0}(2317)$ from the $(\bar{D}\bar K)^-$ mass distributions in $\Lambda_{b}\to \Lambda_{c} (\bar{D}\bar K)^-$ decays
- Author
-
Li, Hai-Peng, Liang, Wei-Hong, Xiao, Chu-Wen, Xie, Ju-Jun, and Oset, Eulogio
- Subjects
High Energy Physics - Phenomenology - Abstract
We study the $\Lambda_{b}\to\Lambda_{c}\bar{D}^{0}K^{-}$ and $\Lambda_{b}\to \Lambda_{c}D^{-}\bar{K}^{0}$ reactions which proceed via a Cabibbo and $N_c$ favored process of external emission, and we determine the $\bar{D}^{0}K^{-}$ and $D^{-}\bar{K}^{0}$ mass distributions close to the $\bar{D} \bar{K}$ threshold. For this, we use the tree level contribution plus the rescattering of the meson-meson components, using the extension of the local hidden gauge approach to the charm sector that produces the $D^*_{s0}(2317)$ resonance. We observe a large enhancement of the mass distributions close to threshold due to the presence of this resonance below threshold. Next we undertake the inverse problem of extracting the maximum information on the interaction of the $\bar{D} \bar{K}$ channels from these distributions, and using the resampling method we find that from these data one can obtain precise values of the scattering lengths and effective ranges, the existence of an $I=0$ bound state with a precision of about $4 \;\rm MeV$ in the mass, plus the $\bar{D} \bar{K}$ molecular probability of this state with reasonable precision. Given the fact that the $\Lambda_{b}\to\Lambda_{c}\bar{D}^{0}K^{-}$ reaction is already measured by the LHCb collaboration, it is expected that in the next runs with more statistics of the reaction, these mass distributions can be measured with precision and the method proposed here can be used to determine the nature of the $D^*_{s0}(2317)$, which is still an issue of debate., Comment: 8 pages, 5 figures, 6 tables
- Published
- 2024
42. FUN-AD: Fully Unsupervised Learning for Anomaly Detection with Noisy Training Data
- Author
-
Im, Jiin, Son, Yongho, and Hong, Je Hyeong
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
While the mainstream research in anomaly detection has mainly followed the one-class classification, practical industrial environments often incur noisy training data due to annotation errors or lack of labels for new or refurbished products. To address these issues, we propose a novel learning-based approach for fully unsupervised anomaly detection with unlabeled and potentially contaminated training data. Our method is motivated by two observations, that i) the pairwise feature distances between the normal samples are on average likely to be smaller than those between the anomaly samples or heterogeneous samples and ii) pairs of features mutually closest to each other are likely to be homogeneous pairs, which hold if the normal data has smaller variance than the anomaly data. Building on the first observation that nearest-neighbor distances can distinguish between confident normal samples and anomalies, we propose a pseudo-labeling strategy using an iteratively reconstructed memory bank (IRMB). The second observation is utilized as a new loss function to promote class-homogeneity between mutually closest pairs thereby reducing the ill-posedness of the task. Experimental results on two public industrial anomaly benchmarks and semantic anomaly examples validate the effectiveness of FUN-AD across different scenarios and anomaly-to-normal ratios. Our code is available at https://github.com/HY-Vision-Lab/FUNAD., Comment: Accepted at WACV 2025. Supplementary material included after references. 17 pages, 7 figures, 14 tables
- Published
- 2024
43. Universal Wong formula for capture cross sections from light to super-heavy systems
- Author
-
Wang, Ning, Chen, Jinming, Wang, Yicheng, and Yao, Hong
- Subjects
Nuclear Theory - Abstract
A universal Wong formula is proposed with refined model parameters for a systematic description of the capture cross sections for heavy-ion fusion reactions from C+C to Ni+U, in which the barrier parameters and the barrier distribution are determined by the entrance-channel nucleus-nucleus potential based on the Skyrme energy density functional. With introducing a constraint to the width of the barrier distribution and a pocket-depth dependent barrier radius, the capture excitation functions for a number of fusion reactions involving different nuclear structure effects are remarkably well reproduced, particularly for the reactions between light nuclei and those forming super-heavy nuclei. The systematic decreasing behavior of the geometric radii with the depth of capture pocket due to the influence of deep inelastic scattering is clearly observed in the TDHF calculations for super-heavy systems. The predicted capture cross sections for $^{54}$Cr + $^{238}$U at above barrier energies are evidently smaller than the corresponding results of more asymmetric projectile-target combination $^{50}$Ti + $^{242}$Pu due to the shallower capture pocket in Cr+U., Comment: 5 figures
- Published
- 2024
44. Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observation
- Author
-
Tang, Shengeng, He, Jiayi, Cheng, Lechao, Wu, Jingjing, Guo, Dan, and Hong, Richang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Generating continuous sign language videos from discrete segments is challenging due to the need for smooth transitions that preserve natural flow and meaning. Traditional approaches that simply concatenate isolated signs often result in abrupt transitions, disrupting video coherence. To address this, we propose a novel framework, Sign-D2C, that employs a conditional diffusion model to synthesize contextually smooth transition frames, enabling the seamless construction of continuous sign language sequences. Our approach transforms the unsupervised problem of transition frame generation into a supervised training task by simulating the absence of transition frames through random masking of segments in long-duration sign videos. The model learns to predict these masked frames by denoising Gaussian noise, conditioned on the surrounding sign observations, allowing it to handle complex, unstructured transitions. During inference, we apply a linearly interpolating padding strategy that initializes missing frames through interpolation between boundary frames, providing a stable foundation for iterative refinement by the diffusion model. Extensive experiments on the PHOENIX14T, USTC-CSL100, and USTC-SLR500 datasets demonstrate the effectiveness of our method in producing continuous, natural sign language videos., Comment: 10 pages, 4 figures
- Published
- 2024
45. UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
- Author
-
Li, Yiheng, Hou, Ruibing, Chang, Hong, Shan, Shiguang, and Chen, Xilin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Human pose plays a crucial role in the digital age. While recent works have achieved impressive progress in understanding and generating human poses, they often support only a single modality of control signals and operate in isolation, limiting their application in real-world scenarios. This paper presents UniPose, a framework employing Large Language Models (LLMs) to comprehend, generate, and edit human poses across various modalities, including images, text, and 3D SMPL poses. Specifically, we apply a pose tokenizer to convert 3D poses into discrete pose tokens, enabling seamless integration into the LLM within a unified vocabulary. To further enhance the fine-grained pose perception capabilities, we facilitate UniPose with a mixture of visual encoders, among them a pose-specific visual encoder. Benefiting from a unified learning strategy, UniPose effectively transfers knowledge across different pose-relevant tasks, adapts to unseen tasks, and exhibits extended capabilities. This work serves as the first attempt at building a general-purpose framework for pose comprehension, generation, and editing. Extensive experiments highlight UniPose's competitive and even superior performance across various pose-relevant tasks.
- Published
- 2024
46. Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
- Author
-
Xi, Zhiheng, Yang, Dingwen, Huang, Jixuan, Tang, Jiafu, Li, Guanyu, Ding, Yiwen, He, Wei, Hong, Boyang, Do, Shihan, Zhan, Wenyu, Wang, Xiao, Zheng, Rui, Ji, Tao, Shi, Xiaowei, Zhai, Yitao, Weng, Rongxiang, Wang, Jingang, Cai, Xunliang, Gui, Tao, Wu, Zuxuan, Zhang, Qi, Qiu, Xipeng, Huang, Xuanjing, and Jiang, Yu-Gang
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Training large language models (LLMs) to spend more time thinking and reflection before responding is crucial for effectively solving complex reasoning tasks in fields such as science, coding, and mathematics. However, the effectiveness of mechanisms like self-reflection and self-correction depends on the model's capacity to accurately assess its own performance, which can be limited by factors such as initial accuracy, question difficulty, and the lack of external feedback. In this paper, we delve into a two-player paradigm that separates the roles of reasoning and critique models, where the critique model provides step-level feedback to supervise the reasoning (actor) model during both test-time and train-time. We first propose AutoMathCritique, an automated and scalable framework for collecting critique data, resulting in a dataset of $76,321$ responses paired with step-level feedback. Fine-tuning language models with this dataset enables them to generate natural language feedback for mathematical reasoning. We demonstrate that the critique models consistently improve the actor's performance on difficult queries at test-time, especially when scaling up inference-time computation. Motivated by these findings, we introduce the critique-based supervision to the actor's self-training process, and propose a critique-in-the-loop self-improvement method. Experiments show that the method improves the actor's exploration efficiency and solution diversity, especially on challenging queries, leading to a stronger reasoning model. Lastly, we take the preliminary step to explore training self-talk reasoning models via critique supervision and showcase its potential. Our code and datasets are at \href{https://mathcritique.github.io/}{https://mathcritique.github.io/}., Comment: Preprint
- Published
- 2024
47. Generalizable Deep Learning Approach for 3D Particle Imaging using Holographic Microscopy
- Author
-
Kumar, Shyam and Hong, Jiarong
- Subjects
Physics - Optics - Abstract
Despite its potential for label-free particle diagnostics, holographic microscopy is limited by specialized processing methods that struggle to generalize across diverse settings. We introduce a deep learning architecture leveraging human perception of longitudinal variation of diffracted patterns of particles, which enables highly generalizable analysis of 3D particle information with orders of magnitude improvement in processing speed. Trained with minimal synthetic and real holograms of simple particles, our method demonstrates exceptional performance on various challenging cases including those with high particle concentrations and noises and a wide range of particle sizes, complex shapes, and optical properties exceeding the diversity of the training datasets., Comment: 16 pages, 7 figures
- Published
- 2024
48. Frozen-field Modeling of Coronal Condensations with MPI-AMRVAC II: Optimization and application in three-dimensional models
- Author
-
Zhou, Yuhao, Li, Xiaohong, Jenkins, Jack M., Hong, Jie, and Keppens, Rony
- Subjects
Astrophysics - Solar and Stellar Astrophysics - Abstract
The frozen-field hydrodynamic (ffHD) model is a simplification of the full magnetohydrodynamical (MHD) equations under the assumption of a rigid magnetic field, which significantly reduces computational complexity and enhances efficiency. In this work, we combine the ffHD prescription with hyperbolic thermal conduction (TC) and the Transition Region Adaptive Conduction (TRAC) method to achieve further optimization. A series of two-dimensional tests are done to evaluate the performance of the hyperbolic TC and the TRAC method. The results indicate that hyperbolic TC, while showing limiter-affected numerical dissipation, delivers outcomes comparable to classic parabolic TC. The TRAC method effectively compensates for the underestimation of enthalpy flux in low-resolution simulations, as evaluated on tests that demonstrate prominence formation. We present an application of the ffHD model that forms a three-dimensional prominence embedded in a magnetic flux rope, which develops into a stable slab-like filament. The simulation reveals a prominence with an elongated spine and a width consistent with observations, highlighting the potential of the ffHD model in capturing the dynamics of solar prominences. Forward modeling of the simulation data produces synthetic images at various wavelengths, providing insights into the appearance of prominences and filaments in different observational contexts. The ffHD model, with its computational efficiency and the demonstrated capability to simulate complex solar phenomena, offers a valuable tool for solar physicists, and is implemented in the open-source MPI-AMRVAC framework., Comment: Accepted for publication in ApJ. 31 pages, 8 figures
- Published
- 2024
49. An exponential-free Runge--Kutta framework for developing third-order unconditionally energy stable schemes for the Cahn--Hilliard equation
- Author
-
Wang, Haifeng, Sun, Jingwei, Zhang, Hong, Qian, Xu, and Song, Songhe
- Subjects
Mathematics - Numerical Analysis ,65M12, 65M15, 65M70, 35Q92 ,G.1.8 - Abstract
In this work, we develop a class of up to third-order energy-stable schemes for the Cahn--Hilliard equation. Building on Lawson's integrating factor Runge--Kutta method, which is widely used for stiff semilinear equations, we discuss its limitations, such as the inability to preserve the equilibrium state and the oversmoothing of interfacial layers in the solution's profile because of the exponential damping effects. To overcome this drawback, we approximate the exponential term using a class of sophisticated Taylor polynomials, leading to a novel Runge--Kutta framework called exponential-free Runge--Kutta. By incorporating stabilization techniques, we analyze the energy stability of the proposed schemes and demonstrate that they preserve the original energy dissipation without time-step restrictions. Furthermore, we perform an analysis of the linear stability and establish an error estimate in the $\ell^2$ norm. A series of numerical experiments validate the high-order accuracy, mass conservation, and energy dissipation of our schemes., Comment: 29 pages, 11 figures
- Published
- 2024
50. DoubleCCA: Improving Foundation Model Group Robustness with Random Sentence Embeddings
- Author
-
Liu, Hong and Lu, Yitong
- Subjects
Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition - Abstract
This paper presents a novel method to improve the robustness of foundation models to group-based biases. We propose a simple yet effective method, called DoubleCCA, that leverages random sentences and Canonical Correlation Analysis (CCA) to enrich the text embeddings of the foundation model. First, we generate various random sentences that augment the original prompts, which extends the original prompts with random words or character sequences. Second, we use an additional sentence embedding model to generate different text embeddings with respect to these random sentences. We then use CCA double twice to align the representations and reconstruct them back to the original representation space. We demonstrate the effectiveness of our method on a variety of tasks and datasets, showing that it outperforms existing methods in terms of both performance and robustness. Our method is simple to implement and can be easily integrated into existing models, making it a practical solution for improving the robustness of foundation models to group-based biases., Comment: 18 pages, 6 figures, 2 tables
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.