8,317,058 results on '"Automatic"'
Search Results
52. Design and simulation of an automatic bridge for efficient and safe railway platform crossing
- Author
-
Patil, Yogesh H. and Patil, Rachana Yogesh
- Published
- 2024
- Full Text
- View/download PDF
53. On the automatic design of multi-objective particle swarm optimizers: experimentation and analysis
- Author
-
Nebro, Antonio J., López-Ibáñez, Manuel, García-Nieto, José, and Coello Coello, Carlos A.
- Published
- 2024
- Full Text
- View/download PDF
54. The system of automatic control of the substrate temperature as part of the installation for the production of film material by spray pyrolysis
- Author
-
Pecherskaya, E. A., Semenov, A. D., Zinchenko, T. O., Danilov, A. A., and Tuzova, D. E.
- Published
- 2024
- Full Text
- View/download PDF
55. Automatic imitation is modulated by stimulus clarity but not by animacy
- Author
-
Wilt, Hannah, Wu, Yuchunzi, Trotter, Antony, and Adank, Patti
- Published
- 2024
- Full Text
- View/download PDF
56. The long-term effect of ostracism on cyber aggression: mutually predictive mediators of hostile automatic thoughts and personal relative deprivation
- Author
-
Sun, Lindan, Tian, Xue, and Zhu, Wenfeng
- Published
- 2024
- Full Text
- View/download PDF
57. Influence of Key Parameters of Medicinal Aluminum Tube on Automatic Casing Process
- Author
-
Yan, Guoping, Ming, Zhengjun, Zhou, Junhong, Tao, Qi, and Li, Shihuang
- Published
- 2024
- Full Text
- View/download PDF
58. Iterative Model Predictive Control for Automatic Carrier Landing of Carrier-Based Aircrafts Under Complex Surroundings and Constraints
- Author
-
Zhang, Xiaotian, He, Defeng, and Liao, Fei
- Published
- 2024
- Full Text
- View/download PDF
59. Unsupervised Machine Learning for Automatic Image Segmentation of Impact Damage in CFRP Composites
- Author
-
Zhupanska, Olesya and Krokhmal, Pavlo
- Published
- 2024
- Full Text
- View/download PDF
60. A FairMOT approach based on video recognition for real-time automatic incident detection on expressways
- Author
-
Xiao, Daiquan, Wang, Zeyu, Shen, Zhenwu, Xu, Xuecai, and Ma, Changxi
- Published
- 2024
- Full Text
- View/download PDF
61. Automatic adjustment of the parameters of the temperature measurement algorithm in a wide range
- Author
-
Bondar, O. G., Brezhneva, E. O., and Botikov, K. A.
- Published
- 2024
- Full Text
- View/download PDF
62. Improved wafer map defect pattern classification using automatic data augmentation based lightweight encoder network in contrastive learning
- Author
-
Sheng, Yi, Yan, Jinda, and Piao, Minghao
- Published
- 2024
- Full Text
- View/download PDF
63. A new thread-level speculative automatic parallelization model and library based on duplicate code execution
- Author
-
Martínez, Millán A., Fraguela, Basilio B., Cabaleiro, José C., and Rivera, Francisco F.
- Published
- 2024
- Full Text
- View/download PDF
64. Achieving improved stability for automatic voltage regulation with fractional-order PID plus double-derivative controller and mountain gazelle optimizer
- Author
-
Izci, Davut, Abualigah, Laith, Can, Özay, Andiç, Cenk, and Ekinci, Serdar
- Published
- 2024
- Full Text
- View/download PDF
65. Automatic Screening of COVID-19 Using an Optimized Generative Adversarial Network
- Author
-
Goel, Tripti, Murugan, R., Mirjalili, Seyedali, and Chakrabartty, Deba Kumar
- Published
- 2024
- Full Text
- View/download PDF
66. Automatic Tongue Delineation from MRI Images with a Convolutional Neural Network Approach
- Author
-
Isaieva, Karyna, Laprie, Yves, Turpault, Nicolas, Houssard, Alexis, Felblinger, Jacques, and Vuissoz, Pierre-André
- Subjects
Computer Science - Artificial Intelligence - Abstract
Tongue contour extraction from real-time magnetic resonance images is a nontrivial task due to the presence of artifacts manifesting in form of blurring or ghostly contours. In this work, we present results of automatic tongue delineation achieved by means of U-Net auto-encoder convolutional neural network. We present both intra- and inter-subject validation. We used real-time magnetic resonance images and manually annotated 1-pixel wide contours as inputs. Predicted probability maps were post-processed in order to obtain 1-pixel wide tongue contours. The results are very good and slightly outperform published results on automatic tongue segmentation.
- Published
- 2024
- Full Text
- View/download PDF
67. HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset
- Author
-
Chu, Zedong, Xiong, Feng, Liu, Meiduo, Zhang, Jinzhi, Shao, Mingqi, Sun, Zhaoxu, Wang, Di, and Xu, Mu
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
With the rapid evolution of 3D generation algorithms, the cost of producing 3D humanoid character models has plummeted, yet the field is impeded by the lack of a comprehensive dataset for automatic rigging, which is a pivotal step in character animation. Addressing this gap, we present HumanRig, the first large-scale dataset specifically designed for 3D humanoid character rigging, encompassing 11,434 meticulously curated T-posed meshes adhered to a uniform skeleton topology. Capitalizing on this dataset, we introduce an innovative, data-driven automatic rigging framework, which overcomes the limitations of GNN-based methods in handling complex AI-generated meshes. Our approach integrates a Prior-Guided Skeleton Estimator (PGSE) module, which uses 2D skeleton joints to provide a preliminary 3D skeleton, and a Mesh-Skeleton Mutual Attention Network (MSMAN) that fuses skeleton features with 3D mesh features extracted by a U-shaped point transformer. This enables a coarse-to-fine 3D skeleton joint regression and a robust skinning estimation, surpassing previous methods in quality and versatility. This work not only remedies the dataset deficiency in rigging research but also propels the animation industry towards more efficient and automated character rigging pipelines., Comment: Website: https://github.com/c8241998/HumanRig
- Published
- 2024
68. What You See Is What You Get: Attention-based Self-guided Automatic Unit Test Generation
- Author
-
Yin, Xin, Ni, Chao, Xu, Xiaodan, and Yang, Xiaohu
- Subjects
Computer Science - Software Engineering - Abstract
Software defects heavily affect software's functionalities and may cause huge losses. Recently, many AI-based approaches have been proposed to detect defects, which can be divided into two categories: software defect prediction and automatic unit test generation. While these approaches have made great progress in software defect detection, they still have several limitations in practical application, including the low confidence of prediction models and the inefficiency of unit testing models. To address these limitations, we propose a WYSIWYG (i.e., What You See Is What You Get) approach: Attention-based Self-guided Automatic Unit Test GenERation (AUGER), which contains two stages: defect detection and error triggering. In the former stage, AUGER first detects the proneness of defects. Then, in the latter stage, it guides to generate unit tests for triggering such an error with the help of critical information obtained by the former stage. To evaluate the effectiveness of AUGER, we conduct a large-scale experiment by comparing with the state-of-the-art (SOTA) approaches on the widely used datasets (i.e., Bears, Bugs.jar, and Defects4J). AUGER makes great improvements by 4.7% to 35.3% and 17.7% to 40.4% in terms of F1-score and Precision in defect detection, and can trigger 23 to 84 more errors than SOTAs in unit test generation. Besides, we also conduct a further study to verify the generalization in practical usage by collecting a new dataset from real-world projects., Comment: Accepted By ICSE'25
- Published
- 2024
69. ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting
- Author
-
Jia, Chengyou, Xia, Changliang, Dang, Zhuohang, Wu, Weijia, Qian, Hangwei, and Luo, Minnan
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Despite the significant advancements in text-to-image (T2I) generative models, users often face a trial-and-error challenge in practical scenarios. This challenge arises from the complexity and uncertainty of tedious steps such as crafting suitable prompts, selecting appropriate models, and configuring specific arguments, making users resort to labor-intensive attempts for desired images. This paper proposes Automatic T2I generation, which aims to automate these tedious steps, allowing users to simply describe their needs in a freestyle chatting way. To systematically study this problem, we first introduce ChatGenBench, a novel benchmark designed for Automatic T2I. It features high-quality paired data with diverse freestyle inputs, enabling comprehensive evaluation of automatic T2I models across all steps. Additionally, recognizing Automatic T2I as a complex multi-step reasoning task, we propose ChatGen-Evo, a multi-stage evolution strategy that progressively equips models with essential automation skills. Through extensive evaluation across step-wise accuracy and image quality, ChatGen-Evo significantly enhances performance over various baselines. Our evaluation also uncovers valuable insights for advancing automatic T2I. All our data, code, and models will be available in \url{https://chengyou-jia.github.io/ChatGen-Home}
- Published
- 2024
70. MiceBoneChallenge: Micro-CT public dataset and six solutions for automatic growth plate detection in micro-CT mice bone scans
- Author
-
Burlutskiy, Nikolay, Kekic, Marija, de la Torre, Jordi, Plewa, Philipp, Boroumand, Mehdi, Jurkowska, Julia, Venovski, Borjan, Biagi, Maria Chiara, Hagos, Yeman Brhane, Malinowska-Traczyk, Roksana, Wang, Yibo, Zalewski, Jacek, Sawczuk, Paula, Pintarić, Karlo, Yousefi, Fariba, and Hultin, Leif
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Statistics - Machine Learning - Abstract
Detecting and quantifying bone changes in micro-CT scans of rodents is a common task in preclinical drug development studies. However, this task is manual, time-consuming and subject to inter- and intra-observer variability. In 2024, Anonymous Company organized an internal challenge to develop models for automatic bone quantification. We prepared and annotated a high-quality dataset of 3D $\mu$CT bone scans from $83$ mice. The challenge attracted over $80$ AI scientists from around the globe who formed $23$ teams. The participants were tasked with developing a solution to identify the plane where the bone growth happens, which is essential for fully automatic segmentation of trabecular bone. As a result, six computer vision solutions were developed that can accurately identify the location of the growth plate plane. The solutions achieved the mean absolute error of $1.91\pm0.87$ planes from the ground truth on the test set, an accuracy level acceptable for practical use by a radiologist. The annotated 3D scans dataset along with the six solutions and source code, is being made public, providing researchers with opportunities to develop and benchmark their own approaches. The code, trained models, and the data will be shared., Comment: Under Review
- Published
- 2024
71. Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation
- Author
-
Ramprasad, Sanjana and Wallace, Byron C.
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Modern LLMs can now produce highly readable abstractive summaries, to the point where traditional automated metrics for evaluating summary quality, such as ROUGE, have become saturated. However, LLMs still sometimes introduce unwanted content into summaries, i.e., information inconsistent with or unsupported by their source. Measuring the occurrence of these often subtle ``hallucinations'' automatically has proved to be challenging. This in turn has motivated development of a variety of metrics intended to measure the factual consistency of generated summaries against their source. But are these approaches measuring what they purport to do? In this work, we stress-test automatic factuality metrics. Specifically, we investigate whether and to what degree superficial attributes of summary texts suffice to predict ``factuality'', finding that a (supervised) model using only such shallow features is reasonably competitive with SOTA factuality scoring methods. We then evaluate how factuality metrics respond to factual corrections in inconsistent summaries and find that only a few show meaningful improvements. In contrast, some metrics are more sensitive to benign, non-factual edits. Motivated by these insights, we show that one can ``game'' (most) automatic factuality metrics, i.e., reliably inflate ``factuality'' scores by appending innocuous sentences to generated summaries. Taken together, our results raise questions about the degree to which we should rely on existing automated factuality metrics and what exactly we want ``factuality metrics'' to measure.
- Published
- 2024
72. CellPilot: A unified approach to automatic and interactive segmentation in histopathology
- Author
-
Endres, Philipp, Koch, Valentin, Schnabel, Julia A., and Marr, Carsten
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Histopathology, the microscopic study of diseased tissue, is increasingly digitized, enabling improved visualization and streamlined workflows. An important task in histopathology is the segmentation of cells and glands, essential for determining shape and frequencies that can serve as indicators of disease. Deep learning tools are widely used in histopathology. However, variability in tissue appearance and cell morphology presents challenges for achieving reliable segmentation, often requiring manual correction to improve accuracy. This work introduces CellPilot, a framework that bridges the gap between automatic and interactive segmentation by providing initial automatic segmentation as well as guided interactive refinement. Our model was trained on over 675,000 masks of nine diverse cell and gland segmentation datasets, spanning 16 organs. CellPilot demonstrates superior performance compared to other interactive tools on three held-out histopathological datasets while enabling automatic segmentation. We make the model and a graphical user interface designed to assist practitioners in creating large-scale annotated datasets available as open-source, fostering the development of more robust and generalized diagnostic models.
- Published
- 2024
73. Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark
- Author
-
Tu, Rong-Cheng, Ma, Zi-Ao, Lan, Tian, Zhao, Yuehao, Huang, Heyan, and Mao, Xian-Ling
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Driven by the remarkable progress in diffusion models, text-to-image generation has made significant strides, creating a pressing demand for automatic quality evaluation of generated images. Current state-of-the-art automatic evaluation methods heavily rely on Multi-modal Large Language Models (MLLMs), particularly powerful commercial models like GPT-4o. While these models are highly effective, their substantial costs limit scalability in large-scale evaluations. Adopting open-source MLLMs is an alternative; however, their performance falls short due to significant limitations in processing multi-modal data compared to commercial MLLMs. To tackle these problems, we first propose a task decomposition evaluation framework based on GPT-4o to automatically construct a new training dataset, where the complex evaluation task is decoupled into simpler sub-tasks, effectively reducing the learning complexity. Based on this dataset, we design innovative training strategies to effectively distill GPT-4o's evaluation capabilities into a 7B open-source MLLM, MiniCPM-V-2.6. Furthermore, to reliably and comprehensively assess prior works and our proposed model, we manually annotate a meta-evaluation benchmark that includes chain-of-thought explanations alongside quality scores for generated images. Experimental results demonstrate that our distilled open-source MLLM significantly outperforms the current state-of-the-art GPT-4o-base baseline, VIEScore, with over 4.6\% improvement in Spearman and Kendall correlations with human judgments.
- Published
- 2024
74. Deep Learning-Based Automatic Delineation of Liver Domes in kV Triggered Images for Online Breath-hold Reproducibility Verification of Liver Stereotactic Body Radiation Therapy
- Author
-
Weragoda, Sugandima, Xia, Ping, Stephans, Kevin, Woody, Neil, Martens, Michael, Brown, Robert, and Guo, Bingqi
- Subjects
Physics - Medical Physics ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Stereotactic Body Radiation Therapy (SBRT) can be a precise, minimally invasive treatment method for liver cancer and liver metastases. However, the effectiveness of SBRT relies on the accurate delivery of the dose to the tumor while sparing healthy tissue. Challenges persist in ensuring breath-hold reproducibility, with current methods often requiring manual verification of liver dome positions from kV-triggered images. To address this, we propose a proof-of-principle study of a deep learning-based pipeline to automatically delineate the liver dome from kV-planar images. From 24 patients who received SBRT for liver cancer or metastasis inside liver, 711 KV-triggered images acquired for online breath-hold verification were included in the current study. We developed a pipeline comprising a trained U-Net for automatic liver dome region segmentation from the triggered images followed by extraction of the liver dome via thresholding, edge detection, and morphological operations. The performance and generalizability of the pipeline was evaluated using 2-fold cross validation. The training of the U-Net model for liver region segmentation took under 30 minutes and the automatic delineation of a liver dome for any triggered image took less than one second. The RMSE and rate of detection for Fold1 with 366 images was (6.4 +/- 1.6) mm and 91.7%, respectively. For Fold2 with 345 images, the RMSE and rate of detection was (7.7 +/- 2.3) mm and 76.3% respectively.
- Published
- 2024
75. Automatic brain tumor segmentation in 2D intra-operative ultrasound images using MRI tumor annotations
- Author
-
Faanes, Mathilde, Helland, Ragnhild Holden, Solheim, Ole, and Reinertsen, Ingerid
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,I.4.6 ,J.3 - Abstract
Automatic segmentation of brain tumors in intra-operative ultrasound (iUS) images could facilitate localization of tumor tissue during resection surgery. The lack of large annotated datasets limits the current models performances. In this paper, we investigate the use of tumor annotations in pre-operative MRI images, which are more easily accessible than annotations in iUS images, for training of deep learning models for iUS brain tumor segmentation. We used 180 annotated pre-operative MRI images with corresponding unannotated iUS images, and 29 annotated iUS images. Image registration was performed to transfer the MRI annotations to the corresponding iUS images before training models with the nnU-Net framework. To validate the use of MRI labels, the models were compared to a model trained with only US annotated tumors, and a model with both US and MRI annotated tumors. In addition, the results were compared to annotations validated by an expert neurosurgeon on the same test set to measure inter-observer variability. The results showed similar performance for a model trained with only MRI annotated tumors, compared to a model trained with only US annotated tumors. The model trained using both modalities obtained slightly better results with an average Dice score of 0.62, where external expert annotations achieved a score of 0.67. The results also showed that the deep learning models were comparable to expert annotation for larger tumors (> 200 mm2), but perform clearly worse for smaller tumors (< 200 mm2). This shows that MRI tumor annotations can be used as a substitute for US tumor annotations to train a deep learning model for automatic brain tumor segmentation in intra-operative ultrasound images. Small tumors is a limitation for the current models and will be the focus of future work. The main models are available here: https://github.com/mathildefaanes/us_brain_tumor_segmentation., Comment: 19, 8 figures, submitted to International Journal of Computer Assisted Radiology and Surgery
- Published
- 2024
76. ComfyGI: Automatic Improvement of Image Generation Workflows
- Author
-
Sobania, Dominik, Briesch, Martin, and Rothlauf, Franz
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Computer Science - Neural and Evolutionary Computing - Abstract
Automatic image generation is no longer just of interest to researchers, but also to practitioners. However, current models are sensitive to the settings used and automatic optimization methods often require human involvement. To bridge this gap, we introduce ComfyGI, a novel approach to automatically improve workflows for image generation without the need for human intervention driven by techniques from genetic improvement. This enables image generation with significantly higher quality in terms of the alignment with the given description and the perceived aesthetics. On the performance side, we find that overall, the images generated with an optimized workflow are about 50% better compared to the initial workflow in terms of the median ImageReward score. These already good results are even surpassed in our human evaluation, as the participants preferred the images improved by ComfyGI in around 90% of the cases.
- Published
- 2024
77. Cyborg Insect Factory: Automatic Assembly System to Build up Insect-computer Hybrid Robot Based on Vision-guided Robotic Arm Manipulation of Custom Bipolar Electrodes
- Author
-
Lin, Qifeng, Vuong, Nghia, Song, Kewei, Tran-Ngoc, Phuoc Thanh, Nonato, Greg Angelo Gonzales, and Sato, Hirotaka
- Subjects
Computer Science - Robotics - Abstract
The advancement of insect-computer hybrid robots holds significant promise for navigating complex terrains and enhancing robotics applications. This study introduced an automatic assembly method for insect-computer hybrid robots, which was accomplished by mounting backpack with precise implantation of custom-designed bipolar electrodes. We developed a stimulation protocol for the intersegmental membrane between pronotum and mesothorax of the Madagascar hissing cockroach, allowing for bipolar electrodes' automatic implantation using a robotic arm. The assembly process was integrated with a deep learning-based vision system to accurately identify the implantation site, and a dedicated structure to fix the insect (68 s for the whole assembly process). The automatically assembled hybrid robots demonstrated steering control (over 70 degrees for 0.4 s stimulation) and deceleration control (68.2% speed reduction for 0.4 s stimulation), matching the performance of manually assembled systems. Furthermore, a multi-agent system consisting of 4 hybrid robots successfully covered obstructed outdoor terrain (80.25% for 10 minutes 31 seconds), highlighting the feasibility of mass-producing these systems for practical applications. The proposed automatic assembly strategy reduced preparation time for the insect-computer hybrid robots while maintaining their precise control, laying a foundation for scalable production and deployment in real-world applications.
- Published
- 2024
78. Classification of Stable Surfaces with respect to Automatic Continuity
- Author
-
Bestvina, Mladen, Domat, George, and Rafi, Kasra
- Subjects
Mathematics - Geometric Topology ,Mathematics - Group Theory ,57S05, 57K20, 20F65, 22A05 - Abstract
We provide a complete classification of when the homeomorphism group of a stable surface, $\Sigma$, has the automatic continuity property: Any homomorphism from Homeo$(\Sigma)$ to a separable group is necessarily continuous. This result descends to a classification of when the mapping class group of $\Sigma$ has the automatic continuity property. Towards this classification, we provide a general framework for proving automatic continuity for groups of homeomorphisms. Applying this framework, we also show that the homeomorphism group of any stable second countable Stone space has the automatic continuity property. Under the presence of stability this answers two questions of Mann., Comment: 37 pages, 5 figures
- Published
- 2024
79. Improving the solver for the Balitsky-Kovchegov evolution equation with Automatic Differentiation
- Author
-
Cougoulic, Florian, Korcyl, Piotr, and Stebel, Tomasz
- Subjects
High Energy Physics - Phenomenology - Abstract
The Balitsky-Kovchegov (BK) evolution equation is an equation derived from perturbative Quantum Chromodynamics that allows one to calculate the scattering amplitude of a pair of quark and antiquark off a hadron target, called the dipole amplitude, as a function of the collision energy. The initial condition, being a non-perturbative object, usually has to be modeled separately. Typically, the model contains several tunable parameters that are determined by fitting to experimental data. In this contribution, we propose an implementation of the BK solver using differentiable programming. Automatic differentiation offers the possibility that the first and second derivatives of the amplitude with respect to the initial condition parameters are automatically calculated at all stages of the simulation. This fact should considerably facilitate and speed up the fitting step. Moreover, in the context of Transverse Momentum Dis- tributions (TMD), we demonstrate that automatic differentiation can be used to obtain the first and second derivatives of the amplitude with respect to the quark-antiquark separation. These derivatives can be used to relate various TMD functions to the dipole amplitude. Our C++ code for the solver, which is available in a public repository [1], includes the Balitsky one-loop running coupling prescription and the kinematic constraint. This version of the BK equation is widely used in the small-x evolution framework., Comment: 17 pages, 6 figures, source code is published in the repository
- Published
- 2024
80. SG-LRA: Self-Generating Automatic Scoliosis Cobb Angle Measurement with Low-Rank Approximation
- Author
-
Shao, Zhiwen, Yuan, Yichen, Ma, Lizhuang, Yeung, Dit-Yan, and Zhu, Xiaojia
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Automatic Cobb angle measurement from X-ray images is crucial for scoliosis screening and diagnosis. However, most existing regression-based methods and segmentation-based methods struggle with inaccurate spine representations or mask connectivity/fragmentation issues. Besides, landmark-based methods suffer from insufficient training data and annotations. To address these challenges, we propose a novel framework including Self-Generation pipeline and Low-Rank Approximation representation (SG-LRA) for automatic Cobb angle measurement. Specifically, we propose a parameterized spine contour representation based on LRA, which enables eigen-spine decomposition and spine contour reconstruction. We can directly obtain spine contour with only regressed LRA coefficients, which form a more accurate spine representation than rectangular boxes. Also, we combine LRA coefficient regression with anchor box classification to solve inaccurate predictions and mask connectivity issues. Moreover, we develop a data engine with automatic annotation and automatic selection in an iterative manner, which is trained on a private Spinal2023 dataset. With our data engine, we generate the largest scoliosis X-ray dataset named Spinal-AI2024 largely without privacy leaks. Extensive experiments on public AASCE2019, private Spinal2023, and generated Spinal-AI2024 datasets demonstrate that our method achieves state-of-the-art Cobb angle measurement performance. Our code and Spinal-AI2024 dataset are available at https://github.com/Ernestchenchen/SG-LRA and https://github.com/Ernestchenchen/Spinal-AI2024, respectively.
- Published
- 2024
81. Qurts: Automatic Quantum Uncomputation by Affine Types with Lifetime
- Author
-
Hirata, Kengo and Heunen, Chris
- Subjects
Computer Science - Programming Languages ,Quantum Physics ,D.3.1 ,F.3.1 ,F.3.2 - Abstract
Uncomputation is a feature in quantum programming that allows the programmer to discard a value without losing quantum information, and that allows the compiler to reuse resources. Whereas quantum information has to be treated linearly by the type system, automatic uncomputation enables the programmer to treat it affinely to some extent. Automatic uncomputation requires a substructural type system between linear and affine, a subtlety that has only been captured by existing languages in an ad hoc way. We extend the Rust type system to the quantum setting to give a uniform framework for automatic uncomputation called Qurts (pronounced quartz). Specifically, we parameterise types by lifetimes, permitting them to be affine during their lifetime, while being restricted to linear use outside their lifetime. We also provide two operational semantics: one based on classical simulation, and one that does not depend on any specific uncomputation strategy., Comment: 59 pages
- Published
- 2024
82. HIST-AID: Leveraging Historical Patient Reports for Enhanced Multi-Modal Automatic Diagnosis
- Author
-
Huang, Haoxu, Deniz, Cem M., Cho, Kyunghyun, Chopra, Sumit, and Madaan, Divyam
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Chest X-ray imaging is a widely accessible and non-invasive diagnostic tool for detecting thoracic abnormalities. While numerous AI models assist radiologists in interpreting these images, most overlook patients' historical data. To bridge this gap, we introduce Temporal MIMIC dataset, which integrates five years of patient history, including radiographic scans and reports from MIMIC-CXR and MIMIC-IV, encompassing 12,221 patients and thirteen pathologies. Building on this, we present HIST-AID, a framework that enhances automatic diagnostic accuracy using historical reports. HIST-AID emulates the radiologist's comprehensive approach, leveraging historical data to improve diagnostic accuracy. Our experiments demonstrate significant improvements, with AUROC increasing by 6.56% and AUPRC by 9.51% compared to models that rely solely on radiographic scans. These gains were consistently observed across diverse demographic groups, including variations in gender, age, and racial categories. We show that while recent data boost performance, older data may reduce accuracy due to changes in patient conditions. Our work paves the potential of incorporating historical data for more reliable automatic diagnosis, providing critical support for clinical decision-making., Comment: In Proceedings of Machine Learning for Health
- Published
- 2024
83. Interactive Cycle Model -- The Linkage Combination among Automatic Speech Recognition, Large Language Models and Smart Glasses
- Author
-
Wang, Libo
- Subjects
Computer Science - Human-Computer Interaction - Abstract
This research proposes the interaction loop model "ASR-LLM-Smart Glasses", which model combines automatic speech recognition, large language model and smart glasses to facilitate seamless human-computer interaction. And the methodology of this research involves decomposing the interaction process into different stages and elements. Speech is captured and processed by ASR, then analyzed and interpreted by LLM. The results are then transmitted to smart glasses for display. The feedback loop is complete when the user interacts with the displayed data. Mathematical formulas are used to quantify the performance of the model that revolves around core evaluation points: accuracy, coherence, and latency during ASR speech-to-text conversion. The research results are provided theoretically to test and evaluate the feasibility and performance of the model. Although such human-computer interaction products have not yet appeared in the industry, the performance indicators of this model in enhancing user experience in fields that rely on human-computer interaction have also verified its utility as a technology to promote human-computer interaction. In addition, this research pioneered the idea of integrating cutting-edge technologies such as generative pre-trained Transformer models into unique interaction models, LLM provides raw value through powerful evaluation techniques and innovative use, which provides a new perspective to evaluate and enhanced human-computer interaction. Keywords: Automatic speech recognition, Large Language Model, Smart glasses, Interaction mechanism, Comment: OpenReview submitted. 11 pages of text and 1 figure
- Published
- 2024
84. CorrectBench: Automatic Testbench Generation with Functional Self-Correction using LLMs for HDL Design
- Author
-
Qiu, Ruidi, Zhang, Grace Li, Drechsler, Rolf, Schlichtmann, Ulf, and Li, Bing
- Subjects
Computer Science - Software Engineering - Abstract
Functional simulation is an essential step in digital hardware design. Recently, there has been a growing interest in leveraging Large Language Models (LLMs) for hardware testbench generation tasks. However, the inherent instability associated with LLMs often leads to functional errors in the generated testbenches. Previous methods do not incorporate automatic functional correction mechanisms without human intervention and still suffer from low success rates, especially for sequential tasks. To address this issue, we propose CorrectBench, an automatic testbench generation framework with functional self-validation and self-correction. Utilizing only the RTL specification in natural language, the proposed approach can validate the correctness of the generated testbenches with a success rate of 88.85%. Furthermore, the proposed LLM-based corrector employs bug information obtained during the self-validation process to perform functional self-correction on the generated testbenches. The comparative analysis demonstrates that our method achieves a pass ratio of 70.13% across all evaluated tasks, compared with the previous LLM-based testbench generation framework's 52.18% and a direct LLM-based generation method's 33.33%. Specifically in sequential circuits, our work's performance is 62.18% higher than previous work in sequential tasks and almost 5 times the pass ratio of the direct method. The codes and experimental results are open-sourced at the link: https://github.com/AutoBench/CorrectBench
- Published
- 2024
85. CRepair: CVAE-based Automatic Vulnerability Repair Technology
- Author
-
Liu, Penghui, Bi, Yingzhou, Huang, Jiangtao, Jiang, Xinxin, and Wang, Lianmei
- Subjects
Computer Science - Software Engineering ,Computer Science - Artificial Intelligence - Abstract
Software vulnerabilities are flaws in computer software systems that pose significant threats to the integrity, security, and reliability of modern software and its application data. These vulnerabilities can lead to substantial economic losses across various industries. Manual vulnerability repair is not only time-consuming but also prone to errors. To address the challenges of vulnerability repair, researchers have proposed various solutions, with learning-based automatic vulnerability repair techniques gaining widespread attention. However, existing methods often focus on learning more vulnerability data to improve repair outcomes, while neglecting the diverse characteristics of vulnerable code, and suffer from imprecise vulnerability localization.To address these shortcomings, this paper proposes CRepair, a CVAE-based automatic vulnerability repair technology aimed at fixing security vulnerabilities in system code. We first preprocess the vulnerability data using a prompt-based method to serve as input to the model. Then, we apply causal inference techniques to map the vulnerability feature data to probability distributions. By employing multi-sample feature fusion, we capture diverse vulnerability feature information. Finally, conditional control is used to guide the model in repairing the vulnerabilities.Experimental results demonstrate that the proposed method significantly outperforms other benchmark models, achieving a perfect repair rate of 52%. The effectiveness of the approach is validated from multiple perspectives, advancing AI-driven code vulnerability repair and showing promising applications.
- Published
- 2024
86. Automatic Authoring of Physical and Perceptual/Affective Motion Effects for Virtual Reality
- Author
-
Lee, Jiwan and Choi, Seungmoon
- Subjects
Computer Science - Human-Computer Interaction - Abstract
This demo is about automatic authoring of various motion effects that are provided with audiovisual content to improve user experiences. Traditionally, motion effects have been used for simulators, e.g., flight simulators for pilots and astronauts, to present physically accurate vestibular feedback. At present, we have greatly wider use of motion effects for entertainment purposes, such as 4D rides in amusement parks and even shopping malls, 4D films in theaters, and relative new virtual reality games with head-mounted displays and personal motion platforms. However, the production of motion effects is done solely by manual authoring or coding, and this costly process prevents the faster and wider dissemination of 4D content. It is imperative to facilitate motion effect production by providing automatic synthesis algorithms. This demo video presents nine different automatic synthesis algorithms for motion effects and a recorded demonstration of each., Comment: Part of proceedings of 6th International Conference AsiaHaptics 2024
- Published
- 2024
87. A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model
- Author
-
Hu, Panwen, Xiao, Nan, Li, Feifei, Chen, Yongquan, and Huang, Rui
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this era of videos, automatic video editing techniques attract more and more attention from industry and academia since they can reduce workloads and lower the requirements for human editors. Existing automatic editing systems are mainly scene- or event-specific, e.g., soccer game broadcasting, yet the automatic systems for general editing, e.g., movie or vlog editing which covers various scenes and events, were rarely studied before, and converting the event-driven editing method to a general scene is nontrivial. In this paper, we propose a two-stage scheme for general editing. Firstly, unlike previous works that extract scene-specific features, we leverage the pre-trained Vision-Language Model (VLM) to extract the editing-relevant representations as editing context. Moreover, to close the gap between the professional-looking videos and the automatic productions generated with simple guidelines, we propose a Reinforcement Learning (RL)-based editing framework to formulate the editing problem and train the virtual editor to make better sequential editing decisions. Finally, we evaluate the proposed method on a more general editing task with a real movie dataset. Experimental results demonstrate the effectiveness and benefits of the proposed context representation and the learning ability of our RL-based editing framework.
- Published
- 2024
88. A multi-purpose automatic editing system based on lecture semantics for remote education
- Author
-
Hu, Panwen and Huang, Rui
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Multimedia - Abstract
Remote teaching has become popular recently due to its convenience and safety, especially under extreme circumstances like a pandemic. However, online students usually have a poor experience since the information acquired from the views provided by the broadcast platforms is limited. One potential solution is to show more camera views simultaneously, but it is technically challenging and distracting for the viewers. Therefore, an automatic multi-camera directing/editing system, which aims at selecting the most concerned view at each time instance to guide the attention of online students, is in urgent demand. However, existing systems mostly make simple assumptions and focus on tracking the position of the speaker instead of the real lecture semantics, and therefore have limited capacities to deliver optimal information flow. To this end, this paper proposes an automatic multi-purpose editing system based on the lecture semantics, which can both direct the multiple video streams for real-time broadcasting and edit the optimal video offline for review purposes. Our system directs the views by semantically analyzing the class events while following the professional directing rules, mimicking a human director to capture the regions of interest from the viewpoint of the onsite students. We conduct both qualitative and quantitative analyses to verify the effectiveness of the proposed system and its components.
- Published
- 2024
89. Automatic Structured Pruning for Efficient Architecture in Federated Learning
- Author
-
Nguyen, Thai Vu, Le, Long Bao, and Avila, Anderson
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In Federated Learning (FL), training is conducted on client devices, typically with limited computational resources and storage capacity. To address these constraints, we propose an automatic pruning scheme tailored for FL systems. Our solution improves computation efficiency on client devices, while minimizing communication costs. One of the challenges of tuning pruning hyper-parameters in FL systems is the restricted access to local data. Thus, we introduce an automatic pruning paradigm that dynamically determines pruning boundaries. Additionally, we utilized a structured pruning algorithm optimized for mobile devices that lack hardware support for sparse computations. Experimental results demonstrate the effectiveness of our approach, achieving accuracy comparable to existing methods. Our method notably reduces the number of parameters by 89% and FLOPS by 90%, with minimal impact on the accuracy of the FEMNIST and CelebFaces datasets. Furthermore, our pruning method decreases communication overhead by up to 5x and halves inference time when deployed on Android devices.
- Published
- 2024
90. Automatic programming via large language models with population self-evolution for dynamic job shop scheduling problem
- Author
-
Huang, Jin, Li, Xinyu, Gao, Liang, Liu, Qihao, and Teng, Yue
- Subjects
Computer Science - Neural and Evolutionary Computing - Abstract
Heuristic dispatching rules (HDRs) are widely regarded as effective methods for solving dynamic job shop scheduling problems (DJSSP) in real-world production environments. However, their performance is highly scenario-dependent, often requiring expert customization. To address this, genetic programming (GP) and gene expression programming (GEP) have been extensively used for automatic algorithm design. Nevertheless, these approaches often face challenges due to high randomness in the search process and limited generalization ability, hindering the application of trained dispatching rules to new scenarios or dynamic environments. Recently, the integration of large language models (LLMs) with evolutionary algorithms has opened new avenues for prompt engineering and automatic algorithm design. To enhance the capabilities of LLMs in automatic HDRs design, this paper proposes a novel population self-evolutionary (SeEvo) method, a general search framework inspired by the self-reflective design strategies of human experts. The SeEvo method accelerates the search process and enhances exploration capabilities. Experimental results show that the proposed SeEvo method outperforms GP, GEP, end-to-end deep reinforcement learning methods, and more than 10 common HDRs from the literature, particularly in unseen and dynamic scenarios.
- Published
- 2024
91. A Survey on Automatic Credibility Assessment of Textual Credibility Signals in the Era of Large Language Models
- Author
-
Srba, Ivan, Razuvayevskaya, Olesya, Leite, João A., Moro, Robert, Schlicht, Ipek Baris, Tonelli, Sara, García, Francisco Moreno, Lottmann, Santiago Barrio, Teyssou, Denis, Porcellini, Valentin, Scarton, Carolina, Bontcheva, Kalina, and Bielikova, Maria
- Subjects
Computer Science - Computation and Language - Abstract
In the current era of social media and generative AI, an ability to automatically assess the credibility of online social media content is of tremendous importance. Credibility assessment is fundamentally based on aggregating credibility signals, which refer to small units of information, such as content factuality, bias, or a presence of persuasion techniques, into an overall credibility score. Credibility signals provide a more granular, more easily explainable and widely utilizable information in contrast to currently predominant fake news detection, which utilizes various (mostly latent) features. A growing body of research on automatic credibility assessment and detection of credibility signals can be characterized as highly fragmented and lacking mutual interconnections. This issue is even more prominent due to a lack of an up-to-date overview of research works on automatic credibility assessment. In this survey, we provide such systematic and comprehensive literature review of 175 research papers while focusing on textual credibility signals and Natural Language Processing (NLP), which undergoes a significant advancement due to Large Language Models (LLMs). While positioning the NLP research into the context of other multidisciplinary research works, we tackle with approaches for credibility assessment as well as with 9 categories of credibility signals (we provide a thorough analysis for 3 of them, namely: 1) factuality, subjectivity and bias, 2) persuasion techniques and logical fallacies, and 3) claims and veracity). Following the description of the existing methods, datasets and tools, we identify future challenges and opportunities, while paying a specific attention to recent rapid development of generative AI.
- Published
- 2024
92. Towards Fully Automatic Distributed Lower Bounds
- Author
-
Balliu, Alkida, Brandt, Sebastian, Kuhn, Fabian, Olivetti, Dennis, and Saarhelo, Joonatan
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
In the past few years, a successful line of research has lead to lower bounds for several fundamental local graph problems in the distributed setting. These results were obtained via a technique called round elimination. On a high level, the round elimination technique can be seen as a recursive application of a function that takes as input a problem $\Pi$ and outputs a problem $\Pi'$ that is one round easier than $\Pi$. Applying this function recursively to concrete problems of interest can be highly nontrivial, which is one of the reasons that has made the technique difficult to approach. The contribution of our paper is threefold. Firstly, we develop a new and fully automatic method for finding lower bounds of $\Omega(\log_\Delta n)$ and $\Omega(\log_\Delta \log n)$ rounds for deterministic and randomized algorithms, respectively, via round elimination. Secondly, we show that this automatic method is indeed useful, by obtaining lower bounds for defective coloring problems. We show that the problem of coloring the nodes of a graph with $3$ colors and defect at most $(\Delta - 3)/2$ requires $\Omega(\log_\Delta n)$ rounds for deterministic algorithms and $\Omega(\log_\Delta \log n)$ rounds for randomized ones. We note that lower bounds for coloring problems are notoriously challenging to obtain, both in general, and via the round elimination technique. Both the first and (indirectly) the second contribution build on our third contribution -- a new and conceptually simple way to compute the one-round easier problem $\Pi'$ in the round elimination framework. This new procedure provides a clear and easy recipe for applying round elimination, thereby making a substantial step towards the greater goal of having a fully automatic procedure for obtaining lower bounds in the distributed setting.
- Published
- 2024
93. Automatic Extraction and Compensation of P-Bit Device Variations in Large Array Utilizing Boltzmann Machine Training
- Author
-
Zhang, Bolin, Liu, Yu, Gao, Tianqi, Yin, Jialiang, Guan, Zhenyu, Zhang, Deming, and Zeng, Lang
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Physics - Applied Physics - Abstract
Probabilistic Bit (P-Bit) device serves as the core hardware for implementing Ising computation. However, the severe intrinsic variations of stochastic P-Bit devices hinder the large-scale expansion of the P-Bit array, significantly limiting the practical usage of Ising computation. In this work, a behavioral model which attributes P-Bit variations to two parameters {\alpha} and {\Delta}V is proposed. Then the weight compensation method is introduced, which can mitigate {\alpha} and {\Delta}V of P-Bits device variations by rederiving the weight matrix, enabling them to compute as ideal identical PBits without the need for weights retraining. Accurately extracting the {\alpha} and {\Delta}V simultaneously from a large P-Bit array which is prerequisite for the weight compensation method is a crucial and challenging task. To solve this obstacle, we present the novel automatic variation extraction algorithm which can extract device variations of each P-Bit in a large array based on Boltzmann machine learning. In order for the accurate extraction of variations from an extendable P-Bit array, an Ising Hamiltonian based on 3D ferromagnetic model is constructed, achieving precise and scalable array variation extraction. The proposed Automatic Extraction and Compensation algorithm is utilized to solve both 16-city traveling salesman problem(TSP) and 21-bit integer factorization on a large P-Bit array with variation, demonstrating its accuracy, transferability, and scalability., Comment: 15 pages, 17 figures
- Published
- 2024
94. ScreenWriter: Automatic Screenplay Generation and Movie Summarisation
- Author
-
Mahon, Louis and Lapata, Mirella
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Multimedia - Abstract
The proliferation of creative video content has driven demand for textual descriptions or summaries that allow users to recall key plot points or get an overview without watching. The volume of movie content and speed of turnover motivates automatic summarisation, which is nevertheless challenging, requiring identifying character intentions and very long-range temporal dependencies. The few existing methods attempting this task rely heavily on textual screenplays as input, greatly limiting their applicability. In this work, we propose the task of automatic screenplay generation, and a method, ScreenWriter, that operates only on video and produces output which includes dialogue, speaker names, scene breaks, and visual descriptions. ScreenWriter introduces a novel algorithm to segment the video into scenes based on the sequence of visual vectors, and a novel method for the challenging problem of determining character names, based on a database of actors' faces. We further demonstrate how these automatic screenplays can be used to generate plot synopses with a hierarchical summarisation method based on scene breaks. We test the quality of the final summaries on the recent MovieSum dataset, which we augment with videos, and show that they are superior to a number of comparison models which assume access to goldstandard screenplays.
- Published
- 2024
95. Automatic Navigation and Voice Cloning Technology Deployment on a Humanoid Robot
- Author
-
Han, Dongkun and Shao, Boyuan
- Subjects
Computer Science - Robotics - Abstract
Mobile robots have shown immense potential and are expected to be widely used in the service industry. The importance of automatic navigation and voice cloning cannot be overstated as they enable functional robots to provide high-quality services. The objective of this work is to develop a control algorithm for the automatic navigation of a humanoid mobile robot called Cruzr, which is a service robot manufactured by Ubtech. Initially, a virtual environment is constructed in the simulation software Gazebo using Simultaneous Localization And Mapping (SLAM), and global path planning is carried out by means of local path tracking. The two-wheel differential chassis kinematics model is employed to ensure autonomous dynamic obstacle avoidance for the robot chassis. Furthermore, the mapping and trajectory generation algorithms developed in the simulation environment are successfully implemented on the real robot Cruzr. The performance of automatic navigation is compared between the Dynamic Window Approach (DWA) and Model Predictive Control (MPC) algorithms. Additionally, a mobile application for voice cloning is created based on a Hidden Markov Model, and the proposed Chatbot is also tested and deployed on Cruzr., Comment: 7 pages, 6 figures
- Published
- 2024
96. FSOS-AMC: Few-Shot Open-Set Learning for Automatic Modulation Classification
- Author
-
Zhang, Hao, Zhou, Fuhui, Wu, Qihui, and Yuen, Chau
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
Automatic modulation classification (AMC) is essential for the advancement and efficiency of future wireless communication networks. Deep learning (DL)-based AMC frameworks have garnered extensive attention for their impressive classification performance. However, existing DL-based AMC frameworks rely on two assumptions, large-scale training data and the same class pool between the training and testing data, which are not suitable for \emph{few-shot and open-set} scenarios. To address this issue, a novel few-shot open-set automatic modulation classification (FSOS-AMC) framework is proposed by exploiting a multi-scale attention network, meta-prototype training, and a modular open-set classifier. The multi-scale attention network is used to extract the features from the input signal, the meta-prototype training is adopted to train the feature extractor and the modular open-set classifier can be utilized to classify the testing data into one of the known modulations or potential unknown modulations. Extensive simulation results demonstrate that the proposed FSOS-AMC framework can achieve higher classification accuracy than the state-of-the-art methods for known modulations and unknown modulations in terms of accuracy and area under the receiver operating characteristic curve (AUROC). Moreover, the performance of the proposed FSOS-AMC framework under low signal-to-noise ratio (SNR) conditions is much better than the compared schemes., Comment: accepted by 16th International Conference on Wireless Communications and Signal Processing (WCSP 2024)
- Published
- 2024
97. EasyHeC++: Fully Automatic Hand-Eye Calibration with Pretrained Image Models
- Author
-
Hong, Zhengdong, Zheng, Kangfu, and Chen, Linghao
- Subjects
Computer Science - Robotics - Abstract
Hand-eye calibration plays a fundamental role in robotics by directly influencing the efficiency of critical operations such as manipulation and grasping. In this work, we present a novel framework, EasyHeC++, designed for fully automatic hand-eye calibration. In contrast to previous methods that necessitate manual calibration, specialized markers, or the training of arm-specific neural networks, our approach is the first system that enables accurate calibration of any robot arm in a marker-free, training-free, and fully automatic manner. Our approach employs a two-step process. First, we initialize the camera pose using a sampling or feature-matching-based method with the aid of pretrained image models. Subsequently, we perform pose optimization through differentiable rendering. Extensive experiments demonstrate the system's superior accuracy in both synthetic and real-world datasets across various robot arms and camera settings. Project page: https://ootts.github.io/easyhec_plus., Comment: Accepted by IROS 2024
- Published
- 2024
98. AMPO: Automatic Multi-Branched Prompt Optimization
- Author
-
Yang, Sheng, Wu, Yurong, Gao, Yan, Zhou, Zineng, Zhu, Bin Benjamin, Sun, Xiaodi, Lou, Jian-Guang, Ding, Zhiming, Hu, Anbang, Fang, Yuan, Li, Yunsong, Chen, Junyan, and Yang, Linjun
- Subjects
Computer Science - Computation and Language - Abstract
Prompt engineering is very important to enhance the performance of large language models (LLMs). When dealing with complex issues, prompt engineers tend to distill multiple patterns from examples and inject relevant solutions to optimize the prompts, achieving satisfying results. However, existing automatic prompt optimization techniques are only limited to producing single flow instructions, struggling with handling diverse patterns. In this paper, we present AMPO, an automatic prompt optimization method that can iteratively develop a multi-branched prompt using failure cases as feedback. Our goal is to explore a novel way of structuring prompts with multi-branches to better handle multiple patterns in complex tasks, for which we introduce three modules: Pattern Recognition, Branch Adjustment, and Branch Pruning. In experiments across five tasks, AMPO consistently achieves the best results. Additionally, our approach demonstrates significant optimization efficiency due to our adoption of a minimal search strategy., Comment: 13 pages, 7 figures, 6 tables
- Published
- 2024
99. Modeling User Preferences with Automatic Metrics: Creating a High-Quality Preference Dataset for Machine Translation
- Author
-
Agrawal, Sweta, de Souza, José G. C., Rei, Ricardo, Farinhas, António, Faria, Gonçalo, Fernandes, Patrick, Guerreiro, Nuno M, and Martins, Andre
- Subjects
Computer Science - Computation and Language - Abstract
Alignment with human preferences is an important step in developing accurate and safe large language models. This is no exception in machine translation (MT), where better handling of language nuances and context-specific variations leads to improved quality. However, preference data based on human feedback can be very expensive to obtain and curate at a large scale. Automatic metrics, on the other hand, can induce preferences, but they might not match human expectations perfectly. In this paper, we propose an approach that leverages the best of both worlds. We first collect sentence-level quality assessments from professional linguists on translations generated by multiple high-quality MT systems and evaluate the ability of current automatic metrics to recover these preferences. We then use this analysis to curate a new dataset, MT-Pref (metric induced translation preference) dataset, which comprises 18k instances covering 18 language directions, using texts sourced from multiple domains post-2022. We show that aligning TOWER models on MT-Pref significantly improves translation quality on WMT23 and FLORES benchmarks., Comment: Accepted at EMNLP Main 2024
- Published
- 2024
100. Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
- Author
-
Zheng, Xiaosen, Pang, Tianyu, Du, Chao, Liu, Qian, Jiang, Jing, and Lin, Min
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Cryptography and Security ,Computer Science - Machine Learning - Abstract
Automatic LLM benchmarks, such as AlpacaEval 2.0, Arena-Hard-Auto, and MT-Bench, have become popular for evaluating language models due to their cost-effectiveness and scalability compared to human evaluation. Achieving high win rates on these benchmarks can significantly boost the promotional impact of newly released language models. This promotional benefit may motivate tricks, such as manipulating model output length or style to game win rates, even though several mechanisms have been developed to control length and disentangle style to reduce gameability. Nonetheless, we show that even a "null model" that always outputs a constant response (irrelevant to input instructions) can cheat automatic benchmarks and achieve top-ranked win rates: an 86.5% LC win rate on AlpacaEval 2.0; an 83.0 score on Arena-Hard-Auto; and a 9.55 score on MT-Bench. Moreover, the crafted cheating outputs are transferable because we assume that the instructions of these benchmarks (e.g., 805 samples of AlpacaEval 2.0) are private and cannot be accessed. While our experiments are primarily proof-of-concept, an adversary could use LLMs to generate more imperceptible cheating responses, unethically benefiting from high win rates and promotional impact. Our findings call for the development of anti-cheating mechanisms for reliable automatic benchmarks. The code is available at https://github.com/sail-sg/Cheating-LLM-Benchmarks.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.