79,131 results on '"Automatic"'
Search Results
2. A Novel Automatic Real-time Motion Tracking Method for Magnetic Resonance Imaging-guided Radiotherapy: Leveraging the Enhanced Tracking-Learning-Detection Framework with Automatic Segmentation
- Author
-
Chen, Shengqi, Wang, Zilin, Dai, Jianrong, Qin, Shirui, Cao, Ying, Zhao, Ruiao, Chen, Jiayun, Wu, Guohua, and Tang, Yuan
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Physics - Medical Physics ,Quantitative Biology - Tissues and Organs - Abstract
Objective: Ensuring the precision in motion tracking for MRI-guided Radiotherapy (MRIgRT) is crucial for the delivery of effective treatments. This study refined the motion tracking accuracy in MRIgRT through the innovation of an automatic real-time tracking method, leveraging an enhanced Tracking-Learning-Detection (ETLD) framework coupled with automatic segmentation. Methods: We developed a novel MRIgRT motion tracking method by integrating two primary methods: the ETLD framework and an improved Chan-Vese model (ICV), named ETLD+ICV. The TLD framework was upgraded to suit real-time cine MRI, including advanced image preprocessing, no-reference image quality assessment, an enhanced median-flow tracker, and a refined detector with dynamic search region adjustments. Additionally, ICV was combined for precise coverage of the target volume, which refined the segmented region frame by frame using tracking results, with key parameters optimized. Tested on 3.5D MRI scans from 10 patients with liver metastases, our method ensures precise tracking and accurate segmentation vital for MRIgRT. Results: An evaluation of 106,000 frames across 77 treatment fractions revealed sub-millimeter tracking errors of less than 0.8mm, with over 99% precision and 98% recall for all subjects, underscoring the robustness and efficacy of the ETLD. Moreover, the ETLD+ICV yielded a dice global score of more than 82% for all subjects, demonstrating the proposed method's extensibility and precise target volume coverage. Conclusions: This study successfully developed an automatic real-time motion tracking method for MRIgRT that markedly surpasses current methods. The novel method not only delivers exceptional precision in tracking and segmentation but also demonstrates enhanced adaptability to clinical demands, positioning it as an indispensable asset in the quest to augment the efficacy of radiotherapy treatments.
- Published
- 2024
3. Automatic Screening for Children with Speech Disorder using Automatic Speech Recognition: Opportunities and Challenges
- Author
-
Liu, Dancheng, Yang, Jason, Albrecht-Buehler, Ishan, Qin, Helen, Li, Sophie, Hu, Yuting, Nassereldine, Amir, and Xiong, Jinjun
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Computation and Language ,Quantitative Biology - Quantitative Methods - Abstract
Speech is a fundamental aspect of human life, crucial not only for communication but also for cognitive, social, and academic development. Children with speech disorders (SD) face significant challenges that, if unaddressed, can result in lasting negative impacts. Traditionally, speech and language assessments (SLA) have been conducted by skilled speech-language pathologists (SLPs), but there is a growing need for efficient and scalable SLA methods powered by artificial intelligence. This position paper presents a survey of existing techniques suitable for automating SLA pipelines, with an emphasis on adapting automatic speech recognition (ASR) models for children's speech, an overview of current SLAs and their automated counterparts to demonstrate the feasibility of AI-enhanced SLA pipelines, and a discussion of practical considerations, including accessibility and privacy concerns, associated with the deployment of AI-powered SLAs., Comment: AAAI-FSS 24
- Published
- 2024
4. ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting
- Author
-
Jia, Chengyou, Xia, Changliang, Dang, Zhuohang, Wu, Weijia, Qian, Hangwei, and Luo, Minnan
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Despite the significant advancements in text-to-image (T2I) generative models, users often face a trial-and-error challenge in practical scenarios. This challenge arises from the complexity and uncertainty of tedious steps such as crafting suitable prompts, selecting appropriate models, and configuring specific arguments, making users resort to labor-intensive attempts for desired images. This paper proposes Automatic T2I generation, which aims to automate these tedious steps, allowing users to simply describe their needs in a freestyle chatting way. To systematically study this problem, we first introduce ChatGenBench, a novel benchmark designed for Automatic T2I. It features high-quality paired data with diverse freestyle inputs, enabling comprehensive evaluation of automatic T2I models across all steps. Additionally, recognizing Automatic T2I as a complex multi-step reasoning task, we propose ChatGen-Evo, a multi-stage evolution strategy that progressively equips models with essential automation skills. Through extensive evaluation across step-wise accuracy and image quality, ChatGen-Evo significantly enhances performance over various baselines. Our evaluation also uncovers valuable insights for advancing automatic T2I. All our data, code, and models will be available in \url{https://chengyou-jia.github.io/ChatGen-Home}
- Published
- 2024
5. MiceBoneChallenge: Micro-CT public dataset and six solutions for automatic growth plate detection in micro-CT mice bone scans
- Author
-
Burlutskiy, Nikolay, Kekic, Marija, de la Torre, Jordi, Plewa, Philipp, Boroumand, Mehdi, Jurkowska, Julia, Venovski, Borjan, Biagi, Maria Chiara, Hagos, Yeman Brhane, Malinowska-Traczyk, Roksana, Wang, Yibo, Zalewski, Jacek, Sawczuk, Paula, Pintarić, Karlo, Yousefi, Fariba, and Hultin, Leif
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Statistics - Machine Learning - Abstract
Detecting and quantifying bone changes in micro-CT scans of rodents is a common task in preclinical drug development studies. However, this task is manual, time-consuming and subject to inter- and intra-observer variability. In 2024, Anonymous Company organized an internal challenge to develop models for automatic bone quantification. We prepared and annotated a high-quality dataset of 3D $\mu$CT bone scans from $83$ mice. The challenge attracted over $80$ AI scientists from around the globe who formed $23$ teams. The participants were tasked with developing a solution to identify the plane where the bone growth happens, which is essential for fully automatic segmentation of trabecular bone. As a result, six computer vision solutions were developed that can accurately identify the location of the growth plate plane. The solutions achieved the mean absolute error of $1.91\pm0.87$ planes from the ground truth on the test set, an accuracy level acceptable for practical use by a radiologist. The annotated 3D scans dataset along with the six solutions and source code, is being made public, providing researchers with opportunities to develop and benchmark their own approaches. The code, trained models, and the data will be shared., Comment: Under Review
- Published
- 2024
6. Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation
- Author
-
Ramprasad, Sanjana and Wallace, Byron C.
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Modern LLMs can now produce highly readable abstractive summaries, to the point where traditional automated metrics for evaluating summary quality, such as ROUGE, have become saturated. However, LLMs still sometimes introduce unwanted content into summaries, i.e., information inconsistent with or unsupported by their source. Measuring the occurrence of these often subtle ``hallucinations'' automatically has proved to be challenging. This in turn has motivated development of a variety of metrics intended to measure the factual consistency of generated summaries against their source. But are these approaches measuring what they purport to do? In this work, we stress-test automatic factuality metrics. Specifically, we investigate whether and to what degree superficial attributes of summary texts suffice to predict ``factuality'', finding that a (supervised) model using only such shallow features is reasonably competitive with SOTA factuality scoring methods. We then evaluate how factuality metrics respond to factual corrections in inconsistent summaries and find that only a few show meaningful improvements. In contrast, some metrics are more sensitive to benign, non-factual edits. Motivated by these insights, we show that one can ``game'' (most) automatic factuality metrics, i.e., reliably inflate ``factuality'' scores by appending innocuous sentences to generated summaries. Taken together, our results raise questions about the degree to which we should rely on existing automated factuality metrics and what exactly we want ``factuality metrics'' to measure.
- Published
- 2024
7. Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark
- Author
-
Tu, Rong-Cheng, Ma, Zi-Ao, Lan, Tian, Zhao, Yuehao, Huang, Heyan, and Mao, Xian-Ling
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Driven by the remarkable progress in diffusion models, text-to-image generation has made significant strides, creating a pressing demand for automatic quality evaluation of generated images. Current state-of-the-art automatic evaluation methods heavily rely on Multi-modal Large Language Models (MLLMs), particularly powerful commercial models like GPT-4o. While these models are highly effective, their substantial costs limit scalability in large-scale evaluations. Adopting open-source MLLMs is an alternative; however, their performance falls short due to significant limitations in processing multi-modal data compared to commercial MLLMs. To tackle these problems, we first propose a task decomposition evaluation framework based on GPT-4o to automatically construct a new training dataset, where the complex evaluation task is decoupled into simpler sub-tasks, effectively reducing the learning complexity. Based on this dataset, we design innovative training strategies to effectively distill GPT-4o's evaluation capabilities into a 7B open-source MLLM, MiniCPM-V-2.6. Furthermore, to reliably and comprehensively assess prior works and our proposed model, we manually annotate a meta-evaluation benchmark that includes chain-of-thought explanations alongside quality scores for generated images. Experimental results demonstrate that our distilled open-source MLLM significantly outperforms the current state-of-the-art GPT-4o-base baseline, VIEScore, with over 4.6\% improvement in Spearman and Kendall correlations with human judgments.
- Published
- 2024
8. Deep Learning-Based Automatic Delineation of Liver Domes in kV Triggered Images for Online Breath-hold Reproducibility Verification of Liver Stereotactic Body Radiation Therapy
- Author
-
Weragoda, Sugandima, Xia, Ping, Stephans, Kevin, Woody, Neil, Martens, Michael, Brown, Robert, and Guo, Bingqi
- Subjects
Physics - Medical Physics ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Stereotactic Body Radiation Therapy (SBRT) can be a precise, minimally invasive treatment method for liver cancer and liver metastases. However, the effectiveness of SBRT relies on the accurate delivery of the dose to the tumor while sparing healthy tissue. Challenges persist in ensuring breath-hold reproducibility, with current methods often requiring manual verification of liver dome positions from kV-triggered images. To address this, we propose a proof-of-principle study of a deep learning-based pipeline to automatically delineate the liver dome from kV-planar images. From 24 patients who received SBRT for liver cancer or metastasis inside liver, 711 KV-triggered images acquired for online breath-hold verification were included in the current study. We developed a pipeline comprising a trained U-Net for automatic liver dome region segmentation from the triggered images followed by extraction of the liver dome via thresholding, edge detection, and morphological operations. The performance and generalizability of the pipeline was evaluated using 2-fold cross validation. The training of the U-Net model for liver region segmentation took under 30 minutes and the automatic delineation of a liver dome for any triggered image took less than one second. The RMSE and rate of detection for Fold1 with 366 images was (6.4 +/- 1.6) mm and 91.7%, respectively. For Fold2 with 345 images, the RMSE and rate of detection was (7.7 +/- 2.3) mm and 76.3% respectively.
- Published
- 2024
9. Automatic brain tumor segmentation in 2D intra-operative ultrasound images using MRI tumor annotations
- Author
-
Faanes, Mathilde, Helland, Ragnhild Holden, Solheim, Ole, and Reinertsen, Ingerid
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,I.4.6 ,J.3 - Abstract
Automatic segmentation of brain tumors in intra-operative ultrasound (iUS) images could facilitate localization of tumor tissue during resection surgery. The lack of large annotated datasets limits the current models performances. In this paper, we investigate the use of tumor annotations in pre-operative MRI images, which are more easily accessible than annotations in iUS images, for training of deep learning models for iUS brain tumor segmentation. We used 180 annotated pre-operative MRI images with corresponding unannotated iUS images, and 29 annotated iUS images. Image registration was performed to transfer the MRI annotations to the corresponding iUS images before training models with the nnU-Net framework. To validate the use of MRI labels, the models were compared to a model trained with only US annotated tumors, and a model with both US and MRI annotated tumors. In addition, the results were compared to annotations validated by an expert neurosurgeon on the same test set to measure inter-observer variability. The results showed similar performance for a model trained with only MRI annotated tumors, compared to a model trained with only US annotated tumors. The model trained using both modalities obtained slightly better results with an average Dice score of 0.62, where external expert annotations achieved a score of 0.67. The results also showed that the deep learning models were comparable to expert annotation for larger tumors (> 200 mm2), but perform clearly worse for smaller tumors (< 200 mm2). This shows that MRI tumor annotations can be used as a substitute for US tumor annotations to train a deep learning model for automatic brain tumor segmentation in intra-operative ultrasound images. Small tumors is a limitation for the current models and will be the focus of future work. The main models are available here: https://github.com/mathildefaanes/us_brain_tumor_segmentation., Comment: 19, 8 figures, submitted to International Journal of Computer Assisted Radiology and Surgery
- Published
- 2024
10. ComfyGI: Automatic Improvement of Image Generation Workflows
- Author
-
Sobania, Dominik, Briesch, Martin, and Rothlauf, Franz
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Computer Science - Neural and Evolutionary Computing - Abstract
Automatic image generation is no longer just of interest to researchers, but also to practitioners. However, current models are sensitive to the settings used and automatic optimization methods often require human involvement. To bridge this gap, we introduce ComfyGI, a novel approach to automatically improve workflows for image generation without the need for human intervention driven by techniques from genetic improvement. This enables image generation with significantly higher quality in terms of the alignment with the given description and the perceived aesthetics. On the performance side, we find that overall, the images generated with an optimized workflow are about 50% better compared to the initial workflow in terms of the median ImageReward score. These already good results are even surpassed in our human evaluation, as the participants preferred the images improved by ComfyGI in around 90% of the cases.
- Published
- 2024
11. Cyborg Insect Factory: Automatic Assembly System to Build up Insect-computer Hybrid Robot Based on Vision-guided Robotic Arm Manipulation of Custom Bipolar Electrodes
- Author
-
Lin, Qifeng, Vuong, Nghia, Song, Kewei, Tran-Ngoc, Phuoc Thanh, Nonato, Greg Angelo Gonzales, and Sato, Hirotaka
- Subjects
Computer Science - Robotics - Abstract
The advancement of insect-computer hybrid robots holds significant promise for navigating complex terrains and enhancing robotics applications. This study introduced an automatic assembly method for insect-computer hybrid robots, which was accomplished by mounting backpack with precise implantation of custom-designed bipolar electrodes. We developed a stimulation protocol for the intersegmental membrane between pronotum and mesothorax of the Madagascar hissing cockroach, allowing for bipolar electrodes' automatic implantation using a robotic arm. The assembly process was integrated with a deep learning-based vision system to accurately identify the implantation site, and a dedicated structure to fix the insect (68 s for the whole assembly process). The automatically assembled hybrid robots demonstrated steering control (over 70 degrees for 0.4 s stimulation) and deceleration control (68.2% speed reduction for 0.4 s stimulation), matching the performance of manually assembled systems. Furthermore, a multi-agent system consisting of 4 hybrid robots successfully covered obstructed outdoor terrain (80.25% for 10 minutes 31 seconds), highlighting the feasibility of mass-producing these systems for practical applications. The proposed automatic assembly strategy reduced preparation time for the insect-computer hybrid robots while maintaining their precise control, laying a foundation for scalable production and deployment in real-world applications.
- Published
- 2024
12. Classification of Stable Surfaces with respect to Automatic Continuity
- Author
-
Bestvina, Mladen, Domat, George, and Rafi, Kasra
- Subjects
Mathematics - Geometric Topology ,Mathematics - Group Theory ,57S05, 57K20, 20F65, 22A05 - Abstract
We provide a complete classification of when the homeomorphism group of a stable surface, $\Sigma$, has the automatic continuity property: Any homomorphism from Homeo$(\Sigma)$ to a separable group is necessarily continuous. This result descends to a classification of when the mapping class group of $\Sigma$ has the automatic continuity property. Towards this classification, we provide a general framework for proving automatic continuity for groups of homeomorphisms. Applying this framework, we also show that the homeomorphism group of any stable second countable Stone space has the automatic continuity property. Under the presence of stability this answers two questions of Mann., Comment: 37 pages, 5 figures
- Published
- 2024
13. Improving the solver for the Balitsky-Kovchegov evolution equation with Automatic Differentiation
- Author
-
Cougoulic, Florian, Korcyl, Piotr, and Stebel, Tomasz
- Subjects
High Energy Physics - Phenomenology - Abstract
The Balitsky-Kovchegov (BK) evolution equation is an equation derived from perturbative Quantum Chromodynamics that allows one to calculate the scattering amplitude of a pair of quark and antiquark off a hadron target, called the dipole amplitude, as a function of the collision energy. The initial condition, being a non-perturbative object, usually has to be modeled separately. Typically, the model contains several tunable parameters that are determined by fitting to experimental data. In this contribution, we propose an implementation of the BK solver using differentiable programming. Automatic differentiation offers the possibility that the first and second derivatives of the amplitude with respect to the initial condition parameters are automatically calculated at all stages of the simulation. This fact should considerably facilitate and speed up the fitting step. Moreover, in the context of Transverse Momentum Dis- tributions (TMD), we demonstrate that automatic differentiation can be used to obtain the first and second derivatives of the amplitude with respect to the quark-antiquark separation. These derivatives can be used to relate various TMD functions to the dipole amplitude. Our C++ code for the solver, which is available in a public repository [1], includes the Balitsky one-loop running coupling prescription and the kinematic constraint. This version of the BK equation is widely used in the small-x evolution framework., Comment: 17 pages, 6 figures, source code is published in the repository
- Published
- 2024
14. SG-LRA: Self-Generating Automatic Scoliosis Cobb Angle Measurement with Low-Rank Approximation
- Author
-
Shao, Zhiwen, Yuan, Yichen, Ma, Lizhuang, Yeung, Dit-Yan, and Zhu, Xiaojia
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Automatic Cobb angle measurement from X-ray images is crucial for scoliosis screening and diagnosis. However, most existing regression-based methods and segmentation-based methods struggle with inaccurate spine representations or mask connectivity/fragmentation issues. Besides, landmark-based methods suffer from insufficient training data and annotations. To address these challenges, we propose a novel framework including Self-Generation pipeline and Low-Rank Approximation representation (SG-LRA) for automatic Cobb angle measurement. Specifically, we propose a parameterized spine contour representation based on LRA, which enables eigen-spine decomposition and spine contour reconstruction. We can directly obtain spine contour with only regressed LRA coefficients, which form a more accurate spine representation than rectangular boxes. Also, we combine LRA coefficient regression with anchor box classification to solve inaccurate predictions and mask connectivity issues. Moreover, we develop a data engine with automatic annotation and automatic selection in an iterative manner, which is trained on a private Spinal2023 dataset. With our data engine, we generate the largest scoliosis X-ray dataset named Spinal-AI2024 largely without privacy leaks. Extensive experiments on public AASCE2019, private Spinal2023, and generated Spinal-AI2024 datasets demonstrate that our method achieves state-of-the-art Cobb angle measurement performance. Our code and Spinal-AI2024 dataset are available at https://github.com/Ernestchenchen/SG-LRA and https://github.com/Ernestchenchen/Spinal-AI2024, respectively.
- Published
- 2024
15. Qurts: Automatic Quantum Uncomputation by Affine Types with Lifetime
- Author
-
Hirata, Kengo and Heunen, Chris
- Subjects
Computer Science - Programming Languages ,Quantum Physics ,D.3.1 ,F.3.1 ,F.3.2 - Abstract
Uncomputation is a feature in quantum programming that allows the programmer to discard a value without losing quantum information, and that allows the compiler to reuse resources. Whereas quantum information has to be treated linearly by the type system, automatic uncomputation enables the programmer to treat it affinely to some extent. Automatic uncomputation requires a substructural type system between linear and affine, a subtlety that has only been captured by existing languages in an ad hoc way. We extend the Rust type system to the quantum setting to give a uniform framework for automatic uncomputation called Qurts (pronounced quartz). Specifically, we parameterise types by lifetimes, permitting them to be affine during their lifetime, while being restricted to linear use outside their lifetime. We also provide two operational semantics: one based on classical simulation, and one that does not depend on any specific uncomputation strategy., Comment: 59 pages
- Published
- 2024
16. HIST-AID: Leveraging Historical Patient Reports for Enhanced Multi-Modal Automatic Diagnosis
- Author
-
Huang, Haoxu, Deniz, Cem M., Cho, Kyunghyun, Chopra, Sumit, and Madaan, Divyam
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Chest X-ray imaging is a widely accessible and non-invasive diagnostic tool for detecting thoracic abnormalities. While numerous AI models assist radiologists in interpreting these images, most overlook patients' historical data. To bridge this gap, we introduce Temporal MIMIC dataset, which integrates five years of patient history, including radiographic scans and reports from MIMIC-CXR and MIMIC-IV, encompassing 12,221 patients and thirteen pathologies. Building on this, we present HIST-AID, a framework that enhances automatic diagnostic accuracy using historical reports. HIST-AID emulates the radiologist's comprehensive approach, leveraging historical data to improve diagnostic accuracy. Our experiments demonstrate significant improvements, with AUROC increasing by 6.56% and AUPRC by 9.51% compared to models that rely solely on radiographic scans. These gains were consistently observed across diverse demographic groups, including variations in gender, age, and racial categories. We show that while recent data boost performance, older data may reduce accuracy due to changes in patient conditions. Our work paves the potential of incorporating historical data for more reliable automatic diagnosis, providing critical support for clinical decision-making., Comment: In Proceedings of Machine Learning for Health
- Published
- 2024
17. Interactive Cycle Model -- The Linkage Combination among Automatic Speech Recognition, Large Language Models and Smart Glasses
- Author
-
Wang, Libo
- Subjects
Computer Science - Human-Computer Interaction - Abstract
This research proposes the interaction loop model "ASR-LLM-Smart Glasses", which model combines automatic speech recognition, large language model and smart glasses to facilitate seamless human-computer interaction. And the methodology of this research involves decomposing the interaction process into different stages and elements. Speech is captured and processed by ASR, then analyzed and interpreted by LLM. The results are then transmitted to smart glasses for display. The feedback loop is complete when the user interacts with the displayed data. Mathematical formulas are used to quantify the performance of the model that revolves around core evaluation points: accuracy, coherence, and latency during ASR speech-to-text conversion. The research results are provided theoretically to test and evaluate the feasibility and performance of the model. Although such human-computer interaction products have not yet appeared in the industry, the performance indicators of this model in enhancing user experience in fields that rely on human-computer interaction have also verified its utility as a technology to promote human-computer interaction. In addition, this research pioneered the idea of integrating cutting-edge technologies such as generative pre-trained Transformer models into unique interaction models, LLM provides raw value through powerful evaluation techniques and innovative use, which provides a new perspective to evaluate and enhanced human-computer interaction. Keywords: Automatic speech recognition, Large Language Model, Smart glasses, Interaction mechanism, Comment: OpenReview submitted. 11 pages of text and 1 figure
- Published
- 2024
18. CorrectBench: Automatic Testbench Generation with Functional Self-Correction using LLMs for HDL Design
- Author
-
Qiu, Ruidi, Zhang, Grace Li, Drechsler, Rolf, Schlichtmann, Ulf, and Li, Bing
- Subjects
Computer Science - Software Engineering - Abstract
Functional simulation is an essential step in digital hardware design. Recently, there has been a growing interest in leveraging Large Language Models (LLMs) for hardware testbench generation tasks. However, the inherent instability associated with LLMs often leads to functional errors in the generated testbenches. Previous methods do not incorporate automatic functional correction mechanisms without human intervention and still suffer from low success rates, especially for sequential tasks. To address this issue, we propose CorrectBench, an automatic testbench generation framework with functional self-validation and self-correction. Utilizing only the RTL specification in natural language, the proposed approach can validate the correctness of the generated testbenches with a success rate of 88.85%. Furthermore, the proposed LLM-based corrector employs bug information obtained during the self-validation process to perform functional self-correction on the generated testbenches. The comparative analysis demonstrates that our method achieves a pass ratio of 70.13% across all evaluated tasks, compared with the previous LLM-based testbench generation framework's 52.18% and a direct LLM-based generation method's 33.33%. Specifically in sequential circuits, our work's performance is 62.18% higher than previous work in sequential tasks and almost 5 times the pass ratio of the direct method. The codes and experimental results are open-sourced at the link: https://github.com/AutoBench/CorrectBench
- Published
- 2024
19. CRepair: CVAE-based Automatic Vulnerability Repair Technology
- Author
-
Liu, Penghui, Bi, Yingzhou, Huang, Jiangtao, Jiang, Xinxin, and Wang, Lianmei
- Subjects
Computer Science - Software Engineering ,Computer Science - Artificial Intelligence - Abstract
Software vulnerabilities are flaws in computer software systems that pose significant threats to the integrity, security, and reliability of modern software and its application data. These vulnerabilities can lead to substantial economic losses across various industries. Manual vulnerability repair is not only time-consuming but also prone to errors. To address the challenges of vulnerability repair, researchers have proposed various solutions, with learning-based automatic vulnerability repair techniques gaining widespread attention. However, existing methods often focus on learning more vulnerability data to improve repair outcomes, while neglecting the diverse characteristics of vulnerable code, and suffer from imprecise vulnerability localization.To address these shortcomings, this paper proposes CRepair, a CVAE-based automatic vulnerability repair technology aimed at fixing security vulnerabilities in system code. We first preprocess the vulnerability data using a prompt-based method to serve as input to the model. Then, we apply causal inference techniques to map the vulnerability feature data to probability distributions. By employing multi-sample feature fusion, we capture diverse vulnerability feature information. Finally, conditional control is used to guide the model in repairing the vulnerabilities.Experimental results demonstrate that the proposed method significantly outperforms other benchmark models, achieving a perfect repair rate of 52%. The effectiveness of the approach is validated from multiple perspectives, advancing AI-driven code vulnerability repair and showing promising applications.
- Published
- 2024
20. Automatic Authoring of Physical and Perceptual/Affective Motion Effects for Virtual Reality
- Author
-
Lee, Jiwan and Choi, Seungmoon
- Subjects
Computer Science - Human-Computer Interaction - Abstract
This demo is about automatic authoring of various motion effects that are provided with audiovisual content to improve user experiences. Traditionally, motion effects have been used for simulators, e.g., flight simulators for pilots and astronauts, to present physically accurate vestibular feedback. At present, we have greatly wider use of motion effects for entertainment purposes, such as 4D rides in amusement parks and even shopping malls, 4D films in theaters, and relative new virtual reality games with head-mounted displays and personal motion platforms. However, the production of motion effects is done solely by manual authoring or coding, and this costly process prevents the faster and wider dissemination of 4D content. It is imperative to facilitate motion effect production by providing automatic synthesis algorithms. This demo video presents nine different automatic synthesis algorithms for motion effects and a recorded demonstration of each., Comment: Part of proceedings of 6th International Conference AsiaHaptics 2024
- Published
- 2024
21. A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model
- Author
-
Hu, Panwen, Xiao, Nan, Li, Feifei, Chen, Yongquan, and Huang, Rui
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this era of videos, automatic video editing techniques attract more and more attention from industry and academia since they can reduce workloads and lower the requirements for human editors. Existing automatic editing systems are mainly scene- or event-specific, e.g., soccer game broadcasting, yet the automatic systems for general editing, e.g., movie or vlog editing which covers various scenes and events, were rarely studied before, and converting the event-driven editing method to a general scene is nontrivial. In this paper, we propose a two-stage scheme for general editing. Firstly, unlike previous works that extract scene-specific features, we leverage the pre-trained Vision-Language Model (VLM) to extract the editing-relevant representations as editing context. Moreover, to close the gap between the professional-looking videos and the automatic productions generated with simple guidelines, we propose a Reinforcement Learning (RL)-based editing framework to formulate the editing problem and train the virtual editor to make better sequential editing decisions. Finally, we evaluate the proposed method on a more general editing task with a real movie dataset. Experimental results demonstrate the effectiveness and benefits of the proposed context representation and the learning ability of our RL-based editing framework.
- Published
- 2024
22. A multi-purpose automatic editing system based on lecture semantics for remote education
- Author
-
Hu, Panwen and Huang, Rui
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Multimedia - Abstract
Remote teaching has become popular recently due to its convenience and safety, especially under extreme circumstances like a pandemic. However, online students usually have a poor experience since the information acquired from the views provided by the broadcast platforms is limited. One potential solution is to show more camera views simultaneously, but it is technically challenging and distracting for the viewers. Therefore, an automatic multi-camera directing/editing system, which aims at selecting the most concerned view at each time instance to guide the attention of online students, is in urgent demand. However, existing systems mostly make simple assumptions and focus on tracking the position of the speaker instead of the real lecture semantics, and therefore have limited capacities to deliver optimal information flow. To this end, this paper proposes an automatic multi-purpose editing system based on the lecture semantics, which can both direct the multiple video streams for real-time broadcasting and edit the optimal video offline for review purposes. Our system directs the views by semantically analyzing the class events while following the professional directing rules, mimicking a human director to capture the regions of interest from the viewpoint of the onsite students. We conduct both qualitative and quantitative analyses to verify the effectiveness of the proposed system and its components.
- Published
- 2024
23. Automatic Structured Pruning for Efficient Architecture in Federated Learning
- Author
-
Nguyen, Thai Vu, Le, Long Bao, and Avila, Anderson
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In Federated Learning (FL), training is conducted on client devices, typically with limited computational resources and storage capacity. To address these constraints, we propose an automatic pruning scheme tailored for FL systems. Our solution improves computation efficiency on client devices, while minimizing communication costs. One of the challenges of tuning pruning hyper-parameters in FL systems is the restricted access to local data. Thus, we introduce an automatic pruning paradigm that dynamically determines pruning boundaries. Additionally, we utilized a structured pruning algorithm optimized for mobile devices that lack hardware support for sparse computations. Experimental results demonstrate the effectiveness of our approach, achieving accuracy comparable to existing methods. Our method notably reduces the number of parameters by 89% and FLOPS by 90%, with minimal impact on the accuracy of the FEMNIST and CelebFaces datasets. Furthermore, our pruning method decreases communication overhead by up to 5x and halves inference time when deployed on Android devices.
- Published
- 2024
24. Automatic programming via large language models with population self-evolution for dynamic job shop scheduling problem
- Author
-
Huang, Jin, Li, Xinyu, Gao, Liang, Liu, Qihao, and Teng, Yue
- Subjects
Computer Science - Neural and Evolutionary Computing - Abstract
Heuristic dispatching rules (HDRs) are widely regarded as effective methods for solving dynamic job shop scheduling problems (DJSSP) in real-world production environments. However, their performance is highly scenario-dependent, often requiring expert customization. To address this, genetic programming (GP) and gene expression programming (GEP) have been extensively used for automatic algorithm design. Nevertheless, these approaches often face challenges due to high randomness in the search process and limited generalization ability, hindering the application of trained dispatching rules to new scenarios or dynamic environments. Recently, the integration of large language models (LLMs) with evolutionary algorithms has opened new avenues for prompt engineering and automatic algorithm design. To enhance the capabilities of LLMs in automatic HDRs design, this paper proposes a novel population self-evolutionary (SeEvo) method, a general search framework inspired by the self-reflective design strategies of human experts. The SeEvo method accelerates the search process and enhances exploration capabilities. Experimental results show that the proposed SeEvo method outperforms GP, GEP, end-to-end deep reinforcement learning methods, and more than 10 common HDRs from the literature, particularly in unseen and dynamic scenarios.
- Published
- 2024
25. A Survey on Automatic Credibility Assessment of Textual Credibility Signals in the Era of Large Language Models
- Author
-
Srba, Ivan, Razuvayevskaya, Olesya, Leite, João A., Moro, Robert, Schlicht, Ipek Baris, Tonelli, Sara, García, Francisco Moreno, Lottmann, Santiago Barrio, Teyssou, Denis, Porcellini, Valentin, Scarton, Carolina, Bontcheva, Kalina, and Bielikova, Maria
- Subjects
Computer Science - Computation and Language - Abstract
In the current era of social media and generative AI, an ability to automatically assess the credibility of online social media content is of tremendous importance. Credibility assessment is fundamentally based on aggregating credibility signals, which refer to small units of information, such as content factuality, bias, or a presence of persuasion techniques, into an overall credibility score. Credibility signals provide a more granular, more easily explainable and widely utilizable information in contrast to currently predominant fake news detection, which utilizes various (mostly latent) features. A growing body of research on automatic credibility assessment and detection of credibility signals can be characterized as highly fragmented and lacking mutual interconnections. This issue is even more prominent due to a lack of an up-to-date overview of research works on automatic credibility assessment. In this survey, we provide such systematic and comprehensive literature review of 175 research papers while focusing on textual credibility signals and Natural Language Processing (NLP), which undergoes a significant advancement due to Large Language Models (LLMs). While positioning the NLP research into the context of other multidisciplinary research works, we tackle with approaches for credibility assessment as well as with 9 categories of credibility signals (we provide a thorough analysis for 3 of them, namely: 1) factuality, subjectivity and bias, 2) persuasion techniques and logical fallacies, and 3) claims and veracity). Following the description of the existing methods, datasets and tools, we identify future challenges and opportunities, while paying a specific attention to recent rapid development of generative AI.
- Published
- 2024
26. Towards Fully Automatic Distributed Lower Bounds
- Author
-
Balliu, Alkida, Brandt, Sebastian, Kuhn, Fabian, Olivetti, Dennis, and Saarhelo, Joonatan
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
In the past few years, a successful line of research has lead to lower bounds for several fundamental local graph problems in the distributed setting. These results were obtained via a technique called round elimination. On a high level, the round elimination technique can be seen as a recursive application of a function that takes as input a problem $\Pi$ and outputs a problem $\Pi'$ that is one round easier than $\Pi$. Applying this function recursively to concrete problems of interest can be highly nontrivial, which is one of the reasons that has made the technique difficult to approach. The contribution of our paper is threefold. Firstly, we develop a new and fully automatic method for finding lower bounds of $\Omega(\log_\Delta n)$ and $\Omega(\log_\Delta \log n)$ rounds for deterministic and randomized algorithms, respectively, via round elimination. Secondly, we show that this automatic method is indeed useful, by obtaining lower bounds for defective coloring problems. We show that the problem of coloring the nodes of a graph with $3$ colors and defect at most $(\Delta - 3)/2$ requires $\Omega(\log_\Delta n)$ rounds for deterministic algorithms and $\Omega(\log_\Delta \log n)$ rounds for randomized ones. We note that lower bounds for coloring problems are notoriously challenging to obtain, both in general, and via the round elimination technique. Both the first and (indirectly) the second contribution build on our third contribution -- a new and conceptually simple way to compute the one-round easier problem $\Pi'$ in the round elimination framework. This new procedure provides a clear and easy recipe for applying round elimination, thereby making a substantial step towards the greater goal of having a fully automatic procedure for obtaining lower bounds in the distributed setting.
- Published
- 2024
27. Automatic Extraction and Compensation of P-Bit Device Variations in Large Array Utilizing Boltzmann Machine Training
- Author
-
Zhang, Bolin, Liu, Yu, Gao, Tianqi, Yin, Jialiang, Guan, Zhenyu, Zhang, Deming, and Zeng, Lang
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Physics - Applied Physics - Abstract
Probabilistic Bit (P-Bit) device serves as the core hardware for implementing Ising computation. However, the severe intrinsic variations of stochastic P-Bit devices hinder the large-scale expansion of the P-Bit array, significantly limiting the practical usage of Ising computation. In this work, a behavioral model which attributes P-Bit variations to two parameters {\alpha} and {\Delta}V is proposed. Then the weight compensation method is introduced, which can mitigate {\alpha} and {\Delta}V of P-Bits device variations by rederiving the weight matrix, enabling them to compute as ideal identical PBits without the need for weights retraining. Accurately extracting the {\alpha} and {\Delta}V simultaneously from a large P-Bit array which is prerequisite for the weight compensation method is a crucial and challenging task. To solve this obstacle, we present the novel automatic variation extraction algorithm which can extract device variations of each P-Bit in a large array based on Boltzmann machine learning. In order for the accurate extraction of variations from an extendable P-Bit array, an Ising Hamiltonian based on 3D ferromagnetic model is constructed, achieving precise and scalable array variation extraction. The proposed Automatic Extraction and Compensation algorithm is utilized to solve both 16-city traveling salesman problem(TSP) and 21-bit integer factorization on a large P-Bit array with variation, demonstrating its accuracy, transferability, and scalability., Comment: 15 pages, 17 figures
- Published
- 2024
28. ScreenWriter: Automatic Screenplay Generation and Movie Summarisation
- Author
-
Mahon, Louis and Lapata, Mirella
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Multimedia - Abstract
The proliferation of creative video content has driven demand for textual descriptions or summaries that allow users to recall key plot points or get an overview without watching. The volume of movie content and speed of turnover motivates automatic summarisation, which is nevertheless challenging, requiring identifying character intentions and very long-range temporal dependencies. The few existing methods attempting this task rely heavily on textual screenplays as input, greatly limiting their applicability. In this work, we propose the task of automatic screenplay generation, and a method, ScreenWriter, that operates only on video and produces output which includes dialogue, speaker names, scene breaks, and visual descriptions. ScreenWriter introduces a novel algorithm to segment the video into scenes based on the sequence of visual vectors, and a novel method for the challenging problem of determining character names, based on a database of actors' faces. We further demonstrate how these automatic screenplays can be used to generate plot synopses with a hierarchical summarisation method based on scene breaks. We test the quality of the final summaries on the recent MovieSum dataset, which we augment with videos, and show that they are superior to a number of comparison models which assume access to goldstandard screenplays.
- Published
- 2024
29. Automatic Navigation and Voice Cloning Technology Deployment on a Humanoid Robot
- Author
-
Han, Dongkun and Shao, Boyuan
- Subjects
Computer Science - Robotics - Abstract
Mobile robots have shown immense potential and are expected to be widely used in the service industry. The importance of automatic navigation and voice cloning cannot be overstated as they enable functional robots to provide high-quality services. The objective of this work is to develop a control algorithm for the automatic navigation of a humanoid mobile robot called Cruzr, which is a service robot manufactured by Ubtech. Initially, a virtual environment is constructed in the simulation software Gazebo using Simultaneous Localization And Mapping (SLAM), and global path planning is carried out by means of local path tracking. The two-wheel differential chassis kinematics model is employed to ensure autonomous dynamic obstacle avoidance for the robot chassis. Furthermore, the mapping and trajectory generation algorithms developed in the simulation environment are successfully implemented on the real robot Cruzr. The performance of automatic navigation is compared between the Dynamic Window Approach (DWA) and Model Predictive Control (MPC) algorithms. Additionally, a mobile application for voice cloning is created based on a Hidden Markov Model, and the proposed Chatbot is also tested and deployed on Cruzr., Comment: 7 pages, 6 figures
- Published
- 2024
30. FSOS-AMC: Few-Shot Open-Set Learning for Automatic Modulation Classification
- Author
-
Zhang, Hao, Zhou, Fuhui, Wu, Qihui, and Yuen, Chau
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
Automatic modulation classification (AMC) is essential for the advancement and efficiency of future wireless communication networks. Deep learning (DL)-based AMC frameworks have garnered extensive attention for their impressive classification performance. However, existing DL-based AMC frameworks rely on two assumptions, large-scale training data and the same class pool between the training and testing data, which are not suitable for \emph{few-shot and open-set} scenarios. To address this issue, a novel few-shot open-set automatic modulation classification (FSOS-AMC) framework is proposed by exploiting a multi-scale attention network, meta-prototype training, and a modular open-set classifier. The multi-scale attention network is used to extract the features from the input signal, the meta-prototype training is adopted to train the feature extractor and the modular open-set classifier can be utilized to classify the testing data into one of the known modulations or potential unknown modulations. Extensive simulation results demonstrate that the proposed FSOS-AMC framework can achieve higher classification accuracy than the state-of-the-art methods for known modulations and unknown modulations in terms of accuracy and area under the receiver operating characteristic curve (AUROC). Moreover, the performance of the proposed FSOS-AMC framework under low signal-to-noise ratio (SNR) conditions is much better than the compared schemes., Comment: accepted by 16th International Conference on Wireless Communications and Signal Processing (WCSP 2024)
- Published
- 2024
31. EasyHeC++: Fully Automatic Hand-Eye Calibration with Pretrained Image Models
- Author
-
Hong, Zhengdong, Zheng, Kangfu, and Chen, Linghao
- Subjects
Computer Science - Robotics - Abstract
Hand-eye calibration plays a fundamental role in robotics by directly influencing the efficiency of critical operations such as manipulation and grasping. In this work, we present a novel framework, EasyHeC++, designed for fully automatic hand-eye calibration. In contrast to previous methods that necessitate manual calibration, specialized markers, or the training of arm-specific neural networks, our approach is the first system that enables accurate calibration of any robot arm in a marker-free, training-free, and fully automatic manner. Our approach employs a two-step process. First, we initialize the camera pose using a sampling or feature-matching-based method with the aid of pretrained image models. Subsequently, we perform pose optimization through differentiable rendering. Extensive experiments demonstrate the system's superior accuracy in both synthetic and real-world datasets across various robot arms and camera settings. Project page: https://ootts.github.io/easyhec_plus., Comment: Accepted by IROS 2024
- Published
- 2024
32. AMPO: Automatic Multi-Branched Prompt Optimization
- Author
-
Yang, Sheng, Wu, Yurong, Gao, Yan, Zhou, Zineng, Zhu, Bin Benjamin, Sun, Xiaodi, Lou, Jian-Guang, Ding, Zhiming, Hu, Anbang, Fang, Yuan, Li, Yunsong, Chen, Junyan, and Yang, Linjun
- Subjects
Computer Science - Computation and Language - Abstract
Prompt engineering is very important to enhance the performance of large language models (LLMs). When dealing with complex issues, prompt engineers tend to distill multiple patterns from examples and inject relevant solutions to optimize the prompts, achieving satisfying results. However, existing automatic prompt optimization techniques are only limited to producing single flow instructions, struggling with handling diverse patterns. In this paper, we present AMPO, an automatic prompt optimization method that can iteratively develop a multi-branched prompt using failure cases as feedback. Our goal is to explore a novel way of structuring prompts with multi-branches to better handle multiple patterns in complex tasks, for which we introduce three modules: Pattern Recognition, Branch Adjustment, and Branch Pruning. In experiments across five tasks, AMPO consistently achieves the best results. Additionally, our approach demonstrates significant optimization efficiency due to our adoption of a minimal search strategy., Comment: 13 pages, 7 figures, 6 tables
- Published
- 2024
33. Modeling User Preferences with Automatic Metrics: Creating a High-Quality Preference Dataset for Machine Translation
- Author
-
Agrawal, Sweta, de Souza, José G. C., Rei, Ricardo, Farinhas, António, Faria, Gonçalo, Fernandes, Patrick, Guerreiro, Nuno M, and Martins, Andre
- Subjects
Computer Science - Computation and Language - Abstract
Alignment with human preferences is an important step in developing accurate and safe large language models. This is no exception in machine translation (MT), where better handling of language nuances and context-specific variations leads to improved quality. However, preference data based on human feedback can be very expensive to obtain and curate at a large scale. Automatic metrics, on the other hand, can induce preferences, but they might not match human expectations perfectly. In this paper, we propose an approach that leverages the best of both worlds. We first collect sentence-level quality assessments from professional linguists on translations generated by multiple high-quality MT systems and evaluate the ability of current automatic metrics to recover these preferences. We then use this analysis to curate a new dataset, MT-Pref (metric induced translation preference) dataset, which comprises 18k instances covering 18 language directions, using texts sourced from multiple domains post-2022. We show that aligning TOWER models on MT-Pref significantly improves translation quality on WMT23 and FLORES benchmarks., Comment: Accepted at EMNLP Main 2024
- Published
- 2024
34. Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
- Author
-
Zheng, Xiaosen, Pang, Tianyu, Du, Chao, Liu, Qian, Jiang, Jing, and Lin, Min
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Cryptography and Security ,Computer Science - Machine Learning - Abstract
Automatic LLM benchmarks, such as AlpacaEval 2.0, Arena-Hard-Auto, and MT-Bench, have become popular for evaluating language models due to their cost-effectiveness and scalability compared to human evaluation. Achieving high win rates on these benchmarks can significantly boost the promotional impact of newly released language models. This promotional benefit may motivate tricks, such as manipulating model output length or style to game win rates, even though several mechanisms have been developed to control length and disentangle style to reduce gameability. Nonetheless, we show that even a "null model" that always outputs a constant response (irrelevant to input instructions) can cheat automatic benchmarks and achieve top-ranked win rates: an 86.5% LC win rate on AlpacaEval 2.0; an 83.0 score on Arena-Hard-Auto; and a 9.55 score on MT-Bench. Moreover, the crafted cheating outputs are transferable because we assume that the instructions of these benchmarks (e.g., 805 samples of AlpacaEval 2.0) are private and cannot be accessed. While our experiments are primarily proof-of-concept, an adversary could use LLMs to generate more imperceptible cheating responses, unethically benefiting from high win rates and promotional impact. Our findings call for the development of anti-cheating mechanisms for reliable automatic benchmarks. The code is available at https://github.com/sail-sg/Cheating-LLM-Benchmarks.
- Published
- 2024
35. Automatic Instantiation of Assurance Cases from Patterns Using Large Language Models
- Author
-
Odu, Oluwafemi, Belle, Alvine B., Wang, Song, Kpodjedo, Segla, Lethbridge, Timothy C., and Hemmati, Hadi
- Subjects
Computer Science - Software Engineering - Abstract
An assurance case is a structured set of arguments supported by evidence, demonstrating that a system's non-functional requirements (e.g., safety, security, reliability) have been correctly implemented. Assurance case patterns serve as templates derived from previous successful assurance cases, aimed at facilitating the creation of new assurance cases. Despite the use of these patterns to generate assurance cases, their instantiation remains a largely manual and error-prone process that heavily relies on domain expertise. Thus, exploring techniques to support their automatic instantiation becomes crucial. This study aims to investigate the potential of Large Language Models (LLMs) in automating the generation of assurance cases that comply with specific patterns. Specifically, we formalize assurance case patterns using predicate-based rules and then utilize LLMs, i.e., GPT-4o and GPT-4 Turbo, to automatically instantiate assurance cases from these formalized patterns. Our findings suggest that LLMs can generate assurance cases that comply with the given patterns. However, this study also highlights that LLMs may struggle with understanding some nuances related to pattern-specific relationships. While LLMs exhibit potential in the automatic generation of assurance cases, their capabilities still fall short compared to human experts. Therefore, a semi-automatic approach to instantiating assurance cases may be more practical at this time.
- Published
- 2024
36. CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation
- Author
-
He, Han, Liu, Qianchu, Xu, Lei, Shivade, Chaitanya, Zhang, Yi, Srinivasan, Sundararajan, and Kirchhoff, Katrin
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Existing automatic prompt engineering methods are typically designed for discriminative tasks, where new task prompts are iteratively refined with limited feedback from a single metric reflecting a single aspect. However, these approaches are suboptimal for generative tasks, which require more nuanced guidance beyond a single numeric metric to improve the prompt and optimize multiple aspects of the generated text. To address these challenges, we propose a novel multi-aspect Critique-Suggestion-guided automatic Prompt Optimization (CriSPO) approach. CriSPO introduces a critique-suggestion module as its core component. This module spontaneously discovers aspects, and compares generated and reference texts across these aspects, providing specific suggestions for prompt modification. These clear critiques and actionable suggestions guide a receptive optimizer module to make more substantial changes, exploring a broader and more effective search space. To further improve CriSPO with multi-metric optimization, we introduce an Automatic Suffix Tuning (AST) extension to enhance the performance of task prompts across multiple metrics. We evaluate CriSPO on 4 state-of-the-art LLMs across 4 summarization and 5 QA datasets. Extensive experiments show 3-4\% ROUGE score improvement on summarization and substantial improvement of various metrics on QA.
- Published
- 2024
37. Algorithms For Automatic Accentuation And Transcription Of Russian Texts In Speech Recognition Systems
- Author
-
Iakovenko, Olga, Bondarenko, Ivan, Borovikova, Mariya, and Vodolazsky, Daniil
- Subjects
Computer Science - Computation and Language - Abstract
This paper presents an overview of rule-based system for automatic accentuation and phonemic transcription of Russian texts for speech connected tasks, such as Automatic Speech Recognition (ASR). Two parts of the developed system, accentuation and transcription, use different approaches to achieve correct phonemic representations of input phrases. Accentuation is based on "Grammatical dictionary of the Russian language" of A.A. Zaliznyak and wiktionary corpus. To distinguish homographs, the accentuation system also utilises morphological information of the sentences based on Recurrent Neural Networks (RNN). Transcription algorithms apply the rules presented in the monograph of B.M. Lobanov and L.I. Tsirulnik "Computer Synthesis and Voice Cloning". The rules described in the present paper are implemented in an open-source module, which can be of use to any scientific study connected to ASR or Speech To Text (STT) tasks. Automatically marked up text annotations of the Russian Voxforge database were used as training data for an acoustic model in CMU Sphinx. The resulting acoustic model was evaluated on cross-validation, mean Word Accuracy being 71.2%. The developed toolkit is written in the Python language and is accessible on GitHub for any researcher interested., Comment: Speech and Computer 20th International Conference, SPECOM 2018, Leipzig, Germany, Proceedings 20
- Published
- 2024
- Full Text
- View/download PDF
38. Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge
- Author
-
Elangovan, Aparna, Ko, Jongwoo, Xu, Lei, Elyasi, Mahsa, Liu, Ling, Bodapati, Sravan, and Roth, Dan
- Subjects
Computer Science - Human-Computer Interaction ,Computer Science - Artificial Intelligence - Abstract
The effectiveness of automatic evaluation of generative models is typically measured by comparing it to human evaluation using correlation metrics. However, metrics like Krippendorff's $\alpha$ and Randolph's $\kappa$, originally designed to measure the reliability of human labeling, make assumptions about human behavior and the labeling process. In this paper, we show how *relying on a single aggregate correlation score* can obscure fundamental differences between human behavior and automatic evaluation methods, including LLM-as-a-Judge. Specifically, we demonstrate that when the proportion of samples with variation or uncertainty in human labels (gathered during human evaluation) is relatively high, machine labels (generated by automatic evaluation methods) may superficially appear to have similar or better correlation with the human majority label compared to human-to-human (HH) correlation. This can create the illusion that automatic evaluation approximates the human majority label. However, as the proportion of samples with consistent human labels increases, the correlation between machine and human labels fall well below HH correlation. Based on these findings, we first propose stratifying results by human label uncertainty to provide a more robust analysis of automatic evaluation performance. Second, recognizing that uncertainty and variation are inherent in perception-based human evaluations, such as those involving attitudes or preferences, we introduce a new metric - *binned Jensen-Shannon Divergence for perception* for such scenarios to better measure the effectiveness of automatic evaluations. Third, we present visualization techniques -- *perception charts*, to compare the strengths and limitations of automatic evaluation and to contextualize correlation measures appropriately
- Published
- 2024
39. HiReview: Hierarchical Taxonomy-Driven Automatic Literature Review Generation
- Author
-
Hu, Yuntong, Li, Zhuofeng, Zhang, Zheng, Ling, Chen, Kanjiani, Raasikh, Zhao, Boxin, and Zhao, Liang
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
In this work, we present HiReview, a novel framework for hierarchical taxonomy-driven automatic literature review generation. With the exponential growth of academic documents, manual literature reviews have become increasingly labor-intensive and time-consuming, while traditional summarization models struggle to generate comprehensive document reviews effectively. Large language models (LLMs), with their powerful text processing capabilities, offer a potential solution; however, research on incorporating LLMs for automatic document generation remains limited. To address key challenges in large-scale automatic literature review generation (LRG), we propose a two-stage taxonomy-then-generation approach that combines graph-based hierarchical clustering with retrieval-augmented LLMs. First, we retrieve the most relevant sub-community within the citation network, then generate a hierarchical taxonomy tree by clustering papers based on both textual content and citation relationships. In the second stage, an LLM generates coherent and contextually accurate summaries for clusters or topics at each hierarchical level, ensuring comprehensive coverage and logical organization of the literature. Extensive experiments demonstrate that HiReview significantly outperforms state-of-the-art methods, achieving superior hierarchical organization, content relevance, and factual accuracy in automatic literature review generation tasks.
- Published
- 2024
40. Automatic deductive coding in discourse analysis: an application of large language models in learning analytics
- Author
-
Zhang, Lishan, Wu, Han, Huang, Xiaoshan, Duan, Tengfei, and Du, Hanxiang
- Subjects
Computer Science - Computation and Language ,Computer Science - Human-Computer Interaction - Abstract
Deductive coding is a common discourse analysis method widely used by learning science and learning analytics researchers for understanding teaching and learning interactions. It often requires researchers to manually label all discourses to be analyzed according to a theoretically guided coding scheme, which is time-consuming and labor-intensive. The emergence of large language models such as GPT has opened a new avenue for automatic deductive coding to overcome the limitations of traditional deductive coding. To evaluate the usefulness of large language models in automatic deductive coding, we employed three different classification methods driven by different artificial intelligence technologies, including the traditional text classification method with text feature engineering, BERT-like pretrained language model and GPT-like pretrained large language model (LLM). We applied these methods to two different datasets and explored the potential of GPT and prompt engineering in automatic deductive coding. By analyzing and comparing the accuracy and Kappa values of these three classification methods, we found that GPT with prompt engineering outperformed the other two methods on both datasets with limited number of training samples. By providing detailed prompt structures, the reported work demonstrated how large language models can be used in the implementation of automatic deductive coding., Comment: 20 pages
- Published
- 2024
41. A C++ implementation of the discrete adjoint sensitivity analysis method for explicit adaptive Runge-Kutta methods enabled by automatic adjoint differentiation and SIMD vectorization
- Author
-
Martins, Rui and Lakshtanov, Evgeny
- Subjects
Mathematics - Numerical Analysis ,Computer Science - Mathematical Software ,34-04 (Primary) 65L06, 65K10, 90C31 (Secondary) ,G.1 ,G.4 - Abstract
A C++ library for sensitivity analysis of optimisation problems involving ordinary differential equations (ODEs) enabled by automatic differentiation (AD) and SIMD (Single Instruction, Multiple data) vectorization is presented. The discrete adjoint sensitivity analysis method is implemented for adaptive explicit Runge-Kutta (ERK) methods. Automatic adjoint differentiation (AAD) is employed for efficient evaluations of products of vectors and the Jacobian matrix of the right hand side of the ODE system. This approach avoids the low-level drawbacks of the black box approach of employing AAD on the entire ODE solver and opens the possibility to leverage parallelization. SIMD vectorization is employed to compute the vector-Jacobian products concurrently. We study the performance of other methods and implementations of sensitivity analysis and we find that our algorithm presents a small advantage compared to equivalent existing software., Comment: 30 pages, 15 figures, preprint
- Published
- 2024
42. DeepAerialMapper: Deep Learning-based Semi-automatic HD Map Creation for Highly Automated Vehicles
- Author
-
Krajewski, Robert and Kim, Huijo
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
High-definition maps (HD maps) play a crucial role in the development, safety validation, and operation of highly automated vehicles. Efficiently collecting up-to-date sensor data from road segments and obtaining accurate maps from these are key challenges in HD map creation. Commonly used methods, such as dedicated measurement vehicles and crowd-sourced data from series vehicles, often face limitations in commercial viability. Although high-resolution aerial imagery offers a cost-effective or even free alternative, it requires significant manual effort and time to transform it into maps. In this paper, we introduce a semi-automatic method for creating HD maps from high-resolution aerial imagery. Our method involves training neural networks to semantically segment aerial images into classes relevant to HD maps. The resulting segmentation is then hierarchically post-processed to generate a prototypical HD map of visible road elements. Exporting the map to the Lanelet2 format allows easy extension for different use cases using standard tools. To train and evaluate our method, we created a dataset using public aerial imagery of urban road segments in Germany. In our evaluation, we achieved an automatic mapping of lane markings and road borders with a recall and precision exceeding 96%. The source code for our method is publicly available at https://github.com/RobertKrajewski/DeepAerialMapper., Comment: For source code, see https://github.com/RobertKrajewski/DeepAerialMapper
- Published
- 2024
43. BioFace3D: A fully automatic pipeline for facial biomarkers extraction of 3D face reconstructions segmented from MRI
- Author
-
Heredia-Lidón, Álvaro, Echeverry-Quiceno, Luis M., González, Alejandro, Hostalet, Noemí, Pomarol-Clotet, Edith, Fortea, Juan, Fatjó-Vilas, Mar, Martínez-Abadías, Neus, and Sevillano, Xavier
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Quantitative Biology - Quantitative Methods - Abstract
Facial dysmorphologies have emerged as potential critical indicators in the diagnosis and prognosis of genetic, psychotic and rare disorders. While in certain conditions these dysmorphologies are severe, in other cases may be subtle and not perceivable to the human eye, requiring precise quantitative tools for their identification. Manual coding of facial dysmorphologies is a burdensome task and is subject to inter- and intra-observer variability. To overcome this gap, we present BioFace3D as a fully automatic tool for the calculation of facial biomarkers using facial models reconstructed from magnetic resonance images. The tool is divided into three automatic modules for the extraction of 3D facial models from magnetic resonance images, the registration of homologous 3D landmarks encoding facial morphology, and the calculation of facial biomarkers from anatomical landmarks coordinates using geometric morphometrics techniques.
- Published
- 2024
44. Polyp-SES: Automatic Polyp Segmentation with Self-Enriched Semantic Model
- Author
-
Nguyen, Quang Vinh, Vo, Thanh Hoang Son, Kang, Sae-Ryung, and Kim, Soo-Hyung
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,I.5.4 ,I.2.1 ,I.4.6 ,J.3 - Abstract
Automatic polyp segmentation is crucial for effective diagnosis and treatment in colonoscopy images. Traditional methods encounter significant challenges in accurately delineating polyps due to limitations in feature representation and the handling of variability in polyp appearance. Deep learning techniques, including CNN and Transformer-based methods, have been explored to improve polyp segmentation accuracy. However, existing approaches often neglect additional semantics, restricting their ability to acquire adequate contexts of polyps in colonoscopy images. In this paper, we propose an innovative method named ``Automatic Polyp Segmentation with Self-Enriched Semantic Model'' to address these limitations. First, we extract a sequence of features from an input image and decode high-level features to generate an initial segmentation mask. Using the proposed self-enriched semantic module, we query potential semantics and augment deep features with additional semantics, thereby aiding the model in understanding context more effectively. Extensive experiments show superior segmentation performance of the proposed method against state-of-the-art polyp segmentation baselines across five polyp benchmarks in both superior learning and generalization capabilities., Comment: Asian Conference on Computer Vision 2024
- Published
- 2024
45. DuMapper: Towards Automatic Verification of Large-Scale POIs with Street Views at Baidu Maps
- Author
-
Fan, Miao, Huang, Jizhou, and Wang, Haifeng
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Information Retrieval - Abstract
With the increased popularity of mobile devices, Web mapping services have become an indispensable tool in our daily lives. To provide user-satisfied services, such as location searches, the point of interest (POI) database is the fundamental infrastructure, as it archives multimodal information on billions of geographic locations closely related to people's lives, such as a shop or a bank. Therefore, verifying the correctness of a large-scale POI database is vital. To achieve this goal, many industrial companies adopt volunteered geographic information (VGI) platforms that enable thousands of crowdworkers and expert mappers to verify POIs seamlessly; but to do so, they have to spend millions of dollars every year. To save the tremendous labor costs, we devised DuMapper, an automatic system for large-scale POI verification with the multimodal street-view data at Baidu Maps. DuMapper takes the signboard image and the coordinates of a real-world place as input to generate a low-dimensional vector, which can be leveraged by ANN algorithms to conduct a more accurate search through billions of archived POIs in the database for verification within milliseconds. It can significantly increase the throughput of POI verification by $50$ times. DuMapper has already been deployed in production since \DuMPOnline, which dramatically improves the productivity and efficiency of POI verification at Baidu Maps. As of December 31, 2021, it has enacted over $405$ million iterations of POI verification within a 3.5-year period, representing an approximate workload of $800$ high-performance expert mappers.
- Published
- 2024
46. Disentangled-Transformer: An Explainable End-to-End Automatic Speech Recognition Model with Speech Content-Context Separation
- Author
-
Wang, Pu and Van hamme, Hugo
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
End-to-end transformer-based automatic speech recognition (ASR) systems often capture multiple speech traits in their learned representations that are highly entangled, leading to a lack of interpretability. In this study, we propose the explainable Disentangled-Transformer, which disentangles the internal representations into sub-embeddings with explicit content and speaker traits based on varying temporal resolutions. Experimental results show that the proposed Disentangled-Transformer produces a clear speaker identity, separated from the speech content, for speaker diarization while improving ASR performance., Comment: Accepted by the 6th IEEE International Conference on Image Processing Applications and Systems
- Published
- 2024
47. Automatic Skull Reconstruction by Deep Learnable Symmetry Enforcement
- Author
-
Wodzinski, Marek, Daniol, Mateusz, and Hemmerling, Daria
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Every year, thousands of people suffer from skull damage and require personalized implants to fill the cranial cavity. Unfortunately, the waiting time for reconstruction surgery can extend to several weeks or even months, especially in less developed countries. One factor contributing to the extended waiting period is the intricate process of personalized implant modeling. Currently, the preparation of these implants by experienced biomechanical experts is both costly and time-consuming. Recent advances in artificial intelligence, especially in deep learning, offer promising potential for automating the process. However, deep learning-based cranial reconstruction faces several challenges: (i) the limited size of training datasets, (ii) the high resolution of the volumetric data, and (iii) significant data heterogeneity. In this work, we propose a novel approach to address these challenges by enhancing the reconstruction through learnable symmetry enforcement. We demonstrate that it is possible to train a neural network dedicated to calculating skull symmetry, which can be utilized either as an additional objective function during training or as a post-reconstruction objective during the refinement step. We quantitatively evaluate the proposed method using open SkullBreak and SkullFix datasets, and qualitatively using real clinical cases. The results indicate that the symmetry-preserving reconstruction network achieves considerably better outcomes compared to the baseline (0.94/0.94/1.31 vs 0.84/0.76/2.43 in terms of DSC, bDSC, and HD95). Moreover, the results are comparable to the best-performing methods while requiring significantly fewer computational resources (< 500 vs > 100,000 GPU hours). The proposed method is a considerable contribution to the field of applied artificial intelligence in medicine and is a step toward automatic cranial defect reconstruction in clinical practice.
- Published
- 2024
48. CleanVul: Automatic Function-Level Vulnerability Detection in Code Commits Using LLM Heuristics
- Author
-
Li, Yikun, Zhang, Ting, Widyasari, Ratnadira, Tun, Yan Naing, Nguyen, Huu Hung, Bui, Tan, Irsan, Ivana Clairine, Cheng, Yiran, Lan, Xiang, Ang, Han Wei, Liauw, Frank, Weyssow, Martin, Kang, Hong Jin, Ouh, Eng Lieh, Shar, Lwin Khin, and Lo, David
- Subjects
Computer Science - Software Engineering ,Computer Science - Cryptography and Security - Abstract
Accurate identification of software vulnerabilities is crucial for system integrity. Vulnerability datasets, often derived from the National Vulnerability Database (NVD) or directly from GitHub, are essential for training machine learning models to detect these security flaws. However, these datasets frequently suffer from significant noise, typically 40% to 75%, due primarily to the automatic and indiscriminate labeling of all changes in vulnerability-fixing commits (VFCs) as vulnerability-related. This misclassification occurs because not all changes in a commit aimed at fixing vulnerabilities pertain to security threats; many are routine updates like bug fixes or test improvements. This paper introduces the first methodology that uses the Large Language Model (LLM) with a heuristic enhancement to automatically identify vulnerability-fixing changes from VFCs, achieving an F1-score of 0.82. VulSifter was applied to a large-scale study, where we conducted a crawl of 127,063 repositories on GitHub, resulting in the acquisition of 5,352,105 commits. VulSifter involves utilizing an LLM to comprehend code semantics and contextual information, while applying heuristics to filter out unrelated changes. We then developed CleanVul, a high-quality dataset comprising 11,632 functions using our LLM heuristic enhancement approach, demonstrating Correctness (90.6%) comparable to established datasets such as SVEN and PrimeVul. To evaluate the CleanVul dataset, we conducted experiments focusing on fine-tuning various LLMs on CleanVul and other high-quality datasets. Evaluation results reveal that LLMs fine-tuned on CleanVul not only exhibit enhanced accuracy but also superior generalization capabilities compared to those trained on uncleaned datasets. Specifically, models trained on CleanVul and tested on PrimeVul achieve accuracy higher than those trained and tested exclusively on PrimeVul.
- Published
- 2024
49. Fully Automatic Deep Learning Pipeline for Whole Slide Image Quality Assessment
- Author
-
Jabar, Falah, Busund, Lill-Tove Rasmussen, Ricciuti, Biagio, Tafavvoghi, Masoud, Pøhl, Mette, Andersen, Sigve, Donnem, Tom, Kwiatkowski, David J., and Rakaee, Mehrdad
- Subjects
Computer Science - Multimedia - Abstract
In recent years, the use of deep learning (DL) methods, including convolutional neural networks (CNNs) and vision transformers (ViTs), has significantly advanced computational pathology, enhancing both diagnostic accuracy and efficiency. Hematoxylin and Eosin (H&E) Whole Slide Images (WSI) plays a crucial role by providing detailed tissue samples for the analysis and training of DL models. However, WSIs often contain regions with artifacts such as tissue folds, blurring, as well as non-tissue regions (background), which can negatively impact DL model performance. These artifacts are diagnostically irrelevant and can lead to inaccurate results. This paper proposes a fully automatic supervised DL pipeline for WSI Quality Assessment (WSI-QA) that uses a fused model combining CNNs and ViTs to detect and exclude WSI regions with artifacts, ensuring that only qualified WSI regions are used to build DL-based computational pathology applications. The proposed pipeline employs a pixel-based segmentation model to classify WSI regions as either qualified or non-qualified based on the presence of artifacts. The proposed model was trained on a large and diverse dataset and validated with internal and external data from various human organs, scanners, and H&E staining procedures. Quantitative and qualitative evaluations demonstrate the superiority of the proposed model, which outperforms state-of-the-art methods in WSI artifact detection. The proposed model consistently achieved over 95% accuracy, precision, recall, and F1 score across all artifact types. Furthermore, the WSI-QA pipeline shows strong generalization across different tissue types and scanning conditions., Comment: submitted to IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, November 25, 2024
- Published
- 2024
50. LRSAA: Large-scale Remote Sensing Image Target Recognition and Automatic Annotation
- Author
-
Dong, Wuzheng and Zhu, Yujuan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
This paper presents a method for object recognition and automatic labeling in large-area remote sensing images called LRSAA. The method integrates YOLOv11 and MobileNetV3-SSD object detection algorithms through ensemble learning to enhance model performance. Furthermore, it employs Poisson disk sampling segmentation techniques and the EIOU metric to optimize the training and inference processes of segmented images, followed by the integration of results. This approach not only reduces the demand for computational resources but also achieves a good balance between accuracy and speed. The source code for this project has been made publicly available on https://github.com/anaerovane/LRSAA., Comment: arXiv admin note: text overlap with arXiv:2411.07802
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.