"Automatic" - Searchworks@Jio Institute Digital Library Search Results

151. Large-scale Remote Sensing Image Target Recognition and Automatic Annotation

Author: Dong, Wuzheng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper presents a method for object recognition and automatic labeling in large-area remote sensing images called LRSAA. The method integrates YOLOv11 and MobileNetV3-SSD object detection algorithms through ensemble learning to enhance model performance. Furthermore, it employs Poisson disk sampling segmentation techniques and the EIOU metric to optimize the training and inference processes of segmented images, followed by the integration of results. This approach not only reduces the demand for computational resources but also achieves a good balance between accuracy and speed. The source code for this project has been made publicly available on https://github.com/anaerovane/LRSAA.
Published: 2024

152. AdaS&S: a One-Shot Supernet Approach for Automatic Embedding Size Search in Deep Recommender System

Author: Wei, He, Yang, Yuekui, Zhang, Yang, Wu, Haiyang, Liu, Meixi, and Ma, Shaoping
Subjects: Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Deep Learning Recommendation Model(DLRM)s utilize the embedding layer to represent various categorical features. Traditional DLRMs adopt unified embedding size for all features, leading to suboptimal performance and redundant parameters. Thus, lots of Automatic Embedding size Search (AES) works focus on obtaining mixed embedding sizes with strong model performance. However, previous AES works can hardly address several challenges together: (1) The search results of embedding sizes are unstable; (2) Recommendation effect with AES results is unsatisfactory; (3) Memory cost of embeddings is uncontrollable. To address these challenges, we propose a novel one-shot AES framework called AdaS&S, in which a supernet encompassing various candidate embeddings is built and AES is performed as searching network architectures within it. Our framework contains two main stages: In the first stage, we decouple training parameters from searching embedding sizes, and propose the Adaptive Sampling method to yield a well-trained supernet, which further helps to produce stable AES results. In the second stage, to obtain embedding sizes that benefits the model effect, we design a reinforcement learning search process which utilizes the supernet trained previously. Meanwhile, to adapt searching to specific resource constraint, we introduce the resource competition penalty to balance the model effectiveness and memory cost of embeddings. We conduct extensive experiments on public datasets to show the superiority of AdaS&S. Our method could improve AUC by about 0.3% while saving about 20% of model parameters. Empirical analysis also shows that the stability of searching results in AdaS&S significantly exceeds other methods.
Published: 2024

153. Using Generative AI and Multi-Agents to Provide Automatic Feedback

Author: Guo, Shuchen, Latif, Ehsan, Zhou, Yifan, Huang, Xuan, and Zhai, Xiaoming
Subjects: Computer Science - Computation and Language
Abstract: This study investigates the use of generative AI and multi-agent systems to provide automatic feedback in educational contexts, particularly for student constructed responses in science assessments. The research addresses a key gap in the field by exploring how multi-agent systems, called AutoFeedback, can improve the quality of GenAI-generated feedback, overcoming known issues such as over-praise and over-inference that are common in single-agent large language models (LLMs). The study developed a multi-agent system consisting of two AI agents: one for generating feedback and another for validating and refining it. The system was tested on a dataset of 240 student responses, and its performance was compared to that of a single-agent LLM. Results showed that AutoFeedback significantly reduced the occurrence of over-praise and over-inference errors, providing more accurate and pedagogically sound feedback. The findings suggest that multi-agent systems can offer a more reliable solution for generating automated feedback in educational settings, highlighting their potential for scalable and personalized learning support. These results have important implications for educators and researchers seeking to leverage AI in formative assessments, offering a pathway to more effective feedback mechanisms that enhance student learning outcomes.
Published: 2024

154. METRIC: a complete methodology for performances evaluation of automatic target Detection, Recognition and Tracking algorithms in infrared imagery

Author: Gilles, Jérôme, Landeau, Stéphane, Dagobert, Tristan, Chevalier, Philippe, Stiée, Eric, Diaz, Damien, and Maillart, Jean-Luc
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: In this communication, we deal with the question of automatic target detection, recognition and tracking (ATD/R/T) algorithms performance assessment. We propose a complete methodology of evaluation which approaches objective image datasets development and adapted metrics definition for the different tasks (detection, recognition and tracking). We present some performance results which are currently processed in a French-MoD program called 2ACI (``Acquisition Automatique de Cibles par Imagerie``).
Published: 2024

155. Comparative Study of Probabilistic Atlas and Deep Learning Approaches for Automatic Brain Tissue Segmentation from MRI Using N4 Bias Field Correction and Anisotropic Diffusion Pre-processing Techniques

Author: Hossain, Mohammad Imran, Amin, Muhammad Zain, Anyimadu, Daniel Tweneboah, and Suleiman, Taofik Ahmed
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Automatic brain tissue segmentation from Magnetic Resonance Imaging (MRI) images is vital for accurate diagnosis and further analysis in medical imaging. Despite advancements in segmentation techniques, a comprehensive comparison between traditional statistical methods and modern deep learning approaches using pre-processing techniques like N4 Bias Field Correction and Anisotropic Diffusion remains underexplored. This study provides a comparative analysis of various segmentation models, including Probabilistic ATLAS, U-Net, nnU-Net, and LinkNet, enhanced with these pre-processing techniques to segment brain tissues (white matter (WM), grey matter (GM) and cerebrospinal fluid (CSF)) on the Internet Brain Segmentation Repository (IBSR18) dataset. Our results demonstrate that the 3D nnU-Net model outperforms others, achieving the highest mean Dice Coefficient score (0.937 +- 0.012), while the 2D nnU-Net model recorded the lowest mean Hausdorff Distance (5.005 +- 0.343 mm) and the lowest mean Absolute Volumetric Difference (3.695 +- 2.931 mm) across five unseen test samples. The findings highlight the superiority of nnU-Net models in brain tissue segmentation, particularly when combined with N4 Bias Field Correction and Anisotropic Diffusion pre-processing techniques. Our implemented code can be accessed via GitHub.
Published: 2024

156. Multistage Fine-tuning Strategies for Automatic Speech Recognition in Low-resource Languages

Author: Pillai, Leena G, Manohar, Kavya, Raju, Basil K, and Sherly, Elizabeth
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This paper presents a novel multistage fine-tuning strategy designed to enhance automatic speech recognition (ASR) performance in low-resource languages using OpenAI's Whisper model. In this approach we aim to build ASR model for languages with limited digital resources by sequentially adapting the model across linguistically similar languages. We experimented this on the Malasar language, a Dravidian language spoken by approximately ten thousand people in the Western Ghats of South India. Malasar language faces critical challenges for technological intervention due to its lack of a native script and absence of digital or spoken data resources. Working in collaboration with Wycliffe India and Malasar community members, we created a spoken Malasar corpus paired with transcription in Tamil script, a closely related major language. In our approach to build ASR model for Malasar, we first build an intermediate Tamil ASR, leveraging higher data availability for Tamil annotated speech. This intermediate model is subsequently fine-tuned on Malasar data, allowing for more effective ASR adaptation despite limited resources. The multistage fine-tuning strategy demonstrated significant improvements over direct fine-tuning on Malasar data alone, achieving a word error rate (WER) of 51.9%, which is 4.5% absolute reduction when compared to the direct fine-tuning method. Further a WER reduction to 47.3% was achieved through punctuation removal in post-processing, which addresses formatting inconsistencies that impact evaluation. Our results underscore the effectiveness of sequential multistage fine-tuning combined with targeted post-processing as a scalable strategy for ASR system development in low-resource languages, especially where linguistic similarities can be leveraged to bridge gaps in training data.
Published: 2024

157. Leveraging Transfer Learning and Multiple Instance Learning for HER2 Automatic Scoring of H\&E Whole Slide Images

Author: Abdulsadig, Rawan S., Williams, Bryan M., and Burlutskiy, Nikolay
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Quantitative Biology - Quantitative Methods
Abstract: Expression of human epidermal growth factor receptor 2 (HER2) is an important biomarker in breast cancer patients who can benefit from cost-effective automatic Hematoxylin and Eosin (H\&E) HER2 scoring. However, developing such scoring models requires large pixel-level annotated datasets. Transfer learning allows prior knowledge from different datasets to be reused while multiple-instance learning (MIL) allows the lack of detailed annotations to be mitigated. The aim of this work is to examine the potential of transfer learning on the performance of deep learning models pre-trained on (i) Immunohistochemistry (IHC) images, (ii) H\&E images and (iii) non-medical images. A MIL framework with an attention mechanism is developed using pre-trained models as patch-embedding models. It was found that embedding models pre-trained on H\&E images consistently outperformed the others, resulting in an average AUC-ROC value of $0.622$ across the 4 HER2 scores ($0.59-0.80$ per HER2 score). Furthermore, it was found that using multiple-instance learning with an attention layer not only allows for good classification results to be achieved, but it can also help with producing visual indication of HER2-positive areas in the H\&E slide image by utilising the patch-wise attention weights.
Published: 2024

158. Automatic Generation of Question Hints for Mathematics Problems using Large Language Models in Educational Technology

Author: Tonga, Junior Cedric, Clement, Benjamin, and Oudeyer, Pierre-Yves
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The automatic generation of hints by Large Language Models (LLMs) within Intelligent Tutoring Systems (ITSs) has shown potential to enhance student learning. However, generating pedagogically sound hints that address student misconceptions and adhere to specific educational objectives remains challenging. This work explores using LLMs (GPT-4o and Llama-3-8B-instruct) as teachers to generate effective hints for students simulated through LLMs (GPT-3.5-turbo, Llama-3-8B-Instruct, or Mistral-7B-instruct-v0.3) tackling math exercises designed for human high-school students, and designed using cognitive science principles. We present here the study of several dimensions: 1) identifying error patterns made by simulated students on secondary-level math exercises; 2) developing various prompts for GPT-4o as a teacher and evaluating their effectiveness in generating hints that enable simulated students to self-correct; and 3) testing the best-performing prompts, based on their ability to produce relevant hints and facilitate error correction, with Llama-3-8B-Instruct as the teacher, allowing for a performance comparison with GPT-4o. The results show that model errors increase with higher temperature settings. Notably, when hints are generated by GPT-4o, the most effective prompts include prompts tailored to specific errors as well as prompts providing general hints based on common mathematical errors. Interestingly, Llama-3-8B-Instruct as a teacher showed better overall performance than GPT-4o. Also the problem-solving and response revision capabilities of the LLMs as students, particularly GPT-3.5-turbo, improved significantly after receiving hints, especially at lower temperature settings. However, models like Mistral-7B-Instruct demonstrated a decline in performance as the temperature increased., Comment: Accepted at NeurIPS 2024 Workshop on Large Foundation Models for Educational Assessment (FM-Assess)
Published: 2024

159. An Application-Agnostic Automatic Target Recognition System Using Vision Language Models

Author: Palladino, Anthony, Gajewski, Dana, Aronica, Abigail, Deptula, Patryk, Hamme, Alexander, Lee, Seiyoung C., Muri, Jeff, Nelling, Todd, Riley, Michael A., Wong, Brian, and Duff, Margaret
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present a novel Automatic Target Recognition (ATR) system using open-vocabulary object detection and classification models. A primary advantage of this approach is that target classes can be defined just before runtime by a non-technical end user, using either a few natural language text descriptions of the target, or a few image exemplars, or both. Nuances in the desired targets can be expressed in natural language, which is useful for unique targets with little or no training data. We also implemented a novel combination of several techniques to improve performance, such as leveraging the additional information in the sequence of overlapping frames to perform tubelet identification (i.e., sequential bounding box matching), bounding box re-scoring, and tubelet linking. Additionally, we developed a technique to visualize the aggregate output of many overlapping frames as a mosaic of the area scanned during the aerial surveillance or reconnaissance, and a kernel density estimate (or heatmap) of the detected targets. We initially applied this ATR system to the use case of detecting and clearing unexploded ordinance on airfield runways and we are currently extending our research to other real-world applications., Comment: Accepted to the Thirty-Seventh Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-25)
Published: 2024

160. Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation?

Author: Xiao, Jingyu, Wan, Yuxuan, Huo, Yintong, Xu, Zhiyao, and Lyu, Michael R.
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction
Abstract: Converting webpage design into functional UI code is a critical step for building websites, which can be labor-intensive and time-consuming. To automate this design-to-code transformation process, various automated methods using learning-based networks and multi-modal large language models (MLLMs) have been proposed. However, these studies were merely evaluated on a narrow range of static web pages and ignored dynamic interaction elements, making them less practical for real-world website deployment. To fill in the blank, we present the first systematic investigation of MLLMs in generating interactive webpages. Specifically, we first formulate the Interaction-to-Code task and build the Interaction2Code benchmark that contains 97 unique web pages and 213 distinct interactions, spanning 15 webpage types and 30 interaction categories. We then conduct comprehensive experiments on three state-of-the-art (SOTA) MLLMs using both automatic metrics and human evaluations, thereby summarizing six findings accordingly. Our experimental results highlight the limitations of MLLMs in generating fine-grained interactive features and managing interactions with complex transformations and subtle visual modifications. We further analyze failure cases and their underlying causes, identifying 10 common failure types and assessing their severity. Additionally, our findings reveal three critical influencing factors, i.e., prompts, visual saliency, and textual descriptions, that can enhance the interaction generation performance of MLLMs. Based on these findings, we elicit implications for researchers and developers, providing a foundation for future advancements in this field. Datasets and source code are available at https://github.com/WebPAI/Interaction2Code.
Published: 2024

161. MA^2: A Self-Supervised and Motion Augmenting Autoencoder for Gait-Based Automatic Disease Detection

Author: Liu, Yiqun, Zhang, Ke, and Zhu, Yin
Subjects: Physics - Biological Physics, Computer Science - Computer Vision and Pattern Recognition
Abstract: Ground reaction force (GRF) is the force exerted by the ground on a body in contact with it. GRF-based automatic disease detection (ADD) has become an emerging medical diagnosis method, which aims to learn and identify disease patterns corresponding to different gait pressures based on deep learning methods. Although existing ADD methods can save doctors time in making diagnoses, training deep models still struggles with the cost caused by the labeling engineering for a large number of gait diagnostic data for subjects. On the other hand, the accuracy of the deep model under the unified benchmark GRF dataset and the generalization ability on scalable gait datasets need to be further improved. To address these issues, we propose MA2, a GRF-based self-supervised and motion augmenting auto-encoder, which models the ADD task as an encoder-decoder paradigm. In the encoder, we introduce an embedding block including the 3-layer 1D convolution for extracting the token and a mask generator to randomly mask out the sequence of tokens to maximize the model's potential to capture high-level, discriminative, intrinsic representations. whereafter, the decoder utilizes this information to reconstruct the pixel sequence of the origin input and calculate the reconstruction loss to optimize the network. Moreover, the backbone of an auto-encoder is multi-head self-attention that can consider the global information of the token from the input, not just the local neighborhood. This allows the model to capture generalized contextual information. Extensive experiments demonstrate MA2 has SOTA performance of 90.91% accuracy on 1% limited pathological GRF samples with labels, and good generalization ability of 78.57% accuracy on scalable Parkinson disease dataset., Comment: 8 pages, 11 figures, article
Published: 2024

162. Applications of Automatic Differentiation in Image Registration

Author: Watson, Warin, Cherry, Cash, and Lang, Rachelle
Subjects: Mathematics - Optimization and Control
Abstract: We demonstrate that automatic differentiation, which has become commonly available in machine learning frameworks, is an efficient way to explore ideas that lead to algorithmic improvement in multi-scale affine image registration and affine super-resolution problems. In our first experiment on multi-scale registration, we implement an ODE predictor-corrector method involving a derivative with respect to the scale parameter and the Hessian of an image registration objective function, both of which would be difficult to compute without AD. Our findings indicate that exact Hessians are necessary for the method to provide any benefits over a traditional multi-scale method; a Gauss-Newton Hessian approximation fails to provide such benefits. In our second experiment, we implement a variable projected Gauss-Newton method for super-resolution and use AD to differentiate through the iteratively computed projection, a method previously unaddressed in the literature. We show that Jacobians obtained without differentiating through the projection are poor approximations to the true Jacobians of the variable projected forward map and explore the performance of some other approximations. By addressing these problems, this work contributes to the application of AD in image registration and sets a precedent for further use of machine learning tools in this field., Comment: 15 pages, 11 figures, to be submitted to SIAM Undergraduate Research Online
Published: 2024

163. TransUNext: towards a more advanced U-shaped framework for automatic vessel segmentation in the fundus image

Author: Li, Xiang, Liu, Mingsi, and Duan, Lixin
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Purpose: Automatic and accurate segmentation of fundus vessel images has become an essential prerequisite for computer-aided diagnosis of ophthalmic diseases such as diabetes mellitus. The task of high-precision retinal vessel segmentation still faces difficulties due to the low contrast between the branch ends of retinal vessels and the background, the long and thin vessel span, and the variable morphology of the optic disc and optic cup in fundus vessel images. Methods: We propose a more advanced U-shaped architecture for a hybrid Transformer and CNN: TransUNext, which integrates an Efficient Self-attention Mechanism into the encoder and decoder of U-Net to capture both local features and global dependencies with minimal computational overhead. Meanwhile, the Global Multi-Scale Fusion (GMSF) module is further introduced to upgrade skip-connections, fuse high-level semantic and low-level detailed information, and eliminate high- and low-level semantic differences. Inspired by ConvNeXt, TransNeXt Block is designed to optimize the computational complexity of each base block in U-Net and avoid the information loss caused by the compressed dimension when the information is converted between the feature spaces of different dimensions. Results: We evaluated the proposed method on four public datasets DRIVE, STARE, CHASE-DB1, and HRF. In the experimental results, the AUC (area under the ROC curve) values were 0.9867, 0.9869, 0.9910, and 0.9887, which exceeded the other state-of-the-art.
Published: 2024

164. Advancing NASA-TLX: Automatic User Interaction Analysis for Workload Evaluation in XR Scenarios

Author: Vidal-Balea, Aida, Fraga-Lamas, Paula, and Fernandez-Carames, Tiago M.
Subjects: Computer Science - Human-Computer Interaction
Abstract: Calculating the effort required to complete a task has always been somewhat difficult, as it depends on each person and becomes very subjective. For this reason, different methodologies were developed to try to standardize these procedures. This article addresses some of the problems that arise when applying NASA-Task Load Index (NASA-TLX), a methodology to calculate the mental workload of tasks performed in industrial environments. In addition, an improvement of this methodology is proposed to adapt it to the new times and to emerging Extended Reality (XR) technologies. Finally, a system is proposed for automatic collection of user performance metrics, providing an autonomous method that collects this information and does not depend on the users' willingness to fill in a feedback questionnaire., Comment: Paper accepted in IEEE GEM 2024
Published: 2024
Full Text: View/download PDF

165. SpineFM: Leveraging Foundation Models for Automatic Spine X-ray Segmentation

Author: Simons, Samuel J. and Papież, Bartłomiej W.
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper introduces SpineFM, a novel pipeline that achieves state-of-the-art performance in the automatic segmentation and identification of vertebral bodies in cervical and lumbar spine radiographs. SpineFM leverages the regular geometry of the spine, employing a novel inductive process to sequentially infer the location of each vertebra along the spinal column. Vertebrae are segmented using Medical-SAM-Adaptor, a robust foundation model that diverges from commonly used CNN-based models. We achieved outstanding results on two publicly available spine X-Ray datasets, with successful identification of 97.8\% and 99.6\% of annotated vertebrae, respectively. Of which, our segmentation reached an average Dice of 0.942 and 0.921, surpassing previous state-of-the-art methods., Comment: 4 pages, 3 figures, submitted to ISBI 2025
Published: 2024

166. Bayesian Smoothing and Feature Selection Using variational Automatic Relevance Determination

Author: Liu, Zihe, Saha, Diptarka, and Liang, Feng
Subjects: Statistics - Methodology
Abstract: This study introduces Variational Automatic Relevance Determination (VARD), a novel approach tailored for fitting sparse additive regression models in high-dimensional settings. VARD distinguishes itself by its ability to independently assess the smoothness of each feature while enabling precise determination of whether a feature's contribution to the response is zero, linear, or nonlinear. Further, an efficient coordinate descent algorithm is introduced to implement VARD. Empirical evaluations on simulated and real-world data underscore VARD's superiority over alternative variable selection methods for additive models.
Published: 2024

167. Automatic feature selection and weighting using Differentiable Information Imbalance

Author: Wild, Romina, Del Tatto, Vittorio, Wodaczek, Felix, Cheng, Bingqing, and Laio, Alessandro
Subjects: Computer Science - Machine Learning, Physics - Computational Physics, Statistics - Machine Learning
Abstract: Feature selection is a common process in many applications, but it is accompanied by uncertainties such as: What is the optimal dimensionality of an interpretable, reduced feature space to retain a maximum amount of information? How to account for different units of measure in features? How to weight different features according to their importance? To address these challenges, we introduce the Differentiable Information Imbalance (DII), an automatic data analysis method to rank information content between sets of features. Based on the nearest neighbors according to distances in the ground truth feature space, the method finds a low-dimensional subset of the input features, within which the pairwise distance relations are most similar to the ground truth. By employing the Differentiable Information Imbalance as a loss function, the relative feature weights of the inputs are optimized, simultaneously performing unit alignment and relative importance scaling, while preserving interpretability. Furthermore, this method can generate sparse solutions and determine the optimal size of the reduced feature space. We illustrate the usefulness of this approach on two prototypical benchmark problems: (1) Identifying a small set of collective variables capable of describing the conformational space of a biomolecule, and (2) selecting a subset of features for training a machine-learning force field. The results highlight the potential of the Differentiable Information Imbalance in addressing feature selection challenges and optimizing dimensionality in various applications. The method is implemented in the Python library DADApy.
Published: 2024

168. Optimized Flow Control based on Automatic Differentiation in Compressible Turbulent Channel Flows

Author: Wang, Wenkang and Chu, Xu
Subjects: Physics - Fluid Dynamics
Abstract: This study presents an automatic differentiation (AD)-based optimization framework for flow control in compressible turbulent channel flows. We developed a fully differentiable boundary condition framework that allows for the precise calculation of gradients with respect to boundary control variables. This facilitates the efficient optimization of flow control methods. The framework's adaptability and effectiveness are demonstrated using two boundary conditions: opposition control and tunable permeable walls. Various optimization targets are evaluated, including wall friction and turbulent kinetic energy (TKE), across different time horizons. In each optimization, there were around $4\times10^4$ control variables and $3\times10^{9}$ state variables in a single episode. Results indicate that TKE-targeted opposition control achieves a more stable and significant reduction in drag, with effective suppression of turbulence throughout the channel. In contrast, strategies that focus directly on minimizing wall friction were found to be less effective, exhibiting instability and increased turbulence in the outer region. The tunable permeable walls also show potential to achieve stable drag reduction through a `flux-inducing' mechanism. This study demonstrates the advantages of AD-based optimization in complex flow control scenarios and provides physical insight into the choice of quantity of interest for improved optimization performance.
Published: 2024

169. Augmenting Polish Automatic Speech Recognition System With Synthetic Data

Author: Bondaruk, Łukasz, Kubiak, Jakub, and Czyżnikiewicz, Mateusz
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: This paper presents a system developed for submission to Poleval 2024, Task 3: Polish Automatic Speech Recognition Challenge. We describe Voicebox-based speech synthesis pipeline and utilize it to augment Conformer and Whisper speech recognition models with synthetic data. We show that addition of synthetic speech to training improves achieved results significantly. We also present final results achieved by our models in the competition.
Published: 2024

170. Automatic Estimation of Singing Voice Musical Dynamics

Author: Narang, Jyoti, Tamer, Nazif Can, De La Vega, Viviana, and Serra, Xavier
Subjects: Computer Science - Sound, Computer Science - Information Retrieval, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Musical dynamics form a core part of expressive singing voice performances. However, automatic analysis of musical dynamics for singing voice has received limited attention partly due to the scarcity of suitable datasets and a lack of clear evaluation frameworks. To address this challenge, we propose a methodology for dataset curation. Employing the proposed methodology, we compile a dataset comprising 509 musical dynamics annotated singing voice performances, aligned with 163 score files, leveraging state-of-the-art source separation and alignment techniques. The scores are sourced from the OpenScore Lieder corpus of romantic-era compositions, widely known for its wealth of expressive annotations. Utilizing the curated dataset, we train a multi-head attention based CNN model with varying window sizes to evaluate the effectiveness of estimating musical dynamics. We explored two distinct perceptually motivated input representations for the model training: log-Mel spectrum and bark-scale based features. For testing, we manually curate another dataset of 25 musical dynamics annotated performances in collaboration with a professional vocalist. We conclude through our experiments that bark-scale based features outperform log-Mel-features for the task of singing voice dynamics prediction. The dataset along with the code is shared publicly for further research on the topic., Comment: To be published in ISMIR 2024, 6 pages
Published: 2024

171. AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs

Author: Siro, Clemencia, Yuan, Yifei, Aliannejadi, Mohammad, and de Rijke, Maarten
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval
Abstract: Generating diverse and effective clarifying questions is crucial for improving query understanding and retrieval performance in open-domain conversational search (CS) systems. We propose AGENT-CQ (Automatic GENeration, and evaluaTion of Clarifying Questions), an end-to-end LLM-based framework addressing the challenges of scalability and adaptability faced by existing methods that rely on manual curation or template-based approaches. AGENT-CQ consists of two stages: a generation stage employing LLM prompting strategies to generate clarifying questions, and an evaluation stage (CrowdLLM) that simulates human crowdsourcing judgments using multiple LLM instances to assess generated questions and answers based on comprehensive quality metrics. Extensive experiments on the ClariQ dataset demonstrate CrowdLLM's effectiveness in evaluating question and answer quality. Human evaluation and CrowdLLM show that the AGENT-CQ - generation stage, consistently outperforms baselines in various aspects of question and answer quality. In retrieval-based evaluation, LLM-generated questions significantly enhance retrieval effectiveness for both BM25 and cross-encoder models compared to human-generated questions., Comment: 23 pages
Published: 2024

172. An LLM Agent for Automatic Geospatial Data Analysis

Author: Chen, Yuxing, Wang, Weijie, Lobry, Sylvain, and Kurtz, Camille
Subjects: Computer Science - Computers and Society, Computer Science - Computation and Language
Abstract: Large language models (LLMs) are being used in data science code generation tasks, but they often struggle with complex sequential tasks, leading to logical errors. Their application to geospatial data processing is particularly challenging due to difficulties in incorporating complex data structures and spatial constraints, effectively utilizing diverse function calls, and the tendency to hallucinate less-used geospatial libraries. To tackle these problems, we introduce GeoAgent, a new interactive framework designed to help LLMs handle geospatial data processing more effectively. GeoAgent pioneers the integration of a code interpreter, static analysis, and Retrieval-Augmented Generation (RAG) techniques within a Monte Carlo Tree Search (MCTS) algorithm, offering a novel approach to geospatial data processing. In addition, we contribute a new benchmark specifically designed to evaluate the LLM-based approach in geospatial tasks. This benchmark leverages a variety of Python libraries and includes both single-turn and multi-turn tasks such as data acquisition, data analysis, and visualization. By offering a comprehensive evaluation among diverse geospatial contexts, this benchmark sets a new standard for developing LLM-based approaches in geospatial data analysis tasks. Our findings suggest that relying solely on knowledge of LLM is insufficient for accurate geospatial task programming, which requires coherent multi-step processes and multiple function calls. Compared to the baseline LLMs, the proposed GeoAgent has demonstrated superior performance, yielding notable improvements in function calls and task completion. In addition, these results offer valuable insights for the future development of LLM agents in automatic geospatial data analysis task programming.
Published: 2024

173. Evaluating and Improving Automatic Speech Recognition Systems for Korean Meteorological Experts

Author: Park, ChaeHun, Cho, Hojun, and Choo, Jaegul
Subjects: Computer Science - Computation and Language
Abstract: This paper explores integrating Automatic Speech Recognition (ASR) into natural language query systems to improve weather forecasting efficiency for Korean meteorologists. We address challenges in developing ASR systems for the Korean weather domain, specifically specialized vocabulary and Korean linguistic intricacies. To tackle these issues, we constructed an evaluation dataset of spoken queries recorded by native Korean speakers. Using this dataset, we assessed various configurations of a multilingual ASR model family, identifying performance limitations related to domain-specific terminology. We then implemented a simple text-to-speech-based data augmentation method, which improved the recognition of specialized terms while maintaining general-domain performance. Our contributions include creating a domain-specific dataset, comprehensive ASR model evaluations, and an effective augmentation technique. We believe our work provides a foundation for future advancements in ASR for the Korean weather forecasting domain.
Published: 2024

174. Learning Structured Compressed Sensing with Automatic Resource Allocation

Author: Wang, Han, Pérez, Eduardo, Huijben, Iris A. M., van Gorp, Hans, van Sloun, Ruud, and Römer, Florian
Subjects: Computer Science - Machine Learning
Abstract: Multidimensional data acquisition often requires extensive time and poses significant challenges for hardware and software regarding data storage and processing. Rather than designing a single compression matrix as in conventional compressed sensing, structured compressed sensing yields dimension-specific compression matrices, reducing the number of optimizable parameters. Recent advances in machine learning (ML) have enabled task-based supervised learning of subsampling matrices, albeit at the expense of complex downstream models. Additionally, the sampling resource allocation across dimensions is often determined in advance through heuristics. To address these challenges, we introduce Structured COmpressed Sensing with Automatic Resource Allocation (SCOSARA) with an information theory-based unsupervised learning strategy. SCOSARA adaptively distributes samples across sampling dimensions while maximizing Fisher information content. Using ultrasound localization as a case study, we compare SCOSARA to state-of-the-art ML-based and greedy search algorithms. Simulation results demonstrate that SCOSARA can produce high-quality subsampling matrices that achieve lower Cram\'er-Rao Bound values than the baselines. In addition, SCOSARA outperforms other ML-based algorithms in terms of the number of trainable parameters, computational complexity, and memory requirements while automatically choosing the number of samples per axis., Comment: Unsupervised Learning, Information Theory, Compressed Sensing, Subsampling
Published: 2024

175. Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages

Author: Deoghare, Sourabh, Kanojia, Diptesh, and Bhattacharyya, Pushpak
Subjects: Computer Science - Computation and Language
Abstract: This exploratory study investigates the potential of multilingual Automatic Post-Editing (APE) systems to enhance the quality of machine translations for low-resource Indo-Aryan languages. Focusing on two closely related language pairs, English-Marathi and English-Hindi, we exploit the linguistic similarities to develop a robust multilingual APE model. To facilitate cross-linguistic transfer, we generate synthetic Hindi-Marathi and Marathi-Hindi APE triplets. Additionally, we incorporate a Quality Estimation (QE)-APE multi-task learning framework. While the experimental results underline the complementary nature of APE and QE, we also observe that QE-APE multitask learning facilitates effective domain adaptation. Our experiments demonstrate that the multilingual APE models outperform their corresponding English-Hindi and English-Marathi single-pair models by $2.5$ and $2.39$ TER points, respectively, with further notable improvements over the multilingual APE model observed through multi-task learning ($+1.29$ and $+1.44$ TER points), data augmentation ($+0.53$ and $+0.45$ TER points) and domain adaptation ($+0.35$ and $+0.45$ TER points). We release the synthetic data, code, and models accrued during this study publicly at https://github.com/cfiltnlp/Multilingual-APE., Comment: Accepted at Findings of EMNLP 2024
Published: 2024

176. Improving Automatic Speech Recognition with Decoder-Centric Regularisation in Encoder-Decoder Models

Author: Polok, Alexander, Kesiraju, Santosh, Beneš, Karel, Burget, Lukáš, and Černocký, Jan
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This paper proposes a simple yet effective way of regularising the encoder-decoder-based automatic speech recognition (ASR) models that enhance the robustness of the model and improve the generalisation to out-of-domain scenarios. The proposed approach is dubbed as $\textbf{De}$coder-$\textbf{C}$entric $\textbf{R}$egularisation in $\textbf{E}$ncoder-$\textbf{D}$ecoder (DeCRED) architecture for ASR, where auxiliary classifier(s) is introduced in layers of the decoder module. Leveraging these classifiers, we propose two decoding strategies that re-estimate the next token probabilities. Using the recent E-branchformer architecture, we build strong ASR systems that obtained competitive WERs as compared to Whisper-medium and outperformed OWSM v3; while relying only on a fraction of training data and model size. On top of such a strong baseline, we show that DeCRED can further improve the results and, moreover, generalise much better to out-of-domain scenarios, where we show an absolute reduction of 2.7 and 2.9 WERs on AMI and Gigaspeech datasets, respectively. We provide extensive analysis and accompanying experiments that support the benefits of the proposed regularisation scheme.
Published: 2024

177. Automatic Differentiation of Optimization Algorithms with Time-Varying Updates

Author: Mehmood, Sheheryar and Ochs, Peter
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning
Abstract: Numerous Optimization Algorithms have a time-varying update rule thanks to, for instance, a changing step size, momentum parameter or, Hessian approximation. In this paper, we apply unrolled or automatic differentiation to a time-varying iterative process and provide convergence (rate) guarantees for the resulting derivative iterates. We adapt these convergence results and apply them to proximal gradient descent with variable step size and FISTA when solving partly smooth problems. We confirm our findings numerically by solving $\ell_1$ and $\ell_2$-regularized linear and logisitc regression respectively. Our theoretical and numerical results show that the convergence rate of the algorithm is reflected in its derivative iterates., Comment: arXiv admin note: text overlap with arXiv:2208.03107
Published: 2024

178. ATOMIC: Automatic Tool for Memristive IMPLY-based Circuit-level Simulation and Validation

Author: Seiler, Fabian and TaheriNejad, Nima
Subjects: Computer Science - Emerging Technologies
Abstract: Since performance improvements of computers are stagnating, new technologies and computer paradigms are hot research topics. Memristor-based In-Memory Computing is one of the promising candidates for the post-CMOS era, which comes in many flavors. Processing In memory Array (PIA) or using memory, is on of them which is a relatively new approach, and substantially different than traditional CMOS-based logic design. Consequently, there is a lack of publicly available CAD tools for memristive PIA design and evaluation. Here, we present ATOMIC: an Automatic Tool for Memristive IMPLY-based Circuit-level Simulation and Validation. Using our tool, a large portion of the simulation, evaluation, and validation process can be performed automatically, drastically reducing the development time for memristive PIA systems, in particular those using IMPLY logic. The code is available at https://github.com/fabianseiler/ATOMIC., Comment: 4 pages, 5 figures, Submitted and Presented at the Embedded Systems Software Competition 2024 at ESWEEK
Published: 2024

179. End-to-End Transformer-based Automatic Speech Recognition for Northern Kurdish: A Pioneering Approach

Author: Abdullah, Abdulhady Abas, Tabibian, Shima, Veisi, Hadi, Mahmudi, Aso, and Rashid, Tarik
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language
Abstract: Automatic Speech Recognition (ASR) for low-resource languages remains a challenging task due to limited training data. This paper introduces a comprehensive study exploring the effectiveness of Whisper, a pre-trained ASR model, for Northern Kurdish (Kurmanji) an under-resourced language spoken in the Middle East. We investigate three fine-tuning strategies: vanilla, specific parameters, and additional modules. Using a Northern Kurdish fine-tuning speech corpus containing approximately 68 hours of validated transcribed data, our experiments demonstrate that the additional module fine-tuning strategy significantly improves ASR accuracy on a specialized test set, achieving a Word Error Rate (WER) of 10.5% and Character Error Rate (CER) of 5.7% with Whisper version 3. These results underscore the potential of sophisticated transformer models for low-resource ASR and emphasize the importance of tailored fine-tuning techniques for optimal performance.
Published: 2024

180. AC-Mix: Self-Supervised Adaptation for Low-Resource Automatic Speech Recognition using Agnostic Contrastive Mixup

Author: Carvalho, Carlos and Abad, Alberto
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: Self-supervised learning (SSL) leverages large amounts of unlabelled data to learn rich speech representations, fostering improvements in automatic speech recognition (ASR), even when only a small amount of labelled data is available for fine-tuning. Despite the advances in SSL, a significant challenge remains when the data used for pre-training (source domain) mismatches the fine-tuning data (target domain). To tackle this domain mismatch challenge, we propose a new domain adaptation method for low-resource ASR focused on contrastive mixup for joint-embedding architectures named AC-Mix (agnostic contrastive mixup). In this approach, the SSL model is adapted through additional pre-training using mixed data views created by interpolating samples from the source and the target domains. Our proposed adaptation method consistently outperforms the baseline system, using approximately 11 hours of adaptation data and requiring only 1 hour of adaptation time on a single GPU with WavLM-Large.
Published: 2024

181. Quantity vs. Quality of Monolingual Source Data in Automatic Text Translation: Can It Be Too Little If It Is Too Good?

Author: Abdulmumin, Idris, Galadanci, Bashir Shehu, Aliyu, Garba, and Muhammad, Shamsuddeen Hassan
Subjects: Computer Science - Computation and Language
Abstract: Monolingual data, being readily available in large quantities, has been used to upscale the scarcely available parallel data to train better models for automatic translation. Self-learning, where a model is made to learn from its output, is one approach to exploit such data. However, it has been shown that too much of this data can be detrimental to the performance of the model if the available parallel data is comparatively extremely low. In this study, we investigate whether the monolingual data can also be too little and if this reduction, based on quality, has any effect on the performance of the translation model. Experiments have shown that on English-German low-resource NMT, it is often better to select only the most useful additional data, based on quality or closeness to the domain of the test data, than utilizing all of the available data.
Published: 2024
Full Text: View/download PDF

182. An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation

Author: Chen, Junjie, Su, Weihang, Chu, Zhumin, Li, Haitao, Ai, Qinyao, Liu, Yiqun, Zhang, Min, and Ma, Shaoping
Subjects: Computer Science - Computation and Language
Abstract: With the rapid development of large language models (LLMs), how to efficiently evaluate them has become an important research question. Existing evaluation methods often suffer from high costs, limited test formats, the need of human references, and systematic evaluation biases. To address these limitations, our study introduces the Auto-PRE, an automatic LLM evaluation framework based on peer review. In contrast to previous studies that rely on human annotations, Auto-PRE selects evaluator LLMs automatically based on their inherent traits including consistency, self-confidence, and pertinence. We conduct extensive experiments on three tasks: summary generation, non-factoid question-answering, and dialogue generation. Experimental results indicate our Auto-PRE achieves state-of-the-art performance at a lower cost. Moreover, our study highlights the impact of prompt strategies and evaluation formats on evaluation performance, offering guidance for method optimization in the future.
Published: 2024

183. AutoSimTTF: A Fully Automatic Pipeline for Electric Field Simulation and Treatment Planning of Tumor Treating Fields

Author: Wang, Minmin, Xie, Xu, Fan, Zhengbo, Lan, Yue, Pan, Yun, Chen, Guangdi, Zhang, Shaomin, and Wang, Yuxing
Subjects: Physics - Medical Physics
Abstract: Objective: Tumor Treating Fields (TTFields) is an emerging approach for cancer therapy that inhibits tumor cell proliferation by applying alternating electric fields (EF) of intermediate frequency and low intensity. The TTFields-induced electric field intensity at the tumor site is closely related to the therapeutic efficacy. Therefore, the EF simulation based on realistic head models have been utilized for the dosage analysis and treatment optimization of TTFields. However, current modeling methods require manual segmentation of tumors and rely on commercial software, which is time-consuming and labor-intensive. Approach: We introduce AutoSimTTF, a fully automatic pipeline for simulating and optimizing the EF distribution for TTFields. The main steps of AutoSimTTF utilize open-source toolkits, enabling fully automated processing of individual MRI data for TTFields. Additionally, AutoSimTTF allows for parameter optimization based on individual anatomical information, thereby achieving a more focused and higher EF distribution at the tumor site. Main results: Compared to conventional EF calculation processes, deviations in AutoSimTTF are below 20%. The optimal treatment parameters generated by AutoSimTTF produces a higher EF intensity at the tumor site (111.9%) and better focality (19.4%) compared to traditional TTFields settings. Significance: AutoSimTTF provides significant reference value and guidance for the clinical application and treatment planning of TTFields., Comment: 8 pages, 6 figures
Published: 2024

184. A note on finite-dimensional quotients and the problem of automatic continuity for twisted convolution algebras

Author: Flores, Felipe I.
Subjects: Mathematics - Functional Analysis, Mathematics - Operator Algebras, Primary 43A20, Secondary 47L65, 46H40
Abstract: In this note we show that the twisted convolution algebra $L^1_{\alpha,\omega}({\sf G},\mathfrak A)$ associated to a twisted action of a locally compact group ${\sf G}$ on a $C^*$-algebra $\mathfrak A$ has the following property: Every quotient by a closed two-sided ideal of finite codimension produces a semisimple algebra. We use this property, together with results of H. Dales and G. Willis, to build up on previous results of the author and produce large classes of examples of algebras with properties of automatic continuity., Comment: 7 pages. Comments are welcomed
Published: 2024

185. Information Importance-Aware Defense against Adversarial Attack for Automatic Modulation Classification:An XAI-Based Approach

Author: Wang, Jingchun, Dong, Peihao, Zhou, Fuhui, and Wu, Qihui
Subjects: Electrical Engineering and Systems Science - Signal Processing
Abstract: Deep learning (DL) has significantly improved automatic modulation classification (AMC) by leveraging neural networks as the feature extractor.However, as the DL-based AMC becomes increasingly widespread, it is faced with the severe secure issue from various adversarial attacks. Existing defense methods often suffer from the high computational cost, intractable parameter tuning, and insufficient robustness.This paper proposes an eXplainable artificial intelligence (XAI) defense approach, which uncovers the negative information caused by the adversarial attack through measuring the importance of input features based on the SHapley Additive exPlanations (SHAP).By properly removing the negative information in adversarial samples and then fine-tuning(FT) the model, the impact of the attacks on the classification result can be mitigated.Experimental results demonstrate that the proposed SHAP-FT improves the classification performance of the model by 15%-20% under different attack levels,which not only enhances model robustness against various attack levels but also reduces the resource consumption, validating its effectiveness in safeguarding communication networks., Comment: Accepted by WCSP 2024
Published: 2024

186. Augmenting In-Context-Learning in LLMs via Automatic Data Labeling and Refinement

Author: Shtok, Joseph, Alfassy, Amit, Dahood, Foad Abo, Schwartz, Eliyahu, Doveh, Sivan, and Arbelle, Assaf
Subjects: Computer Science - Computation and Language
Abstract: It has been shown that Large Language Models' (LLMs) performance can be improved for many tasks using Chain of Thought (CoT) or In-Context Learning (ICL), which involve demonstrating the steps needed to solve a task using a few examples. However, while datasets with input-output pairs are relatively easy to produce, providing demonstrations which include intermediate steps requires cumbersome manual work. These steps may be executable programs, as in agentic flows, or step-by-step reasoning as in CoT. In this work, we propose Automatic Data Labeling and Refinement (ADLR), a method to automatically generate and filter demonstrations which include the above intermediate steps, starting from a small seed of manually crafted examples. We demonstrate the advantage of ADLR in code-based table QA and mathematical reasoning, achieving up to a 5.5% gain. The code implementing our method is provided in the Supplementary material and will be made available.
Published: 2024

187. Automatic Speech Recognition with BERT and CTC Transformers: A Review

Author: Djeffal, Noussaiba, Kheddar, Hamza, Addou, Djamel, Mazari, Ahmed Cherif, and Himeur, Yassine
Subjects: Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This review paper provides a comprehensive analysis of recent advances in automatic speech recognition (ASR) with bidirectional encoder representations from transformers BERT and connectionist temporal classification (CTC) transformers. The paper first introduces the fundamental concepts of ASR and discusses the challenges associated with it. It then explains the architecture of BERT and CTC transformers and their potential applications in ASR. The paper reviews several studies that have used these models for speech recognition tasks and discusses the results obtained. Additionally, the paper highlights the limitations of these models and outlines potential areas for further research. All in all, this review provides valuable insights for researchers and practitioners who are interested in ASR with BERT and CTC transformers.
Published: 2024
Full Text: View/download PDF

188. A SAM based Tool for Semi-Automatic Food Annotation

Author: Rahman, Lubnaa Abdur, Papathanail, Ioannis, Brigato, Lorenzo, and Mougiakakou, Stavroula
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: The advancement of artificial intelligence (AI) in food and nutrition research is hindered by a critical bottleneck: the lack of annotated food data. Despite the rise of highly efficient AI models designed for tasks such as food segmentation and classification, their practical application might necessitate proficiency in AI and machine learning principles, which can act as a challenge for non-AI experts in the field of nutritional sciences. Alternatively, it highlights the need to translate AI models into user-friendly tools that are accessible to all. To address this, we present a demo of a semi-automatic food image annotation tool leveraging the Segment Anything Model (SAM). The tool enables prompt-based food segmentation via user interactions, promoting user engagement and allowing them to further categorise food items within meal images and specify weight/volume if necessary. Additionally, we release a fine-tuned version of SAM's mask decoder, dubbed MealSAM, with the ViT-B backbone tailored specifically for food image segmentation. Our objective is not only to contribute to the field by encouraging participation, collaboration, and the gathering of more annotated food data but also to make AI technology available for a broader audience by translating AI into practical tools., Comment: Accepted Demo Paper - ECAI 2024
Published: 2024
Full Text: View/download PDF

189. Hey AI Can You Grade My Essay?: Automatic Essay Grading

Author: Maliha, Maisha and Pramanik, Vishal
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Automatic essay grading (AEG) has attracted the the attention of the NLP community because of its applications to several educational applications, such as scoring essays, short answers, etc. AEG systems can save significant time and money when grading essays. In the existing works, the essays are graded where a single network is responsible for the whole process, which may be ineffective because a single network may not be able to learn all the features of a human-written essay. In this work, we have introduced a new model that outperforms the state-of-the-art models in the field of AEG. We have used the concept of collaborative and transfer learning, where one network will be responsible for checking the grammatical and structural features of the sentences of an essay while another network is responsible for scoring the overall idea present in the essay. These learnings are transferred to another network to score the essay. We also compared the performances of the different models mentioned in our work, and our proposed model has shown the highest accuracy of 85.50%., Comment: Accepted in ICAAAIML (4th International Conference on Advances and Applications of Artificial Intelligence and Machine Learning) 2023
Published: 2024

190. ACER: Automatic Language Model Context Extension via Retrieval

Author: Gao, Luyu, Zhang, Yunyi, and Callan, Jamie
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Long-context modeling is one of the critical capabilities of language AI for digesting and reasoning over complex information pieces. In practice, long-context capabilities are typically built into a pre-trained language model~(LM) through a carefully designed context extension stage, with the goal of producing generalist long-context capabilities. In our preliminary experiments, however, we discovered that the current open-weight generalist long-context models are still lacking in practical long-context processing tasks. While this means perfectly effective long-context modeling demands task-specific data, the cost can be prohibitive. In this paper, we draw inspiration from how humans process a large body of information: a lossy \textbf{retrieval} stage ranks a large set of documents while the reader ends up reading deeply only the top candidates. We build an \textbf{automatic} data synthesis pipeline that mimics this process using short-context LMs. The short-context LMs are further tuned using these self-generated data to obtain task-specific long-context capabilities. Similar to how pre-training learns from imperfect data, we hypothesize and further demonstrate that the short-context model can bootstrap over the synthetic data, outperforming not only long-context generalist models but also the retrieval and read pipeline used to synthesize the training data in real-world tasks such as long-context retrieval augmented generation.
Published: 2024

191. Enhancing Indonesian Automatic Speech Recognition: Evaluating Multilingual Models with Diverse Speech Variabilities

Author: Adila, Aulia, Lestari, Dessi, Purwarianti, Ayu, Tanaya, Dipta, Azizah, Kurniawati, and Sakti, Sakriani
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: An ideal speech recognition model has the capability to transcribe speech accurately under various characteristics of speech signals, such as speaking style (read and spontaneous), speech context (formal and informal), and background noise conditions (clean and moderate). Building such a model requires a significant amount of training data with diverse speech characteristics. Currently, Indonesian data is dominated by read, formal, and clean speech, leading to a scarcity of Indonesian data with other speech variabilities. To develop Indonesian automatic speech recognition (ASR), we present our research on state-of-the-art speech recognition models, namely Massively Multilingual Speech (MMS) and Whisper, as well as compiling a dataset comprising Indonesian speech with variabilities to facilitate our study. We further investigate the models' predictive ability to transcribe Indonesian speech data across different variability groups. The best results were achieved by the Whisper fine-tuned model across datasets with various characteristics, as indicated by the decrease in word error rate (WER) and character error rate (CER). Moreover, we found that speaking style variability affected model performance the most., Comment: Accepted at O-COCOSDA 2024
Published: 2024

192. Synergizing Morphological Computation and Generative Design: Automatic Synthesis of Tendon-Driven Grippers

Author: Zharkov, Kirill, Chaikovskii, Mikhail, Osipov, Yefim, Alshaowa, Rahaf, Borisov, Ivan, and Kolyubin, Sergey
Subjects: Computer Science - Robotics
Abstract: Robots' behavior and performance are determined both by hardware and software. The design process of robotic systems is a complex journey that involves multiple phases. Throughout this process, the aim is to tackle various criteria simultaneously, even though they often contradict each other. The ultimate goal is to uncover the optimal solution that resolves these conflicting factors. Generative, computation or automatic designs are the paradigms aimed at accelerating the whole design process. Within this paper we propose a design methodology to generate linkage mechanisms for robots with morphological computation. We use a graph grammar and a heuristic search algorithm to create robot mechanism graphs that are converted into simulation models for testing the design output. To verify the design methodology we have applied it to a relatively simple quasi-static problem of object grasping. We found a way to automatically design an underactuated tendon-driven gripper that can grasp a wide range of objects. This is possible because of its structure, not because of sophisticated planning or learning.
Published: 2024

193. Automatic Curriculum Expert Iteration for Reliable LLM Reasoning

Author: Zhao, Zirui, Dong, Hanze, Saha, Amrita, Xiong, Caiming, and Sahoo, Doyen
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Statistics - Machine Learning
Abstract: Hallucinations (i.e., generating plausible but inaccurate content) and laziness (i.e. excessive refusals or defaulting to "I don't know") persist as major challenges in LLM reasoning. Current efforts to reduce hallucinations primarily focus on factual errors in knowledge-grounded tasks, often neglecting hallucinations related to faulty reasoning. Meanwhile, some approaches render LLMs overly conservative, limiting their problem-solving capabilities. To mitigate hallucination and laziness in reasoning tasks, we propose Automatic Curriculum Expert Iteration (Auto-CEI) to enhance LLM reasoning and align responses to the model's capabilities--assertively answering within its limits and declining when tasks exceed them. In our method, Expert Iteration explores the reasoning trajectories near the LLM policy, guiding incorrect paths back on track to reduce compounding errors and improve robustness; it also promotes appropriate "I don't know" responses after sufficient reasoning attempts. The curriculum automatically adjusts rewards, incentivizing extended reasoning before acknowledging incapability, thereby pushing the limits of LLM reasoning and aligning its behaviour with these limits. We compare Auto-CEI with various SOTA baselines across logical reasoning, mathematics, and planning tasks, where Auto-CEI achieves superior alignment by effectively balancing assertiveness and conservativeness., Comment: 20 pages
Published: 2024

194. Automatic Summarization of Long Documents

Author: Chhibbar, Naman and Kalita, Jugal
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: A vast amount of textual data is added to the internet daily, making utilization and interpretation of such data difficult and cumbersome. As a result, automatic text summarization is crucial for extracting relevant information, saving precious reading time. Although many transformer-based models excel in summarization, they are constrained by their input size, preventing them from processing texts longer than their context size. This study introduces three novel algorithms that allow any LLM to efficiently overcome its input size limitation, effectively utilizing its full potential without any architectural modifications. We test our algorithms on texts with more than 70,000 words, and our experiments show a significant increase in BERTScore with competitive ROUGE scores., Comment: 9 pages (including bibliography) with 6 figures. ACL 2023 proceedings format
Published: 2024

195. Windshield Integration of Thermal and Color Fusion for Automatic Emergency Braking in Low Visibility Conditions

Author: Jobert, Gabriel, Delubac, Guillaume, Matias, Jessy, Noir, Quentin, Brenière, Xavier, Girard, Pauline, Severin-Fabiani, Tatiana, and Tinnes, Sebastien
Subjects: Physics - Optics
Abstract: The new NHSTA regulations require Automatic Emergency Braking (AEB) systems to operate at night, to protect pedestrians in the deadliest conditions. We propose thermal imaging as a new sensor to complement the AEB sensor suite, alongside the visible front camera and RADAR. This paper explores the benefits of visible-thermal fusion, proposes a windshield integration for such a system, and evaluates the minimum performance requirements for a thermal camera compliant with the NHSTA standards, based on a field study of pedestrian detection range., Comment: SIA VISION 2024, Oct 2024, Paris, France
Published: 2024

196. ProtocoLLM: Automatic Evaluation Framework of LLMs on Domain-Specific Scientific Protocol Formulation Tasks

Author: Yi, Seungjun, Lim, Jaeyoung, and Yoon, Juyong
Subjects: Computer Science - Computation and Language
Abstract: Automated generation of scientific protocols executable by robots can significantly accelerate scientific research processes. Large Language Models (LLMs) excel at Scientific Protocol Formulation Tasks (SPFT), but the evaluation of their capabilities rely on human evaluation. Here, we propose a flexible, automatic framework to evaluate LLM's capability on SPFT: ProtocoLLM. This framework prompts the target model and GPT-4 to extract pseudocode from biology protocols using only predefined lab actions and evaluates the output of target model using LLAM-EVAL, the pseudocode generated by GPT-4 serving as a baseline and Llama-3 acting as the evaluator. Our adaptable prompt-based evaluation method, LLAM-EVAL, offers significant flexibility in terms of evaluation model, material, criteria, and is free of cost. We evaluate GPT variations, Llama, Mixtral, Gemma, Cohere, and Gemini. Overall, we find that GPT and Cohere is a powerful scientific protocol formulators. We also introduce BIOPROT 2.0, a dataset with biology protocols and corresponding pseudocodes, which can aid LLMs in formulation and evaluation of SPFT. Our work is extensible to assess LLMs on SPFT across various domains and other fields that require protocol generation for specific goals., Comment: Submitted to 2024 ACL Rolling Review June Cycle
Published: 2024

197. Radio-opaque artefacts in digital mammography: automatic detection and analysis of downstream effects

Author: Schueppert, Amelia, Glocker, Ben, and Roschewitz, Mélanie
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: This study investigates the effects of radio-opaque artefacts, such as skin markers, breast implants, and pacemakers, on mammography classification models. After manually annotating 22,012 mammograms from the publicly available EMBED dataset, a robust multi-label artefact detector was developed to identify five distinct artefact types (circular and triangular skin markers, breast implants, support devices and spot compression structures). Subsequent experiments on two clinically relevant tasks $-$ breast density assessment and cancer screening $-$ revealed that these artefacts can significantly affect model performance, alter classification thresholds, and distort output distributions. These findings underscore the importance of accurate automatic artefact detection for developing reliable and robust classification models in digital mammography. To facilitate future research our annotations, code, and model predictions are made publicly available., Comment: Code available at https://github.com/biomedia-mira/mammo-artifacts
Published: 2024

198. Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval Augmented Generation

Author: Leemann, Tobias, Petridis, Periklis, Vietri, Giuseppe, Manousakas, Dionysis, Roth, Aaron, and Aydore, Sergul
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: While retrieval augmented generation (RAG) has been shown to enhance factuality of large language model (LLM) outputs, LLMs still suffer from hallucination, generating incorrect or irrelevant information. One common detection strategy involves prompting the LLM again to assess whether its response is grounded in the retrieved evidence, but this approach is costly. Alternatively, lightweight natural language inference (NLI) models for efficient grounding verification can be used at inference time. While existing pre-trained NLI models offer potential solutions, their performance remains subpar compared to larger models on realistic RAG inputs. RAG inputs are more complex than most datasets used for training NLI models and have characteristics specific to the underlying knowledge base, requiring adaptation of the NLI models to a specific target domain. Additionally, the lack of labeled instances in the target domain makes supervised domain adaptation, e.g., through fine-tuning, infeasible. To address these challenges, we introduce Automatic Generative Domain Adaptation (Auto-GDA). Our framework enables unsupervised domain adaptation through synthetic data generation. Unlike previous methods that rely on handcrafted filtering and augmentation strategies, Auto-GDA employs an iterative process to continuously improve the quality of generated samples using weak labels from less efficient teacher models and discrete optimization to select the most promising augmented samples. Experimental results demonstrate the effectiveness of our approach, with models fine-tuned on synthetic data using Auto-GDA often surpassing the performance of the teacher model and reaching the performance level of LLMs at 10 % of their computational cost.
Published: 2024

199. Cross-lingual Transfer for Automatic Question Generation by Learning Interrogative Structures in Target Languages

Author: Hwang, Seonjeong, Kim, Yunsu, and Lee, Gary Geunbae
Subjects: Computer Science - Computation and Language
Abstract: Automatic question generation (QG) serves a wide range of purposes, such as augmenting question-answering (QA) corpora, enhancing chatbot systems, and developing educational materials. Despite its importance, most existing datasets predominantly focus on English, resulting in a considerable gap in data availability for other languages. Cross-lingual transfer for QG (XLT-QG) addresses this limitation by allowing models trained on high-resource language datasets to generate questions in low-resource languages. In this paper, we propose a simple and efficient XLT-QG method that operates without the need for monolingual, parallel, or labeled data in the target language, utilizing a small language model. Our model, trained solely on English QA datasets, learns interrogative structures from a limited set of question exemplars, which are then applied to generate questions in the target language. Experimental results show that our method outperforms several XLT-QG baselines and achieves performance comparable to GPT-3.5-turbo across different languages. Additionally, the synthetic data generated by our model proves beneficial for training multilingual QA models. With significantly fewer parameters than large language models and without requiring additional training for target languages, our approach offers an effective solution for QG and QA tasks across various languages., Comment: EMNLP 2024
Published: 2024

200. An Online Automatic Modulation Classification Scheme Based on Isolation Distributional Kernel

Author: Li, Xinpeng, Jiang, Zile, Ting, Kai Ming, and Zhu, Ye
Subjects: Computer Science - Machine Learning
Abstract: Automatic Modulation Classification (AMC), as a crucial technique in modern non-cooperative communication networks, plays a key role in various civil and military applications. However, existing AMC methods usually are complicated and can work in batch mode only due to their high computational complexity. This paper introduces a new online AMC scheme based on Isolation Distributional Kernel. Our method stands out in two aspects. Firstly, it is the first proposal to represent baseband signals using a distributional kernel. Secondly, it introduces a pioneering AMC technique that works well in online settings under realistic time-varying channel conditions. Through extensive experiments in online settings, we demonstrate the effectiveness of the proposed classifier. Our results indicate that the proposed approach outperforms existing baseline models, including two state-of-the-art deep learning classifiers. Moreover, it distinguishes itself as the first online classifier for AMC with linear time complexity, which marks a significant efficiency boost for real-time applications.
Published: 2024

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Region

Database

Publisher

8,312,885 results on '"Automatic"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources