Author: "Huck P." - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Huck P."' showing total 3,215 results

Start Over Author "Huck P."

3,215 results on '"Huck P."'

1. EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation

Author: Liu, Shih-Yang, Yang, Huck, Wang, Chein-Yi, Fung, Nai Chit, Yin, Hongxu, Sakr, Charbel, Muralidharan, Saurav, Cheng, Kwang-Ting, Kautz, Jan, Wang, Yu-Chiang Frank, Molchanov, Pavlo, and Chen, Min-Hung
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: In this work, we re-formulate the model compression problem into the customized compensation problem: Given a compressed model, we aim to introduce residual low-rank paths to compensate for compression errors under customized requirements from users (e.g., tasks, compression ratios), resulting in greater flexibility in adjusting overall capacity without being constrained by specific compression formats. However, naively applying SVD to derive residual paths causes suboptimal utilization of the low-rank representation capacity. Instead, we propose Training-free Eigenspace Low-Rank Approximation (EoRA), a method that directly minimizes compression-induced errors without requiring gradient-based training, achieving fast optimization in minutes using a small amount of calibration data. EoRA projects compression errors into the eigenspace of input activations, leveraging eigenvalues to effectively prioritize the reconstruction of high-importance error components. Moreover, EoRA can be seamlessly integrated with fine-tuning and quantization to further improve effectiveness and efficiency. EoRA consistently outperforms previous methods in compensating errors for compressed LLaMA2/3 models on various tasks, such as language generation, commonsense reasoning, and math reasoning tasks (e.g., 31.31%/12.88% and 9.69% improvements on ARC-Easy/ARC-Challenge and MathQA when compensating LLaMA3-8B that is quantized to 4-bit and pruned to 2:4 sparsity). EoRA offers a scalable, training-free solution to compensate for compression errors, making it a powerful tool to deploy LLMs in various capacity and efficiency requirements.
Published: 2024

2. Difficulties Constructing Lattices with Exponential Kissing Number from Codes

Author: Bennett, Huck, Golovnev, Alexander, and Stephens-Davidowitz, Noah
Subjects: Mathematics - Metric Geometry, Computer Science - Information Theory, Mathematics - Number Theory
Abstract: In this note, we present examples showing that several natural ways of constructing lattices from error-correcting codes do not in general yield a correspondence between minimum-weight non-zero codewords and shortest non-zero lattice vectors. From these examples, we conclude that the main results in two works of Vl\u{a}du\c{t} (Moscow J. Comb. Number Th., 2019 and Discrete Comput. Geom., 2021) on constructing lattices with exponential kissing number from error-correcting codes are invalid. Exhibiting a family of lattices with exponential kissing number therefore remains an open problem.
Published: 2024

3. Towards Neural Scaling Laws for Time Series Foundation Models

Author: Yao, Qingren, Yang, Chao-Han Huck, Jiang, Renhe, Liang, Yuxuan, Jin, Ming, and Pan, Shirui
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Scaling laws offer valuable insights into the design of time series foundation models (TSFMs). However, previous research has largely focused on the scaling laws of TSFMs for in-distribution (ID) data, leaving their out-of-distribution (OOD) scaling behavior and the influence of model architectures less explored. In this work, we examine two common TSFM architectures, encoder-only and decoder-only Transformers, and investigate their scaling behavior on both ID and OOD data. These models are trained and evaluated across varying parameter counts, compute budgets, and dataset sizes. Our experiments reveal that the log-likelihood loss of TSFMs exhibits similar scaling behavior in both OOD and ID settings. We further compare the scaling properties across different architectures, incorporating two state-of-the-art TSFMs as case studies, showing that model architecture plays a significant role in scaling. The encoder-only Transformers demonstrate better scalability than the decoder-only Transformers, while the architectural enhancements in the two advanced TSFMs primarily improve ID performance but reduce OOD scalability. While scaling up TSFMs is expected to drive performance breakthroughs, the lack of a comprehensive understanding of TSFM scaling laws has hindered the development of a robust framework to guide model scaling. We fill this gap in this work by synthesizing our findings and providing practical guidelines for designing and scaling larger TSFMs with enhanced model capabilities.
Published: 2024

4. A Quantum Circuit-Based Compression Perspective for Parameter-Efficient Learning

Author: Liu, Chen-Yu, Yang, Chao-Han Huck, Hsieh, Min-Hsiu, and Goan, Hsi-Sheng
Subjects: Quantum Physics
Abstract: Quantum-centric supercomputing presents a compelling framework for large-scale hybrid quantum-classical tasks. Although quantum machine learning (QML) offers theoretical benefits in various applications, challenges such as large-size data encoding in the input stage and the reliance on quantum resources in the inference stage limit its practicality for tasks like fine-tuning large language models (LLMs). Quantum parameter generation, a novel approach of QML, addresses these limitations by using quantum neural networks (QNNs) to generate classical model weights (parameters) exclusively during training, thereby decoupling inference from quantum hardware. In this work, we introduce Quantum Parameter Adaptation (QPA) in the framework of quantum parameter generation, which integrates QNNs with a classical multi-layer perceptron mapping model to generate parameters for fine-tuning methods. Using Gemma-2 and GPT-2 as case studies, QPA demonstrates significant parameter reduction for parameter-efficient fine-tuning methods, such as Low-Rank Adaptation (LoRA), while maintaining comparable or improved performance in text generation tasks. Specifically, QPA reduces the number of parameters to $52.06\%$ of the original LoRA for GPT-2 with a slight performance gain of $0.75\%$, and to $16.84\%$ for Gemma-2, with a marginal performance improvement of $0.07\%$. These results highlight QPA's ability to achieve efficient parameter reduction without sacrificing performance in the quantum parameter generation framework. This work showcases the potential of quantum-enhanced parameter reduction, offering a scalable quantum-classical solution for fine-tuning LLMs while preserving the feasibility of inference on classical hardware., Comment: 21 pages, 6 figures
Published: 2024

5. FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model

Author: Lu, Yichen, Song, Jiaqi, Yang, Chao-Han Huck, and Watanabe, Shinji
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: In this study, we aim to explore Multitask Speech Language Model (SpeechLM) efficient inference via token reduction. Unlike other modalities such as vision or text, speech has unique temporal dependencies, making previous efficient inference works on other modalities not directly applicable. Furthermore, methods for efficient SpeechLM inference on long sequence and sparse signals remain largely unexplored. Then we propose FastAdaSP, a weighted token merging framework specifically designed for various speech-related tasks to improve the trade-off between efficiency and performance. Experimental results on WavLLM and Qwen-Audio show that our method achieves the state-of-the-art (SOTA) efficiency-performance trade-off compared with other baseline methods. Specifically, FastAdaSP achieved 7x memory efficiency and 1.83x decoding throughput without any degradation on tasks like Emotion Recognition (ER) and Spoken Question Answering (SQA). The code will be available at https://github.com/yichen14/FastAdaSP, Comment: EMNLP 2024 Industry Track
Published: 2024

6. Post-edits Are Preferences Too

Author: Berger, Nathaniel, Riezler, Stefan, Exel, Miriam, and Huck, Matthias
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Preference Optimization (PO) techniques are currently one of the state of the art techniques for fine-tuning large language models (LLMs) on pairwise preference feedback from human annotators. However, in machine translation, this sort of feedback can be difficult to solicit. Additionally, Kreutzer et al. (2018) have shown that, for machine translation, pairwise preferences are less reliable than other forms of human feedback, such as 5-point ratings. We examine post-edits to see if they can be a source of reliable human preferences by construction. In PO, a human annotator is shown sequences $s_1$ and $s_2$ and asked for a preference judgment, %$s_1 > s_2$; while for post-editing, editors create $s_1$ and know that it should be better than $s_2$. We attempt to use these implicit preferences for PO and show that it helps the model move towards post-edit-like hypotheses and away from machine translation-like hypotheses. Furthermore, we show that best results are obtained by pre-training the model with supervised fine-tuning (SFT) on post-edits in order to promote post-edit-like hypotheses to the top output ranks., Comment: To appear at the Ninth Conference on Machine Translation (WMT24)
Published: 2024

7. Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data

Author: Lu, Ke-Han, Chen, Zhehuai, Fu, Szu-Wei, Yang, Chao-Han Huck, Balam, Jagadeesh, Ginsburg, Boris, Wang, Yu-Chiang Frank, and Lee, Hung-yi
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Sound
Abstract: Recent end-to-end speech language models (SLMs) have expanded upon the capabilities of large language models (LLMs) by incorporating pre-trained speech models. However, these SLMs often undergo extensive speech instruction-tuning to bridge the gap between speech and text modalities. This requires significant annotation efforts and risks catastrophic forgetting of the original language capabilities. In this work, we present a simple yet effective automatic process for creating speech-text pair data that carefully injects speech paralinguistic understanding abilities into SLMs while preserving the inherent language capabilities of the text-based LLM. Our model demonstrates general capabilities for speech-related tasks without the need for speech instruction-tuning data, achieving impressive performance on Dynamic-SUPERB and AIR-Bench-Chat benchmarks. Furthermore, our model exhibits the ability to follow complex instructions derived from LLMs, such as specific output formatting and chain-of-thought reasoning. Our approach not only enhances the versatility and effectiveness of SLMs but also reduces reliance on extensive annotated datasets, paving the way for more efficient and capable speech understanding systems., Comment: Submitted to ICASSP 2025
Published: 2024

8. Revise, Reason, and Recognize: LLM-Based Emotion Recognition via Emotion-Specific Prompts and ASR Error Correction

Author: Li, Yuanchao, Gong, Yuan, Yang, Chao-Han Huck, Bell, Peter, and Lai, Catherine
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Multimedia, Computer Science - Sound
Abstract: Annotating and recognizing speech emotion using prompt engineering has recently emerged with the advancement of Large Language Models (LLMs), yet its efficacy and reliability remain questionable. In this paper, we conduct a systematic study on this topic, beginning with the proposal of novel prompts that incorporate emotion-specific knowledge from acoustics, linguistics, and psychology. Subsequently, we examine the effectiveness of LLM-based prompting on Automatic Speech Recognition (ASR) transcription, contrasting it with ground-truth transcription. Furthermore, we propose a Revise-Reason-Recognize prompting pipeline for robust LLM-based emotion recognition from spoken language with ASR errors. Additionally, experiments on context-aware learning, in-context learning, and instruction tuning are performed to examine the usefulness of LLM training schemes in this direction. Finally, we investigate the sensitivity of LLMs to minor prompt variations. Experimental results demonstrate the efficacy of the emotion-specific prompts, ASR error correction, and LLM training schemes for LLM-based emotion recognition. Our study aims to refine the use of LLMs in emotion recognition and related domains.
Published: 2024

9. Chain-of-Thought Prompting for Speech Translation

Author: Hu, Ke, Chen, Zhehuai, Yang, Chao-Han Huck, Żelasko, Piotr, Hrinchuk, Oleksii, Lavrukhin, Vitaly, Balam, Jagadeesh, and Ginsburg, Boris
Subjects: Computer Science - Computation and Language
Abstract: Large language models (LLMs) have demonstrated remarkable advancements in language understanding and generation. Building on the success of text-based LLMs, recent research has adapted these models to use speech embeddings for prompting, resulting in Speech-LLM models that exhibit strong performance in automatic speech recognition (ASR) and automatic speech translation (AST). In this work, we propose a novel approach to leverage ASR transcripts as prompts for AST in a Speech-LLM built on an encoder-decoder text LLM. The Speech-LLM model consists of a speech encoder and an encoder-decoder structure Megatron-T5. By first decoding speech to generate ASR transcripts and subsequently using these transcripts along with encoded speech for prompting, we guide the speech translation in a two-step process like chain-of-thought (CoT) prompting. Low-rank adaptation (LoRA) is used for the T5 LLM for model adaptation and shows superior performance to full model fine-tuning. Experimental results show that the proposed CoT prompting significantly improves AST performance, achieving an average increase of 2.4 BLEU points across 6 En->X or X->En AST tasks compared to speech prompting alone. Additionally, compared to a related CoT prediction method that predicts a concatenated sequence of ASR and AST transcripts, our method performs better by an average of 2 BLEU points.
Published: 2024

10. Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition

Author: Yang, Chao-Han Huck, Park, Taejin, Gong, Yuan, Li, Yuanchao, Chen, Zhehuai, Lin, Yen-Ting, Chen, Chen, Hu, Yuchen, Dhawan, Kunal, Żelasko, Piotr, Zhang, Chao, Chen, Yun-Nung, Tsao, Yu, Balam, Jagadeesh, Ginsburg, Boris, Siniscalchi, Sabato Marco, Chng, Eng Siong, Bell, Peter, Lai, Catherine, Watanabe, Shinji, and Stolcke, Andreas
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Given recent advances in generative AI technology, a key question is how large language models (LLMs) can enhance acoustic modeling tasks using text decoding results from a frozen, pretrained automatic speech recognition (ASR) model. To explore new capabilities in language modeling for speech processing, we introduce the generative speech transcription error correction (GenSEC) challenge. This challenge comprises three post-ASR language modeling tasks: (i) post-ASR transcription correction, (ii) speaker tagging, and (iii) emotion recognition. These tasks aim to emulate future LLM-based agents handling voice-based interfaces while remaining accessible to a broad audience by utilizing open pretrained language models or agent-based APIs. We also discuss insights from baseline evaluations, as well as lessons learned for designing future evaluations., Comment: IEEE SLT 2024. The initial draft version has been done in December 2023. Post-ASR Text Processing and Understanding Community and LlaMA-7B pre-training correction model: https://huggingface.co/GenSEC-LLM/SLT-Task1-Llama2-7b-HyPo-baseline
Published: 2024

11. TALOS (Total Automation of LabVIEW Operations for Science): A framework for autonomous control systems for complex experiments

Author: Volponi, M., Zieliński, J., Rauschendorfer, T., Huck, S., Caravita, R., Auzins, M., Bergmann, B., Burian, P., Brusa, R. S., Camper, A., Castelli, F., Cerchiari, G., Ciuryło, R., Consolati, G., Doser, M., Eliaszuk, K., Giszczak, A., Glöggler, L. T., Graczykowski, Ł., Grosbart, M., Guatieri, F., Gusakova, N., Gustafsson, F., Haider, S., Janik, M. A., Januszek, T., Kasprowicz, G., Khatri, G., Kłosowski, Ł., Kornakov, G., Krumins, V., Lappo, L., Linek, A., Malamant, J., Mariazzi, S., Penasa, L., Petracek, V., Piwiński, M., Pospisil, S., Povolo, L., Prelz, F., Rangwala, S. A., Rawat, B. S., Rienäcker, B., Rodin, V., Røhne, O. M., Sandaker, H., Smolyanskiy, P., Sowiński, T., Tefelski, D., Vafeiadis, T., Welsch, C. P., Wolz, T., Zawada, M., and Zurlo, N.
Subjects: Physics - Instrumentation and Detectors, Physics - Atomic Physics
Abstract: Modern physics experiments are frequently very complex, relying on multiple simultaneous events to happen in order to obtain the desired result. The experiment control system plays a central role in orchestrating the measurement setup: However, its development is often treated as secondary with respect to the hardware, its importance becoming evident only during the operational phase. Therefore, the AEgIS (Antimatter Experiment: Gravity, Interferometry, Spectroscopy) collaboration has created a framework for easily coding control systems, specifically targeting atomic, quantum, and antimatter experiments. This framework, called Total Automation of LabVIEW Operations for Science (TALOS), unifies all the machines of the experiment in a single entity, thus enabling complex high-level decisions to be taken, and it is constituted by separate modules, called MicroServices, that run concurrently and asynchronously. This enhances the stability and reproducibility of the system while allowing for continuous integration and testing while the control system is running. The system demonstrated high stability and reproducibility, running completely unsupervised during the night and weekends of the data-taking campaigns. The results demonstrate the suitability of TALOS to manage an entire physics experiment in full autonomy: being open-source, experiments other than the AEgIS experiment can benefit from it.
Published: 2024
Full Text: View/download PDF

12. Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction

Author: Ko, Yuka, Li, Sheng, Yang, Chao-Han Huck, and Kawahara, Tatsuya
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Sound
Abstract: With the strong representational power of large language models (LLMs), generative error correction (GER) for automatic speech recognition (ASR) aims to provide semantic and phonetic refinements to address ASR errors. This work explores how LLM-based GER can enhance and expand the capabilities of Japanese language processing, presenting the first GER benchmark for Japanese ASR with 0.9-2.6k text utterances. We also introduce a new multi-pass augmented generative error correction (MPA GER) by integrating multiple system hypotheses on the input side with corrections from multiple LLMs on the output side and then merging them. To the best of our knowledge, this is the first investigation of the use of LLMs for Japanese GER, which involves second-pass language modeling on the output transcriptions generated by the ASR system (e.g., N-best hypotheses). Our experiments demonstrated performance improvement in the proposed methods of ASR quality and generalization both in SPREDS-U1-ja and CSJ data.
Published: 2024

13. Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking Across Diverse Vocabularies

Author: Koneru, Sai, Huck, Matthias, Exel, Miriam, and Niehues, Jan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Recent advancements in NLP have resulted in models with specialized strengths, such as processing multimodal inputs or excelling in specific domains. However, real-world tasks, like multimodal translation, often require a combination of these strengths, such as handling both translation and image processing. While individual translation and vision models are powerful, they typically lack the ability to perform both tasks in a single system. Combining these models poses challenges, particularly due to differences in their vocabularies, which limit the effectiveness of traditional ensemble methods to post-generation techniques like N-best list re-ranking. In this work, we propose a novel zero-shot ensembling strategy that allows for the integration of different models during the decoding phase without the need for additional training. Our approach re-ranks beams during decoding by combining scores at the word level, using heuristics to predict when a word is completed. We demonstrate the effectiveness of this method in machine translation scenarios, showing that it enables the generation of translations that are both speech- and image-aware while also improving overall translation quality\footnote{We will release the code upon paper acceptance.}., Comment: Under Review
Published: 2024

14. Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction

Author: Sachdev, Rithik, Wang, Zhong-Qiu, and Yang, Chao-Han Huck
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Building upon the strength of modern large language models (LLMs), generative error correction (GEC) has emerged as a promising paradigm that can elevate the performance of modern automatic speech recognition (ASR) systems. One representative approach is to leverage in-context learning to prompt LLMs so that a better hypothesis can be generated by the LLMs based on a carefully-designed prompt and an $N$-best list of hypotheses produced by ASR systems. However, it is yet unknown whether the existing prompts are the most effective ones for the task of post-ASR error correction. In this context, this paper first explores alternative prompts to identify an initial set of effective prompts, and then proposes to employ an evolutionary prompt optimization algorithm to refine the initial prompts. Evaluations results on the CHiME-4 subset of the Task $1$ of the SLT $2024$ GenSEC challenge show the effectiveness and potential of the proposed algorithms., Comment: in submission
Published: 2024

15. QTRL: Toward Practical Quantum Reinforcement Learning via Quantum-Train

Author: Liu, Chen-Yu, Lin, Chu-Hsuan Abraham, Yang, Chao-Han Huck, Chen, Kuan-Cheng, and Hsieh, Min-Hsiu
Subjects: Quantum Physics
Abstract: Quantum reinforcement learning utilizes quantum layers to process information within a machine learning model. However, both pure and hybrid quantum reinforcement learning face challenges such as data encoding and the use of quantum computers during the inference stage. We apply the Quantum-Train method to reinforcement learning tasks, called QTRL, training the classical policy network model using a quantum machine learning model with polylogarithmic parameter reduction. This QTRL approach eliminates the data encoding issues of conventional quantum machine learning and reduces the training parameters of the corresponding classical policy network. Most importantly, the training result of the QTRL is a classical model, meaning the inference stage only requires classical computer. This is extremely practical and cost-efficient for reinforcement learning tasks, where low-latency feedback from the policy model is essential., Comment: 6 pages, 1 figure
Published: 2024

16. Shower Separation in Five Dimensions for Highly Granular Calorimeters using Machine Learning

Author: Lai, S., Utehs, J., Wilhahn, A., Fouz, M. C., Bach, O., Brianne, E., Ebrahimi, A., Gadow, K., Göttlicher, P., Hartbrich, O., Heuchel, D., Irles, A., Krüger, K., Kvasnicka, J., Lu, S., Neubüser, C., Provenza, A., Reinecke, M., Sefkow, F., Schuwalow, S., De Silva, M., Sudo, Y., Tran, H. L., Liu, L., Masuda, R., Murata, T., Ootani, W., Seino, T., Takatsu, T., Tsuji, N., Pöschl, R., Richard, F., Zerwas, D., Hummer, F., Simon, F., Boudry, V., Brient, J-C., Nanni, J., Videau, H., Buhmann, E., Garutti, E., Huck, S., Kasieczka, G., Martens, S., Rolph, J., Wellhausen, J., Bilki, B., Northacker, D., Onel, Y., Emberger, L., and Graf, C.
Subjects: Physics - Instrumentation and Detectors
Abstract: To achieve state-of-the-art jet energy resolution for Particle Flow, sophisticated energy clustering algorithms must be developed that can fully exploit available information to separate energy deposits from charged and neutral particles. Three published neural network-based shower separation models were applied to simulation and experimental data to measure the performance of the highly granular CALICE Analogue Hadronic Calorimeter (AHCAL) technological prototype in distinguishing the energy deposited by a single charged and single neutral hadron for Particle Flow. The performance of models trained using only standard spatial and energy and charged track position information from an event was compared to models trained using timing information available from AHCAL, which is expected to improve sensitivity to shower development and, therefore, aid in clustering. Both simulation and experimental data were used to train and test the models and their performances were compared. The best-performing neural network achieved significantly superior event reconstruction when timing information was utilised in training for the case where the charged hadron had more energy than the neutral one, motivating temporally sensitive calorimeters. All models under test were observed to tend to allocate energy deposited by the more energetic of the two showers to the less energetic one. Similar shower reconstruction performance was observed for a model trained on simulation and applied to data and a model trained and applied to data.
Published: 2024

17. Understanding the Impact of openPMD on BIT1, a Particle-in-Cell Monte Carlo Code, through Instrumentation, Monitoring, and In-Situ Analysis

Author: Williams, Jeremy J., Costea, Stefan, Malony, Allen D., Tskhakaya, David, Kos, Leon, Podolnik, Ales, Hromadka, Jakub, Huck, Kevin, Laure, Erwin, and Markidis, Stefano
Subjects: Physics - Computational Physics, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Performance, Physics - Plasma Physics
Abstract: Particle-in-Cell Monte Carlo simulations on large-scale systems play a fundamental role in understanding the complexities of plasma dynamics in fusion devices. Efficient handling and analysis of vast datasets are essential for advancing these simulations. Previously, we addressed this challenge by integrating openPMD with BIT1, a Particle-in-Cell Monte Carlo code, streamlining data streaming and storage. This integration not only enhanced data management but also improved write throughput and storage efficiency. In this work, we delve deeper into the impact of BIT1 openPMD BP4 instrumentation, monitoring, and in-situ analysis. Utilizing cutting-edge profiling and monitoring tools such as gprof, CrayPat, Cray Apprentice2, IPM, and Darshan, we dissect BIT1's performance post-integration, shedding light on computation, communication, and I/O operations. Fine-grained instrumentation offers insights into BIT1's runtime behavior, while immediate monitoring aids in understanding system dynamics and resource utilization patterns, facilitating proactive performance optimization. Advanced visualization techniques further enrich our understanding, enabling the optimization of BIT1 simulation workflows aimed at controlling plasma-material interfaces with improved data analysis and visualization at every checkpoint without causing any interruption to the simulation., Comment: Accepted by the Euro-Par 2024 workshops (PHYSHPC 2024), prepared in the standardized Springer LNCS format and consists of 12 pages, which includes the main text, references, and figures
Published: 2024

18. Real-time antiproton annihilation vertexing with sub-micron resolution

Author: Berghold, M., Orsucci, D., Guatieri, F., Alfaro, S., Auzins, M., Bergmann, B., Burian, P., Brusa, R. S., Camper, A., Caravita, R., Castelli, F., Cerchiari, G., Ciuryło, R., Chehaimi, A., Consolati, G., Doser, M., Eliaszuk, K., Ferguson, R., Germann, M., Giszczak, A., Glöggler, L. T., Graczykowski, Ł., Grosbart, M., Gusakova, N., Gustafsson, F., Haider, S., Huck, S., Hugenschmidt, C., Janik, M. A., Januszek, T., Kasprowicz, G., Kempny, K., Khatri, G., Kłosowski, Ł., Kornakov, G., Krumins, V., Lappo, L., Linek, A., Mariazzi, S., Moskal, P., Nowicka, D., Pandey, P., Pęcak, D., Penasa, L., Petracek, V., Piwiński, M., Pospisil, S., Povolo, L., Prelz, F., Rangwala, S. A., Rauschendorfer, T., Rawat, B. S., Rienäcker, B., Rodin, V., Røhne, O. M., Sandaker, H., Sharma, S., Smolyanskiy, P., Sowiński, T., Tefelski, D., Vafeiadis, T., Volponi, M., Welsch, C. P., Zawada, M., Zielinski, J., and Zurlo, N.
Subjects: Physics - Instrumentation and Detectors
Abstract: The primary goal of the AEgIS experiment is to precisely measure the free fall of antihydrogen within Earth's gravitational field. To this end, a cold ~50K antihydrogen beam has to pass through two grids forming a moir\'e deflectometer before annihilating onto a position-sensitive detector, which shall determine the vertical position of the annihilation vertex relative to the grids with micrometric accuracy. Here we introduce a vertexing detector based on a modified mobile camera sensor and experimentally demonstrate that it can measure the position of antiproton annihilations with an accuracy of $0.62^{+0.40}_{-0.22}\mu m$, which represents a 35-fold improvement over the previous state-of-the-art for real-time antiproton vertexing. Importantly, these antiproton detection methods are directly applicable to antihydrogen. Moreover, the sensitivity to light of the sensor enables the in-situ calibration of the moir\'e deflectometer, significantly reducing systematic errors. This sensor emerges as a breakthrough technology for achieving the \aegis scientific goals and has been selected as the basis for the development of a large-area detector for conducting antihydrogen gravity measurements., Comment: 21 pages, 4 figures, 2 tables
Published: 2024

19. From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment

Author: Hirota, Yusuke, Hachiuma, Ryo, Yang, Chao-Han Huck, and Nakashima, Yuta
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Large language models (LLMs) have enhanced the capacity of vision-language models to caption visual text. This generative approach to image caption enrichment further makes textual captions more descriptive, improving alignment with the visual context. However, while many studies focus on benefits of generative caption enrichment (GCE), are there any negative side effects? We compare standard-format captions and recent GCE processes from the perspectives of "gender bias" and "hallucination", showing that enriched captions suffer from increased gender bias and hallucination. Furthermore, models trained on these enriched captions amplify gender bias by an average of 30.9% and increase hallucination by 59.5%. This study serves as a caution against the trend of making captions more descriptive.
Published: 2024

20. Prompting Large Language Models with Human Error Markings for Self-Correcting Machine Translation

Author: Berger, Nathaniel, Riezler, Stefan, Exel, Miriam, and Huck, Matthias
Subjects: Computer Science - Computation and Language
Abstract: While large language models (LLMs) pre-trained on massive amounts of unpaired language data have reached the state-of-the-art in machine translation (MT) of general domain texts, post-editing (PE) is still required to correct errors and to enhance term translation quality in specialized domains. In this paper we present a pilot study of enhancing translation memories (TM) produced by PE (source segments, machine translations, and reference translations, henceforth called PE-TM) for the needs of correct and consistent term translation in technical domains. We investigate a light-weight two-step scenario where, at inference time, a human translator marks errors in the first translation step, and in a second step a few similar examples are extracted from the PE-TM to prompt an LLM. Our experiment shows that the additional effort of augmenting translations with human error markings guides the LLM to focus on a correction of the marked errors, yielding consistent improvements over automatic PE (APE) and MT from scratch., Comment: To appear at The 25th Annual Conference of the European Association for Machine Translation (EAMT 2024)
Published: 2024

21. Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

Author: Hu, Yuchen, Chen, Chen, Yang, Chao-Han Huck, Qin, Chengwei, Chen, Pin-Yu, Chng, Eng Siong, and Zhang, Chao
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: We propose an unsupervised adaptation framework, Self-TAught Recognizer (STAR), which leverages unlabeled data to enhance the robustness of automatic speech recognition (ASR) systems in diverse target domains, such as noise and accents. STAR is developed for prevalent speech foundation models based on Transformer-related architecture with auto-regressive decoding (e.g., Whisper, Canary). Specifically, we propose a novel indicator that empirically integrates step-wise information during decoding to assess the token-level quality of pseudo labels without ground truth, thereby guiding model updates for effective unsupervised adaptation. Experimental results show that STAR achieves an average of 13.5% relative reduction in word error rate across 14 target domains, and it sometimes even approaches the upper-bound performance of supervised adaptation. Surprisingly, we also observe that STAR prevents the adapted model from the common catastrophic forgetting problem without recalling source-domain data. Furthermore, STAR exhibits high data efficiency that only requires less than one-hour unlabeled data, and seamless generality to alternative large speech models and speech translation tasks. Our code aims to open source to the research communities., Comment: 23 pages, Preprint
Published: 2024

22. An Investigation of Incorporating Mamba for Speech Enhancement

Author: Chao, Rong, Cheng, Wen-Huang, La Quatra, Moreno, Siniscalchi, Sabato Marco, Yang, Chao-Han Huck, Fu, Szu-Wei, and Tsao, Yu
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This work aims to study a scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task. We exploit a Mamba-based regression model to characterize speech signals and build an SE system upon Mamba, termed SEMamba. We explore the properties of Mamba by integrating it as the core model in both basic and advanced SE systems, along with utilizing signal-level distances as well as metric-oriented loss functions. SEMamba demonstrates promising results and attains a PESQ score of 3.55 on the VoiceBank-DEMAND dataset. When combined with the perceptual contrast stretching technique, the proposed SEMamba yields a new state-of-the-art PESQ score of 3.69.
Published: 2024

23. Computing distances on Riemann surfaces

Author: Stepanyants, Huck, Beardon, Alan, Paton, Jeremy, and Krioukov, Dmitri
Subjects: Mathematics - Geometric Topology, Mathematics - Differential Geometry
Abstract: Riemann surfaces are among the simplest and most basic geometric objects. They appear as key players in many branches of physics, mathematics, and other sciences. Despite their widespread significance, how to compute distances between pairs of points on compact Riemann surfaces is surprisingly unknown, unless the surface is a sphere or a torus. This is because on higher-genus surfaces, the distance formula involves an infimum over infinitely many terms, so it cannot be evaluated in practice. Here we derive a computable distance formula for a broad class of Riemann surfaces. The formula reduces the infimum to a minimum over an explicit set consisting of finitely many terms. We also develop a distance computation algorithm, which cannot be expressed as a formula, but which is more computationally efficient on surfaces with high genuses. We illustrate both the formula and the algorithm in application to generalized Bolza surfaces, which are a particular class of highly symmetric compact Riemann surfaces of any genus greater than 1.
Published: 2024
Full Text: View/download PDF

24. Bayesian Example Selection Improves In-Context Learning for Speech, Text, and Visual Modalities

Author: Wang, Siyin, Yang, Chao-Han Huck, Wu, Ji, and Zhang, Chao
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Large language models (LLMs) can adapt to new tasks through in-context learning (ICL) based on a few examples presented in dialogue history without any model parameter update. Despite such convenience, the performance of ICL heavily depends on the quality of the in-context examples presented, which makes the in-context example selection approach a critical choice. This paper proposes a novel Bayesian in-Context example Selection method (ByCS) for ICL. Extending the inference probability conditioned on in-context examples based on Bayes' theorem, ByCS focuses on the inverse inference conditioned on test input. Following the assumption that accurate inverse inference probability (likelihood) will result in accurate inference probability (posterior), in-context examples are selected based on their inverse inference results. Diverse and extensive cross-tasking and cross-modality experiments are performed with speech, text, and image examples. Experimental results show the efficacy and robustness of our ByCS method on various models, tasks and modalities., Comment: 17 pages, 6 figures
Published: 2024

25. Software Compensation for Highly Granular Calorimeters using Machine Learning

Author: Lai, S., Utehs, J., Wilhahn, A., Bach, O., Brianne, E., Ebrahimi, A., Gadow, K., Göttlicher, P., Hartbrich, O., Heuchel, D., Irles, A., Krüger, K., Kvasnicka, J., Lu, S., Neubüser, C., Provenza, A., Reinecke, M., Sefkow, F., Schuwalow, S., De Silva, M., Sudo, Y., Tran, H. L., Buhmann, E., Garutti, E., Huck, S., Kasieczka, G., Martens, S., Rolph, J., Wellhausen, J., Blazey, G. C., Dyshkant, A., Francis, K., Zutshi, V., Bilki, B., Northacker, D., Onel, Y., Hummer, F., Simon, F., Kawagoe, K., Onoe, T., Suehara, T., Tsumura, S., Yoshioka, T., Fouz, M. C., Emberger, L., Graf, C., Wagner, M., Pöschl, R., Richard, F., Zerwas, D., Boudry, V., Brient, J-C., Nanni, J., Videau, H., Liu, L., Masuda, R., Murata, T., Ootani, W., Takatsu, T., Tsuji, N., Chadeeva, M., Danilov, M., Korpachev, S., and Rusinov, V.
Subjects: Physics - Instrumentation and Detectors
Abstract: A neural network for software compensation was developed for the highly granular CALICE Analogue Hadronic Calorimeter (AHCAL). The neural network uses spatial and temporal event information from the AHCAL and energy information, which is expected to improve sensitivity to shower development and the neutron fraction of the hadron shower. The neural network method produced a depth-dependent energy weighting and a time-dependent threshold for enhancing energy deposits consistent with the timescale of evaporation neutrons. Additionally, it was observed to learn an energy-weighting indicative of longitudinal leakage correction. In addition, the method produced a linear detector response and outperformed a published control method regarding resolution for every particle energy studied.
Published: 2024

26. Location-Dependent Phase Transformation Kinetics During Laser Wire Deposition Additive Manufacturing of Ti–6Al–4V

Author: Huck, Andrew, Verma, Amit K., O’Donnell, Katie, Smith, Lonnie, Karra, Venkata Satya Surya Amaranth, Guzel, Ali, Chen, Hangman, Pistorius, Petrus C., Webler, Bryan A., and Rollett, Anthony D.
Published: 2024
Full Text: View/download PDF

27. Engineering hyaluronic acid-based nanoassemblies for monoclonal antibody delivery – design, characterization, and biological insights

Author: López-Estévez, Ana M., Zhang, Y., Medel, María, Arriaga, Iker, Sanjurjo, Lucía, Huck-Iriart, Cristian, Abrescia, Nicola G. A., Vicent, María J., Ouyang, Defang, Torres, Dolores, and Alonso, María José
Published: 2024
Full Text: View/download PDF

28. GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

Author: Hu, Yuchen, Chen, Chen, Yang, Chao-Han Huck, Li, Ruizhe, Zhang, Dong, Chen, Zhehuai, and Chng, Eng Siong
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Recent advances in large language models (LLMs) have stepped forward the development of multilingual speech and machine translation by its reduced representation errors and incorporated external knowledge. However, both translation tasks typically utilize beam search decoding and top-1 hypothesis selection for inference. These techniques struggle to fully exploit the rich information in the diverse N-best hypotheses, making them less optimal for translation tasks that require a single, high-quality output sequence. In this paper, we propose a new generative paradigm for translation tasks, namely "GenTranslate", which builds upon LLMs to generate better results from the diverse translation versions in N-best list. Leveraging the rich linguistic knowledge and strong reasoning abilities of LLMs, our new paradigm can integrate the rich information in N-best candidates to generate a higher-quality translation result. Furthermore, to support LLM finetuning, we build and release a HypoTranslate dataset that contains over 592K hypotheses-translation pairs in 11 languages. Experiments on various speech and machine translation benchmarks (e.g., FLEURS, CoVoST-2, WMT) demonstrate that our GenTranslate significantly outperforms the state-of-the-art model., Comment: 18 pages, Accepted by ACL 2024. This work is open sourced at: https://github.com/YUCHEN005/GenTranslate
Published: 2024

29. It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

Author: Chen, Chen, Li, Ruizhe, Hu, Yuchen, Siniscalchi, Sabato Marco, Chen, Pin-Yu, Chng, Ensiong, and Yang, Chao-Han Huck
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Multimedia, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Recent studies have successfully shown that large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output. Specifically, an LLM is utilized to carry out a direct mapping from the N-best hypotheses list generated by an ASR system to the predicted output transcription. However, despite its effectiveness, GER introduces extra data uncertainty since the LLM is trained without taking into account acoustic information available in the speech signal. In this work, we aim to overcome such a limitation by infusing acoustic information before generating the predicted transcription through a novel late fusion solution termed Uncertainty-Aware Dynamic Fusion (UADF). UADF is a multimodal fusion approach implemented into an auto-regressive decoding process and works in two stages: (i) It first analyzes and calibrates the token-level LLM decision, and (ii) it then dynamically assimilates the information from the acoustic modality. Experimental evidence collected from various ASR tasks shows that UADF surpasses existing fusion mechanisms in several ways. It yields significant improvements in word error rate (WER) while mitigating data uncertainty issues in LLM and addressing the poor generalization relied with sole modality during fusion. We also demonstrate that UADF seamlessly adapts to audio-visual speech recognition., Comment: Accepted to ICLR 2024, 17 pages. This work will be open sourced under MIT license
Published: 2024

30. CIRCUS: an autonomous control system for antimatter, atomic and quantum physics experiments

Author: Volponi, Marco, Huck, Saiva, Caravita, Ruggero, Zielinski, Jakub, Kornakov, Georgy, Kasprowicz, Grzegorz, Nowicka, Dorota, Rauschendorfer, Tassilo, Rienäcker, Benjamin, Prelz, Francesco, Auzins, Marcis, Bergmann, Benedikt, Burian, Petr, Brusa, Roberto Sennen, Camper, Antoine, Castelli, Fabrizio, Ciuryło, Roman, Consolati, Giovanni, Doser, Michael, Glöggler, Lisa, Graczykowski, Łukasz, Grosbart, Malgorzata, Guatieri, Francesco, Gusakova, Nataly, Gustafsson, Fredrik, Haider, Stefan, Janik, Malgorzata, Khatri, Gunn, Kłosowski, Łukasz, Krumins, Valts, Lappo, Lidia, Linek, Adam, Malamant, Jan, Mariazzi, Sebastiano, Penasa, Luca, Petracek, Vojtech, Piwiński, Mariusz, Pospisil, Stanislav, Povolo, Luca, Rangwala, Sadiqali, Rawat, Bharat, Rodin, Volodymyr, Røhne, Ole, Sandaker, Heidi, Smolyanskiy, Petr, Sowiński, Tomasz, Tefelski, Dariusz, Vafeiadis, Theodoros, Welsch, Carsten, Wolz, Tim, Zawada, Michal, and Zurlo, Nicola
Subjects: Quantum Physics, General Relativity and Quantum Cosmology, Physics - Atomic Physics, Physics - Instrumentation and Detectors
Abstract: A powerful and robust control system is a crucial, often neglected, pillar of any modern, complex physics experiment that requires the management of a multitude of different devices and their precise time synchronisation. The AEgIS collaboration presents CIRCUS, a novel, autonomous control system optimised for time-critical experiments such as those at CERN's Antiproton Decelerator and, more broadly, in atomic and quantum physics research. Its setup is based on Sinara/ARTIQ and TALOS, integrating the ALPACA analysis pipeline, the last two developed entirely in AEgIS. It is suitable for strict synchronicity requirements and repeatable, automated operation of experiments, culminating in autonomous parameter optimisation via feedback from real-time data analysis. CIRCUS has been successfully deployed and tested in AEgIS; being experiment-agnostic and released open-source, other experiments can leverage its capabilities.
Published: 2024
Full Text: View/download PDF

31. The ab initio amorphous materials database: Empowering machine learning to decode diffusivity

Author: Zheng, Hui, Sivonxay, Eric, Gallant, Max, Luo, Ziyao, McDermott, Matthew, Huck, Patrick, and Persson, Kristin A.
Subjects: Condensed Matter - Materials Science
Abstract: Amorphous materials exhibit unique properties that make them suitable for various applications in science and technology, ranging from optical and electronic devices and solid-state batteries to protective coatings. However, data-driven exploration and design of amorphous materials is hampered by the absence of a comprehensive database covering a broad chemical space. In this work, we present the largest computed amorphous materials database to date, generated from systematic and accurate \textit{ab initio} molecular dynamics (AIMD) calculations. We also show how the database can be used in simple machine-learning models to connect properties to composition and structure, here specifically targeting ionic conductivity. These models predict the Li-ion diffusivity with speed and accuracy, offering a cost-effective alternative to expensive density functional theory (DFT) calculations. Furthermore, the process of computational quenching amorphous materials provides a unique sampling of out-of-equilibrium structures, energies, and force landscape, and we anticipate that the corresponding trajectories will inform future work in universal machine learning potentials, impacting design beyond that of non-crystalline materials., Comment: 28 pages, 7 figures
Published: 2024

32. Autonomy Loops for Monitoring, Operational Data Analytics, Feedback, and Response in HPC Operations

Author: Boito, Francieli, Brandt, Jim, Cardellini, Valeria, Carns, Philip, Ciorba, Florina M., Egan, Hilary, Eleliemy, Ahmed, Gentile, Ann, Gruber, Thomas, Hanson, Jeff, Haus, Utz-Uwe, Huck, Kevin, Ilsche, Thomas, Jakobsche, Thomas, Jones, Terry, Karlsson, Sven, Mueen, Abdullah, Ott, Michael, Patki, Tapasya, Peng, Ivy, Raghavan, Krishnan, Simms, Stephen, Shoga, Kathleen, Showerman, Michael, Tiwari, Devesh, Wilde, Torsten, and Yamamoto, Keiji
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Many High Performance Computing (HPC) facilities have developed and deployed frameworks in support of continuous monitoring and operational data analytics (MODA) to help improve efficiency and throughput. Because of the complexity and scale of systems and workflows and the need for low-latency response to address dynamic circumstances, automated feedback and response have the potential to be more effective than current human-in-the-loop approaches which are laborious and error prone. Progress has been limited, however, by factors such as the lack of infrastructure and feedback hooks, and successful deployment is often site- and case-specific. In this position paper we report on the outcomes and plans from a recent Dagstuhl Seminar, seeking to carve a path for community progress in the development of autonomous feedback loops for MODA, based on the established formalism of similar (MAPE-K) loops in autonomous computing and self-adaptive systems. By defining and developing such loops for significant cases experienced across HPC sites, we seek to extract commonalities and develop conventions that will facilitate interoperability and interchangeability with system hardware, software, and applications across different sites, and will motivate vendors and others to provide telemetry interfaces and feedback hooks to enable community development and pervasive deployment of MODA autonomy loops.
Published: 2024

33. Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

Author: Hu, Yuchen, Chen, Chen, Yang, Chao-Han Huck, Li, Ruizhe, Zhang, Chao, Chen, Pin-Yu, and Chng, EnSiong
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR), which leverages the rich linguistic knowledge and powerful reasoning ability of LLMs to improve recognition results. The latest work proposes a GER benchmark with HyPoradise dataset to learn the mapping from ASR N-best hypotheses to ground-truth transcription by efficient LLM finetuning, which shows great effectiveness but lacks specificity on noise-robust ASR. In this work, we extend the benchmark to noisy conditions and investigate if we can teach LLMs to perform denoising for GER just like what robust ASR do}, where one solution is introducing noise information as a conditioner into LLM. However, directly incorporating noise embeddings from audio encoder could harm the LLM tuning due to cross-modality gap. To this end, we propose to extract a language-space noise embedding from the N-best list to represent the noise conditions of source speech, which can promote the denoising process in GER. Furthermore, in order to enhance its representation ability of audio noise, we design a knowledge distillation (KD) approach via mutual information estimation to distill the real noise information in audio embeddings to our language embedding. Experiments on various latest LLMs demonstrate our approach achieves a new breakthrough with up to 53.9% correction improvement in terms of word error rate while with limited training data. Analysis shows that our language-space noise embedding can well represent the noise conditions of source speech, under which off-the-shelf LLMs show strong ability of language-space denoising., Comment: Accepted to ICLR 2024, Spotlight top 5%, 24 pages. This work will be open sourced at: https://github.com/YUCHEN005/RobustGER under MIT license
Published: 2024

34. Investigating Training Strategies and Model Robustness of Low-Rank Adaptation for Language Modeling in Speech Recognition

Author: Yu, Yu, Yang, Chao-Han Huck, Dinh, Tuan, Ryu, Sungho, Kolehmainen, Jari, Ren, Roger, Filimonov, Denis, Shivakumar, Prashanth G., Gandhe, Ankur, Rastow, Ariya, Xu, Jia, Bulyko, Ivan, and Stolcke, Andreas
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: The use of low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) has become increasing popular as a mainstream, resource-efficient modeling approach for memory-constrained hardware. In this study, we first explore how to enhance model performance by introducing various LoRA training strategies, achieving relative word error rate reductions of 3.50\% on the public Librispeech dataset and of 3.67\% on an internal dataset in the messaging domain. To further characterize the stability of LoRA-based second-pass speech recognition models, we examine robustness against input perturbations. These perturbations are rooted in homophone replacements and a novel metric called N-best Perturbation-based Rescoring Robustness (NPRR), both designed to measure the relative degradation in the performance of rescoring models. Our experimental results indicate that while advanced variants of LoRA, such as dynamic rank-allocated LoRA, lead to performance degradation in $1$-best perturbation, they alleviate the degradation in $N$-best perturbation. This finding is in comparison to fully-tuned models and vanilla LoRA tuning baselines, suggesting that a comprehensive selection is needed when using LoRA-based adaptation for compute-cost savings and robust language modeling.
Published: 2024

35. 2024 Roadmap on Magnetic Microscopy Techniques and Their Applications in Materials Science

Author: Christensen, D. V., Staub, U., Devidas, T. R., Kalisky, B., Nowack, K. C., Webb, J. L., Andersen, U. L., Huck, A., Broadway, D. A., Wagner, K., Maletinsky, P., van der Sar, T., Du, C. R., Yacoby, A., Collomb, D., Bending, S., Oral, A., Hug, H. J., Mandru, A. -O., Neu, V., Schumacher, H. W., Sievers, S., Saito, H., Khajetoorians, A. A., Hauptmann, N., Baumann, S., Eichler, A., Degen, C. L., McCord, J., Vogel, M., Fiebig, M., Fischer, P., Hierro-Rodriguez, A., Finizio, S., Dhesi, S. S., Donnelly, C., Büttner, Felix, Kfir, O., Hu, W., Zayko, S., Eisebitt, S., Pfau, B., Frömter, R., Kläui, M., Yasin, F. S., McMorran, B. J., Seki, S., Yu, X., Lubk, A., Wolf, D., Pryds, N., Makarov, D., and Poggio, M.
Subjects: Condensed Matter - Materials Science, Condensed Matter - Mesoscale and Nanoscale Physics, Condensed Matter - Strongly Correlated Electrons, Condensed Matter - Superconductivity, Quantum Physics
Abstract: Considering the growing interest in magnetic materials for unconventional computing, data storage, and sensor applications, there is active research not only on material synthesis but also characterisation of their properties. In addition to structural and integral magnetic characterisations, imaging of magnetization patterns, current distributions and magnetic fields at nano- and microscale is of major importance to understand the material responses and qualify them for specific applications. In this roadmap, we aim to cover a broad portfolio of techniques to perform nano- and microscale magnetic imaging using SQUIDs, spin center and Hall effect magnetometries, scanning probe microscopies, x-ray- and electron-based methods as well as magnetooptics and nanoMRI. The roadmap is aimed as a single access point of information for experts in the field as well as the young generation of students outlining prospects of the development of magnetic imaging technologies for the upcoming decade with a focus on physics, materials science, and chemistry of planar, 3D and geometrically curved objects of different material classes including 2D materials, complex oxides, semi-metals, multiferroics, skyrmions, antiferromagnets, frustrated magnets, magnetic molecules/nanoparticles, ionic conductors, superconductors, spintronic and spinorbitronic materials.
Published: 2024

36. Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks

Author: Everson, Kevin, Gu, Yile, Yang, Huck, Shivakumar, Prashanth Gurunath, Lin, Guan-Ting, Kolehmainen, Jari, Bulyko, Ivan, Gandhe, Ankur, Ghosh, Shalini, Hamza, Wael, Lee, Hung-yi, Rastrow, Ariya, and Stolcke, Andreas
Subjects: Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In the realm of spoken language understanding (SLU), numerous natural language understanding (NLU) methodologies have been adapted by supplying large language models (LLMs) with transcribed speech instead of conventional written text. In real-world scenarios, prior to input into an LLM, an automated speech recognition (ASR) system generates an output transcript hypothesis, where inherent errors can degrade subsequent SLU tasks. Here we introduce a method that utilizes the ASR system's lattice output instead of relying solely on the top hypothesis, aiming to encapsulate speech ambiguities and enhance SLU outcomes. Our in-context learning experiments, covering spoken question answering and intent classification, underline the LLM's resilience to noisy speech transcripts with the help of word confusion networks from lattices, bridging the SLU performance gap between using the top ASR hypothesis and an oracle upper bound. Additionally, we delve into the LLM's robustness to varying ASR performance conditions and scrutinize the aspects of in-context learning which prove the most influential., Comment: Accepted to ICASSP 2024
Published: 2024

37. Electrochemical Removal of HF from Carbonate-based $LiPF_6$-containing Li-ion Battery Electrolytes

Author: Ge, Xiaokun, Huck, Marten, Kuhlmann, Andreas, Tiemann, Michael, Weinberger, Christian, Xu, Xiaodan, Zhao, Zhenyu, and Steinrück, Hans-Georg
Subjects: Condensed Matter - Materials Science
Abstract: Due to the hydrolytic instability of $LiPF_6$ in carbonate-based solvents, HF is a typical impurity in Li-ion battery electrolytes. HF significantly influences the performance of Li-ion batteries, for example by impacting the formation of the solid electrolyte interphase at the anode and by affecting transition metal dissolution at the cathode. Additionally, HF complicates studying fundamental interfacial electrochemistry of Li-ion battery electrolytes, such as direct anion reduction, because it is electrocatalytically relatively unstable, resulting in LiF passivation layers. Methods to selectively remove ppm levels of HF from $LiPF_6$-containing carbonate-based electrolytes are limited. We introduce and benchmark a simple yet efficient electrochemical in situ method to selectively remove ppm amounts of HF from $LiPF_6$-containing carbonate-based electrolytes. The basic idea is the application of a suitable potential to a high surface-area metallic electrode upon which only HF reacts (electrocatalytically) while all other electrolyte components are unaffected under the respective conditions., Comment: 31 pages, 19 figures
Published: 2024

38. Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue

Author: Lin, Guan-Ting, Shivakumar, Prashanth Gurunath, Gandhe, Ankur, Yang, Chao-Han Huck, Gu, Yile, Ghosh, Shalini, Stolcke, Andreas, Lee, Hung-yi, and Bulyko, Ivan
Subjects: Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Large Language Models (LLMs) have demonstrated superior abilities in tasks such as chatting, reasoning, and question-answering. However, standard LLMs may ignore crucial paralinguistic information, such as sentiment, emotion, and speaking style, which are essential for achieving natural, human-like spoken conversation, especially when such information is conveyed by acoustic cues. We therefore propose Paralinguistics-enhanced Generative Pretrained Transformer (ParalinGPT), an LLM that utilizes text and speech modalities to better model the linguistic content and paralinguistic attributes of spoken dialogue. The model takes the conversational context of text, speech embeddings, and paralinguistic attributes as input prompts within a serialized multitasking multimodal framework. Specifically, our framework serializes tasks in the order of current paralinguistic attribute prediction, response paralinguistic attribute prediction, and response text generation with autoregressive conditioning. We utilize the Switchboard-1 corpus, including its sentiment labels as the paralinguistic attribute, as our spoken dialogue dataset. Experimental results indicate the proposed serialized multitasking method outperforms typical sequence classification techniques on current and response sentiment classification. Furthermore, leveraging conversational context and speech embeddings significantly improves both response text generation and sentiment prediction. Our proposed framework achieves relative improvements of 6.7%, 12.0%, and 3.5% in current sentiment accuracy, response sentiment accuracy, and response text BLEU score, respectively., Comment: Accepted by ICASSP 2024. Camera-ready version
Published: 2023

39. Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification

Author: Sundar, Anirudh S., Yang, Chao-Han Huck, Chan, David M., Ghosh, Shalini, Ravichandran, Venkatesh, and Nidadavolu, Phani Sankar
Subjects: Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Training large foundation models using self-supervised objectives on unlabeled data, followed by fine-tuning on downstream tasks, has emerged as a standard procedure. Unfortunately, the efficacy of this approach is often constrained by both limited fine-tuning compute and scarcity in labeled downstream data. We introduce Multimodal Attention Merging (MAM), an attempt that facilitates direct knowledge transfer from attention matrices of models rooted in high resource modalities, text and images, to those in resource-constrained domains, speech and audio, employing a zero-shot paradigm. MAM reduces the relative Word Error Rate (WER) of an Automatic Speech Recognition (ASR) model by up to 6.70%, and relative classification error of an Audio Event Classification (AEC) model by 10.63%. In cases where some data/compute is available, we present Learnable-MAM, a data-driven approach to merging attention matrices, resulting in a further 2.90% relative reduction in WER for ASR and 18.42% relative reduction in AEC compared to fine-tuning., Comment: 5 pages, 1 figure, ICASSP 2024 Workshop on Self-supervision in Audio, Speech and Beyond
Published: 2023

40. A Call for Change: Disrupting White Supremacy Culture in Dispositional Expectations of Teacher Candidates

Author: Stevens, Elizabeth Y., Driskill, Kristen M., Huck, Adam, Abbott, Diana, Robinson, Emily E., Barrett, Maryanne, Johnson, Denise, Polisseni, Amy, and Rushforth, Holley
Abstract: Today, in the context of the Black Lives Matter movement and an increased focus on antiracism, P-12 and higher education institutions are engaged in studying practices and resources from an (in)equity lens. This study explores disposition expectations for teacher candidates noted in the form of a rubric drawing on Critical Race Theory (Ladson-Billings & Tate, 1995). Characteristics of White Supremacy Culture (Okun, 2021) also grounded the study and were used as themes determined a priori. Researchers engaged in document analysis to analyze and code the rubric (Bowen, 2009; Corbin & Strauss, 2007). Findings show evidence of white supremacy culture in dispositional expectations. These findings reveal the need to challenge current expectations for teacher candidates to disrupt the white supremacy culture that permeates teacher education. Implications provide ideas for future research and practices that are flexible, collaborative, and critical.
Published: 2023

41. Development of an Instrument to Assess Teacher Perceptions of Social Emotional Learning (SEL) in PK-12 Schools

Author: Huck, Carla, Zhang, Jingshun, Garby, Lisa, and Li, Xiangming
Abstract: Research from numerous studies worldwide consistently shows that integrating social emotional learning (SEL) development into the structures and practices of schools is a path to creating safe, supportive, and inclusive environments. Researchers developed and validated an instrument to examine teachers' perceptions of SEL needs in their schools; their knowledge, skills, training, and experiences with SEL in their classrooms; and barriers to implementing practices or receiving professional development. A pilot study was conducted to assess the feasibility of the survey questionnaire, participant recruitment, and data collection and analysis processes. This paper describes the pilot testing process to ensure methodological rigor and content and face validity of the instrument before commencing the main research project surveying PK-12 teachers in Florida. This tool can be used in multiple sites and contexts to assess readiness and barriers to SEL program implementation, providing formative feedback for school leaders, curriculum developers, and teacher educators.
Published: 2023

42. Atomistic investigation of interface-dominated deformation mechanisms in nanolayered Cu–Ag eutectic alloy

Author: Xie, Hongtao, Zhou, Haofei, Chew, Huck Beng, and Li, Ruizhi
Published: 2024
Full Text: View/download PDF

43. Accelerating breast MRI acquisition with generative AI models

Author: Okolie, Augustine, Dirrichs, Timm, Huck, Luisa Charlotte, Nebelung, Sven, Arasteh, Soroosh Tayebi, Nolte, Teresa, Han, Tianyu, Kuhl, Christiane Katharina, and Truhn, Daniel
Published: 2024
Full Text: View/download PDF

44. Simulating stellar merger using HPX/Kokkos on A64FX on Supercomputer Fugaku

Author: Diehl, Patrick, Daiß, Gregor, Huck, Kevin, Marcello, Dominic, Shiber, Sagiv, Kaiser, Hartmut, and Pflüger, Dirk
Published: 2024
Full Text: View/download PDF

45. Chemical reservoir computation in a self-organizing reaction network

Author: Baltussen, Mathieu G., de Jong, Thijs J., Duez, Quentin, Robinson, William E., and Huck, Wilhelm T. S.
Published: 2024
Full Text: View/download PDF

46. Interfacial energy-mediated bulk transport across artificial cell membranes

Author: Tian, Jia-Qi, Chang, Mu-Yueh, Chen, Chen, Luo, Zhen-Hong, Huck, Wilhelm T. S., and Deng, Nan-Nan
Published: 2024
Full Text: View/download PDF

47. Diameter of Compact Riemann Surfaces

Author: Stepanyants, Huck, Beardon, Alan, Paton, Jeremy, and Krioukov, Dmitri
Published: 2024
Full Text: View/download PDF

48. Conditional Modeling Based Automatic Video Summarization

Author: Huang, Jia-Hong, Yang, Chao-Han Huck, Chen, Pin-Yu, Chen, Min-Hung, and Worring, Marcel
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Multimedia
Abstract: The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story. Video summarization methods mainly rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video. There are other non-visual factors, such as interestingness, representativeness, and storyline consistency that should also be considered for generating high-quality video summaries. Current methods do not adequately take into account these non-visual factors, resulting in suboptimal performance. In this work, a new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries. The method utilizes a conditional modeling perspective and introduces multiple meaningful random variables and joint distributions to characterize the key components of video summarization. Helper distributions are employed to improve the training of the model. A conditional attention module is designed to mitigate potential performance degradation in the presence of multi-modal input. The proposed video summarization method incorporates the above innovative design choices that aim to narrow the gap between human-generated and machine-generated video summaries. Extensive experiments show that the proposed approach outperforms existing methods and achieves state-of-the-art performance on commonly used video summarization datasets., Comment: This work has been submitted to the IEEE for possible publication. arXiv admin note: substantial text overlap with arXiv:2305.00455
Published: 2023

49. Reinforcement Learning for Safety Testing: Lessons from A Mobile Robot Case Study

Author: Huck, Tom P., Kaiser, Martin, Cronrath, Constantin, Lennartson, Bengt, Kröger, Torsten, and Asfour, Tamim
Subjects: Computer Science - Robotics
Abstract: Safety-critical robot systems need thorough testing to expose design flaws and software bugs which could endanger humans. Testing in simulation is becoming increasingly popular, as it can be applied early in the development process and does not endanger any real-world operators. However, not all safety-critical flaws become immediately observable in simulation. Some may only become observable under certain critical conditions. If these conditions are not covered, safety flaws may remain undetected. Creating critical tests is therefore crucial. In recent years, there has been a trend towards using Reinforcement Learning (RL) for this purpose. Guided by domain-specific reward functions, RL algorithms are used to learn critical test strategies. This paper presents a case study in which the collision avoidance behavior of a mobile robot is subjected to RL-based testing. The study confirms prior research which shows that RL can be an effective testing tool. However, the study also highlights certain challenges associated with RL-based testing, namely (i) a possible lack of diversity in test conditions and (ii) the phenomenon of reward hacking where the RL agent behaves in undesired ways due to a misalignment of reward and test specification. The challenges are illustrated with data and examples from the experiments, and possible mitigation strategies are discussed.
Published: 2023

50. Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing

Author: Koneru, Sai, Exel, Miriam, Huck, Matthias, and Niehues, Jan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large Language Models (LLM's) have demonstrated considerable success in various Natural Language Processing tasks, but they have yet to attain state-of-the-art performance in Neural Machine Translation (NMT). Nevertheless, their significant performance in tasks demanding a broad understanding and contextual processing shows their potential for translation. To exploit these abilities, we investigate using LLM's for MT and explore recent parameter-efficient fine-tuning techniques. Surprisingly, our initial experiments find that fine-tuning for translation purposes even led to performance degradation. To overcome this, we propose an alternative approach: adapting LLM's as Automatic Post-Editors (APE) rather than direct translators. Building on the LLM's exceptional ability to process and generate lengthy sequences, we also propose extending our approach to document-level translation. We show that leveraging Low-Rank-Adapter fine-tuning for APE can yield significant improvements across both sentence and document-level metrics while generalizing to out-of-domain data. Most notably, we achieve a state-of-the-art accuracy rate of 89\% on the ContraPro test set, which specifically assesses the model's ability to resolve pronoun ambiguities when translating from English to German. Lastly, we investigate a practical scenario involving manual post-editing for document-level translation, where reference context is made available. Here, we demonstrate that leveraging human corrections can significantly reduce the number of edits required for subsequent translations (Interactive Demo for integrating manual feedback can be found here: https://huggingface.co/spaces/skoneru/contextual_refinement_ende)., Comment: NAACL 2024
Published: 2023

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

3,215 results on '"Huck P."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources