Author: "Cucu, Horia" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Cucu, Horia"' showing total 147 results

Start Over Author "Cucu, Horia"

147 results on '"Cucu, Horia"'

1. Easy, Interpretable, Effective: openSMILE for voice deepfake detection

Author: Pascu, Octavian, Oneata, Dan, Cucu, Horia, and Müller, Nicolas M.
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence, Computer Science - Sound
Abstract: In this paper, we demonstrate that attacks in the latest ASVspoof5 dataset -- a de facto standard in the field of voice authenticity and deepfake detection -- can be identified with surprising accuracy using a small subset of very simplistic features. These are derived from the openSMILE library, and are scalar-valued, easy to compute, and human interpretable. For example, attack A10`s unvoiced segments have a mean length of 0.09 +- 0.02, while bona fide instances have a mean length of 0.18 +- 0.07. Using this feature alone, a threshold classifier achieves an Equal Error Rate (EER) of 10.3% for attack A10. Similarly, across all attacks, we achieve up to 0.8% EER, with an overall EER of 15.7 +- 6.0%. We explore the generalization capabilities of these features and find that some of them transfer effectively between attacks, primarily when the attacks originate from similar Text-to-Speech (TTS) architectures. This finding may indicate that voice anti-spoofing is, in part, a problem of identifying and remembering signatures or fingerprints of individual TTS systems. This allows to better understand anti-spoofing models and their challenges in real-world application.
Published: 2024

2. WavLM model ensemble for audio deepfake detection

Author: Combei, David, Stan, Adriana, Oneata, Dan, and Cucu, Horia
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Audio deepfake detection has become a pivotal task over the last couple of years, as many recent speech synthesis and voice cloning systems generate highly realistic speech samples, thus enabling their use in malicious activities. In this paper we address the issue of audio deepfake detection as it was set in the ASVspoof5 challenge. First, we benchmark ten types of pretrained representations and show that the self-supervised representations stemming from the wav2vec2 and wavLM families perform best. Of the two, wavLM is better when restricting the pretraining data to LibriSpeech, as required by the challenge rules. To further improve performance, we finetune the wavLM model for the deepfake detection task. We extend the ASVspoof5 dataset with samples from other deepfake detection datasets and apply data augmentation. Our final challenge submission consists of a late fusion combination of four models and achieves an equal error rate of 6.56% and 17.08% on the two evaluation sets., Comment: Accepted at ASVspoof Workshop 2024
Published: 2024

3. Towards generalisable and calibrated synthetic speech detection with self-supervised representations

Author: Pascu, Octavian, Stan, Adriana, Oneata, Dan, Oneata, Elisabeta, and Cucu, Horia
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: Generalisation -- the ability of a model to perform well on unseen data -- is crucial for building reliable deepfake detectors. However, recent studies have shown that the current audio deepfake models fall short of this desideratum. In this work we investigate the potential of pretrained self-supervised representations in building general and calibrated audio deepfake detection models. We show that large frozen representations coupled with a simple logistic regression classifier are extremely effective in achieving strong generalisation capabilities: compared to the RawNet2 model, this approach reduces the equal error rate from 30.9% to 8.8% on a benchmark of eight deepfake datasets, while learning less than 2k parameters. Moreover, the proposed method produces considerably more reliable predictions compared to previous approaches making it more suitable for realistic use., Comment: Accepted at Interspeech 2024
Published: 2023

4. Adaptation of Whisper models to child speech recognition

Author: Jain, Rishabh, Barcovschi, Andrei, Yiwere, Mariam, Corcoran, Peter, and Cucu, Horia
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence
Abstract: Automatic Speech Recognition (ASR) systems often struggle with transcribing child speech due to the lack of large child speech datasets required to accurately train child-friendly ASR models. However, there are huge amounts of annotated adult speech datasets which were used to create multilingual ASR models, such as Whisper. Our work aims to explore whether such models can be adapted to child speech to improve ASR for children. In addition, we compare Whisper child-adaptations with finetuned self-supervised models, such as wav2vec2. We demonstrate that finetuning Whisper on child speech yields significant improvements in ASR performance on child speech, compared to non finetuned Whisper models. Additionally, utilizing self-supervised Wav2vec2 models that have been finetuned on child speech outperforms Whisper finetuning., Comment: Accepted in Interspeech 2023
Published: 2023

5. Adaptive Planning Search Algorithm for Analog Circuit Verification

Author: Manolache, Cristian, Andronache, Cristina, Caranica, Alexandru, Cucu, Horia, Buzo, Andi, Diaconu, Cristian, and Pelz, Georg
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Integrated circuit verification has gathered considerable interest in recent times. Since these circuits keep growing in complexity year by year, pre-Silicon (pre-SI) verification becomes ever more important, in order to ensure proper functionality. Thus, in order to reduce the time needed for manually verifying ICs, we propose a machine learning (ML) approach, which uses less simulations. This method relies on an initial evaluation set of operating condition configurations (OCCs), in order to train Gaussian process (GP) surrogate models. By using surrogate models, we can propose further, more difficult OCCs. Repeating this procedure for several iterations has shown better GP estimation of the circuit's responses, on both synthetic and real circuits, resulting in a better chance of finding the worst case, or even failures, for certain circuit responses. Thus, we show that the proposed approach is able to provide OCCs closer to the specifications for all circuits and identify a failure (specification violation) for one of the responses of a real circuit.
Published: 2023

6. FlexLip: A Controllable Text-to-Lip System

Author: Oneata, Dan, Lorincz, Beata, Stan, Adriana, and Cucu, Horia
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: The task of converting text input into video content is becoming an important topic for synthetic media generation. Several methods have been proposed with some of them reaching close-to-natural performances in constrained tasks. In this paper, we tackle a subissue of the text-to-video generation problem, by converting the text into lip landmarks. However, we do this using a modular, controllable system architecture and evaluate each of its individual components. Our system, entitled FlexLip, is split into two separate modules: text-to-speech and speech-to-lip, both having underlying controllable deep neural network architectures. This modularity enables the easy replacement of each of its components, while also ensuring the fast adaptation to new speaker identities by disentangling or projecting the input features. We show that by using as little as 20 min of data for the audio generation component, and as little as 5 min for the speech-to-lip component, the objective measures of the generated lip landmarks are comparable with those obtained when using a larger set of training samples. We also introduce a series of objective evaluation measures over the complete flow of our system by taking into consideration several aspects of the data and system configuration. These aspects pertain to the quality and amount of training data, the use of pretrained models, and the data contained therein, as well as the identity of the target speaker; with regard to the latter, we show that we can perform zero-shot lip adaptation to an unseen identity by simply updating the shape of the lips in our model., Comment: 16 pages, 4 tables, 4 figures
Published: 2022
Full Text: View/download PDF

7. Automated Circuit Sizing with Multi-objective Optimization based on Differential Evolution and Bayesian Inference

Author: Visan, Catalin, Pascu, Octavian, Stanescu, Marius, Sandru, Elena-Diana, Diaconu, Cristian, Buzo, Andi, Pelz, Georg, and Cucu, Horia
Subjects: Computer Science - Machine Learning
Abstract: With the ever increasing complexity of specifications, manual sizing for analog circuits recently became very challenging. Especially for innovative, large-scale circuits designs, with tens of design variables, operating conditions and conflicting objectives to be optimized, design engineers spend many weeks, running time-consuming simulations, in their attempt at finding the right configuration. Recent years brought machine learning and optimization techniques to the field of analog circuits design, with evolutionary algorithms and Bayesian models showing good results for circuit sizing. In this context, we introduce a design optimization method based on Generalized Differential Evolution 3 (GDE3) and Gaussian Processes (GPs). The proposed method is able to perform sizing for complex circuits with a large number of design variables and many conflicting objectives to be optimized. While state-of-the-art methods reduce multi-objective problems to single-objective optimization and potentially induce a prior bias, we search directly over the multi-objective space using Pareto dominance and ensure that diverse solutions are provided to the designers to choose from. To the best of our knowledge, the proposed method is the first to specifically address the diversity of the solutions, while also focusing on minimizing the number of simulations required to reach feasible configurations. We evaluate the introduced method on two voltage regulators showing different levels of complexity and we highlight that the proposed innovative candidate selection method and survival policy leads to obtaining feasible solutions, with a high degree of diversity, much faster than with GDE3 or Bayesian Optimization-based algorithms., Comment: 48 pages, 13 figures, submitted to Knowledge Based Systems
Published: 2022

8. Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations

Author: Oneata, Dan and Cucu, Horia
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Multimodal speech recognition aims to improve the performance of automatic speech recognition (ASR) systems by leveraging additional visual information that is usually associated to the audio input. While previous approaches make crucial use of strong visual representations, e.g. by finetuning pretrained image recognition networks, significantly less attention has been paid to its counterpart: the speech component. In this work, we investigate ways of improving the base speech recognition system by following similar techniques to the ones used for the visual encoder, namely, transferring representations and data augmentation. First, we show that starting from a pretrained ASR significantly improves the state-of-the-art performance; remarkably, even when building upon a strong unimodal system, we still find gains by including the visual modality. Second, we employ speech data augmentation techniques to encourage the multimodal system to attend to the visual stimuli. This technique replaces previously used word masking and comes with the benefits of being conceptually simpler and yielding consistent improvements in the multimodal setting. We provide empirical results on three multimodal datasets, including the newly introduced Localized Narratives., Comment: Accepted at the Multimodal Learning and Applications Workshop (MULA) from CVPR 2022
Published: 2022

9. A Wav2vec2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech Recognition

Author: Jain, Rishabh, Barcovschi, Andrei, Yiwere, Mariam, Bigioi, Dan, Corcoran, Peter, and Cucu, Horia
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: Despite recent advancements in deep learning technologies, Child Speech Recognition remains a challenging task. Current Automatic Speech Recognition (ASR) models require substantial amounts of annotated data for training, which is scarce. In this work, we explore using the ASR model, wav2vec2, with different pretraining and finetuning configurations for self-supervised learning (SSL) toward improving automatic child speech recognition. The pretrained wav2vec2 models were finetuned using different amounts of child speech training data, adult speech data, and a combination of both, to discover the optimum amount of data required to finetune the model for the task of child ASR. Our trained model achieves the best Word Error Rate (WER) of 7.42 on the MyST child speech dataset, 2.99 on the PFSTAR dataset and 12.47 on the CMU KIDS dataset as compared to any other previous methods. Our models outperformed the wav2vec2 BASE 960 on child speech which is considered a state-of-the-art ASR model on adult speech by just using 10 hours of child speech data in finetuning. The analysis of different types of training data and their effect on inference is also provided by using a combination of datasets in pretraining, finetuning and inference., Comment: Preprint, Submitted to IEEE Access
Published: 2022

10. A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis

Author: Jain, Rishabh, Yiwere, Mariam, Bigioi, Dan, Corcoran, Peter, and Cucu, Horia
Subjects: Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Speech synthesis has come a long way as current text-to-speech (TTS) models can now generate natural human-sounding speech. However, most of the TTS research focuses on using adult speech data and there has been very limited work done on child speech synthesis. This study developed and validated a training pipeline for fine-tuning state-of-the-art (SOTA) neural TTS models using child speech datasets. This approach adopts a multi-speaker TTS retuning workflow to provide a transfer-learning pipeline. A publicly available child speech dataset was cleaned to provide a smaller subset of approximately 19 hours, which formed the basis of our fine-tuning experiments. Both subjective and objective evaluations were performed using a pretrained MOSNet for objective evaluation and a novel subjective framework for mean opinion score (MOS) evaluations. Subjective evaluations achieved the MOS of 3.95 for speech intelligibility, 3.89 for voice naturalness, and 3.96 for voice consistency. Objective evaluation using a pretrained MOSNet showed a strong correlation between real and synthetic child voices. Speaker similarity was also verified by calculating the cosine similarity between the embeddings of utterances. An automatic speech recognition (ASR) model is also used to provide a word error rate (WER) comparison between the real and synthetic child voices. The final trained TTS model was able to synthesize child-like speech from reference audio samples as short as 5 seconds., Comment: Submitted to IEEE ACCESS
Published: 2022

11. Speaker disentanglement in video-to-speech conversion

Author: Oneata, Dan, Stan, Adriana, and Cucu, Horia
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: The task of video-to-speech aims to translate silent video of lip movement to its corresponding audio signal. Previous approaches to this task are generally limited to the case of a single speaker, but a method that accounts for multiple speakers is desirable as it allows to i) leverage datasets with multiple speakers or few samples per speaker; and ii) control speaker identity at inference time. In this paper, we introduce a new video-to-speech architecture and explore ways of extending it to the multi-speaker scenario: we augment the network with an additional speaker-related input, through which we feed either a discrete identity or a speaker embedding. Interestingly, we observe that the visual encoder of the network is capable of learning the speaker identity from the lip region of the face alone. To better disentangle the two inputs -- linguistic content and speaker identity -- we add adversarial losses that dispel the identity from the video embeddings. To the best of our knowledge, the proposed method is the first to provide important functionalities such as i) control of the target voice and ii) speech synthesis for unseen identities over the state-of-the-art, while still maintaining the intelligibility of the spoken output., Comment: To appear in Proc of EUSIPCO 2021
Published: 2021

12. An evaluation of word-level confidence estimation for end-to-end automatic speech recognition

Author: Oneata, Dan, Caranica, Alexandru, Stan, Adriana, and Cucu, Horia
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Sound
Abstract: Quantifying the confidence (or conversely the uncertainty) of a prediction is a highly desirable trait of an automatic system, as it improves the robustness and usefulness in downstream tasks. In this paper we investigate confidence estimation for end-to-end automatic speech recognition (ASR). Previous work has addressed confidence measures for lattice-based ASR, while current machine learning research mostly focuses on confidence measures for unstructured deep learning. However, as the ASR systems are increasingly being built upon deep end-to-end methods, there is little work that tries to develop confidence measures in this context. We fill this gap by providing an extensive benchmark of popular confidence methods on four well-known speech datasets. There are two challenges we overcome in adapting existing methods: working on structured data (sequences) and obtaining confidences at a coarser level than the predictions (words instead of tokens). Our results suggest that a strong baseline can be obtained by scaling the logits by a learnt temperature, followed by estimating the confidence as the negative entropy of the predictive distribution and, finally, sum pooling to aggregate at word level., Comment: Accepted at SLT 2021
Published: 2021

13. The Quo Vadis submission at Traffic4cast 2019

Author: Oneata, Dan, Alexandru, Cosmin George, Stanescu, Marius, Pascu, Octavian, Magan, Alexandru, Postelnicu, Adrian, and Cucu, Horia
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We describe the submission of the Quo Vadis team to the Traffic4cast competition, which was organized as part of the NeurIPS 2019 series of challenges. Our system consists of a temporal regression module, implemented as $1\times1$ 2d convolutions, augmented with spatio-temporal biases. We have found that using biases is a straightforward and efficient way to include seasonal patterns and to improve the performance of the temporal regression model. Our implementation obtains a mean squared error of $9.47\times 10^{-3}$ on the test data, placing us on the eight place team-wise. We also present our attempts at incorporating spatial correlations into the model; however, contrary to our expectations, adding this type of auxiliary information did not benefit the main system. Our code is available at https://github.com/danoneata/traffic4cast., Comment: Extended abstract for the Traffic4cast competition from NeurIPS 2019
Published: 2019

14. Kite: Automatic speech recognition for unmanned aerial vehicles

Author: Oneata, Dan and Cucu, Horia
Subjects: Computer Science - Sound, Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This paper addresses the problem of building a speech recognition system attuned to the control of unmanned aerial vehicles (UAVs). Even though UAVs are becoming widespread, the task of creating voice interfaces for them is largely unaddressed. To this end, we introduce a multi-modal evaluation dataset for UAV control, consisting of spoken commands and associated images, which represent the visual context of what the UAV "sees" when the pilot utters the command. We provide baseline results and address two research directions: (i) how robust the language models are, given an incomplete list of commands at train time; (ii) how to incorporate visual information in the language model. We find that recurrent neural networks (RNNs) are a solution to both tasks: they can be successfully adapted using a small number of commands and they can be extended to use visual cues. Our results show that the image-based RNN outperforms its text-only counterpart even if the command-image training associations are automatically generated and inherently imperfect. The dataset and our code are available at http://kite.speed.pub.ro., Comment: 5 pages, accepted at Interspeech 2019
Published: 2019

15. Multimodal speech recognition for unmanned aerial vehicles

Author: Oneață, Dan and Cucu, Horia
Published: 2021
Full Text: View/download PDF

16. Synthetic Benchmark for Data-Driven Pre-Si Analogue Circuit Verification.

Author: Manolache, Cristian, Andronache, Cristina, Guzu, Alexandru, Caranica, Alexandru, Cucu, Horia, Buzo, Andi, and Pelz, Georg
Subjects: MATHEMATICAL functions, OPTIMIZATION algorithms, GAUSSIAN processes, BENCHMARK problems (Computer science)
Abstract: As the demand for more complex circuits increases, so does the duration of creating and testing them. The most time-consuming task in circuit development is notoriously the verification process, primarily due to the large number of simulations (hundreds or even thousands) required to ensure that the circuits adhere to the specifications regardless of the operating conditions. In order to decrease the number of required simulations, various verification algorithms have been proposed over the years, but this comes with an additional issue: the thorough validation of the algorithms. As simulations on real circuits are significantly time-consuming, synthetic circuits can offer precious insights into the capabilities of the verification algorithm. In this paper, we propose a benchmark of synthetic circuits that can be used to exhaustively validate pre-silicon (Pre-Si) verification algorithms. The newly created benchmark consists of 900 synthetic circuits (mathematical functions) with input dimensions (variables) ranging from 2 to 10. We design the benchmark to include functions of varying complexities, reflecting real-world circuit expectations. Eventually, we use this benchmark to evaluate a previously proposed state-of-the-art Pre-Si circuit verification algorithm. We show that this algorithm generally obtains relative verification errors below 2% with fewer than 150 simulations if the circuits have less than six to seven operating conditions. In addition, we demonstrate that some of the most complex circuits in the benchmark pose serious problems to the verification algorithm: the worst case is not found even when 200 simulations are used. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Exploring Native and Non-Native English Child Speech Recognition With Whisper

Author: Jain, Rishabh, primary, Barcovschi, Andrei, additional, Yiwere, Mariam Yahayah, additional, Corcoran, Peter, additional, and Cucu, Horia, additional
Published: 2024
Full Text: View/download PDF

18. Performance vs. hardware requirements in state-of-the-art automatic speech recognition

Author: Georgescu, Alexandru-Lucian, Pappalardo, Alessandro, Cucu, Horia, and Blott, Michaela
Published: 2021
Full Text: View/download PDF

19. Autonomous System for Performing Dexterous, Human-Level Manipulation Tasks as Response to External Stimuli in Real Time

Author: Neacșu, Ana, Burileanu, Corneliu, Cucu, Horia, Akan, Ozgur, Series Editor, Bellavista, Paolo, Series Editor, Cao, Jiannong, Series Editor, Coulson, Geoffrey, Series Editor, Dressler, Falko, Series Editor, Ferrari, Domenico, Series Editor, Gerla, Mario, Series Editor, Kobayashi, Hisashi, Series Editor, Palazzo, Sergio, Series Editor, Sahni, Sartaj, Series Editor, Shen, Xuemin (Sherman), Series Editor, Stan, Mircea, Series Editor, Xiaohua, Jia, Series Editor, Zomaya, Albert Y., Series Editor, Fratu, Octavian, editor, Militaru, Nicolae, editor, and Halunga, Simona, editor
Published: 2018
Full Text: View/download PDF

20. Recent Experiments and Findings in Baby Cry Classification

Author: Șandru, Elena-Diana, Buzo, Andi, Cucu, Horia, Burileanu, Corneliu, Akan, Ozgur, Series Editor, Bellavista, Paolo, Series Editor, Cao, Jiannong, Series Editor, Coulson, Geoffrey, Series Editor, Dressler, Falko, Series Editor, Ferrari, Domenico, Series Editor, Gerla, Mario, Series Editor, Kobayashi, Hisashi, Series Editor, Palazzo, Sergio, Series Editor, Sahni, Sartaj, Series Editor, Shen, Xuemin (Sherman), Series Editor, Stan, Mircea, Series Editor, Xiaohua, Jia, Series Editor, Zomaya, Albert Y., Series Editor, Fratu, Octavian, editor, Militaru, Nicolae, editor, and Halunga, Simona, editor
Published: 2018
Full Text: View/download PDF

21. Multilingual Low-Resourced Prototype System for Voice-Controlled Intelligent Building Applications

Author: Caranica, Alexandru, Georgescu, Lucian, Vulpe, Alexandru, Cucu, Horia, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Rocha, Álvaro, editor, Adeli, Hojjat, editor, Reis, Luís Paulo, editor, and Costanzo, Sandra, editor
Published: 2018
Full Text: View/download PDF

22. A Study on Initial Population Sampling for Multi-Objective Optimization based on Differential Evolution and Bayesian Inference

Author: Nicolae, Georgian, primary, Visan, Catalin, additional, Curavale, Dan, additional, Boldeanu, Mihai, additional, Cucu, Horia, additional, Buzo, Andi, additional, and Pelz, Georg, additional
Published: 2023
Full Text: View/download PDF

23. Applying Multi-objective Acquisition Function Ensemble for a candidate proposal algorithm

Author: Manolache, Cristian, primary, Andronache, Cristina Maria, additional, Caranica, Alexandru, additional, Cucu, Horia, additional, Buzo, Andi, additional, Diaconu, Cristian Vasile, additional, and Pelz, Georg, additional
Published: 2023
Full Text: View/download PDF

24. Adaptation of Whisper models to child speech recognition

Author: Jain, Rishabh, primary, Barcovschi, Andrei, additional, Yiwere, Mariam, additional, Corcoran, Peter, additional, and Cucu, Horia, additional
Published: 2023
Full Text: View/download PDF

25. The SpeeD--ZevoTech submission at DISPLACE 2023

Author: Pirlogeanu, Gabriel, primary, Oneata, Dan, additional, Georgescu, Alexandru-Lucian, additional, and Cucu, Horia, additional
Published: 2023
Full Text: View/download PDF

26. Efficient Multi-Objective Optimization for PVT Variation-Aware Circuit Sizing Using Surrogate Models and Smart Corner Sampling

Author: Pascu, Octavian, primary, Visan, Catalin, additional, Nicolae, Georgian, additional, Boldeanu, Mihai, additional, Cucu, Horia, additional, Diaconu, Cristian, additional, Buzo, Andi, additional, and Pelz, Georg, additional
Published: 2023
Full Text: View/download PDF

27. Towards generalisable and calibrated synthetic speech detection with self-supervised representations

Author: Oneata, Dan, Stan, Adriana, Pascu, Octavian, Oneata, Elisabeta, Cucu, Horia, Oneata, Dan, Stan, Adriana, Pascu, Octavian, Oneata, Elisabeta, and Cucu, Horia
Abstract: Generalisation -- the ability of a model to perform well on unseen data -- is crucial for building reliable deep fake detectors. However, recent studies have shown that the current audio deep fake models fall short of this desideratum. In this paper we show that pretrained self-supervised representations followed by a simple logistic regression classifier achieve strong generalisation capabilities, reducing the equal error rate from 30% to 8% on the newly introduced In-the-Wild dataset. Importantly, this approach also produces considerably better calibrated models when compared to previous approaches. This means that we can trust our model's predictions more and use these for downstream tasks, such as uncertainty estimation. In particular, we show that the entropy of the estimated probabilities provides a reliable way of rejecting uncertain samples and further improving the accuracy., Comment: Submitted to ICASSP 2024
Published: 2023

28. A WAV2VEC2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech Recognition

Author: Jain, Rishabh, primary, Barcovschi, Andrei, additional, Yiwere, Mariam Yahayah, additional, Bigioi, Dan, additional, Corcoran, Peter, additional, and Cucu, Horia, additional
Published: 2023
Full Text: View/download PDF

29. Augmentation Techniques for Adult-Speech to Generate Child-Like Speech Data Samples at Scale.

Author: Yiwere, Mariam Y., primary, Barcovschi, Andrei, additional, Jain, Rishabh, additional, Cucu, Horia, additional, and Corcoran, Peter, additional
Published: 2023
Full Text: View/download PDF

30. Automated circuit sizing with multi-objective optimization based on differential evolution and Bayesian inference

Author: Vişan, Cătălin, primary, Pascu, Octavian, additional, Stănescu, Marius, additional, Şandru, Elena-Diana, additional, Diaconu, Cristian, additional, Buzo, Andi, additional, Pelz, Georg, additional, and Cucu, Horia, additional
Published: 2022
Full Text: View/download PDF

31. Statistical Error Correction Methods for Domain-Specific ASR Systems

Author: Cucu, Horia, Buzo, Andi, Besacier, Laurent, Burileanu, Corneliu, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Dediu, Adrian-Horia, editor, Martín-Vide, Carlos, editor, Mitkov, Ruslan, editor, and Truthe, Bianca, editor
Published: 2013
Full Text: View/download PDF

32. Efficient Modeling of PVT Variation for Mixed-Signal Circuit Sizing

Author: Pascu, Octavian, primary, Visan, Catalin, additional, Stanescu, Marius, additional, Cucu, Horia, additional, Diaconu, Cristian, additional, Buzo, Andi, additional, and Pelz, Georg, additional
Published: 2022
Full Text: View/download PDF

33. Enhanced Candidate Selection Algorithm for Analog Circuit Verification

Author: Manolache, Cristian, primary, Caranica, Alexandru, additional, Cucu, Horia, additional, Buzo, Andi, additional, Diaconu, Cristian, additional, and Pelz, Georg, additional
Published: 2022
Full Text: View/download PDF

34. SMT-based ASR domain adaptation methods for under-resourced languages: Application to Romanian

Author: Cucu, Horia, Buzo, Andi, Besacier, Laurent, and Burileanu, Corneliu
Published: 2014
Full Text: View/download PDF

35. Unsupervised deep learning models for aerosol layers segmentation

Author: Manolache, Cristian, primary, Boldeanu, Mihai, additional, Talianu, Camelia, additional, and Cucu, Horia, additional
Published: 2022
Full Text: View/download PDF

36. Advanced Operating Conditions Search applied in Analog Circuit Verification

Author: Manolache, Cristian, primary, Caranica, Alexandru, additional, Stanescu, Marius, additional, Cucu, Horia, additional, Buzo, Andi, additional, Diaconu, Cristian, additional, and Pelz, Georg, additional
Published: 2022
Full Text: View/download PDF

37. Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations

Author: Oneata, Dan, primary and Cucu, Horia, additional
Published: 2022
Full Text: View/download PDF

38. FlexLip: A Controllable Text-to-Lip System

Author: Oneață, Dan, primary, Lőrincz, Beáta, additional, Stan, Adriana, additional, and Cucu, Horia, additional
Published: 2022
Full Text: View/download PDF

39. A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis

Author: Jain, Rishabh, primary, Yiwere, Mariam Yahayah, additional, Bigioi, Dan, additional, Corcoran, Peter, additional, and Cucu, Horia, additional
Published: 2022
Full Text: View/download PDF

40. Automatic Pollen Classification and Segmentation Using U-Nets and Synthetic Data

Author: Boldeanu, Mihai, primary, Gonzalez-Alonso, Monica, additional, Cucu, Horia, additional, Burileanu, Corneliu, additional, Maya-Manzano, Jose Maria, additional, and Buters, Jeroen Titus Maria, additional
Published: 2022
Full Text: View/download PDF

41. Multi-Input Convolutional Neural Networks for Automatic Pollen Classification

Author: Boldeanu, Mihai, primary, Cucu, Horia, additional, Burileanu, Corneliu, additional, and Mărmureanu, Luminița, additional
Published: 2021
Full Text: View/download PDF

42. Versatility and Population Diversity of Evolutionary Algorithms in Automated Circuit Sizing Applications

Author: Visan, Catalin, primary, Pascu, Octavian, additional, Stanescu, Marius, additional, Cucu, Horia, additional, Diaconu, Cristian, additional, Buzo, Andi, additional, and Pelz, Georg, additional
Published: 2021
Full Text: View/download PDF

43. Improvements of SpeeD’s Romanian ASR system during ReTeRom project

Author: Georgescu, Alexandru-Lucian, primary, Cucu, Horia, additional, and Burileanu, Corneliu, additional
Published: 2021
Full Text: View/download PDF

44. MARS: the First Romanian Pollen Dataset using a Rapid-E Particle Analyzer

Author: Boldeanu, Mihai, primary, Marin, Cristina, additional, Ene, Dragos, additional, Marmureanu, Luminita, additional, Cucu, Horia, additional, and Burileanu, Corneliu, additional
Published: 2021
Full Text: View/download PDF

45. Multi-Objective Optimization Algorithms for Automated Circuit Sizing of Analog/ Mixed-Signal Circuits

Author: Stanescu, Marius, primary, Visan, Catalin, additional, Sandu, Gabriel, additional, Cucu, Horia, additional, Diaconu, Cristian, additional, Buzo, Andi, additional, and Pelz, Georg, additional
Published: 2021
Full Text: View/download PDF

46. Speaker disentanglement in video-to-speech conversion

Author: Oneata, Dan, primary, Stan, Adriana, additional, and Cucu, Horia, additional
Published: 2021
Full Text: View/download PDF

47. Automatic Pollen Classification Using Convolutional Neural Networks

Author: Boldeanu, Mihai, primary, Cucu, Horia, additional, Burileanu, Corneliu, additional, and Marmureanu, Luminita, additional
Published: 2021
Full Text: View/download PDF

48. Statistical Error Correction Methods for Domain-Specific ASR Systems

Author: Cucu, Horia, primary, Buzo, Andi, additional, Besacier, Laurent, additional, and Burileanu, Corneliu, additional
Published: 2013
Full Text: View/download PDF

49. Revisiting SincNet: An Evaluation of Feature and Network Hyperparameters for Speaker Recognition

Author: Oneata, Dan, primary, Georgescu, Lucian, additional, Cucu, Horia, additional, Burileanu, Dragos, additional, and Burileanu, Corneliu, additional
Published: 2021
Full Text: View/download PDF

50. Data-Filtering Methods for Self-Training of Automatic Speech Recognition Systems

Author: Georgescu, Alexandru-Lucian, primary, Manolache, Cristian, additional, Oneata, Dan, additional, Cucu, Horia, additional, and Burileanu, Corneliu, additional
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

147 results on '"Cucu, Horia"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources