Author: "Font, Frederic" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Font, Frederic"' showing total 94 results

Start Over Author "Font, Frederic"

94 results on '"Font, Frederic"'

1. The language of sound search: Examining User Queries in Audio Search Engines

Author: Weck, Benno and Font, Frederic
Subjects: Computer Science - Computation and Language, Computer Science - Human-Computer Interaction, Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This study examines textual, user-written search queries within the context of sound search engines, encompassing various applications such as foley, sound effects, and general audio retrieval. Current research inadequately addresses real-world user needs and behaviours in designing text-based audio retrieval systems. To bridge this gap, we analysed search queries from two sources: a custom survey and Freesound website query logs. The survey was designed to collect queries for an unrestricted, hypothetical sound search engine, resulting in a dataset that captures user intentions without the constraints of existing systems. This dataset is also made available for sharing with the research community. In contrast, the Freesound query logs encompass approximately 9 million search requests, providing a comprehensive view of real-world usage patterns. Our findings indicate that survey queries are generally longer than Freesound queries, suggesting users prefer detailed queries when not limited by system constraints. Both datasets predominantly feature keyword-based queries, with few survey participants using full sentences. Key factors influencing survey queries include the primary sound source, intended usage, perceived location, and the number of sound sources. These insights are crucial for developing user-centred, effective text-based audio retrieval systems, enhancing our understanding of user behaviour in sound search contexts., Comment: Accepted at DCASE 2024. Supplementary materials at https://doi.org/10.5281/zenodo.13622537
Published: 2024

2. Heterogeneous sound classification with the Broad Sound Taxonomy and Dataset

Author: Anastasopoulou, Panagiota, Torrey, Jessica, Serra, Xavier, and Font, Frederic
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Automatic sound classification has a wide range of applications in machine listening, enabling context-aware sound processing and understanding. This paper explores methodologies for automatically classifying heterogeneous sounds characterized by high intra-class variability. Our study evaluates the classification task using the Broad Sound Taxonomy, a two-level taxonomy comprising 28 classes designed to cover a heterogeneous range of sounds with semantic distinctions tailored for practical user applications. We construct a dataset through manual annotation to ensure accuracy, diverse representation within each class and relevance in real-world scenarios. We compare a variety of both traditional and modern machine learning approaches to establish a baseline for the task of heterogeneous sound classification. We investigate the role of input features, specifically examining how acoustically derived sound representations compare to embeddings extracted with pre-trained deep neural networks that capture both acoustic and semantic information about sounds. Experimental results illustrate that audio embeddings encoding acoustic and semantic information achieve higher accuracy in the classification task. After careful analysis of classification errors, we identify some underlying reasons for failure and propose actions to mitigate them. The paper highlights the need for deeper exploration of all stages of classification, understanding the data and adopting methodologies capable of effectively handling data complexity and generalizing in real-world sound environments., Comment: DCASE2024, post-print, 5 pages, 2 figures
Published: 2024

3. Evaluating Neural Networks Architectures for Spring Reverb Modelling

Author: Papaleo, Francesco, Lizarraga-Seijas, Xavier, and Font, Frederic
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence
Abstract: Reverberation is a key element in spatial audio perception, historically achieved with the use of analogue devices, such as plate and spring reverb, and in the last decades with digital signal processing techniques that have allowed different approaches for Virtual Analogue Modelling (VAM). The electromechanical functioning of the spring reverb makes it a nonlinear system that is difficult to fully emulate in the digital domain with white-box modelling techniques. In this study, we compare five different neural network architectures, including convolutional and recurrent models, to assess their effectiveness in replicating the characteristics of this audio effect. The evaluation is conducted on two datasets at sampling rates of 16 kHz and 48 kHz. This paper specifically focuses on neural audio architectures that offer parametric control, aiming to advance the boundaries of current black-box modelling techniques in the domain of spring reverberation., Comment: 8 pages, 7 figures, 2 tables
Published: 2024

4. FSD50K: An Open Dataset of Human-Labeled Sound Events

Author: Fonseca, Eduardo, Favory, Xavier, Pons, Jordi, Font, Frederic, and Serra, Xavier
Subjects: Computer Science - Sound, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing, Statistics - Machine Learning
Abstract: Most existing datasets for sound event recognition (SER) are relatively small and/or domain-specific, with the exception of AudioSet, based on over 2M tracks from YouTube videos and encompassing over 500 sound classes. However, AudioSet is not an open dataset as its official release consists of pre-computed audio features. Downloading the original audio tracks can be problematic due to YouTube videos gradually disappearing and usage rights issues. To provide an alternative benchmark dataset and thus foster SER research, we introduce FSD50K, an open dataset containing over 51k audio clips totalling over 100h of audio manually labeled using 200 classes drawn from the AudioSet Ontology. The audio clips are licensed under Creative Commons licenses, making the dataset freely distributable (including waveforms). We provide a detailed description of the FSD50K creation process, tailored to the particularities of Freesound data, including challenges encountered and solutions adopted. We include a comprehensive dataset characterization along with discussion of limitations and key factors to allow its audio-informed usage. Finally, we conduct sound event classification experiments to provide baseline systems as well as insight on the main factors to consider when splitting Freesound audio data for SER. Our goal is to develop a dataset to be widely adopted by the community as a new open benchmark for SER research., Comment: Accepted version in TASLP. Main updates include: estimation of the amount of label noise in FSD50K, SNR comparison between FSD50K and AudioSet, improved description of evaluation metrics including equations, clarification of experimental methodology and some results, some content moved to Appendix for readability. https://ieeexplore.ieee.org/document/9645159
Published: 2020

5. The Freesound Loop Dataset and Annotation Tool

Author: Ramires, Antonio, Font, Frederic, Bogdanov, Dmitry, Smith, Jordan B. L., Yang, Yi-Hsuan, Ching, Joann, Chen, Bo-Yu, Wu, Yueh-Kao, Wei-Han, Hsu, and Serra, Xavier
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: Music loops are essential ingredients in electronic music production, and there is a high demand for pre-recorded loops in a variety of styles. Several commercial and community databases have been created to meet this demand, but most are not suitable for research due to their strict licensing. We present the Freesound Loop Dataset (FSLD), a new large-scale dataset of music loops annotated by experts. The loops originate from Freesound, a community database of audio recordings released under Creative Commons licenses, so the audio in our dataset may be redistributed. The annotations include instrument, tempo, meter, key and genre tags. We describe the methodology used to assemble and annotate the data, and report on the distribution of tags in the data and inter-annotator agreement. We also present to the community an online loop annotator tool that we developed. To illustrate the usefulness of FSLD, we present short case studies on using it to estimate tempo and key, generate music tracks, and evaluate a loop separation algorithm. We anticipate that the community will find yet more uses for the data, in applications from automatic loop characterisation to algorithmic composition., Comment: This work will be presented in the 21st International Society for Music Information Retrieval (ISMIR2020). Annotator website: http://mtg.upf.edu/fslannotator Dataset: https://zenodo.org/record/3967852
Published: 2020

6. Search Result Clustering in Collaborative Sound Collections

Author: Favory, Xavier, Font, Frederic, and Serra, Xavier
Subjects: Computer Science - Information Retrieval, Computer Science - Human-Computer Interaction, Computer Science - Machine Learning, Computer Science - Sound, H.3.3
Abstract: The large size of nowadays' online multimedia databases makes retrieving their content a difficult and time-consuming task. Users of online sound collections typically submit search queries that express a broad intent, often making the system return large and unmanageable result sets. Search Result Clustering is a technique that organises search-result content into coherent groups, which allows users to identify useful subsets in their results. Obtaining coherent and distinctive clusters that can be explored with a suitable interface is crucial for making this technique a useful complement of traditional search engines. In our work, we propose a graph-based approach using audio features for clustering diverse sound collections obtained when querying large online databases. We propose an approach to assess the performance of different features at scale, by taking advantage of the metadata associated with each sound. This analysis is complemented with an evaluation using ground-truth labels from manually annotated datasets. We show that using a confidence measure for discarding inconsistent clusters improves the quality of the partitions. After identifying the most appropriate features for clustering, we conduct an experiment with users performing a sound design task, in order to evaluate our approach and its user interface. A qualitative analysis is carried out including usability questionnaires and semi-structured interviews. This provides us with valuable new insights regarding the features that promote efficient interaction with the clusters., Comment: 8 pages, 4 figures, Proceedings of the 2020 International Conference on Multimedia Retrieval (ICMR 20), June 8-11, 2020, Dublin, Ireland. ACM, NewYork, NY, USA, 8 pages
Published: 2020
Full Text: View/download PDF

7. Model-agnostic Approaches to Handling Noisy Labels When Training Sound Event Classifiers

Author: Fonseca, Eduardo, Font, Frederic, and Serra, Xavier
Subjects: Computer Science - Sound, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing, Statistics - Machine Learning
Abstract: Label noise is emerging as a pressing issue in sound event classification. This arises as we move towards larger datasets that are difficult to annotate manually, but it is even more severe if datasets are collected automatically from online repositories, where labels are inferred through automated heuristics applied to the audio content or metadata. While learning from noisy labels has been an active area of research in computer vision, it has received little attention in sound event classification. Most recent computer vision approaches against label noise are relatively complex, requiring complex networks or extra data resources. In this work, we evaluate simple and efficient model-agnostic approaches to handling noisy labels when training sound event classifiers, namely label smoothing regularization, mixup and noise-robust loss functions. The main advantage of these methods is that they can be easily incorporated to existing deep learning pipelines without need for network modifications or extra resources. We report results from experiments conducted with the FSDnoisy18k dataset. We show that these simple methods can be effective in mitigating the effect of label noise, providing up to 2.5\% of accuracy boost when incorporated to two different CNNs, while requiring minimal intervention and computational overhead., Comment: WASPAA 2019
Published: 2019

8. Audio tagging with noisy labels and minimal supervision

Author: Fonseca, Eduardo, Plakal, Manoj, Font, Frederic, Ellis, Daniel P. W., and Serra, Xavier
Subjects: Computer Science - Sound, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing, Statistics - Machine Learning
Abstract: This paper introduces Task 2 of the DCASE2019 Challenge, titled "Audio tagging with noisy labels and minimal supervision". This task was hosted on the Kaggle platform as "Freesound Audio Tagging 2019". The task evaluates systems for multi-label audio tagging using a large set of noisy-labeled data, and a much smaller set of manually-labeled data, under a large vocabulary setting of 80 everyday sound classes. In addition, the proposed dataset poses an acoustic mismatch problem between the noisy train set and the test set due to the fact that they come from different web audio sources. This can correspond to a realistic scenario given by the difficulty in gathering large amounts of manually labeled data. We present the task setup, the FSDKaggle2019 dataset prepared for this scientific evaluation, and a baseline system consisting of a convolutional neural network. All these resources are freely available., Comment: DCASE2019 Workshop
Published: 2019

9. Learning Sound Event Classifiers from Web Audio with Noisy Labels

Author: Fonseca, Eduardo, Plakal, Manoj, Ellis, Daniel P. W., Font, Frederic, Favory, Xavier, and Serra, Xavier
Subjects: Computer Science - Sound, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing, Statistics - Machine Learning
Abstract: As sound event classification moves towards larger datasets, issues of label noise become inevitable. Web sites can supply large volumes of user-contributed audio and metadata, but inferring labels from this metadata introduces errors due to unreliable inputs, and limitations in the mapping. There is, however, little research into the impact of these errors. To foster the investigation of label noise in sound event classification we present FSDnoisy18k, a dataset containing 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data. We characterize the label noise empirically, and provide a CNN baseline system. Experiments suggest that training with large amounts of noisy data can outperform training with smaller amounts of carefully-labeled data. We also show that noise-robust loss functions can be effective in improving performance in presence of corrupted labels., Comment: International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2019)
Published: 2019

10. Facilitating the Manual Annotation of Sounds When Using Large Taxonomies

Author: Favory, Xavier, Fonseca, Eduardo, Font, Frederic, and Serra, Xavier
Subjects: Computer Science - Information Retrieval, Computer Science - Human-Computer Interaction, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Properly annotated multimedia content is crucial for supporting advances in many Information Retrieval applications. It enables, for instance, the development of automatic tools for the annotation of large and diverse multimedia collections. In the context of everyday sounds and online collections, the content to describe is very diverse and involves many different types of concepts, often organised in large hierarchical structures called taxonomies. This makes the task of manually annotating content arduous. In this paper, we present our user-centered development of two tools for the manual annotation of audio content from a wide range of types. We conducted a preliminary evaluation of functional prototypes involving real users. The goal is to evaluate them in a real context, engage in discussions with users, and inspire new ideas. A qualitative analysis was carried out including usability questionnaires and semi-structured interviews. This revealed interesting aspects to consider when developing tools for the manual annotation of audio content with labels drawn from large hierarchical taxonomies., Comment: 5 pages, 5 figures, IEEE FRUCT International Workshop on Semantic Audio and the Internet of Things
Published: 2018

11. General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline

Author: Fonseca, Eduardo, Plakal, Manoj, Font, Frederic, Ellis, Daniel P. W., Favory, Xavier, Pons, Jordi, and Serra, Xavier
Subjects: Computer Science - Sound, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing, Statistics - Machine Learning
Abstract: This paper describes Task 2 of the DCASE 2018 Challenge, titled "General-purpose audio tagging of Freesound content with AudioSet labels". This task was hosted on the Kaggle platform as "Freesound General-Purpose Audio Tagging Challenge". The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology. We present the task, the dataset prepared for the competition, and a baseline system., Comment: Camera ready for DCASE Workshop 2018
Published: 2018

12. Sound Sharing and Retrieval

Author: Font, Frederic, Roma, Gerard, Serra, Xavier, Virtanen, Tuomas, editor, Plumbley, Mark D., editor, and Ellis, Dan, editor
Published: 2018
Full Text: View/download PDF

13. Embodying Theoretical Research in Music Cognition: Four Proposals for Theory-Driven Experimentation

Author: Ballus, Andreu, Arnau, Eric, Nieto, Oriol, Font, Frederic, and Torrents, Alba
Published: 2014

14. The Internet of Sounds: Convergent Trends, Insights, and Future Directions

Author: Turchet, Luca, primary, Lagrange, Mathieu, additional, Rottondi, Cristina, additional, Fazekas, György, additional, Peters, Nils, additional, Østergaard, Jan, additional, Font, Frederic, additional, Bäckström, Tom, additional, and Fischione, Carlo, additional
Published: 2023
Full Text: View/download PDF

15. Leveraging Online Audio Commons Content for Media Production

Author: Xambó, Anna, primary, Font, Frederic, additional, Fazekas, György, additional, and Barthet, Mathieu, additional
Published: 2019
Full Text: View/download PDF

16. The Internet of Sounds : Convergent Trends, Insights, and Future Directions

Author: Turchet, Luca, Lagrange, Mathieu, Rottondi, Cristina, Fazekas, Gyorgy, Peters, Nils, Ostergaard, Jan, Font, Frederic, Backstrom, Tom, Fischione, Carlo, Turchet, Luca, Lagrange, Mathieu, Rottondi, Cristina, Fazekas, Gyorgy, Peters, Nils, Ostergaard, Jan, Font, Frederic, Backstrom, Tom, and Fischione, Carlo
Abstract: Current sound-based practices and systems developed in both academia and industry point to convergent research trends that bring together the field of sound and music Computing with that of the Internet of Things. This article proposes a vision for the emerging field of the Internet of Sounds (IoS), which stems from such disciplines. The IoS relates to the network of Sound Things, i.e., devices capable of sensing, acquiring, processing, actuating, and exchanging data serving the purpose of communicating sound-related information. In the IoS paradigm, which merges under a unique umbrella the emerging fields of the Internet of Musical Things and the Internet of Audio Things, heterogeneous devices dedicated to musical and nonmusical tasks can interact and cooperate with one another and with other things connected to the Internet to facilitate sound-based services and applications that are globally available to the users. We survey the state-of-the-art in this space, discuss the technological and nontechnological challenges ahead of us and propose a comprehensive research agenda for the field., QC 20230724
Published: 2023
Full Text: View/download PDF

17. Extending Sound Sample Descriptions through the Extraction of Community Knowledge

Author: Font, Frederic, Serra, Xavier, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Konstan, Joseph A., editor, Conejo, Ricardo, editor, Marzo, José L., editor, and Oliver, Nuria, editor
Published: 2011
Full Text: View/download PDF

18. Class-based tag recommendation and user-based evaluation in online audio clip sharing

Author: Font, Frederic, Serrà, Joan, and Serra, Xavier
Published: 2014
Full Text: View/download PDF

19. Sound Sharing and Retrieval

Author: Font, Frederic, primary, Roma, Gerard, additional, and Serra, Xavier, additional
Published: 2017
Full Text: View/download PDF

20. FSD50K: An Open Dataset of Human-Labeled Sound Events

Author: Fonseca, Eduardo, primary, Favory, Xavier, additional, Pons, Jordi, additional, Font, Frederic, additional, and Serra, Xavier, additional
Published: 2022
Full Text: View/download PDF

21. Proceedings of the 6th Workshop on Detection and Classication of Acoustic Scenes and Events (DCASE 2021)

Author: Font, Frederic, Mesaros, Annamaria, P. W. Ellis, Daniel, Fonseca, Eduardo, Fuentes, Magdalena, and Elizalde, Benjamin
Abstract: This volume is a collection of the papers presented at the Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE 2021) in Barcelona (online), Spain, on November 15-19, 2021.
Published: 2021
Full Text: View/download PDF

22. SOURCE: a Freesound Community Music Sampler

Author: Font, Frederic, primary
Published: 2021
Full Text: View/download PDF

23. Toward interpretable polyphonic sound event detection with attention maps based on local prototypes

Author: Zinemanas, Pablo, Rocamora, Martín, Fonseca, Eduardo, Font, Frederic, and Serra, Xavier
Subjects: Sound event detection, Prototypes, Interpretability
Abstract: Comunicació presentada a: DCASE 2021 celebrat del 15 al 19 de novembre de 2021 de manera virtual. Understanding the reasons behind the predictions of deep neural networks is a pressing concern as it can be critical in several application scenarios. In this work, we present a novel interpretable model for polyphonic sound event detection. It tackles one of the limitations of our previous work, i.e. the difficulty to deal with a multi-label setting properly. The proposed architecture incorporates a prototype layer and an attention mechanism. The network learns a set of local prototypes in the latent space representing a patch in the input representation. Besides, it learns attention maps for positioning the local prototypes and reconstructing the latent space. Then, the predictions are solely based on the attention maps. Thus, the explanations provided are the attention maps and the corresponding local prototypes. Moreover, one can reconstruct the prototypes to the audio domain for inspection. The obtained results in urban sound event detection are comparable to that of two opaque baselines but with fewer parameters while offering interpretability.
Published: 2021

24. An Interpretable Deep Learning Model for Automatic Sound Classification

Author: Zinemanas, Pablo, primary, Rocamora, Martín, additional, Miron, Marius, additional, Font, Frederic, additional, and Serra, Xavier, additional
Published: 2021
Full Text: View/download PDF

25. Freesound: 15 anys de sons Creative Commons

Author: Font, Frederic, Universitat Pompeu Fabra. Music Technology Group, and Intensiu de Col·leccions Digitals (5è : 2020 : Barcelona)
Subjects: Dipòsits digitals, Digitalització, 02 - Biblioteconomia. Documentació
Published: 2020

26. Extending Sound Sample Descriptions through the Extraction of Community Knowledge

Author: Font, Frederic, primary and Serra, Xavier, additional
Published: 2011
Full Text: View/download PDF

27. Search Result Clustering in Collaborative Sound Collections

Author: Favory, Xavier, primary, Font, Frederic, additional, and Serra, Xavier, additional
Published: 2020
Full Text: View/download PDF

28. Model-Agnostic Approaches To Handling Noisy Labels When Training Sound Event Classifiers

Author: Fonseca, Eduardo, primary, Font, Frederic, additional, and Serra, Xavier, additional
Published: 2019
Full Text: View/download PDF

29. Learning Sound Event Classifiers from Web Audio with Noisy Labels

Author: Fonseca, Eduardo, primary, Plakal, Manoj, additional, Ellis, Daniel P. W., additional, Font, Frederic, additional, Favory, Xavier, additional, and Serra, Xavier, additional
Published: 2019
Full Text: View/download PDF

30. Audio Tagging with Noisy Labels and Minimal Supervision

Author: Fonseca, Eduardo, primary, Plakal, Manoj, additional, Font, Frederic, additional, Ellis, Daniel P.W., additional, and Serra, Xavier, additional
Published: 2019
Full Text: View/download PDF

31. Improving Audio Retrieval through Loudness Profile Categorization

Author: Parekh, Sanjeel, primary, Font, Frederic, additional, and Serra, Xavier, additional
Published: 2016
Full Text: View/download PDF

32. Analysis of the impact of a tag recommendation system in a real-world Folksonomy

Author: Consejo Superior de Investigaciones Científicas (España), Generalitat de Catalunya, Ministerio de Ciencia e Innovación (España), Ministerio de Economía y Competitividad (España), Font, Frederic, Serra, Joan, Serra, Xavier, Consejo Superior de Investigaciones Científicas (España), Generalitat de Catalunya, Ministerio de Ciencia e Innovación (España), Ministerio de Economía y Competitividad (España), Font, Frederic, Serra, Joan, and Serra, Xavier
Abstract: Collaborative tagging systems have emerged as a successful solution for annotating contributed resources to online sharing platforms, facilitating searching, browsing, and organizing their contents. To aid users in the annotation process, several tag recommendation methods have been proposed. It has been repeatedly hypothesized that these methods should contribute to improving annotation quality and reducing the cost of the annotation process. It has been also hypothesized that these methods should contribute to the consolidation of the vocabulary of collaborative tagging systems. However, to date, no empirical and quantitative result supports these hypotheses. In this work, we deeply analyze the impact of a tag recommendation system in the folksonomy of Freesound, a real-world and large-scale online sound sharing platform. Our results suggest that tag recommendation effectively increases vocabulary sharing among users of the platform. In addition, tag recommendation is shown to contribute to the convergence of the vocabulary as well as to a partial increase in the quality of annotations. However, according to our analysis, the cost of the annotation process does not seem to be effectively reduced. Our work is relevant to increase our understanding about the nature of tag recommendation systems and points to future directions for the further development of those systems and their analysis. © 2015 ACM.
Published: 2015

33. FREESOUND DATASETS: A PLATFORM FOR THE CREATION OF OPEN AUDIO DATASETS.

Author: Fonseca, Eduardo, Pons, Jordi, Favory, Xavier, Font, Frederic, Bogdanov, Dmitry, Ferraro, Andres, Oramas, Sergio, Porter, Alastair, and Serra, Xavier
Subjects: BIG data, MACHINE learning, SCIENTIFIC community, MUSICAL analysis, INFORMATION retrieval
Abstract: Openly available datasets are a key factor in the advancement of data-driven research approaches, including many of the ones used in sound and music computing. In the last few years, quite a number of new audio datasets have been made available but there are still major shortcomings in many of them to have a significant research impact. Among the common shortcomings are the lack of transparency in their creation and the difficulty of making them completely open and sharable. They often do not include clear mechanisms to amend errors and many times they are not large enough for current machine learning needs. This paper introduces Freesound Datasets, an online platform for the collaborative creation of open audio datasets based on principles of transparency, openness, dynamic character, and sustainability. As a proof-of-concept, we present an early snapshot of a large-scale audio dataset built using this platform. It consists of audio samples from Freesound organised in a hierarchy based on the AudioSet Ontology. We believe that building and maintaining datasets following the outlined principles and using open tools and collaborative approaches like the ones presented here will have a significant impact in our research community. [ABSTRACT FROM AUTHOR]
Published: 2017

34. Analysis of the Impact of a Tag Recommendation System in a Real-World Folksonomy

Author: Font, Frederic, primary, Serrà, Joan, additional, and Serra, Xavier, additional
Published: 2015
Full Text: View/download PDF

35. Class-based tag recommendation and user-based evaluation in online audio clip sharing

Author: Consejo Superior de Investigaciones Científicas (España), European Commission, Ministerio de Economía y Competitividad (España), Generalitat de Catalunya, Ministerio de Ciencia e Innovación (España), Font, Frederic, Serra, Joan, Serra, Xavier, Consejo Superior de Investigaciones Científicas (España), European Commission, Ministerio de Economía y Competitividad (España), Generalitat de Catalunya, Ministerio de Ciencia e Innovación (España), Font, Frederic, Serra, Joan, and Serra, Xavier
Abstract: Online sharing platforms often rely on collaborative tagging systems for annotating content. In this way, users themselves annotate and describe the shared contents using textual labels, commonly called tags. These annotations typically suffer from a number of issues such as tag scarcity or ambiguous labelling. Hence, to minimise some of these issues, tag recommendation systems can be employed to suggest potentially relevant tags during the annotation process. In this work, we present a tag recommendation system and evaluate it in the context of an online platform for audio clip sharing. By exploiting domain-specific knowledge, the system we present is able to classify an audio clip among a number of predefined audio classes and to produce specific tag recommendations for the different classes. We perform an in-depth user-based evaluation of the recommendation method along with two baselines and a former version that we described in previous work. This user-based evaluation is further complemented with a prediction-based evaluation following standard information retrieval methodologies. Results show that the proposed tag recommendation method brings a statistically significant improvement over the previous method and the baselines. In addition, we report a number of findings based on the detailed analysis of user feedback provided during the evaluation process. The considered methods, when applied to real-world collaborative tagging systems, should serve the purpose of consolidating the tagging vocabulary and improving the quality of content annotations. © 2014 Elsevier B.V. All rights reserved.
Published: 2014

36. TEMPO ESTIMATION FOR MUSIC LOOPS AND A SIMPLE CONFIDENCE MEASURE.

Author: Font, Frederic and Serra, Xavier
Subjects: TEMPO (Music theory), LOOPS (Group theory), INFORMATION retrieval, ALGORITHMS, MUSIC archives
Abstract: Tempo estimation is a common task within the music information retrieval community, but existing works are rarely evaluated with datasets of music loops and the algorithms are not tailored to this particular type of content. In addition to this, existing works on tempo estimation do not put an emphasis on providing a confidence value that indicates how reliable their tempo estimations are. In current music creation contexts, it is common for users to search for and use loops shared in online repositories. These loops are typically not produced by professionals and lack annotations. Hence, the existence of reliable tempo estimation algorithms becomes necessary to enhance the reusability of loops shared in such repositories. In this paper, we test six existing tempo estimation algorithms against four music loop datasets containing more than 35k loops. We also propose a simple and computationally cheap confidence measure that can be applied to any existing algorithm to estimate the reliability of their tempo predictions when applied to music loops. We analyse the accuracy of the algorithms in combination with our proposed confidence measure, and see that we can significantly improve the algorithms' performance when only considering music loops with high estimated confidence. [ABSTRACT FROM AUTHOR]
Published: 2016

37. Folksonomy-based tag recommendation for collaborative tagging systems

Author: Font, Frederic, Serra, Joan, Serra, Xavier, Font, Frederic, Serra, Joan, and Serra, Xavier
Abstract: Collaborative tagging has emerged as a common solution for labelling and organising online digital content. However, collaborative tagging systems typically suffer from a number of issues such as tag scarcity or ambiguous labelling. As a result, the organisation and browsing of tagged content is far from being optimal. In this work the authors present a general scheme for building a folksonomy-based tag recommendation system to help users tagging online content resources. Based on this general scheme, the authorse describe eight tag recommendation methods and extensively evaluate them with data coming from two real-world large-scale datasets of tagged images and sound clips. Their results show that the proposed methods can effectively recommend relevant tags, given a set of input tags and tag co-occurrence information. Moreover, the authors show how novel strategies for selecting the appropriate number of tags to be recommended can significantly improve methods performances. Approaches such as the one presented here can be useful to obtain more comprehensive and coherent descriptions of tagged resources, thus allowing a better organisation, browsing and reuse of online content. Moreover, they can increase the value of folksonomies as reliable sources for knowledge-mining. Copyright © 2013, IGI Global.
Published: 2013

38. Fungal postoperative spondylodiscitis due to Scedosporium prolificans

Author: Garcia-Vidal, Carolina, Cabellos, Carmen, Ayats, Josefina, Font, Frederic, Ferran, Enrique, and Fernandez-Viladrich, Pedro
Published: 2009
Full Text: View/download PDF

39. Freesound technical demo

Author: Font, Frederic, primary, Roma, Gerard, additional, and Serra, Xavier, additional
Published: 2013
Full Text: View/download PDF

40. Folksonomy-Based Tag Recommendation for Collaborative Tagging Systems

Author: Font, Frederic, primary, Serrà, Joan, additional, and Serra, Xavier, additional
Published: 2013
Full Text: View/download PDF

41. Characterization of the Freesound online community

Author: Font, Frederic, primary, Roma, Gerard, additional, Herrera, Perfecto, additional, and Serra, Xavier, additional
Published: 2012
Full Text: View/download PDF

42. Small world networks and creativity in audio clip sharing

Author: Roma, Gerard, primary, Herrera, Perfecto, additional, Zanin, Massimiliano, additional, Toral, Sergio L., additional, Font, Frederic, additional, and Serra, Xavier, additional
Published: 2012
Full Text: View/download PDF

43. Audio Commons: Bringing Creative Commons Audio Content to the Creative Industries

Author: 'Font, Frederic

44. Audio-visual scene classification: analysis of DCASE 2021 Challenge submissions

Author: Wang, Shanshan, Heittola, Toni, Mesaros, Annamaria, Virtanen, Tuomas, Font, Frederic, Mesaros, Annamaria, P.W. Ellis, Daniel, Fonseca, Eduardo, Fuentes, Magdalena, Elizalde, Benjamin, Tampere University, and Computing Sciences
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Audio and Speech Processing (eess.AS), 213 Electronic, automation and communications engineering, electronics, FOS: Electrical engineering, electronic engineering, information engineering, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This paper presents the details of the Audio-Visual Scene Classification task in the DCASE 2021 Challenge (Task 1 Subtask B). The task is concerned with classification using audio and video modalities, using a dataset of synchronized recordings. This task has attracted 43 submissions from 13 different teams around the world. Among all submissions, more than half of the submitted systems have better performance than the baseline. The common techniques among the top systems are the usage of large pretrained models such as ResNet or EfficientNet which are trained for the task-specific problem. Fine-tuning, transfer learning, and data augmentation techniques are also employed to boost the performance. More importantly, multi-modal methods using both audio and video are employed by all the top 5 teams. The best system among all achieved a logloss of 0.195 and accuracy of 93.8\%, compared to the baseline system with logloss of 0.662 and accuracy of 77.1%. publishedVersion
Published: 2021

45. Continual Learning for Automated Audio Captioning Using The Learning Without Forgetting Approach

Author: Berg, Jan, Drossos, Konstantinos, Font, Frederic, Mesaros, Annamaria, P.W. Ellis, Daniel, Fonseca, Eduardo, Fuentes, Magdalena, Elizalde, Benjamin, Tampere University, and Computing Sciences
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Machine Learning, 213 Electronic, automation and communications engineering, electronics, ComputingMilieux_LEGALASPECTSOFCOMPUTING, Clotho, Computer Science - Sound, Machine Learning (cs.LG), learning without forgetting, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, WaveTransformer, ComputingMilieux_COMPUTERSANDSOCIETY, AudioCaps, automated audio captioning, continual learning, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Automated audio captioning (AAC) is the task of automatically creating textual descriptions (i.e. captions) for the contents of a general audio signal. Most AAC methods are using existing datasets to optimize and/or evaluate upon. Given the limited information held by the AAC datasets, it is very likely that AAC methods learn only the information contained in the utilized datasets. In this paper we present a first approach for continuously adapting an AAC method to new information, using a continual learning method. In our scenario, a pre-optimized AAC method is used for some unseen general audio signals and can update its parameters in order to adapt to the new information, given a new reference caption. We evaluate our method using a freely available, pre-optimized AAC method and two freely available AAC datasets. We compare our proposed method with three scenarios, two of training on one of the datasets and evaluating on the other and a third of training on one dataset and fine-tuning on the other. Obtained results show that our method achieves a good balance between distilling new knowledge and not forgetting the previous one., The authors wish to acknowledge CSC-IT Center for Science, Finland, for computational resources. K. Drossos has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 957337, project MARVEL.
Published: 2021
Full Text: View/download PDF

46. Diversity and bias in audio captioning datasets

Author: Martin Morato, Irene, Mesaros, Annamaria, Font, Frederic, Mesaros, Annamaria, P.W. Ellis, Daniel, Fonseca, Eduardo, Fuentes, Magdalena, Elizalde, Benjamin, Tampere University, and Computing Sciences
Subjects: 213 Electronic, automation and communications engineering, electronics
Abstract: Describing soundscapes in sentences allows better understanding of the acoustic scene than a single label indicating the acoustic scene class or a set of audio tags indicating the sound events active in the audio clip. In addition, the richness of natural language allows a range of possible descriptions for the same acoustic scene. In this work, we address the diversity obtained when collecting descriptions of soundscapes using crowdsourcing. We study how much the collection of audio captions can be guided by the instructions given in the annotation task, by analysing the possible bias introduced by auxiliary information provided in the annotation process. Our study shows that even when given hints on the audio content, different annotators describe the same soundscape using different vocabulary. In automatic captioning, hints provided as audio tags represent grounding textual information that facilitates guiding the captioning output towards specific concepts. We also release a new dataset of audio captions and audio tags produced by multiple annotators for a subset of the TAU Urban Acoustic Scenes 2018 dataset, suitable for studying guided captioning. publishedVersion
Published: 2021

47. Fairness and underspecification in acoustic scene classification: The case for disaggregated evaluations

Author: Triantafyllopoulos, Andreas, Milling, Manuel, Drossos, Konstantinos, Schuller, Björn W., Font, Frederic, Mesaros, Annamaria, P.W. Ellis, Daniel, Fonseca, Eduardo, Fuentes, Magdalena, Elizalde, Benjamin, Tampere University, and Computing Sciences
Subjects: FOS: Computer and information sciences, transparency, Computer Science - Machine Learning, Computer Science - Computers and Society, evaluation, 213 Electronic, automation and communications engineering, electronics, Computers and Society (cs.CY), fairness, acoustic scene classification, ddc:004, ethics, Machine Learning (cs.LG)
Abstract: Underspecification and fairness in machine learning (ML) applications have recently become two prominent issues in the ML community. Acoustic scene classification (ASC) applications have so far remained unaffected by this discussion, but are now becoming increasingly used in real-world systems where fairness and reliability are critical aspects. In this work, we argue for the need of a more holistic evaluation process for ASC models through disaggregated evaluations. This entails taking into account performance differences across several factors, such as city, location, and recording device. Although these factors play a well-understood role in the performance of ASC models, most works report single evaluation metrics taking into account all different strata of a particular dataset. We argue that metrics computed on specific sub-populations of the underlying data contain valuable information about the expected real-world behaviour of proposed systems, and their reporting could improve the transparency and trustability of such systems. We demonstrate the effectiveness of the proposed evaluation process in uncovering underspecification and fairness problems exhibited by several standard ML architectures when trained on two widely-used ASC datasets. Our evaluation shows that all examined architectures exhibit large biases across all factors taken into consideration, and in particular with respect to the recording location. Additionally, different architectures exhibit different biases even though they are trained with the same experimental configurations.
Published: 2021
Full Text: View/download PDF

48. Assessment of Self-Attention on Learned Features For Sound Event Localization and Detection

Author: Parthasaarathy Sudarsanam, Archontis Politis, Drossos, Konstantinos, Font, Frederic, Mesaros, Annamaria, P.W. Ellis, Daniel, Fonseca, Eduardo, Fuentes, Magdalena, Elizalde, Benjamin, Tampere University, and Computing Sciences
Subjects: FOS: Computer and information sciences, Sound (cs.SD), acoustic scene analysis, Audio and Speech Processing (eess.AS), 213 Electronic, automation and communications engineering, electronics, FOS: Electrical engineering, electronic engineering, information engineering, Self- attenion, Sound event localization and detection, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Joint sound event localization and detection (SELD) is an emerging audio signal processing task adding spatial dimensions to acoustic scene analysis and sound event detection. A popular approach to modeling SELD jointly is using convolutional recur- rent neural network (CRNN) models, where CNNs learn high-level features from multi-channel audio input and the RNNs learn temporal relationships from these high-level features. However, RNNs have some drawbacks, such as a limited capability to model long temporal dependencies and slow training and inference times due to their sequential processing nature. Recently, a few SELD studies used multi-head self-attention (MHSA), among other innovations in their models. MHSA and the related transformer networks have shown state-of-the-art performance in various domains. While they can model long temporal dependencies, they can also be parallelized efficiently. In this paper, we study in detail the effect of MHSA on the SELD task. Specifically, we examined the effects of replacing the RNN blocks with self-attention layers. We studied the influence of stacking multiple self-attention blocks, using multiple attention heads in each self-attention block, and the effect of position embeddings and layer normalization. Evaluation on the DCASE 2021 SELD (task 3) development data set shows a significant improvement in all employed metrics compared to the baseline CRNN accompanying the task., The authors wish to acknowledge CSC-IT Center for Science, Finland, for computational resources. K. Drossos has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 957337, project MARVEL.
Published: 2021

49. Low-complexity acoustic scene classification for multi-device audio: analysis of DCASE 2021 Challenge systems

Author: Martin Morato, Irene, Heittola, Toni, Mesaros, Annamaria, Virtanen, Tuomas, Font, Frederic, Mesaros, Annamaria, P.W. Ellis, Daniel, Fonseca, Eduardo, Fuentes, Magdalena, Elizalde, Benjamin, Tampere University, and Computing Sciences
Subjects: Audio and Speech Processing (eess.AS), 213 Electronic, automation and communications engineering, electronics, FOS: Electrical engineering, electronic engineering, information engineering, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This paper presents the details of Task 1A Acoustic Scene Classification in the DCASE 2021 Challenge. The task targeted development of low-complexity solutions with good generalization properties. The provided baseline system is based on a CNN architecture and post-training quantization of parameters. The system is trained using all the available training data, without any specific technique for handling device mismatch, and obtains an overall accuracy of 47.7%, with a log loss of 1.473. The task received 99 submissions from 30 teams, and most of the submitted systems outperformed the baseline. The most used techniques among the submissions were residual networks and weight quantization, with the top systems reaching over 70% accuracy, and log loss under 0.8. The acoustic scene classification task remained a popular task in the challenge, despite the increasing difficulty of the setup. publishedVersion
Published: 2021

50. Training sound event classifiers using different types of supervision

Author: Fonseca, Eduardo, Serra, Xavier, Font, Frederic, and Universitat Pompeu Fabra. Departament de Tecnologies de la Informació i les Comunicacions
Subjects: Supervisión, Audio dataset, Audio representation learning, Sonido ambiental, Aprendizaje de representaciones de audio, Environmental sound, Supervision, Contrastive learning, Clasificación, Sound event, Classification, Shift invariance, Aprendizaje contrastivo, Self-supervision, Auto-supervisión, Evento sonoro, Creación de datasets, Weak labels, Tagging, Label noise, Data collection, Convolutional neural networks, Ruido de etiquetas
Abstract: The automatic recognition of sound events has gained attention in the past few years, motivated by emerging applications in fields such as healthcare, smart homes, or urban planning. When the work for this thesis started, research on sound event classification was mainly focused on supervised learning using small datasets, often carefully annotated with vocabularies limited to specific domains (e.g., urban or domestic). However, such small datasets do not support training classifiers able to recognize hundreds of sound events occurring in our everyday environment, such as kettle whistles, bird tweets, cars passing by, or different types of alarms. At the same time, large amounts of environmental sound data are hosted in websites such as Freesound or YouTube, which can be convenient for training large-vocabulary classifiers, particularly using data-hungry deep learning approaches. To advance the state-of-the-art in sound event classification, this thesis investigates several strands of dataset creation as well as supervised and unsupervised learning to train large-vocabulary sound event classifiers, using different types of supervision in novel and alternative ways. Specifically, we focus on supervised learning using clean and noisy labels, as well as self-supervised representation learning from unlabeled data. The first part of this thesis focuses on the creation of FSD50K, a large-vocabulary dataset with over 100h of audio manually labeled using 200 classes of sound events. We provide a detailed description of the creation process and a comprehensive characterization of the dataset. In addition, we explore architectural modifications to increase shift invariance in CNNs, improving robustness to time/frequency shifts in input spectrograms. In the second part, we focus on training sound event classifiers using noisy labels. First, we propose a dataset that supports the investigation of real label noise. Then, we explore network-agnostic approaches to mitigate the effect of label noise during training, including regularization techniques, noise-robust loss functions, and strategies to reject noisy labeled examples. Further, we develop a teacher-student framework to address the problem of missing labels in sound event datasets. In the third part, we propose algorithms to learn audio representations from unlabeled data. In particular, we develop self-supervised contrastive learning frameworks, where representations are learned by comparing pairs of examples computed via data augmentation and automatic sound separation methods. Finally, we report on the organization of two DCASE Challenge Tasks on automatic audio tagging with noisy labels. By providing data resources as well as state-of-the-art approaches and audio representations, this thesis contributes to the advancement of open sound event research, and to the transition from traditional supervised learning using clean labels to other learning strategies less dependent on costly annotation efforts. El interés en el reconocimiento automático de eventos sonoros se ha incrementado en los últimos años, motivado por nuevas aplicaciones en campos como la asistencia médica, smart homes, o urbanismo. Al comienzo de esta tesis, la investigación en clasificación de eventos sonoros se centraba principalmente en aprendizaje supervisado usando datasets pequeños, a menudo anotados cuidadosamente con vocabularios limitados a dominios específicos (como el urbano o el doméstico). Sin embargo, tales datasets no permiten entrenar clasificadores capaces de reconocer los cientos de eventos sonoros que ocurren en nuestro entorno, como silbidos de kettle, sonidos de pájaros, coches pasando, o diferentes alarmas. Al mismo tiempo, websites como Freesound o YouTube albergan grandes cantidades de datos de sonido ambiental, que pueden ser útiles para entrenar clasificadores con un vocabulario más extenso, particularmente utilizando métodos de deep learning que requieren gran cantidad de datos. Para avanzar el estado del arte en la clasificación de eventos sonoros, esta tesis investiga varios aspectos de la creación de datasets, así como de aprendizaje supervisado y no supervisado para entrenar clasificadores de eventos sonoros con un vocabulario extenso, utilizando diferentes tipos de supervisión de manera novedosa y alternativa. En concreto, nos centramos en aprendizaje supervisado usando etiquetas sin ruido y con ruido, así como en aprendizaje de representaciones auto-supervisado a partir de datos no etiquetados. La primera parte de esta tesis se centra en la creación de FSD50K, un dataset con más de 100h de audio etiquetado manualmente usando 200 clases de eventos sonoros. Presentamos una descripción detallada del proceso de creación y una caracterización exhaustiva del dataset. Además, exploramos modificaciones arquitectónicas para aumentar la invariancia frente a desplazamientos en CNNs, mejorando la robustez frente a desplazamientos de tiempo/frecuencia en los espectrogramas de entrada. En la segunda parte, nos centramos en entrenar clasificadores de eventos sonoros usando etiquetas con ruido. Primero, proponemos un dataset que permite la investigación del ruido de etiquetas real. Después, exploramos métodos agnósticos a la arquitectura de red para mitigar el efecto del ruido en las etiquetas durante el entrenamiento, incluyendo técnicas de regularización, funciones de coste robustas al ruido, y estrategias para rechazar ejemplos etiquetados con ruido. Además, desarrollamos un método teacher-student para abordar el problema de las etiquetas ausentes en datasets de eventos sonoros. En la tercera parte, proponemos algoritmos para aprender representaciones de audio a partir de datos sin etiquetar. En particular, desarrollamos métodos de aprendizaje contrastivos auto-supervisados, donde las representaciones se aprenden comparando pares de ejemplos calculados a través de métodos de aumento de datos y separación automática de sonido. Finalmente, reportamos sobre la organización de dos DCASE Challenge Tasks para el tageado automático de audio a partir de etiquetas ruidosas. Mediante la propuesta de datasets, así como de métodos de vanguardia y representaciones de audio, esta tesis contribuye al avance de la investigación abierta sobre eventos sonoros y a la transición del aprendizaje supervisado tradicional utilizando etiquetas sin ruido a otras estrategias de aprendizaje menos dependientes de costosos esfuerzos de anotación.
Published: 2021

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

94 results on '"Font, Frederic"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources