24 results on '"audio search"'
Search Results
2. Evaluation Measures for Audio Search
- Author
-
Mary, Leena, G, Deekshitha, Neustein, Amy, Series Editor, Mary, Leena, and G, Deekshitha
- Published
- 2019
- Full Text
- View/download PDF
3. Features, Representations, and Matching Techniques for Audio Search
- Author
-
Mary, Leena, G, Deekshitha, Neustein, Amy, Series Editor, Mary, Leena, and G, Deekshitha
- Published
- 2019
- Full Text
- View/download PDF
4. Audio Search Techniques
- Author
-
Mary, Leena, G, Deekshitha, Neustein, Amy, Series Editor, Mary, Leena, and G, Deekshitha
- Published
- 2019
- Full Text
- View/download PDF
5. Modelling the Microphone-Related Timbral Brightness of Recorded Signals.
- Author
-
Pearce, Andy, Brookes, Tim, and Mason, Russell
- Subjects
MICROPHONES ,METADATA ,DATABASE searching ,AUDITORY perception ,PREDICTION models ,INFORMATION retrieval - Abstract
Brightness is one of the most common timbral descriptors used for searching audio databases, and is also the timbral attribute of recorded sound that is most affected by microphone choice, making a brightness prediction model desirable for automatic metadata generation. A model, sensitive to microphone-related as well as source-related brightness, was developed based on a novel combination of the spectral centroid and the ratio of the total magnitude of the signal above 500 Hz to that of the full signal. This model performed well on training data (r = 0.922). Validating it on new data showed a slight gradient error but good linear correlation across source types and overall (r = 0.955). On both training and validation data, the new model out-performed metrics previously used for brightness prediction. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
6. Modelling the Microphone-Related Timbral Brightness of Recorded Signals
- Author
-
Andy Pearce, Tim Brookes, and Russell Mason
- Subjects
audio characterisation ,audio indexing ,audio search ,auditory perception ,machine learning ,music information retrieval ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
Brightness is one of the most common timbral descriptors used for searching audio databases, and is also the timbral attribute of recorded sound that is most affected by microphone choice, making a brightness prediction model desirable for automatic metadata generation. A model, sensitive to microphone-related as well as source-related brightness, was developed based on a novel combination of the spectral centroid and the ratio of the total magnitude of the signal above 500 Hz to that of the full signal. This model performed well on training data (r = 0.922). Validating it on new data showed a slight gradient error but good linear correlation across source types and overall (r = 0.955). On both training and validation data, the new model out-performed metrics previously used for brightness prediction.
- Published
- 2021
- Full Text
- View/download PDF
7. Vocal Tract Length Normalization Features for Audio Search
- Author
-
Madhavi, Maulik C., Sharma, Shubham, Patil, Hemant A., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Král, Pavel, editor, and Matoušek, Václav, editor
- Published
- 2015
- Full Text
- View/download PDF
8. Voice Technology to Enable Sophisticated Access to Historical Audio Archive of the Czech Radio
- Author
-
Nouza, Jan, Blavka, Karel, Bohac, Marek, Cerva, Petr, Zdansky, Jindrich, Silovsky, Jan, Prazak, Jan, Grana, Costantino, editor, and Cucchiara, Rita, editor
- Published
- 2012
- Full Text
- View/download PDF
9. Discretion of Speech Units for the Text Post-processing Phase of Automatic Transcription (in the Czech Language)
- Author
-
Škodová, Svatava, Kuchařová, Michaela, Šeps, Ladislav, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Sojka, Petr, editor, Horák, Aleš, editor, Kopeček, Ivan, editor, and Pala, Karel, editor
- Published
- 2012
- Full Text
- View/download PDF
10. Olaf: a lightweight, portable audio search system
- Author
-
Six, Joren
- Subjects
Music Information Retrieval ,Technology and Engineering ,Audio search ,Acoustic Fingerprinting - Abstract
Olaf stands for Overly Lightweight Acoustic Fingerprinting and solves the problem of finding short audio fragments in large digital audio archives. The content-based audio search algorithm implemented in Olaf can identify a short audio query in a large database of thousands of hours of audio using an acoustic fingerprinting technique.
- Published
- 2023
11. 'The Truth is Out There': Using Advanced Speech Analytics to Learn Why Customers Call Help-line Desks and How Effectively They Are Being Served by the Call Center Agent
- Author
-
Gavalda, Marsal, Schlueter, Jeff, and Neustein, Amy, editor
- Published
- 2010
- Full Text
- View/download PDF
12. An efficient search method for the content-based identification of telephone-SPAM.
- Author
-
Strobl, Julian, Mainka, Bernhard, Grutzek, Gary, and Knospe, Heiko
- Abstract
With the help of VoIP technology, large numbers of unsolicited calls can be conveniently placed and SPAM over Internet Telephony may become a major nuisance and threat. Various mitigation methods have been proposed which are mostly based on a pattern analysis of the signaling traffic. This contribution shows that an analysis of the audio content is also feasible and can provide protection against replayed calls. In order to identify similar or equal audio data, spectral features are extracted and a short and robust audio fingerprint is computed. The definition of the fingerprint is optimized for a fast index-based search. Then, the matching of telephone speech data is based on the intersection of inverted files of audio fingerprints. Furthermore, the system design of a working prototype is explained and experimental results on the recognition rate and the performance of the system are presented. It can be shown that the search method is suitable for an efficient identification of SPAM calls. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
13. The Spoken Web Search Task at MediaEval 2011.
- Author
-
Metze, Florian, Rajput, Nitendra, Anguera, Xavier, Davel, Marelie, Gravier, Guillaume, van Heerden, Charl, Mantena, Gautam V., Muscariello, Armando, Prahallad, Kishore, Szoke, Igor, and Tejedor, Javier
- Abstract
In this paper, we describe the “Spoken Web Search” Task, which was held as part of the 2011 MediaEval benchmark campaign. The purpose of this task was to perform audio search with audio input in four languages, with very few resources being available in each language. The data was taken from “spoken web” material collected over mobile phone connections by IBM India. We present results from several independent systems, developed by five teams and using different approaches, compare them, and provide analysis and directions for future research. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
14. BABAZ: A large scale audio search system for video copy detection.
- Author
-
Jegou, Herve, Delhumeau, Jonathan, Yuan, Jiangbo, Gravier, Guillaume, and Gros, Patrick
- Abstract
This paper presents BABAZ, an audio search system to search modified segments in large databases of music or video tracks. It is based on an efficient audio feature matching system which exploits the reciprocal nearest neighbors to produce a per-match similarity score. Temporal consistency is taken into account based on the audio matches, and boundary estimation allows the precise localization of the matching segments. The method is mainly intended for video retrieval based on their audio track, as typically evaluated in the copy detection task of TRECVID evaluation campaigns. The evaluation conducted on music retrieval shows that our system is comparable to a reference audio fingerprinting system for music retrieval, and significantly outperforms it on audio-based video retrieval, as shown by our experiments conducted on the dataset used in the copy detection task of TRECVID'2010 campaign. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
15. ˵The Truth is Out There″: Using Advanced Speech Analytics to Learn Why Customers Call Help-line Desks and How Effectively They Are Being Served by the Call Center Agent.
- Author
-
Gavalda, Marsal and Schlueter, Jeff
- Abstract
In this chapter, we describe our novel work in phonetic-based -indexing and search, which is designed for extremely fast searching through vast amounts of media. This method makes it possible to search for words, phrases, jargon, slang, and other -terminology that are not readily found in a speech-to-text -dictionary. The most advanced phonetic-based speech analytics solutions, such as ours, are those that are robust to noisy channel conditions and dialectal variations; those that can extract information beyond words and phrases; and those that do not require the creation or maintenance of lexicons or language models. Such well-performing speech analytic programs offer unprecedented levels of accuracy, scale, ease of deployment, and an overall effectiveness in the mining of live and recorded calls. Given that speech -analytics has become sine qua non to understanding how to achieve a high rate of -customer satisfaction and cost containment, we demonstrate in this chapter how our data mining technology is used to produce sophisticated analyses and reports (including visualizations of call -category trends and -correlations or -statistical metrics), while preserving the ability at any time to drill down to individual calls and listen to the -specific evidence that -supports the -particular -categorization or data point in question, all of which allows for a deep and -fact-based understanding of contact center dynamics. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
16. Making Czech Historical Radio Archive Accessible and Searchable for Wide Public.
- Author
-
Nouza, Jan, Blavka, Karel, Cerva, Petr, Zdansky, Jindrich, Silovsky, Jan, Bohac, Marek, and Prazak, Jan
- Subjects
ACCESS to archives ,TRANSCRIPTION (Linguistics) ,DIGITIZATION of archival materials ,RADIO (Medium) ,SOUND recordings ,PUBLIC broadcasting ,INFORMATION retrieval - Abstract
In this paper we describe a complex software platform that is being developed for the automatic transcription and indexation of the Czech Radio archive of spoken documents. The archive contains more than 100.000 hours of audio recordings covering almost ninety years of public broadcasting in the Czech Republic and former Czechoslovakia. The platform is based on modern speech processing technology and includes modules for speech, speaker and language recognition, and tools for multimodal information retrieval. The aim of the project supported by the Czech Ministry of Culture is to make the archive accessible and searchable both for researchers as well as for wide public. After the first project's year, the key modules have been already implemented and tested on a 27.400-hour subset of the archive. A web-based full-text search engine allows for the demonstration of the project's current state. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
17. A Cyclic Interface for the Presentation of Multiple Music Files.
- Author
-
Ali, S. and Aarabi, P.
- Abstract
This paper proposes a novel cyclic interface for browsing through a song database. The method, which sums multiple audio streams on a server and broadcasts only a single summed stream, allows the user to hear different parts of each audio stream by cycling through all available streams. Songs are summed into a single stream based on a combination of spectral entropy and local power of each song's waveform. Perceptual parameters of the system are determined based on experiments conducted on 20 users, for three, four, and five songs. Results illustrate that the proposed methodology requires less listening time as compared to traditional list-based interfaces when the desired audio clip is among one of the audio streams. Applications of this methodology include any search system which returns multiple audio search results, including music query by example. The proposed methodology can be used for real-time searching with an ordinary internet browser. [ABSTRACT FROM PUBLISHER]
- Published
- 2008
- Full Text
- View/download PDF
18. Generalized Time-Series Active Search With Kullback-Leibler Distance for Audio Fingerprinting.
- Author
-
Hui Lin, Zhijian Ou, and Xi Xiao
- Subjects
MP3 (Audio coding standard) ,TIME series analysis ,MATHEMATICAL statistics ,PROBABILITY theory ,DIGITAL audio standards - Abstract
In this letter, a new audio fingerprinting approach is presented. We investigate to improve robustness by more precise statistical fingerprint modeling with common component Gaussian mixture models (CCGMMs) and Kullback-Leibler (KL) distance, which is more suitable to measure the dissimilarity between two probabilistic models. To address the resulting complexity, generalized time-series active search is proposed, which supports a wide variety of distance measures between two CCGMMs, including L1, L
2 , KL. etc. Experiments show that the new approach with KL distance increases robustness to distortions (including low-quality MP3 compression, small room echo, and play-and-record) while achieving efficient search. [ABSTRACT FROM AUTHOR]- Published
- 2006
- Full Text
- View/download PDF
19. Zoeken in historisch videomateriaal
- Subjects
Audio search ,HMI-SLT: Speech and Language Technology ,IR-63342 ,EWI-6597 ,Cultural Heritage ,METIS-121914 ,HMI-MR: MULTIMEDIA RETRIEVAL - Abstract
On attaching automatic search functionality to historical video archives
- Published
- 2000
20. The spoken web search task at MediaEval 2011
- Author
-
Armando Muscariello, Javier Tejedor, Florian Metze, Gautam Varma Mantena, Guillaume Gravier, Charl van Heerden, Marelie H. Davel, Xavier Anguera, Igor Szöke, Nitendra Rajput, Kishore Prahallad, Carnegie Mellon University [Pittsburgh] (CMU), IBM Almaden Research Center [San Jose], IBM, Telefonica Investigación y Desarrollo (Telefonica I+D), Telefonica Group, North-West University [Vaal Triangle Campus, Vanderbijlpark, South Africa] (NWU), Multimedia content-based indexing (TEXMEX), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria), International Institute of Information Technology [Hyperabad] (IIIT-H), Speech and sound data modeling and processing (METISS), Speech@FIT, Brno University of Technology [Brno] (BUT), Human Computer Technology Laboratory (HCTLab), Universidad Autonoma de Madrid (UAM), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, and Universidad Autónoma de Madrid (UAM)
- Subjects
FOS: Computer and information sciences ,spoken term detection ,Computer science ,02 engineering and technology ,computer.software_genre ,Task (project management) ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Search engine ,0202 electrical engineering, electronic engineering, information engineering ,89999 Information and Computing Sciences not elsewhere classified ,Hidden Markov model ,spoken web ,evaluation ,business.industry ,low-resource speech recognition ,[INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM] ,020207 software engineering ,Mobile phone ,audio search ,Benchmark (computing) ,The Internet ,Artificial intelligence ,0305 other medical science ,business ,computer ,Natural language processing - Abstract
International audience; In this paper, we describe the "Spoken Web Search" Task, which was held as part of the 2011 MediaEval benchmark campaign. The purpose of this task was to perform audio search with audio input in four languages, with very few resources being available in each language. The data was taken from "spoken web" material collected over mobile phone connections by IBM India. We present results from several independent systems, developed by five teams and using different approaches, compare them, and provide analysis and directions for future research.
- Published
- 2012
21. Two-stream indexing for spoken web search
- Author
-
Nitendra Rajput, Shrey Sahay, Kundan Srivastava, Mayank Shrivastava, Jitendra Ajmera, Sougata Mukherjea, and Anupam Joshi
- Subjects
Audio mining ,mobile phone ,Information retrieval ,Multimedia ,developing regions ,Computer science ,Human Factors ,Search engine indexing ,World Wide Telecom Web ,literacy ,Context (language use) ,computer.software_genre ,Spoken Web ,Noise ,Index (publishing) ,audio search ,Web navigation ,UMBC Ebiquity Research Group ,Web service ,computer ,ComputingMilieux_MISCELLANEOUS ,Algorithms ,Experimentation - Abstract
Proceedings of the 20th international conference companion on world wide web, This paper presents two-stream processing of audio to index the audio content for Spoken Web search. The first stream indexes the meta-data associated with a particular audio document. The meta-data is usually very sparse, but accurate. This therefore results in a high-precision, low-recall index. The second stream uses a novel language-independent speech recognition to generate text to be indexed. Owing to the multiple languages and the noise in user generated content on the Spoken Web, the speech recognition accuracy of such systems is not high, thus they result in a low-precision, high-recall index. The paper attempts to use these two complementary streams to generate a combined index to increase the precision-recall performance in audio content search. The problem of audio content search is motivated by the real world implication of the Web in developing regions, where due to literacy and affordability issues, people use Spoken Web which consists of interconnected VoiceSites, which have content in audio. The experiments are based on more than 20,000 audio documents spanning over seven live VoiceSites and four different languages. The results suggest significant improvement over a meta-data-only or a speech-recognitiononly system, thus justifying the two-stream processing approach. Audio content search is a growing problem area and this paper wishes to be a first step to solving this at a large scale, across languages, in a Web context.
- Published
- 2011
- Full Text
- View/download PDF
22. Speech Indexing
- Author
-
Ordelman, Roeland J.F., de Jong, Franciska M.G., van Leeuwen, D.A., Blanken, Henk, de Vries, A.P., Blok, H.E., and Feng, L.
- Subjects
Speech Recognition ,Audio search ,Speech Indexing ,IR-61901 ,Spoken Document Retrieval ,METIS-241883 ,HMI-SLT: Speech and Language Technology ,HMI-MR: MULTIMEDIA RETRIEVAL ,EWI-11008 - Abstract
This chapter will focus on the automatic extraction of information from the speech in multimedia documents. This approach is often referred to as speech indexing and it can be regarded as a subfield of audio indexing that also incorporates for example the analysis of music and sounds. If the objective of the recognition of the words spoken is to support retrieval, one commonly speaks of spoken document retrieval (SDR). If the objective is on the coupling of various media types the term media mining or even cross-media mining is used. Most attention in this chapter will go to SDR. The focus is less on searching (an index of ) a multimedia database, but on enabling multiple views on the data by cross-linking all the available multifaceted information sources in a multimedia database. In section 1.6 cross-media mining will be discussed in more detail.
- Published
- 2007
23. Speech Recognition Issues for Dutch Spoken Document Retrieval
- Author
-
Ordelman, Roeland J.F., van Hessen, Adrianus J., de Jong, Franciska M.G., Matousek, Vaclav, Mautner, Pavel, Moucek, Roman, and Tauser, Karel
- Subjects
Audio search ,Spoken Document Retrieval ,Computer science ,business.industry ,Speech recognition ,computer.software_genre ,HMI-MR: MULTIMEDIA RETRIEVAL ,Speech Recognition ,Work (electrical) ,IR-63372 ,HMI-SLT: Speech and Language Technology ,METIS-205756 ,Language model ,Artificial intelligence ,Document retrieval ,business ,computer ,Natural language processing ,EWI-6683 - Abstract
In this paper, ongoing work on the development of the speech recognition modules of a multimedia retrieval environment for Dutch is described. The work on the generation of acoustic models and language models along with their current performance is presented. Some characteristics of the Dutch language and of the target video archives that require special treatment are discussed.
- Published
- 2001
- Full Text
- View/download PDF
24. Compound decomposition in Dutch large vocabulary speech recognition
- Author
-
Roeland Ordelman, Hessen, A., and Jong, F.
- Subjects
EWI-6705 ,Audio search ,METIS-217551 ,Spoken Document Retrieval ,HMI-SLT: Speech and Language Technology ,IR-63377 ,HMI-MR: MULTIMEDIA RETRIEVAL - Abstract
This paper addresses compound splitting for Dutch in the context of broadcast news transcription. Language models were created using original text versions and text versions that were decomposed using a data-driven compound splitting algorithm. Language model performances were compared in terms of out-of- vocabulary rates and word error rates in a real-world broadcast news transcription task. It was concluded that compound splitting does improve ASR performance. Best results were obtained when frequent compounds were not decomposed.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.