Author: "Yoshiaki, Itoh" / Publisher: ieee - Searchworks@Jio Institute Digital Library Search Results

1. Frame-Level Matching Scheme Using Posteriorgram Probability Distance of Spoken Data to Improve Search Accuracy of Spoken Term Detection

Author: Reo Minakawa, Kazunori Kojima, Shi-wook Lee, and Yoshiaki Itoh
Published: 2022
Full Text: View/download PDF

2. A Rescoring Method Using Web Search and Word Vectors for Spoken Term Detection

Author: Yoshiaki Itoh, Kazunori Kojima, Hiroaki Nanjo, Haruka Tanji, and Shi-wook Lee
Subjects: Computer science, business.industry, Feature extraction, Text recognition, computer.software_genre, Expression (mathematics), Term (time), Degree of similarity, Word2vec, Artificial intelligence, Hidden Markov model, business, computer, Natural language processing, Word (computer architecture)
Abstract: We propose a rescoring method using words related to a query obtained by Web search and word vectors for spoken term detection (STD). In this paper, we assume that words associated with the topic in speech data and co-occurring with the query are called “words related to the query”, and that the related words appear multiple times in the speech data. To identify the words related to the query, we introduce distributed expression of words obtained by Word2vec [1] [2], and first convert each word in the word-recognition results of speech data into a word vector. Each word vector is then compared with a word vector of the query. Words related to the query are determined by calculating the degree of similarity between the two word vectors. However, a word vector of an out-of-vocabulary (OOV) query cannot be obtained in this manner, since OOV queries do not appear in word-recognition results. For such OOV queries, we perform a Web search using the query, whereupon texts including the query are extracted. Recognition results of the speech data and the extracted texts are then combined and used for training of Word2vec. In this manner, a word vector of the OOV query can be obtained. Distances to all candidates in the document, including words related to the query, are used advantageously. Experiments are conducted to evaluate the performance of the proposed method using open test collections of the NTCIR-10[3] and NTCIR-12[4] workshops. For retrieval accuracy, an improvement of 3.2 points in mean average precision was achieved using the proposed method.
Published: 2019
Full Text: View/download PDF

3. Acceleration for query-by-example using posteriorgram of deep neural network

Author: Munehiro Moriya, Ryota Konno, Kazunori Kojima, Yoshiaki Itoh, Kazuyo Tanaka, Masato Obara, and Shi-wook Lee
Subjects: Matching (statistics), Artificial neural network, Computer science, Speech recognition, 020206 networking & telecommunications, 02 engineering and technology, Term (time), Matrix (mathematics), Memory management, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Query by Example, Hidden Markov model, Bitwise operation, computer, computer.programming_language
Abstract: Much research has been conducted on spoken term detection. Query-by-example has also been an important topic in spoken term detection, which covers spoken queries. A previous study examined posteriorgrams, which are sequences of output probabilities generated by a deep neural network from speech queries and speech data. Although posteriorgram matching between a spoken query and speech data exhibits improved retrieval accuracy, the time required to search with a spoken query is long, even for a relatively small quantity of speech data. Reducing retrieval time is thus a crucial problem. In this paper, we propose two methods for reducing retrieval time in posteriorgram matching. One is to accelerate posteriorgram matching by transforming a posteriorgram to a bit matrix, and the other is to use a sparse vector method. The first method, "posteriorgram bit operation," transforms the posteriorgrams of both spoken queries and speech data to bit matrices. The second method retains a small number of elements, of which only a few are of high probability, in a posteriorgram. Because most of the elements in a sparse vector are 0, the thousands of output probabilities of the posteriorgram are reduced to only a small number of output probabilities. Evaluation experiments have been carried out using open test collections (Spoken-Doc tasks of NTCIR-10 workshops) [1,2], and the results have demonstrated the effectiveness of the proposed method.
Published: 2017
Full Text: View/download PDF

4. Rescoring by a deep neural network for spoken term detection

Author: Kazuyo Tanaka, Kazunori Kojima, Yoshiaki Itoh, Shi-wook Lee, and Ryota Konno
Subjects: Matching (statistics), Sequence, Artificial neural network, Computer science, business.industry, Speech recognition, Frame (networking), Posterior probability, Pattern recognition, Term (time), Artificial intelligence, State (computer science), Hidden Markov model, business
Abstract: In spoken-term detection (STD), the detection of out-of-vocabulary (OOV) query terms is crucial because query terms are likely to be OOV terms. This paper proposes a rescoring method that uses the posterior probabilities output by a deep neural network (DNN) to improve detection accuracy for OOV query terms. Conventional STD methods for OOV query terms search a query subword sequence for subword sequences of speech data by using an automatic speech recognizer. A detailed matching in the proposed method is performed by using the probabilities output by the DNN. A pseudo query at the frame or state level is generated so as to align the obtained probability at the frame level. To reduce the computational burden on the DNN, we apply the proposed method to only top candidate utterances, which can be quickly found by a conventional STD method. Experiments were conducted to evaluate the performance of the proposed method, using the open test collections for the SpokenDoc tasks of the NTCIR-9 and NTCIR-10 workshops as benchmarks. The proposed method improved the mean average precision between 5 and 20 points, surpassing the best accuracy obtained at the workshops. These results demonstrated the effectiveness of the proposed method.
Published: 2015
Full Text: View/download PDF

5. Effective combination of heterogeneous subword-based spoken term detection systems

Author: Kazuyo Tanaka, Shi-wook Lee, and Yoshiaki Itoh
Subjects: Correlation, System combination, business.industry, Computer science, Speech recognition, Keyword spotting, Pattern recognition, Artificial intelligence, business, Task (project management), Term (time)
Abstract: Combining heterogeneous systems has been shown to provide significant improvement in the spoken term detection (STD) task. However, there has been little research into why the system combination improves STD performance. In this paper, we analyze the heterogeneousness of the systems by calculating the correlation between their scores and evaluating the effectiveness of the combined subword-based systems. Here, we investigate both heterogeneous detection schemes and heterogeneous subword units, using a test-bed of NTCIR-10 task. Experimental analysis shows that the higher improvement rates can be achieved by combining the more heterogeneous systems which are with lower correlation each other, that is, with lager amount of complementary information. Compared with the highest performance among each individual system to be combined, a parallel combination of heterogeneous subword units improves the STD performance by 13.59%, and the system with an efficient cascaded combination of heterogeneous subword units and heterogeneous detection schemes improves by 12.79%. Finally, the state-of-the-art performance of 74.07 average maximum F-measure on the NTCIR-10 task can be achieved by the combination of heterogeneous subword units and heterogeneous detection schemes.
Published: 2014
Full Text: View/download PDF

6. High priority in highly ranked documents in spoken term detection

Author: Masaaki Ishigame, Kazuyo Tanaka, Shi-wook Lee, Kazuma Konno, Yoshiaki Itoh, and Kazunori Kojima
Subjects: Matching (statistics), Vocabulary, Query expansion, Information retrieval, Document handling, Ranking, Computer science, media_common.quotation_subject, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Speech processing, Term (time), media_common
Abstract: In spoken term detection, the retrieval of OOV (Out-Of-Vocabulary) query terms are very important because query terms are likely to be OOV terms. To improve the retrieval performance for OOV query terms, the paper proposes a re-scoring method after determining the candidate segments. Each candidate segment has a matching score and a segment number. Because highly ranked candidate is usually reliable and a user is assumed to select query terms so that they are the special terms for the target documents and they appear frequently in the target documents, we give a high priority to the candidate segments that are included in highly ranked documents by adjusting the matching score. We conducted the performance evaluation experiments for the proposed method using open test collections for SpokenDoc-2 in NTCIR-10. Results showed the retrieval performance was more than 7.0 points improved by the proposed method for two test sets in the test collections, and demonstrated the effectiveness of the proposed method.
Published: 2013
Full Text: View/download PDF

7. Time-space acoustical feature for fast video copy detection

Author: Yoshiaki Itoh, Masaaki Ishigame, Masahiro Erokuumae, Kazuyo Tanaka, and Kazunori Kojima
Subjects: business.industry, Video capture, Computer science, Video copy detection, Video processing, computer.file_format, Smacker video, Video compression picture types, Video tracking, Computer vision, Artificial intelligence, Multiview Video Coding, business, computer, Block-matching algorithm
Abstract: We propose a new time-space acoustical feature for fast video copy detection to search a video segment for a number of video streams to find illegal video copies on Internet video site and so on. We extract a small number of feature vectors from acoustically peculiar points that express the point of local maximum/minimum in the time sequence of acoustical power envelopes in video data. The relative values of the feature points are extracted, so called time-space acoustical feature, because the volume in the video stream differs in different recording environments. The features can be obtained quickly compared with representative features such as MFCC, and they require a short processing time for matching because the number and the dimension of each feature vector are both small. The accuracy and the computation time of the proposed method is evaluated using recorded TV movie programs for input data, and a 30 sec. −3 min. segment in DVD for reference data, assuming a copyright holder of a movie searches the illegal copies for video streams. We could confirm that the proposed method completed all processes within the computation time of the former feature extraction with 93.2% of F-measure in 3 minutes video segment detection.
Published: 2010
Full Text: View/download PDF

8. Open vocabulary spoken document retrieval by subword sequence obtained from speech recognizer

Author: G. Kuriki, Shi-wook Lee, Yoshiaki Itoh, Kazunori Kojima, Masaaki Ishigame, and Kazuyo Tanaka
Subjects: Sequence, Vocabulary, Computer science, business.industry, media_common.quotation_subject, Speech recognition, ComputerSystemsOrganization_PROCESSORARCHITECTURES, Vocabulary speech recognition, computer.software_genre, Out of vocabulary, Phone, Artificial intelligence, Document retrieval, business, computer, Natural language processing, Word (computer architecture), media_common
Abstract: We present a method for open vocabulary retrieval based on a spoken document retrieval (SDR) system using subword models. The present paper proposes a new approach to open vocabulary SDR system using subword models which do not require subword recognition. Instead, subword sequences are obtained from the phone sequence outputted containing an out of vocabulary (OOV) word, a speech recognizer outputs a word sequence whose phone sequence is considered to be similar to the OOV word. When OOV words are provided in a query, the proposed system is able to retrieve the target section by comparing the phone sequences of the query and the word sequence generated by the speech recognizer.
Published: 2008
Full Text: View/download PDF

9. Highlight scene extraction of sports broadcasts using sports news programs

Author: Masaaki Ishigame, Kazunori Kojima, Yoshiaki Itoh, and S. Sakaki
Subjects: Cover (telecommunications), Multimedia, Computer science, ComputerApplications_MISCELLANEOUS, Broadcast data, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, computer.software_genre, computer, Visualization
Abstract: This paper proposes a new approach for extracting highlight scenes from sports broadcasts by using sports news programs. In order to extract the highlight scenes from sports broadcasts without fail, we use sports news programs and identify identical or similar sections between sports broadcasts and sports news programs that cover the sports broadcasts. To extract identical or similar sections between two video data sets efficiently, we developed a two-step method that combines relay-CDP and active-search. We evaluated this method from the standpoint of the extraction accuracy of the highlight scenes, and computation time, through experiments using actual broadcast data sets.
Published: 2008
Full Text: View/download PDF

10. Combining Multiple Subword Representations for Open-Vocabulary Spoken Document Retrieval

Author: Yoshiaki Itoh, Kazuyo Tanaka, and Shi-wook Lee
Subjects: Vocabulary, Computer science, business.industry, Speech recognition, media_common.quotation_subject, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, ComputerSystemsOrganization_PROCESSORARCHITECTURES, computer.software_genre, Artificial intelligence, Document retrieval, business, computer, Natural language processing, media_common
Abstract: The paper describes subword-based approaches for open-vocabulary spoken document retrieval. First, the feasibility of subword units in spoken document retrieval is investigated, and our previously proposed sub-phonetic segment units are compared to typical subword units, such as syllables, phonemes, and triphones. Next, we explore the linear combination of retrieval score from multiple subword representations to improve retrieval performance. Experimental evaluation of open-vocabulary spoken document retrieval tasks demonstrates that our proposed sub-phonetic segment units are more effective than typical subword units, and the linear combination of multiple subword representations resulted in a consistent improvement in the F-measure.
Published: 2006
Full Text: View/download PDF

11. Speech data retrieval system constructed on a universal phonetic code domain

Author: H. Kojima, K. Tanaka, Yoshiaki Itoh, and N. Fujimura
Subjects: Voice activity detection, Computer science, business.industry, Speech recognition, Speech coding, Speech synthesis, Speech corpus, computer.software_genre, Speech processing, International Phonetic Alphabet, Speech analytics, Artificial intelligence, business, computer, Natural language, Natural language processing
Abstract: We propose a novel speech processing framework, where all of the speech data are encoded into universal phonetic code (UPC) sequences and speech processing systems, such as speech recognition, retrieval, digesting, etc., are constructed on this UPC domain. As the first step, we introduce a sub-phonetic segment (SPS) set, based on IPA (international phonetic alphabet), to deal with multilingual speech and develop a procedure to estimate acoustic models of the SPS from IPA-like phone models. The key point of the framework is to employ environment adaptation into the SPS encoding stage. This makes it possible to normalize acoustic variations and extract the language factor contained in speech signals as encoded SPS sequences. We confirm these characteristics by constructing a speech retrieval system on the SPS domain. The system can retrieve key phrases, given by speech, from different environment speech data in a vocabulary-free condition. We show several preliminary experimental results on this system, using Japanese and English sentence speech sets.
Published: 2005
Full Text: View/download PDF

12. A proposal of novel information architecture-open cooperative work space

Author: T. Endo, Yoshiaki Itoh, S. Nagaya, Ryuichi Oka, and Jiro Kiyama
Subjects: Collaborative software, Interface (Java), business.industry, Human–computer interaction, Computer science, Information architecture, Computer-supported cooperative work, Space (commercial competition), Architecture, business, Cooperative work, Information integration
Abstract: We propose a novel information integration architecture for a man-machine interface and a new CSCW system called open cooperative work space (OCoWS) for its implementation. The architecture gives system users the advantage of a cooperative work space, in the style of an ordinary meeting by means of human sensing techniques as the system input interface and the integration of their results in several levels. As a result, users can dedicate themselves to mutual understanding among meeting members with no awareness of direction to the CSCW system.
Published: 2002
Full Text: View/download PDF

13. A proposal for a new algorithm of reference interval-free continuous DP for real-time speech or text retrieval

Author: Ryuichi Oka, Jiro Kiyama, S. Seki, Yoshiaki Itoh, and H. Kojima
Subjects: Range (mathematics), Identification (information), Computer science, Speech recognition, Pattern recognition (psychology), Mobile robot, Interval (mathematics), Spotting, Algorithm, Word (computer architecture), Utterance
Abstract: This paper proposes a new frame-synchronous algorithm for spotting similar intervals by comparing arbitrary intervals in a reference pattern sequence with arbitrary intervals in an input pattern sequence. The algorithm is called Reference Interval-Free Continuous DP (RIFCDP) and the experimental results show that RIFCDP is successful in detecting the similar intervals between a reference pattern and an input. We have applied this algorithm to speech retrieval from a speech database and showed the possibility of real-time speech/text retrieval. The proposed algorithm can offer a wide range of applications such as digesting of continuous speech by checking the duplication of input data (same word utterance), and location identification of a mobile robot.
Published: 2002
Full Text: View/download PDF

14. Sentence spotting applied to partial sentences and unknown words

Author: Ryuichi Oka, Jiro Kiyama, and Yoshiaki Itoh
Subjects: Sequence, Computer science, business.industry, Speech recognition, Meaning (non-linguistic), Artificial intelligence, Spotting, computer.software_genre, business, computer, Natural language processing, Sentence, Word (computer architecture)
Abstract: Itoh et al. (1993) developed a system that uses vector continuous dynamic programming (VCDP). This system works well for sentence spotting in spontaneous speech. Partial sentences intended to convey complete meaning, as well as words that are unknown to the system appear quite often in spontaneous speech. The present authors have extended the sentence spotting algorithm such that it is now capable of accepting partial sentences and detecting unknown words. The paper proposes a method for spotting partial sentences by making networks to represent them and to cope with unknown words by detecting the section containing the unknown word and generating the appropriate demi-phoneme sequence. The results show that the same level of performance can be attained with spotting partial sentences as well as complete sentences. Also, unknown words can be detected well. >
Published: 2002
Full Text: View/download PDF

15. A matching algorithm between arbitrary sections of two speech data sets for speech retrieval

Author: Yoshiaki Itoh
Subjects: Computer science, Speech recognition, Speech coding, Constant (mathematics), Image retrieval, Natural language, Blossom algorithm
Abstract: Proposes a matching algorithm to retrieve speech information from a speech database by speech query that allows continuous input. The algorithm is called shift continuous DP (CDP). Shift CDP extracts similar sections between two speech data sets. Two speech data sets are considered as reference patterns that are regarded as a speech database and input speech respectively. Shift CDP applies CDP to a constant length of unit reference patterns and provides a fast match between arbitrary sections in the reference pattern and the input speech. The algorithm allows endless input and real-time responses for the input speech query. Experiments were conducted for conversational speech and the results showed shift CDP was successful in detecting similar sections between arbitrary sections of the reference speech and arbitrary sections of the input speech. This method can be applied to all kinds of time sequence data such as moving images.
Published: 2002
Full Text: View/download PDF

16. Modeling of sequential control system with cyclic scan by Petri net

Author: Iko Miyazawa, Yoshiaki Itoh, and Takashi Sekiguchi
Subjects: Sequential control, Programming language, Computer science, Ladder logic, Programmable logic controller, Electrical and Electronic Engineering, Arithmetic, Iec standards, Petri net, computer.software_genre, computer, Industrial and Manufacturing Engineering
Abstract: We propose the modeling method of a ladder diagram (LD) with the cyclic scan of a programmable controller (PC) by Petri net (PN). First, some experiments are conducted to examine the relationship between the cyclic scan and the behavior of LDs. Second, PNs are used for the qualitative modeling of the behavior of LDs based on the IEC61131-3 PC programming languages standard. Especially, in addition to LD Boolean logic expressions previously modeled by PNs, we consider the PC control components, such as cyclic scan, peripherals, memories, scanning ladder rungs. Moreover, the experimental results are qualitatively explained by the PNs.
Published: 2002
Full Text: View/download PDF

17. Speech labeling and the most frequent phrase extraction using same section in a presentation speech

Author: Kazuyo Tanaka and Yoshiaki Itoh
Subjects: Phrase, Voice activity detection, Matching (graph theory), Computer science, business.industry, Speech recognition, media_common.quotation_subject, Section (typography), Feature extraction, Phrase extraction, computer.software_genre, Presentation, Artificial intelligence, business, computer, Natural language processing, media_common
Abstract: This paper discusses the possibility of speech labeling by utilizing same sections, such as the same words or same phrases that are repeated in a speech. The same sections are checked and detected in a presentation speech. For this purpose, a new efficient algorithm has been proposed, called Shift Continuous DP, because it is an extension of Continuous DP (CDP). Shift CDP realizes fast matching between arbitrary sections in the reference pattern and the input speech and enables extracting similar sections frame-synchronously. This algorithm is extended and applied to extract the repeated sections in a presentation speech and to identify the most frequent phrase in the talk. Experiments were conducted for presentation speech and the results showed Shift CDP was successful in detecting similar sections and identifying the most frequent phrase in the presentation.
Published: 2002
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

17 results on '"Yoshiaki, Itoh"'

1. Frame-Level Matching Scheme Using Posteriorgram Probability Distance of Spoken Data to Improve Search Accuracy of Spoken Term Detection

2. A Rescoring Method Using Web Search and Word Vectors for Spoken Term Detection

3. Acceleration for query-by-example using posteriorgram of deep neural network

4. Rescoring by a deep neural network for spoken term detection

5. Effective combination of heterogeneous subword-based spoken term detection systems

6. High priority in highly ranked documents in spoken term detection

7. Time-space acoustical feature for fast video copy detection

8. Open vocabulary spoken document retrieval by subword sequence obtained from speech recognizer

9. Highlight scene extraction of sports broadcasts using sports news programs

10. Combining Multiple Subword Representations for Open-Vocabulary Spoken Document Retrieval

11. Speech data retrieval system constructed on a universal phonetic code domain

12. A proposal of novel information architecture-open cooperative work space

13. A proposal for a new algorithm of reference interval-free continuous DP for real-time speech or text retrieval

14. Sentence spotting applied to partial sentences and unknown words

15. A matching algorithm between arbitrary sections of two speech data sets for speech retrieval

16. Modeling of sequential control system with cyclic scan by Petri net

17. Speech labeling and the most frequent phrase extraction using same section in a presentation speech

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

17 results on '"Yoshiaki, Itoh"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources