Author: "Wang, Hsin-Min" / Database: Academic Search Index - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Wang, Hsin-Min"' showing total 19 results

Start Over Author "Wang, Hsin-Min" Database Academic Search Index

19 results on '"Wang, Hsin-Min"'

1. Time-Series Linear Search for Video Copies Based on Compact Signature Manipulation and Containment Relation Modeling.

Author: Chiu, Chih-Yi and Wang, Hsin-Min
Subjects: *VIDEO recording piracy, *DIGITAL video editing, *TIME series analysis, *LINEAR statistical models, *HUMAN fingerprints, *COMPUTER algorithms, *DATA mining, *STREAMING video & television
Abstract: This paper presents a novel time-series linear search (TLS) method for detecting video copies. The method utilizes a sliding window to locate window sequences that are near-duplicates of a given query sequence. We address two issues of the conventional TLS method in order to strengthen its video copy detection capability. First, to accelerate the TLS process, we use a sequence-level signature as a compact representation of a video sequence based on the min-hash theory, and develop an efficient heap manipulation technique for fast generation of each window sequence's signature. Second, to improve the robustness of the TLS method, we use two techniques, namely, window length estimation and threshold transform, to resolve the containment relation problem caused by various types of video transformation and editing, such as frame cropping and speed change. The results of experiments on the MUSCLE-VCD-2007 dataset demonstrate that the proposed method is efficient and robust against different types of video transformation and editing. [ABSTRACT FROM PUBLISHER]
Published: 2010
Full Text: View/download PDF

2. Evolutionary minimization of the Rand index for speaker clustering

Author: Tsai, Wei-Ho and Wang, Hsin-Min
Subjects: *SPEECH perception, *CLUSTER analysis (Statistics), *GENETIC algorithms, *BAYESIAN analysis
Abstract: Abstract: We propose an effective method for clustering unknown speech utterances based on their associated speakers. The method jointly optimizes the generated clusters and the required number of clusters by estimating and minimizing the Rand index. The metric reflects the clustering errors that arise when utterances from the same speaker are placed in different clusters; or when utterances from different speakers are placed in the same cluster. One useful characteristic of the Rand index is that its value only reaches the minimum when the number of clusters is equal to the size of the true speaker population. We approximate the Rand index by a function of the similarity measures between utterances and then use a genetic algorithm to determine the cluster in which each utterance should be located, such that the function is minimized. Our experiment results show that this novel speaker-clustering method outperforms conventional methods that use the Bayesian information criterion to determine the required number of clusters. [Copyright &y& Elsevier]
Published: 2009
Full Text: View/download PDF

3. Automatic Identification of the Sung Language in Popular Music Recordings.

Author: Tsai, Wei-Ho and Wang, Hsin-Min
Subjects: *MUSIC data processing, *MUSIC & language, *INFORMATION retrieval, *POPULAR music, *SONG lyrics, *MUSICAL accompaniment
Abstract: As part of the research into content-based music information retrieval (MIR), this paper presents a preliminary attempt to automatically identify the language sung in popular music recordings. It is assumed that each language has its own set of constraints that specify the sequence of basic linguistic events when lyrics are sung. Thus, the acoustic structure of individual languages may be characterized by statistically modelling those constraints. To achieve this, the proposed method employs vector clustering to convert a singing signal from its spectrum-based feature representation into a sequence of smaller basic phonological units. The dynamic characteristics of the sequence are then analysed using bigram language models. As vector clustering is performed in an unsupervised manner, the resulting system does not need sophisticated linguistic knowledge; therefore, it is easily portable to new language sets. In addition, to eliminate interference from background music, we leverage the statistical estimation of the background musical accompaniment of a song so that the vector clustering truly reflects the solo singing voices in the accompanied signals. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

4. Content-based Language Models for Spoken Document Retrieval.

Author: Wang, Hsin-Min and Chen, Berlin
Subjects: *INFORMATION retrieval, *MULTIMEDIA systems, *SPEECH perception
Abstract: Spoken document retrieval (SDR) has been extensively studied in recent years because of its potential use in navigating large multimedia collections in the near future. This paper presents a novel concept of applying content-based language models to spoken document retrieval. In an example task for retrieval of Mandarin Chinese broadcast news data, the content-based language models, either trained on automatic transcriptions of spoken documents or adapted from baseline language models using automatic transcriptions of spoken documents, were used to create more accurate recognition results and indexing terms from both spoken documents and speech queries. We report on some interesting findings obtained in this research. [ABSTRACT FROM AUTHOR]
Published: 2001

5. Browsing the Chinese Web Pages Using Mandarin Speech.

Author: Wang, Hsin-Min, Chou, Yu-Hsueh, and Chen, Berlin
Subjects: *MANDARIN dialects, *COMPUTER interfaces
Abstract: A speech interface that allows easy access to information on the WWW has the potential to make the browser more friendly and powerful. This paper, thus, presents a working Mandarin speech interface for using unconstrained Mandarin speech to control the WWW browser for conveniently browsing the Chinese Web pages. The interface currently provides speakable commands, bookmarks, and links. The experimental results show that our approach which specially consider the characteristics of the Chinese language is very effective, and very high accuracy can be achieved. [ABSTRACT FROM AUTHOR]
Published: 2000

6. Mapping potentially inappropriate medications in older adults using the Anatomical Therapeutic Chemical (ATC) classification system.

Author: Ndai, Asinamai, Al Bahou, Julie, Morris, Earl, Wang, Hsin‐Min, Marcum, Zach, Hung, Anna, Brandt, Nicole, Steinman, Michael A., and Vouri, Scott Martin
Subjects: *PROFESSIONAL practice, *FEE for service (Medical fees), *POLYPHARMACY, *SEROTONIN uptake inhibitors, *INAPPROPRIATE prescribing (Medicine), *MEDICAL protocols, *DRUGS, *DRUG prescribing, *DRUG utilization, *MEDICAID, *DRUG side effects, *PHYSICIAN practice patterns, *MEDICARE, *ELDER care, *OLD age
Abstract: Background: Potentially inappropriate medications (PIMs) in older adults are medications in which risks often outweigh benefits and are suggested to be avoided. Worldwide, many distinct guidelines and tools classify PIMs in older adults. Collating these guidelines and tools, mapping them to a medication classification system, and creating a crosswalk will enhance the utility of PIM guidance for research and clinical practice. Methods: We used the Anatomical Therapeutic Chemical (ATC) Classification System, a hierarchical classification system, to map PIMs from eight distinct guidelines and tools (2019 Beers Criteria, Screening Tool for Older Person's Appropriate Prescriptions [STOPP], STOPP‐Japan, German PRISCUS, European Union‐7 Potentially Inappropriate Medication [PIM] list, Centers for Medicare & Medicaid Services [CMS] High‐Risk Medication, Anticholinergic Burden Scale, and Drug Burden Index). Each PIM was mapped to ATC Level 5 (drug) and to ATC Level 4 (drug class). We then used the crosswalk (1) to compare PIMs and PIM drug classes across guidelines and tools to determine the number of PIMs that were index (drug‐induced adverse event) or marker (treatment of drug‐induced adverse event) drug of prescribing cascades, and (2) estimate the prevalence of PIM use in older adults continuously enrolled with fee‐for‐service Medicare in 2018 as use cases. Data visualization and descriptive statistics were used to assess guidelines and tools for both use cases. Results: Out of 480 unique PIMs identified, only three medications—amitriptyline, clomipramine, and imipramine and two drug classes—N06AA (tricyclic antidepressants) and N06AB (selective serotonin reuptake inhibitors), were noted in all eight guidelines and tools. Using the crosswalk, 50% of classes of index drugs and 47% of classes of marker drugs of known prescribing cascades were PIMs. Additionally, 88% of Medicare beneficiaries were dispensed ≥1 PIM across the eight guidelines and tools. Conclusion: We created a crosswalk of eight PIM guidelines and tools to the ATC classification system and created two use cases. Our findings could be used to expand the ease of PIM identification and harmonization for research and clinical practice purposes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Generalized k-Labelsets Ensemble for Multi-Label and Cost-Sensitive Classification.

Author: Lo, Hung-Yi, Lin, Shou-De, and Wang, Hsin-Min
Subjects: *MACHINE learning, *TAGS (Metadata), *HYPERGRAPHS, *LEARNING classifier systems, *STATISTICAL ensembles, *INFORMATION technology
Abstract: Label powerset (LP) method is one category of multi-label learning algorithm. This paper presents a basis expansions model for multi-label classification, where a basis function is an LP classifier trained on a random k-labelset. The expansion coefficients are learned to minimize the global error between the prediction and the ground truth. We derive an analytic solution to learn the coefficients efficiently. We further extend this model to handle the cost-sensitive multi-label classification problem, and apply it in social tagging to handle the issue of the noisy training set by treating the tag counts as the misclassification costs. We have conducted experiments on several benchmark datasets and compared our method with other state-of-the-art multi-label learning methods. Experimental results on both multi-label classification and cost-sensitive social tagging demonstrate that our method has better performance than other methods. [ABSTRACT FROM PUBLISHER]
Published: 2014
Full Text: View/download PDF

8. Improving GMM–UBM speaker verification using discriminative feedback adaptation

Author: Chao, Yi-Hsiang, Tsai, Wei-Ho, and Wang, Hsin-Min
Subjects: *ENGINEERING databases, *DIALOG (Information retrieval system), *INFORMATION storage & retrieval systems, *SYSTEMS design
Abstract: Abstract: The Gaussian mixture model – Universal background model (GMM–UBM) system is one of the predominant approaches for text-independent speaker verification, because both the target speaker model and the impostor model (UBM) have generalization ability to handle “unseen” acoustic patterns. However, since GMM–UBM uses a common anti-model, namely UBM, for all target speakers, it tends to be weak in rejecting impostors’ voices that are similar to the target speaker’s voice. To overcome this limitation, we propose a discriminative feedback adaptation (DFA) framework that reinforces the discriminability between the target speaker model and the anti-model, while preserving the generalization ability of the GMM–UBM approach. This is achieved by adapting the UBM to a target speaker dependent anti-model based on a minimum verification squared-error criterion, rather than estimating the model from scratch by applying the conventional discriminative training schemes. The results of experiments conducted on the NIST2001-SRE database show that DFA substantially improves the performance of the conventional GMM–UBM approach. [Copyright &y& Elsevier]
Published: 2009
Full Text: View/download PDF

9. Improving the characterization of the alternative hypothesis via minimum verification error training with applications to speaker verification

Author: Chao, Yi-Hsiang, Tsai, Wei-Ho, Wang, Hsin-Min, and Chang, Ruei-Chuan
Subjects: *STATISTICAL hypothesis testing, *AUTOMATIC speech recognition, *PATTERN perception, *HYPOTHESIS, *GENETIC algorithms, *MACHINE learning
Abstract: Abstract: Speaker verification is usually formulated as a statistical hypothesis testing problem and solved by a log-likelihood ratio (LLR) test. A speaker verification system''s performance is highly dependent on modeling the target speaker''s voice (the null hypothesis) and characterizing non-target speakers’ voices (the alternative hypothesis). However, since the alternative hypothesis involves unknown impostors, it is usually difficult to characterize a priori. In this paper, we propose a framework to better characterize the alternative hypothesis with the goal of optimally distinguishing the target speaker from impostors. The proposed framework is built on a weighted arithmetic combination (WAC) or a weighted geometric combination (WGC) of useful information extracted from a set of pre-trained background models. The parameters associated with WAC or WGC are then optimized using two discriminative training methods, namely, the minimum verification error (MVE) training method and the proposed evolutionary MVE (EMVE) training method, such that both the false acceptance probability and the false rejection probability are minimized. Our experiment results show that the proposed framework outperforms conventional LLR-based approaches. [Copyright &y& Elsevier]
Published: 2009
Full Text: View/download PDF

10. Quadriceps muscle volume positively contributes to ACL volume.

Author: Shultz, Sandra J., Schmitz, Randy J., Kulas, Anthony S., Labban, Jeffrey D., and Wang, Hsin‐Min
Subjects: *QUADRICEPS muscle, *ANTERIOR cruciate ligament, *MUSCLE mass, *THIGH, *HAMSTRING muscle, *MAGNETIC resonance imaging
Abstract: Females have smaller anterior cruciate ligaments (ACLs) than males and smaller ACLs have been associated with a greater risk of ACL injury. Overall body dimensions do not adequately explain these sex differences. This study examined the extent to which quadriceps muscle volume (VOLQUAD) positively predicts ACL volume (VOLACL) once sex and other body dimensions were accounted for. Physically active males (N = 10) and females (N = 10) were measured for height, weight, and body mass index (BMI). Three‐Tesla magnetic resonance images of their dominant and nondominant thigh and knee were then obtained to measure VOLACL, quadriceps, and hamstring muscle volumes, femoral notch width, and femoral notch width index. Separate three‐step regressions estimated associations between VOLQUAD and VOLACL (third step), after controlling for sex (first step) and one body dimension (second step). When controlling for sex and sex plus BMI, VOLHAM, notch width, or notch width index, VOLQUAD consistently exhibited a positive association with VOLACL in the dominant leg, nondominant leg, and leg‐averaged models (p < 0.05). Findings were inconsistent when controlling for sex and height (p = 0.038–0.102). Once VOLQUAD was included, only notch width and notch width index retained a statistically significant individual association with VOLACL (p < 0.01). Statement of Clinical Significance: The positive association between VOLQUAD and VOLACL suggests ACL size may in part be modifiable. Future studies are needed to determine the extent to which an appropriate training stimulus (focused on optimizing overall lower extremity muscle mass development) can positively impact ACL size and structure in young females. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

11. Blind Clustering of Popular Music Recordings Based on Singer Voice Characteristics.

Author: Wei-Ho Tsai, Rodgers, Dwight, and Wang, Hsin-Min
Subjects: *COMPUTER music, *SINGERS, *POPULAR music, *VOCAL music, *ELECTRONIC music
Abstract: This study examined the feasibility of unsupervised clustering of music data based on their associated singer. The music data used in this study consisted of 416 tracks from Mandarin pop music compact discs. Most of the tracks were lyrical ballads, but around ten percent were folk-like songs accompanied by a disco beat, and another ten percent were essentially stylistic imitations of Western pop blended with hip hop and rock. The average length of the tracks was about three minutes. It has been shown that the characteristics of a singer's voice can be extracted from music via vocal segment detection followed by solo-vocal signal modeling. Singer-based clustering was formulated and solved using a vector-clustering framework with reliable estimation of the correct number of clusters. Although fairly good results have been reported in this article, more work is needed to validate the proposed methods for a wider variety of music data, such as larger singer populations and richer songs with different music styles. Furthermore, future work for singer-based clustering will extend the current system to handle duets, chorus, background vocals, or other music data with multiple simultaneous or non-simultaneous singers.
Published: 2004
Full Text: View/download PDF

12. A hierarchical tag-graph search scheme with layered grammar rules for spontaneous speech understanding

Author: Lin, Bor-shen, Chen, Berlin, Wang, Hsin-min, and Lee, Lin-shan
Subjects: *SPEECH, *COMPUTER systems
Abstract: It has always been difficult for language understanding systems to handle spontaneous speech with satisfactory robustness, primarily due to such problems as the fragments, disfluencies, out-of-vocabulary words, and ill-formed sentence structures. Also, the search schemes used are usually not flexible enough in accepting different input linguistic units, and great efforts are therefore required when they are used with different acoustic front ends in different tasks, specially in multi-modal and multi-lingual systems. In this paper, a new hierarchical tag-graph-based search scheme for spontaneous speech understanding is proposed. This scheme is based on a layered hierarchy of grammar rules, and therefore can integrate all the statistical and rule-based knowledge including acoustic scores, language model scores and grammar rules into the search process. More robust speech understanding is thus achievable. In addition, this scheme can accept graphs of different linguistic units such as phonemes, syllables, characters, words, spotted keywords, or phrases as the input, thus compatible to different acoustic front ends and multi-modal and multi-lingual applications can be easily developed. This search scheme has been successfully applied to a multi-domain, multi-modal dialogue system. [Copyright &y& Elsevier]
Published: 2002
Full Text: View/download PDF

13. Syllable-Based Chinese Text/Spoken Document Retrieval Using Text/Speech Queries.

Author: Bai, Bo-ren, Chen, Berlin, and Wang, Hsin-min
Subjects: *QUERY languages (Computer science), *INFORMATION retrieval
Abstract: In light of the rapid growth of Chinese information resources on the Internet, this study investigates a novel approach that deals with the problem of Chinese text and spoken document retrieval using both text and speech queries. By properly utilizing the monosyllabic structure of the Chinese language, the proposed approach estimates the statistical similarity between the text/speech queries and the text/spoken documents at the phonetic level using the syllable-based statistical information. The investigation successfully implemented a prototype system with an interface supporting some user-friendly functions and the initial test results demonstrate the feasibility of the proposed approach. [ABSTRACT FROM AUTHOR]
Published: 2000
Full Text: View/download PDF

14. Bilateral quadriceps and hamstrings muscle volume asymmetries in healthy individuals.

Author: Kulas, Anthony S., Schmitz, Randy J., Shultz, Sandra J., Waxman, Justin P., Wang, Hsin‐min, Kraft, Robert A., and Partington, Heath S.
Subjects: *QUADRICEPS muscle physiology, *HAMSTRING muscle, *MAGNETIC resonance imaging, *ANTERIOR cruciate ligament injuries, *ANATOMY
Abstract: ABSTRACT: Determining the magnitude of quadriceps and hamstring muscle volume asymmetries in healthy individuals is a critical first step toward interpreting asymmetries as compensatory or abnormal in pathological populations. The purpose of this study was to determine the magnitude of whole and individual muscle volume asymmetries, quantified as right–left volume differences, for the quadriceps and hamstring muscles in a young and healthy population. Twenty‐one healthy individuals participated: Eleven females age = 22.6 ± 2.9 years and 10 males age = 23.2 ± 3.4 years. Whole muscle group and individual muscle volume asymmetries were quantified within the context of absolute measurement error using a 95% Limits of Agreement approach. Mean muscle asymmetries ranged from −3.0 to 6.0% for all individual and whole muscle groups. Whole muscle group 95% limits of agreements represented ±11.4% and ±8.8% volume asymmetries for the hamstrings and quadriceps, respectively. Individual muscle asymmetry 95% limits of agreements ranged from ∼ ± 11–13% for the vastii muscles while the biceps femoris short‐head (±33.5%), long‐head (±20.9%), and the rectus femoris (±21.4%) displayed the highest relative individual asymmetries. Individual muscle asymmetries exceeded absolute measurement error in 70% of all cases, with 26% of all cases exceeding 10% asymmetry. Although whole muscle group asymmetries appear to be near the 10% assumed clinical threshold of normality, the greater magnitude of individual muscle asymmetries highlights the subject‐ and muscle‐specific variability in volume asymmetry. Future research is warranted to determine if volume asymmetry thresholds exist that discriminate between healthy and pathological populations. Statement of Clinical Significance: Muscle volume asymmetries displayed in healthy individuals provide a reference for interpreting asymmetries in pathological populations. © 2017 Orthopaedic Research Society. Published by Wiley Periodicals, Inc. J Orthop Res 36:963–970, 2018. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

15. A Rotational Actuator Using a Thermomagnetic-Induced Magnetic Force Interaction.

Author: Cheng, Chih-Cheng, Chung, Tien-Kan, Chen, Chin-Chung, and Wang, Hsin-Min
Subjects: *GADOLINIUM compounds, *STAINLESS steel, *THERMOELECTRIC generators, *MAGNETISM, *TEMPERATURE distribution, *MAGNETIC actuators
Abstract: In this paper, we demonstrate a rotational actuator using a thermomagnetic-induced magnetic force interaction. The actuator consists of a magnetic rotary beam, stainless-steel bearing, mechanical frame, thermomagnetic Gadolinium sheets, and thermoelectric generators (TEGs). Experimental results show that applying a sequence of currents to the TEGs successfully produces sequential magnetic forces. Consequently, these sequential magnetic forces rotate the beam for revolutions. When applying a sequence set of currents of −0.5 and 1.3 A, the maximum rotation speed and maximum stall torque of the actuator is 3.81 rpm and $136.2~\mu $ Nm, respectively. Most importantly, the operating temperatures of other thermomagnetic (and electrothermal) actuators are usually high, but the operating temperature of our actuator is approximately room temperature (13 °C–27 °C). Therefore, our actuators have more practical applications. According to the above-mentioned features, we believe our actuator is an important alternative approach to developing future rotational actuators and motors. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

16. Exploring the use of unsupervised query modeling techniques for speech recognition and summarization.

Author: Chen, Kuan-Yu, Liu, Shih-Hung, Chen, Berlin, Wang, Hsin-Min, and Chen, Hsin-Hsi
Subjects: *SPEECH perception, *INFORMATION retrieval, *QUERY (Information retrieval system), *SEARCH algorithms, *MANIFOLDS (Mathematics), *LANGUAGE & languages, *QUERY languages (Computer science), *MATHEMATICAL models
Abstract: Statistical language modeling (LM) that intends to quantify the acceptability of a given piece of text has long been an interesting yet challenging research area. In particular, language modeling for information retrieval (IR) has enjoyed remarkable empirical success; one emerging stream of the LM approach for IR is to employ the pseudo-relevance feedback process to enhance the representation of an input query so as to improve retrieval effectiveness. This paper presents a continuation of such a general line of research and the major contributions are three-fold. First, we propose a principled framework which can unify the relationships among several widely-cited query modeling formulations. Second, on top of this successfully developed framework, two extensions have been proposed. On one hand, we present an extended query modeling formulation by incorporating critical query-specific information cues to guide the model estimation. On the other hand, a word-based relevance modeling has also been leveraged to overcome the obstacle of time-consuming model estimation when the framework is being utilized for practical applications. In addition, we further adopt and formalize such a framework to the speech recognition and summarization tasks. A series of experiments reveal the empirical potential of such an LM framework and the performance merits of the deduced models on these two tasks. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

17. Fluent speech prosody: Framework and modeling

Author: Tseng, Chiu-yu, Pin, Shao-huang, Lee, Yehlin, Wang, Hsin-min, and Chen, Yong-cheng
Subjects: *SPEECH, *INTONATION (Phonetics), *VERSIFICATION, *RHYTHM
Abstract: Abstract: The prosody of fluent connected speech is much more complicated than concatenating individual sentence intonations into strings. We analyzed speech corpora of read Mandarin Chinese discourses from a top–down perspective on perceived units and boundaries, and consistently identified speech paragraphs of multiple phrases that reflected discourse rather than sentence effects in fluent speech. Subsequent cross-speaker and cross-speaking-rate acoustic analyses of identified speech paragraphs revealed systematic cross-phrase prosodic patterns in every acoustic parameter, namely, F0 contours, duration adjustment, intensity patterns, and in addition, boundary breaks. We therefore argue for a higher prosodic node that governs, constrains, and groups phrases to derive speech paragraphs. A hierarchical multi-phrase framework is constructed to account for the governing effect, with complimentary production and perceptual evidences. We show how cross-phrase F0 and syllable duration patterns templates are derived to account for the tune and rhythm characteristic to fluent speech prosody, and argue for a prosody framework that specifies phrasal intonations as subjacent sister constituent subject to higher terms. Output fluent speech prosody is thus cumulative results of contributions from every prosodic layer. To test our framework, we further construct a modular prosody model of multiple-phrase grouping with four corresponding acoustic modules and begin testing the model with speech synthesis. To conclude, we argue that any prosody framework of fluent speech should include prosodic contributions above individual sentences in production, with considerations of its perceptual effects to on-line processing; and development of unlimited TTS could benefit most appreciably by capturing and including cross-phrase relationships in prosody modeling. [Copyright &y& Elsevier]
Published: 2005
Full Text: View/download PDF

18. Mandarin–English Information (MEI): investigating translingual speech retrieval

Author: Meng, Helen M., Chen, Berlin, Khudanpur, Sanjeev, Levow, Gina-Anne, Lo, Wai-Kit, Oard, Douglas, Schone, Patrick, Tang, Karen, Wang, Hsin-min, and Wang, Jianqiang
Subjects: *INTELLIGIBILITY of speech, *CROSS-language information retrieval, *INFORMATION retrieval, *CHINESE people
Abstract: This paper describes the Mandarin–English Information (MEI) project, where we investigated the problem of cross-language spoken document retrieval (CL-SDR), and developed one of the first English–Chinese CL-SDR systems. Our system accepts an entire English news story (text) as query, and retrieves relevant Chinese broadcast news stories (audio) from the document collection. Hence, this is a cross-language and cross-media retrieval task. We applied a multi-scale approach to our problem, which unifies the use of phrases, words and subwords in retrieval. The English queries are translated into Chinese by means of a dictionary-based approach, where we have integrated phrase-based translation with word-by-word translation. Untranslatable named entities are transliterated by a novel subword translation technique. The multi-scale approach can be divided into three subtasks – multi-scale query formulation, multi-scale audio indexing (by speech recognition) and multi-scale retrieval. Experimental results demonstrate that the use of phrase-based translation and subword translation gave performance gains, and multi-scale retrieval outperforms word-based retrieval. [Copyright &y& Elsevier]
Published: 2004
Full Text: View/download PDF

19. ASVspoof 2019: A large-scale public database of synthetized, converted and replayed speech.

Author: Wang, Xin, Yamagishi, Junichi, Todisco, Massimiliano, Delgado, Héctor, Nautsch, Andreas, Evans, Nicholas, Sahidullah, Md, Vestman, Ville, Kinnunen, Tomi, Lee, Kong Aik, Juvela, Lauri, Alku, Paavo, Peng, Yu-Huai, Hwang, Hsin-Te, Tsao, Yu, Wang, Hsin-Min, Maguer, Sébastien Le, Becker, Markus, Henderson, Fergus, and Clark, Rob
Subjects: *PHISHING prevention, *AUTOMATIC speech recognition, *DATABASES, *HUMAN voice, *SPEECH synthesis
Abstract: • We describe the protocol and design of the ASVspoof Challenge 2019 database • We detail the speech synthesis and voice conversion algorithms used in the database • We detail the carefully controlled simulation to generate replay spoofing speech • We evaluate of baseline countermeasure and ASV systems on the database • Human assessment found that one spoofing system can fool human listeners Automatic speaker verification (ASV) is one of the most natural and convenient means of biometric person recognition. Unfortunately, just like all other biometric systems, ASV is vulnerable to spoofing, also referred to as "presentation attacks." These vulnerabilities are generally unacceptable and call for spoofing countermeasures or "presentation attack detection" systems. In addition to impersonation, ASV systems are vulnerable to replay, speech synthesis, and voice conversion attacks. The ASVspoof challenge initiative was created to foster research on anti-spoofing and to provide common platforms for the assessment and comparison of spoofing countermeasures. The first edition, ASVspoof 2015, focused upon the study of countermeasures for detecting of text-to-speech synthesis (TTS) and voice conversion (VC) attacks. The second edition, ASVspoof 2017, focused instead upon replay spoofing attacks and countermeasures. The ASVspoof 2019 edition is the first to consider all three spoofing attack types within a single challenge. While they originate from the same source database and same underlying protocol, they are explored in two specific use case scenarios. Spoofing attacks within a logical access (LA) scenario are generated with the latest speech synthesis and voice conversion technologies, including state-of-the-art neural acoustic and waveform model techniques. Replay spoofing attacks within a physical access (PA) scenario are generated through carefully controlled simulations that support much more revealing analysis than possible previously. Also new to the 2019 edition is the use of the tandem detection cost function metric, which reflects the impact of spoofing and countermeasures on the reliability of a fixed ASV system. This paper describes the database design, protocol, spoofing attack implementations, and baseline ASV and countermeasure results. It also describes a human assessment on spoofed data in logical access. It was demonstrated that the spoofing data in the ASVspoof 2019 database have varied degrees of perceived quality and similarity to the target speakers, including spoofed data that cannot be differentiated from bona fide utterances even by human subjects. It is expected that the ASVspoof 2019 database, with its varied coverage of different types of spoofing data, could further foster research on anti-spoofing. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

19 results on '"Wang, Hsin-Min"'

1. Time-Series Linear Search for Video Copies Based on Compact Signature Manipulation and Containment Relation Modeling.

2. Evolutionary minimization of the Rand index for speaker clustering

3. Automatic Identification of the Sung Language in Popular Music Recordings.

4. Content-based Language Models for Spoken Document Retrieval.

5. Browsing the Chinese Web Pages Using Mandarin Speech.

6. Mapping potentially inappropriate medications in older adults using the Anatomical Therapeutic Chemical (ATC) classification system.

7. Generalized k-Labelsets Ensemble for Multi-Label and Cost-Sensitive Classification.

8. Improving GMM–UBM speaker verification using discriminative feedback adaptation

9. Improving the characterization of the alternative hypothesis via minimum verification error training with applications to speaker verification

10. Quadriceps muscle volume positively contributes to ACL volume.

11. Blind Clustering of Popular Music Recordings Based on Singer Voice Characteristics.

12. A hierarchical tag-graph search scheme with layered grammar rules for spontaneous speech understanding

13. Syllable-Based Chinese Text/Spoken Document Retrieval Using Text/Speech Queries.

14. Bilateral quadriceps and hamstrings muscle volume asymmetries in healthy individuals.

15. A Rotational Actuator Using a Thermomagnetic-Induced Magnetic Force Interaction.

16. Exploring the use of unsupervised query modeling techniques for speech recognition and summarization.

17. Fluent speech prosody: Framework and modeling

18. Mandarin–English Information (MEI): investigating translingual speech retrieval

19. ASVspoof 2019: A large-scale public database of synthetized, converted and replayed speech.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

19 results on '"Wang, Hsin-Min"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources