16 results on '"VOICEPRINTS"'
Search Results
2. An MFCC-based text-independent speaker identification system for access control.
- Author
-
Liu, Jung‐Chun, Leu, Fang‐Yie, Lin, Guan‐Liang, and Susanto, Heru
- Subjects
COMPUTER access control ,VOICEPRINTS ,PATTERN recognition systems ,FOURIER transforms ,GAUSSIAN mixture models ,ACOUSTIC models - Abstract
In recent years, by merit of convenient and unique features, bio-authentication techniques have been applied to identify and authenticate a person based on his/her spoken words and/or sentences. Among these techniques, speaker recognition/identification is the most convenient one, providing a secure and strong authentication solution viable for a wide range of applications. In this paper, to safeguard real-world objects, like buildings, we develop a speaker identification system named mel frequency cepstral coefficients (MFCC)-based speaker identification system for access control (MSIAC for short), which identifies a speaker U by first collecting U's voice signals and converting the signals to frequency domain. An MFCC-based human auditory filtering model is utilized to adjust the energy levels of different frequencies as U's voice quantified features. Next, a Gaussian mixture model is employed to represent the distribution of the logarithmic features as U's specific acoustic model. When a person, eg, x, would like to access a real-world object protected by the MSIAC, x's acoustic model is compared with known-people's acoustic models. Based on the identification result, the MSIAC will determine whether the access will be accepted or denied. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
3. Automatic speaker verification on narrowband and wideband lossy coded clean speech.
- Author
-
Jarina, Roman, Polacký, Jozef, Počta, Peter, and Chmulík, Michal
- Subjects
- *
BIOMETRIC identification , *VOICEPRINTS , *CODECS , *COMPRESSED speech , *VOICE mail systems - Abstract
Substantial progress has been achieved in voice-based biometrics in recent times but a variety of challenges still remain for speech research community. One such obstacle is reliable speaker authentication from speech signals degraded by lossy compression. Compression is commonplace in modern telecommunications, such as mobile telephony, VoIP services, teleconference, voice messaging or gaming. In this study, the authors investigate the effect of lossy speech compression on text-independent speaker verification. Voice biometrics performance is evaluated on clean speech signals distorted by the stateof- the-art narrowband (NB) as well as wideband (WB) speech codecs. The tests are performed in both channel-matched and channel-mismatched scenarios. The test results show that coded WB speech improves voice authentication precision by 1-3% of equal error rate over coded NB speech, even at the lowest investigated bitrates. It is also shown that the enhanced voice services codec does not provide better results than the other codecs involved in this study. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
4. Speaker Recognition Using Wavelet Packet Entropy, I-Vector, and Cosine Distance Scoring.
- Author
-
Lei, Lei and Kun, She
- Subjects
VOICEPRINTS ,WAVELETS (Mathematics) ,ENTROPY (Information theory) ,COSINE function ,VECTOR analysis - Abstract
Today, more and more people have benefited from the speaker recognition. However, the accuracy of speaker recognition often drops off rapidly because of the low-quality speech and noise. This paper proposed a new speaker recognition model based on wavelet packet entropy (WPE), i-vector, and cosine distance scoring (CDS). In the proposed model, WPE transforms the speeches into short-term spectrum feature vectors (short vectors) and resists the noise. I-vector is generated from those short vectors and characterizes speech to improve the recognition accuracy. CDS fast compares with the difference between two i-vectors to give out the recognition result. The proposed model is evaluated by TIMIT speech database. The results of the experiments show that the proposed model can obtain good performance in clear and noisy environment and be insensitive to the low-quality speech, but the time cost of the model is high. To reduce the time cost, the parallel computation is used. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
5. Speaker Recognition Using Wavelet Cepstral Coefficient, I-Vector, and Cosine Distance Scoring and Its Application for Forensics.
- Author
-
Lei, Lei and Kun, She
- Subjects
VOICEPRINTS ,CEPSTRUM analysis (Mechanics) ,FORENSIC sciences ,COSINE function ,COEFFICIENTS (Statistics) ,WAVELETS (Mathematics) ,DISCRIMINANT analysis - Abstract
An important application of speaker recognition is forensics. However, the accuracy of speaker recognition in forensic cases often drops off rapidly because of the ill effect of ambient noise, variable channel, different duration of speech data, and so on. Therefore, finding a robust speaker recognition model is very important for forensics. This paper builds a new speaker recognition model based on wavelet cepstral coefficient (WCC), i-vector, and cosine distance scoring (CDS). This model firstly uses the WCC to transform the speech into spectral feature vecors and then uses those spectral feature vectors to train the i-vectors that represent the speeches having different durations. CDS is used to compare the i-vectors to give out the evidence. Moreover, linear discriminant analysis (LDA) and the within-class covariance normalization (WCNN) are added to the CDS algorithm to deal with the channel variability problem. Finally, the likelihood ratio estimates the strength of the evidence. We use the TIMIT database to evaluate the performance of the proposed model. The experimental results show that the proposed model can effectively solve the troubles of forensic scenario, but the time cost of the method is high. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
6. An Approach to Speaker Identification.
- Author
-
Hollien, Harry
- Subjects
- *
VOICEPRINTS , *SPEECH processing systems , *FORENSIC phonetics , *AUTOMATIC speech recognition , *NATURAL language processing - Abstract
This presentation will provide standards upon which any attempts to meet the challenge of identifying speakers by voice should be based. It is organized into a model based on (i) application of a rigorous research program validating the system, (ii) an upgrading of the organization of the SI area, and (iii) exploitation of new technology. The second part of the presentation will describe an illustrative speech/voice approach to SI development. This effort is also based on an extensive corpus of research. It is suggested that application of the cited standards, plus the illustrative model, will permit reasonable progress to be made. Finally, a number of procedural recommendations are made; they should enhance the efficacy of the proposed approach. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
7. Biometrics.
- Author
-
Caelli, William J.
- Subjects
BIOMETRY ,IDENTIFICATION ,PERSONAL identification numbers ,IDENTITY (Psychology) ,DNA fingerprinting ,SMART cards ,VOICEPRINTS ,AUTOMATION - Abstract
The term "biometrics" is defined in the USA's NCSS Instruction 4009¹, an IT assurance glossary of terms, as follows: "Biometrics: Automated methods of authenticating or verifying an individual based upon a physical or behavioural characteristic.". The term relates to the original definition of three ways to verify a claim of identity that emerged in the 1970s. These are proof of identity claimed by: • What you know, such as a password, personal identification number (PIN), etc. • What you possess, such as a smart card, mechanical/electronic token, etc. and/or • What you are, such as a fingerprint, eye retinal pattern, voiceprint, DNA pattern, etc. [ABSTRACT FROM AUTHOR]
- Published
- 2005
8. A Hierarchical Framework Approach for Voice Activity Detection and Speech Enhancement.
- Author
-
Yan Zhang, Zhen-min Tang, Yan-ping Li, and Yang Luo
- Subjects
SPEECH perception ,VOICEPRINTS ,WIENER filters (Signal processing) ,NOISE control ,FEATURE selection - Abstract
Accurate and effective voice activity detection (VAD) is a fundamental step for robust speech or speaker recognition. In this study, we proposed a hierarchical framework approach for VAD and speech enhancement. The modified Wiener filter (MWF) approach is utilized for noise reduction in the speech enhancement block. For the feature selection and voting block, several discriminating features were employed in a voting paradigm for the consideration of reliability and discriminative power. Effectiveness of the proposed approach is compared and evaluated to other VAD techniques by using two well-known databases, namely, TIMIT database and NOISEX-92 database. Experimental results show that the proposed method performs well under a variety of noisy conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
9. Cost-Sensitive Learning for Emotion Robust Speaker Recognition.
- Author
-
Dongdong Li, Yingchun Yang, and Weihui Dai
- Subjects
VOICEPRINTS ,MACHINE learning ,INFORMATION technology security ,BIOMETRIC identification ,PROBABILITY theory - Abstract
In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
10. Mobile user authentication system in cloud environment.
- Author
-
Yeh, Her-Tyan, Chen, Bing-Chang, and Wu, Yi-Cong
- Subjects
INTERNET security ,COMPUTER passwords ,CAPTCHA (Challenge-response test) ,VISUAL cryptography ,SMARTPHONES ,VOICEPRINTS - Abstract
ABSTRACT In order to reach a safe environment that can be automatically used on the Internet and to take precautions against the Internet fishing attack, the system integrates some features including one-time password, Completely Automated Public Turing Test to tell Computers and Humans Apart, voiceprint identification of creatural features, and visual cryptography, designing a formula wherein users do not need to remember any accounts and passwords when they surf the Internet through mobile devices, and it aims at smart phones and the Cloud. The formula is able to improve the problems of rampant Internet fishing and the management of passwords. In techniques, on one hand, it uses PIN information visual passwords in cell phones to improve the security of the account; on the other hand, it uses voiceprint identification features so that the system center can ensure the user's identification with a view to improve the leak in mobile devices rather than only to check mobile devices. And then, it utilizes the voiceprint, which we use when we log in, to produce a one-time password that is able to lower the risk of the account and passwords being attacked by Internet fishing. Through the frame of this research, it can protect our cell phones from being lost and embezzled and can prevent the account and passwords from being attacked by Internet fishing. It can also solve the problem of users forgetting accounts and passwords, and reduce the operational burden of cell phones. Besides, it is capable of preventing the Cloud servers from incurring many malicious registrations and logins, keeping them working efficiently. Copyright © 2012 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
11. Passive Acoustic Source Tracking Using Underwater Distributed Sensors.
- Author
-
Seung-Yong Chun and Ki-Man Kim
- Subjects
DISTRIBUTED sensors ,SPECTROGRAMS ,SOUND spectrography ,VOICEPRINTS ,WAVEGUIDES ,COAXIAL waveguides ,WAVEGUIDE theory - Abstract
Passive acoustic source tracking using underwater distributed sensors has been a severe problem because of the complexity of the underwater channel and the limited resources of the sensors. In this paper, we propose an acoustic source tracking algorithm using underwater distributed sensors. According to the waveguide invariant theory, a slope of the interference pattern which is seen in a spectrogram is proportional to range of the acoustic source. The proposed algorithm matches the interference patterns at the distributed sensors and calculates a distance ratio between source and sensors. A locus of the source by the principal of the circle of Apollonius is estimated. The Apollonius circle, however, still keeps the ambiguity against the correct source location. In addition hyperbola equation is introduced into localization algorithm by estimating time difference of arrival between the received signals at the distributed sensors. Finally the cross point of the circle and hyperbola can be estimated as the position of the acoustic source. The proposed algorithm is tested on sea trial data for acoustic source ranges of 400-2,000 m and frequencies from 50 to 750 Hz. The results show that the proposed algorithm successfully estimates the source location within an error bound of 7.3%. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
12. Multimedia mug books: how multi should the media be?
- Author
-
McAllister, Hunter A., Blair, Mark J., Cerone, Laura G., and Laurent, Mark J.
- Subjects
- *
COGNITIVE psychology , *EYEWITNESS identification , *MULTIMEDIA systems , *VOICEPRINTS , *IDENTIFICATION - Abstract
The impact of allowing witnesses to choose the type of cues presented in multimedia mug books was explored in two experiments. In Experiment 1, participants viewed a videotaped crime and attempted to identify the perpetrator from one of three types of mug books: (a) dynamic-combined—participants could choose to follow static mug shots with a computerized video clip combining three types of dynamic cues: the person walking, talking, and rotating; (b) dynamic-separable—participants could limit the types of dynamic cues presented; and (c) static—just the static mug shot was presented. The dynamic-separable condition produced significantly fewer false positive foil identifications than the static condition. Within the dynamic-separable condition, voice was the most preferred cue. Experiment 2 explored the contribution of the individual cues. Participants attempted identifications from single dynamic cue mug books where only one type of cue was presented if a participant chose additional information. It was found that providing individual cues did not improve performance over the static mug book control. Based on the potential danger of witnesses choosing to rely on single dynamic cues, it was suggested that multimedia mug books should present dynamic cues in combination. Copyright © 2000 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
- Published
- 2000
- Full Text
- View/download PDF
13. Voice Identification of an Abductor.
- Author
-
Yarmey, A. Daniel and Matthys, Eva
- Subjects
- *
AUTOMATIC speech recognition , *PATTERN recognition systems , *VOICEPRINTS , *IDENTIFICATION of criminals , *LANGUAGE & languages , *HUMAN voice - Abstract
A total of 288 male and 288 female undergraduate students heard a taped voice of a mock abductor for a total of 18 seconds, 36 seconds, 120 seconds, or 6 minutes. One-third of all subjects heard the voice for one massed trial, one-third for two equal period separated by a 5-minute intervals. Voice identification and confidence of response were tested immediately after observation, or 24 hours later, or 1 week later. Hit rates were significantly greater with longer voice-sample durations, and were superior with two distributed exposures to the suspect’s voice in contrast to one massed exposure or three distributed exposures. However, the false alarm rate in the suspect-present line-up differed significantly as a function of voice-sample durations and retention intervals, and voice-sample durations and frequency of distributed exposures. False alarms in the suspect-absent line-up were consistently high (overall M = .58) and exceeded the overall hit rate (M = .40). Confidence of response was negatively correlated with suspect identification in the 18-second voice-sample condition, but was positively correlated with voice-sample durations of 120 seconds and 6 minutes. [ABSTRACT FROM AUTHOR]
- Published
- 1992
- Full Text
- View/download PDF
14. A Language Effect in Voice Identification.
- Author
-
Thompson, Charles P.
- Subjects
- *
LANGUAGE & languages , *HUMAN voice , *IDENTIFICATION , *VOICEPRINTS , *BILINGUALISM , *MONOLINGUALISM - Abstract
Bilingual students recorded messages in English, Spanish, and English with a strong Spanish accent. Monolingual English-speaking subjects heard a single message and attempted to identify the voice in a six-person line-up 1 week later. The line-up message was delivered in the same language and accent as the initial message. Voices were identified best when speaking English and worst when speaking Spanish. Identification accuracy was intermediate for the accent condition. There were no reliable differences among conditions in false alarms when the target voice was absent from the line-up. The effect of language was replicated using a 30-min retention interval. Familiarity with the language and language constraints on voice characteristics were discussed as possible explanations of the language effect. [ABSTRACT FROM AUTHOR]
- Published
- 1987
- Full Text
- View/download PDF
15. Robust speaker recognition using library of cross-domain variation compensation transforms.
- Author
-
Houjun Huang, Shengyu Yao, Ruohua Zhou, and Yonghong Yan
- Subjects
- *
VOICEPRINTS , *DISCRIMINANT analysis , *MULTIVARIATE analysis , *ANALYSIS of covariance , *SCATTERING (Physics) - Abstract
Although the state-of-the-art i-vector-based probabilistic linear discriminant analysis systems resulted in promising performances in the National Institute of Standards and Technology speaker recognition evaluations, the impact of domain mismatch when the system development data and the evaluation data are collected from different sources remains a challenging problem. This issue was a focus of the Johns Hopkins University 2013 speaker recognition workshop where a domain adaptation challenge (DAC13) corpus was created to address it. The cross-domain variation compensation (CDVC) approach has been recently proposed to address it when in-domain development data are available. The work reported by the present authors addresses this issue when in-domain development data are unavailable using a library of CDVC transforms. This approach is evaluated on the DAC13 corpus and is shown to be more powerful than nuisance attribute projection-based inter-dataset variability compensation and the whitening library. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
16. Spectral shifting of speaker-specific information for narrow band telephonic speaker recognition.
- Author
-
Thiruvaran, T., Sethu, V., Ambikairajah, E., and Li, H.
- Subjects
- *
VOICEPRINTS , *TELEPHONES , *ERROR rates , *RADIUS (Computer network protocol) , *BANDWIDTHS - Abstract
Speech-based authentication system can perform remote authentication over telephone channels. However, telephone channels are restricted to a bandwidth of ~0-4 kHz while studies on the distribution of speakerspecific information in the speech spectrum strongly suggests that useful speaker-specific information is present above 4 kHz. A method to shift a part of this speaker-specific information above 4 kHz into the telephone bandwidth in place of less speaker-specific information originally present below 4 kHz is proposed. Speaker recognition experiments conducted using the proposed method leads to ~18.5% relative improvement on equal error rate when compared to a system using the conventional telephone band speech, as evaluated on the Intelligence Advanced Research Projects Activity (IARPA) Babel Program Tamil language collection. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.