Author: "Nam, KiHyun" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Nam, KiHyun"' showing total 16 results

Start Over Author "Nam, KiHyun"

16 results on '"Nam, KiHyun"'

1. Disentangled Representation Learning for Environment-agnostic Speaker Recognition

Author: Nam, KiHyun, Heo, Hee-Soo, Jung, Jee-weon, and Chung, Joon Son
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This work presents a framework based on feature disentanglement to learn speaker embeddings that are robust to environmental variations. Our framework utilises an auto-encoder as a disentangler, dividing the input speaker embedding into components related to the speaker and other residual information. We employ a group of objective functions to ensure that the auto-encoder's code representation - used as the refined embedding - condenses only the speaker characteristics. We show the versatility of our framework through its compatibility with any existing speaker embedding extractor, requiring no structural modifications or adaptations for integration. We validate the effectiveness of our framework by incorporating it into two popularly used embedding extractors and conducting experiments across various benchmarks. The results show a performance improvement of up to 16%. We release our code for this work to be available https://github.com/kaistmm/voxceleb-disentangler, Comment: Interspeech 2024. The official webpage can be found at https://mm.kaist.ac.kr/projects/voxceleb-disentangler/
Published: 2024

2. Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification

Author: Heo, Hee-Soo, Nam, KiHyun, Lee, Bong-Jin, Kwon, Youngki, Lee, Minjae, Kim, You Jin, and Chung, Joon Son
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: In the field of speaker verification, session or channel variability poses a significant challenge. While many contemporary methods aim to disentangle session information from speaker embeddings, we introduce a novel approach using an additional embedding to represent the session information. This is achieved by training an auxiliary network appended to the speaker embedding extractor which remains fixed in this training process. This results in two similarity scores: one for the speakers information and one for the session information. The latter score acts as a compensator for the former that might be skewed due to session variations. Our extensive experiments demonstrate that session information can be effectively compensated without retraining of the embedding extractor.
Published: 2023

3. TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning

Author: Jung, Chaeyoung, Lee, Suyeon, Nam, Kihyun, Rho, Kyeongha, Kim, You Jin, Jang, Youngjoon, and Chung, Joon Son
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: The goal of this work is Active Speaker Detection (ASD), a task to determine whether a person is speaking or not in a series of video frames. Previous works have dealt with the task by exploring network architectures while learning effective representations has been less explored. In this work, we propose TalkNCE, a novel talk-aware contrastive loss. The loss is only applied to part of the full segments where a person on the screen is actually speaking. This encourages the model to learn effective representations through the natural correspondence of speech and facial movements. Our loss can be jointly optimized with the existing objectives for training ASD models without the need for additional supervision or training data. The experiments demonstrate that our loss can be easily integrated into the existing ASD frameworks, improving their performance. Our method achieves state-of-the-art performances on AVA-ActiveSpeaker and ASW datasets.
Published: 2023

4. Disentangled representation learning for multilingual speaker recognition

Author: Nam, Kihyun, Kim, Youkyum, Huh, Jaesung, Heo, Hee Soo, Jung, Jee-weon, and Chung, Joon Son
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: The goal of this paper is to learn robust speaker representation for bilingual speaking scenario. The majority of the world's population speak at least two languages; however, most speaker recognition systems fail to recognise the same speaker when speaking in different languages. Popular speaker recognition evaluation sets do not consider the bilingual scenario, making it difficult to analyse the effect of bilingual speakers on speaker recognition performance. In this paper, we publish a large-scale evaluation set named VoxCeleb1-B derived from VoxCeleb that considers bilingual scenarios. We introduce an effective disentanglement learning strategy that combines adversarial and metric learning-based methods. This approach addresses the bilingual situation by disentangling language-related information from speaker representation while ensuring stable speaker representation learning. Our language-disentangled learning method only uses language pseudo-labels without manual information., Comment: Interspeech 2023
Published: 2022

5. ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers

Author: Ha, Jung-Woo, Nam, Kihyun, Kang, Jingu, Lee, Sang-Woo, Yang, Sohee, Jung, Hyunhoon, Kim, Eunmi, Kim, Hyeji, Kim, Soojin, Kim, Hyun Ah, Doh, Kyoungtae, Lee, Chan Kyu, Sung, Nako, and Kim, Sunghun
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Sound, Statistics - Machine Learning
Abstract: Automatic speech recognition (ASR) via call is essential for various applications, including AI for contact center (AICC) services. Despite the advancement of ASR, however, most publicly available call-based speech corpora such as Switchboard are old-fashioned. Also, most existing call corpora are in English and mainly focus on open domain dialog or general scenarios such as audiobooks. Here we introduce a new large-scale Korean call-based speech corpus under a goal-oriented dialog scenario from more than 11,000 people, i.e., ClovaCall corpus. ClovaCall includes approximately 60,000 pairs of a short sentence and its corresponding spoken utterance in a restaurant reservation domain. We validate the effectiveness of our dataset with intensive experiments using two standard ASR models. Furthermore, we release our ClovaCall dataset and baseline source codes to be available via https://github.com/ClovaAI/ClovaCall., Comment: 5 pages, 2 figures, 4 tables, The first two authors equally contributed to this work
Published: 2020

6. TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning

Author: Jung, Chaeyoung, primary, Lee, Suyeon, additional, Nam, Kihyun, additional, Rho, Kyeongha, additional, Kim, You Jin, additional, Jang, Youngjoon, additional, and Chung, Joon Son, additional
Published: 2024
Full Text: View/download PDF

7. VoxMM: Rich Transcription of Conversations in the Wild

Author: Kwak, Doyeop, primary, Jung, Jaemin, additional, Nam, Kihyun, additional, Jang, Youngjoon, additional, Jung, Jee-Weon, additional, Watanabe, Shinji, additional, and Chung, Joon Son, additional
Published: 2024
Full Text: View/download PDF

8. Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification

Author: Heo, Hee-Soo, primary, Nam, KiHyun, additional, Lee, Bong-Jin, additional, Kwon, Youngki, additional, Lee, Minjae, additional, Kim, You Jin, additional, and Chung, Joon Son, additional
Published: 2024
Full Text: View/download PDF

9. Disentangled Representation Learning for Multilingual Speaker Recognition

Author: Nam, Kihyun, primary, Kim, Youkyum, additional, Huh, Jaesung, additional, Heo, Hee-Soo, additional, Jung, Jee-weon, additional, and Chung, Joon Son, additional
Published: 2023
Full Text: View/download PDF

10. ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers

Author: Ha, Jung-Woo, primary, Nam, Kihyun, additional, Kang, Jingu, additional, Lee, Sang-Woo, additional, Yang, Sohee, additional, Jung, Hyunhoon, additional, Kim, Hyeji, additional, Kim, Eunmi, additional, Kim, Soojin, additional, Kim, Hyun Ah, additional, Doh, Kyoungtae, additional, Lee, Chan Kyu, additional, Sung, Nako, additional, and Kim, Sunghun, additional
Published: 2020
Full Text: View/download PDF

11. The High Court of Governor-General of Korea's Ruling on Land Ownership (1918∼1921) and Its Characteristics

Author: Nam, Kihyun, primary
Published: 2020
Full Text: View/download PDF

12. The Enactment of the Laws and Regulations on Land Ownership and Change in Its Meaning in the Late Great Han Empire and Early Japanese Colony : with Centering on the Interpretation of the High Court of the Japanese Government General of Korea

Author: Nam, Kihyun, primary
Published: 2019
Full Text: View/download PDF

13. Anaphoric Expressions of Korean Sign Language

Author: Nam, Kihyun, primary and Cho, JunMo, additional
Published: 2018
Full Text: View/download PDF

14. A study on the Variant Entries in Dictionary of Korean Sign Language

Author: Nam, Kihyun, primary
Published: 2018
Full Text: View/download PDF

15. A Proposal for the Concept of the Idiomatic Expressions in Korean Sign Language

Author: Nam, Kihyun, primary
Published: 2018
Full Text: View/download PDF

16. A Study on the Productive Lexicon in Korean Sign Language

Author: Nam, Kihyun, primary
Published: 2018
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

16 results on '"Nam, KiHyun"'

1. Disentangled Representation Learning for Environment-agnostic Speaker Recognition

2. Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification

3. TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning

4. Disentangled representation learning for multilingual speaker recognition

5. ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers

6. TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning

7. VoxMM: Rich Transcription of Conversations in the Wild

8. Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification

9. Disentangled Representation Learning for Multilingual Speaker Recognition

10. ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers

11. The High Court of Governor-General of Korea's Ruling on Land Ownership (1918∼1921) and Its Characteristics

12. The Enactment of the Laws and Regulations on Land Ownership and Change in Its Meaning in the Late Great Han Empire and Early Japanese Colony : with Centering on the Interpretation of the High Court of the Japanese Government General of Korea

13. Anaphoric Expressions of Korean Sign Language

14. A study on the Variant Entries in Dictionary of Korean Sign Language

15. A Proposal for the Concept of the Idiomatic Expressions in Korean Sign Language

16. A Study on the Productive Lexicon in Korean Sign Language

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

16 results on '"Nam, KiHyun"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources