Author: "Ke, Dengfeng" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Ke, Dengfeng"' showing total 131 results

Start Over Author "Ke, Dengfeng"

131 results on '"Ke, Dengfeng"'

1. CONCSS: Contrastive-based Context Comprehension for Dialogue-appropriate Prosody in Conversational Speech Synthesis

Author: Deng, Yayue, Xue, Jinlong, Jia, Yukang, Li, Qifei, Han, Yichen, Wang, Fengping, Gao, Yingming, Ke, Dengfeng, and Li, Ya
Subjects: Computer Science - Computation and Language, Computer Science - Human-Computer Interaction
Abstract: Conversational speech synthesis (CSS) incorporates historical dialogue as supplementary information with the aim of generating speech that has dialogue-appropriate prosody. While previous methods have already delved into enhancing context comprehension, context representation still lacks effective representation capabilities and context-sensitive discriminability. In this paper, we introduce a contrastive learning-based CSS framework, CONCSS. Within this framework, we define an innovative pretext task specific to CSS that enables the model to perform self-supervised learning on unlabeled conversational datasets to boost the model's context understanding. Additionally, we introduce a sampling strategy for negative sample augmentation to enhance context vectors' discriminability. This is the first attempt to integrate contrastive learning into CSS. We conduct ablation studies on different contrastive learning strategies and comprehensive experiments in comparison with prior CSS systems. Results demonstrate that the synthesized speech from our proposed method exhibits more contextually appropriate and sensitive prosody., Comment: 5 pages, 2 figures, 3 tables, Accepted by ICASSP 2024
Published: 2023

2. Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis

Author: Ke, Dengfeng, Deng, Yayue, Jia, Yukang, Xue, Jinlong, Luo, Qi, Li, Ya, Sun, Jianqing, Liang, Jiaen, and Lin, Binghuai
Subjects: Computer Science - Artificial Intelligence
Abstract: Regressive Text-to-Speech (TTS) system utilizes attention mechanism to generate alignment between text and acoustic feature sequence. Alignment determines synthesis robustness (e.g, the occurence of skipping, repeating, and collapse) and rhythm via duration control. However, current attention algorithms used in speech synthesis cannot control rhythm using external duration information to generate natural speech while ensuring robustness. In this study, we propose Rhythm-controllable Attention (RC-Attention) based on Tracotron2, which improves robustness and naturalness simultaneously. Proposed attention adopts a trainable scalar learned from four kinds of information to achieve rhythm control, which makes rhythm control more robust and natural, even when synthesized sentences are extremely longer than training corpus. We use word errors counting and AB preference test to measure robustness of proposed method and naturalness of synthesized speech, respectively. Results shows that RC-Attention has the lowest word error rate of nearly 0.6%, compared with 11.8% for baseline system. Moreover, nearly 60% subjects prefer to the speech synthesized with RC-Attention to that with Forward Attention, because the former has more natural rhythm., Comment: 5 pages, 3 figures, Published in: 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)
Published: 2023
Full Text: View/download PDF

3. MaskMel-Prosody-CycleGAN-VC: High-Quality Cross-Lingual Voice Conversion

Author: Yan, Siqi, Chen, Senda, Xu, Yanyan, Ke, Dengfeng, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Tan, Kay Chen, Series Editor, Yadav, Sanjay, editor, Arya, Yogendra, editor, Pandey, Shailesh M., editor, Gherabi, Noredine, editor, and Karras, Dimitrios A., editor
Published: 2024
Full Text: View/download PDF

4. Fine-Grained Style Control in VITS-Based Text-to-Speech Synthesis

Author: Huihang, Zhong, Ke, Dengfeng, Ya, Li, Yao, Wenhan, Bao, Wenqian, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Zhang, Min, editor, Xu, Bin, editor, Hu, Fuyuan, editor, Lin, Junyu, editor, Song, Xianhua, editor, and Lu, Zeguang, editor
Published: 2024
Full Text: View/download PDF

5. Text-Aware End-to-end Mispronunciation Detection and Diagnosis

Author: Peng, Linkai, Gao, Yingming, Lin, Binghuai, Ke, Dengfeng, Xie, Yanlu, and Zhang, Jinsong
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Mispronunciation detection and diagnosis (MDD) technology is a key component of computer-assisted pronunciation training system (CAPT). In the field of assessing the pronunciation quality of constrained speech, the given transcriptions can play the role of a teacher. Conventional methods have fully utilized the prior texts for the model construction or improving the system performance, e.g. forced-alignment and extended recognition networks. Recently, some end-to-end based methods attempt to incorporate the prior texts into model training and preliminarily show the effectiveness. However, previous studies mostly consider applying raw attention mechanism to fuse audio representations with text representations, without taking possible text-pronunciation mismatch into account. In this paper, we present a gating strategy that assigns more importance to the relevant audio features while suppressing irrelevant text information. Moreover, given the transcriptions, we design an extra contrastive loss to reduce the gap between the learning objective of phoneme recognition and MDD. We conducted experiments using two publicly available datasets (TIMIT and L2-Arctic) and our best model improved the F1 score from $57.51\%$ to $61.75\%$ compared to the baselines. Besides, we provide a detailed analysis to shed light on the effectiveness of gating mechanism and contrastive learning on MDD., Comment: Rejected by Interspeech2022
Published: 2022

6. Fine-Grained Style Control in VITS-Based Text-to-Speech Synthesis

Author: Huihang, Zhong, primary, Ke, Dengfeng, additional, Ya, Li, additional, Yao, Wenhan, additional, and Bao, Wenqian, additional
Published: 2023
Full Text: View/download PDF

7. Three-stage training and orthogonality regularization for spoken language recognition

Author: Li, Zimu, Xu, Yanyan, Ke, Dengfeng, and Su, Kaile
Published: 2023
Full Text: View/download PDF

8. An Empirical Study on End-to-End Singing Voice Synthesis with Encoder-Decoder Architectures

Author: Ke, Dengfeng, Lu, Yuxing, Liu, Xudong, Xu, Yanyan, Sun, Jing, and Cai, Cheng-Hao
Subjects: Computer Science - Sound, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: With the rapid development of neural network architectures and speech processing models, singing voice synthesis with neural networks is becoming the cutting-edge technique of digital music production. In this work, in order to explore how to improve the quality and efficiency of singing voice synthesis, in this work, we use encoder-decoder neural models and a number of vocoders to achieve singing voice synthesis. We conduct experiments to demonstrate that the models can be trained using voice data with pitch information, lyrics and beat information, and the trained models can produce smooth, clear and natural singing voice that is close to real human voice. As the models work in the end-to-end manner, they allow users who are not domain experts to directly produce singing voice by arranging pitches, lyrics and beats., Comment: 27 pages, 4 figures, 5 tables
Published: 2021

9. PLDE: A lightweight pooling layer for spoken language recognition

Author: Li, Zimu, Xu, Yanyan, Ke, Dengfeng, and Su, Kaile
Published: 2024
Full Text: View/download PDF

10. Speech Enhancement using Separable Polling Attention and Global Layer Normalization followed with PReLU

Author: Ke, Dengfeng, Zhang, Jinsong, Xie, Yanlu, Xu, Yanyan, and Lin, Binghuai
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Single channel speech enhancement is a challenging task in speech community. Recently, various neural networks based methods have been applied to speech enhancement. Among these models, PHASEN and T-GSA achieve state-of-the-art performances on the publicly opened VoiceBank+DEMAND corpus. Both of the models reach the COVL score of 3.62. PHASEN achieves the highest CSIG score of 4.21 while T-GSA gets the highest PESQ score of 3.06. However, both of these two models are very large. The contradiction between the model performance and the model size is hard to reconcile. In this paper, we introduce three kinds of techniques to shrink the PHASEN model and improve the performance. Firstly, seperable polling attention is proposed to replace the frequency transformation blocks in PHASEN. Secondly, global layer normalization followed with PReLU is used to replace batch normalization followed with ReLU. Finally, BLSTM in PHASEN is replaced with Conv2d operation and the phase stream is simplified. With all these modifications, the size of the PHASEN model is shrunk from 33M parameters to 5M parameters, while the performance on VoiceBank+DEMAND is improved to the CSIG score of 4.30, the PESQ score of 3.07 and the COVL score of 3.73.
Published: 2021

11. A Full Text-Dependent End to End Mispronunciation Detection and Diagnosis with Easy Data Augmentation Techniques

Author: Fu, Kaiqi, Lin, Jones, Ke, Dengfeng, Xie, Yanlu, Zhang, Jinsong, and Lin, Binghuai
Subjects: Computer Science - Computation and Language
Abstract: Recently, end-to-end mispronunciation detection and diagnosis (MD&D) systems has become a popular alternative to greatly simplify the model-building process of conventional hybrid DNN-HMM systems by representing complicated modules with a single deep network architecture. In this paper, in order to utilize the prior text in the end-to-end structure, we present a novel text-dependent model which is difference with sed-mdd, the model achieves a fully end-to-end system by aligning the audio with the phoneme sequences of the prior text inside the model through the attention mechanism. Moreover, the prior text as input will be a problem of imbalance between positive and negative samples in the phoneme sequence. To alleviate this problem, we propose three simple data augmentation methods, which effectively improve the ability of model to capture mispronounced phonemes. We conduct experiments on L2-ARCTIC, and our best performance improved from 49.29% to 56.08% in F-measure metric compared to the CNN-RNN-CTC model., Comment: Submitted to INTERSPEECH2021
Published: 2021

12. Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection

Author: Dou, Yongqiang, Yang, Haocheng, Yang, Maolin, Xu, Yanyan, and Ke, Dengfeng
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing, Statistics - Machine Learning
Abstract: It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose D3M, to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go. Source code, analysis data, and other details are publicly available at https://github.com/asvspoof/D3M., Comment: The 25th International Conference on Pattern Recognition (ICPR2020)
Published: 2020
Full Text: View/download PDF

13. Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism

Author: Dai, Wang, Zhang, Jinsong, Gao, Yingming, Wei, Wei, Ke, Dengfeng, Lin, Binghuai, and Xie, Yanlu
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: Formant tracking is one of the most fundamental problems in speech processing. Traditionally, formants are estimated using signal processing methods. Recent studies showed that generic convolutional architectures can outperform recurrent networks on temporal tasks such as speech synthesis and machine translation. In this paper, we explored the use of Temporal Convolutional Network (TCN) for formant tracking. In addition to the conventional implementation, we modified the architecture from three aspects. First, we turned off the "causal" mode of dilated convolution, making the dilated convolution see the future speech frames. Second, each hidden layer reused the output information from all the previous layers through dense connection. Third, we also adopted a gating mechanism to alleviate the problem of gradient disappearance by selectively forgetting unimportant information. The model was validated on the open access formant database VTR. The experiment showed that our proposed model was easy to converge and achieved an overall mean absolute percent error (MAPE) of 8.2% on speech-labeled frames, compared to three competitive baselines of 9.4% (LSTM), 9.1% (Bi-LSTM) and 8.9% (TCN)., Comment: Accepted by Interspeech 2020
Published: 2020

14. A lattice-transformer-graph deep learning model for Chinese named entity recognition

Author: Lin Min, Xu Yanyan, Cai Chenghao, Ke Dengfeng, and Su Kaile
Subjects: chinese named entity recognition, lattice, transformer, graph convolutional networks, 68t50, Science, Electronic computers. Computer science, QA75.5-76.95
Abstract: Named entity recognition (NER) is the localization and classification of entities with specific meanings in text data, usually used for applications such as relation extraction, question answering, etc. Chinese is a language with Chinese characters as the basic unit, but a Chinese named entity is normally a word containing several characters, so both the relationships between words and those between characters play an important role in Chinese NER. At present, a large number of studies have demonstrated that reasonable word information can effectively improve deep learning models for Chinese NER. Besides, graph convolution can help deep learning models perform better for sequence labeling. Therefore, in this article, we combine word information and graph convolution and propose our Lattice-Transformer-Graph (LTG) deep learning model for Chinese NER. The proposed model pays more attention to additional word information through position-attention, and therefore can learn relationships between characters by using lattice-transformer. Moreover, the adapted graph convolutional layer enables the model to learn both richer character relationships and word relationships and hence helps to recognize Chinese named entities better. Our experiments show that compared with 12 other state-of-the-art models, LTG achieves the best results on the public datasets of Microsoft Research Asia, Resume, and WeiboNER, with the F1 score of 95.89%, 96.81%, and 72.32%, respectively.
Published: 2023
Full Text: View/download PDF

15. Complementary Fusion of Multi-Features and Multi-Modalities in Sentiment Analysis

Author: Chen, Feiyang, Luo, Ziqian, Xu, Yanyan, and Ke, Dengfeng
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Sentiment analysis, mostly based on text, has been rapidly developing in the last decade and has attracted widespread attention in both academia and industry. However, the information in the real world usually comes from multiple modalities, such as audio and text. Therefore, in this paper, based on audio and text, we consider the task of multimodal sentiment analysis and propose a novel fusion strategy including both multi-feature fusion and multi-modality fusion to improve the accuracy of audio-text sentiment analysis. We call it the DFF-ATMF (Deep Feature Fusion - Audio and Text Modality Fusion) model, which consists of two parallel branches, the audio modality based branch and the text modality based branch. Its core mechanisms are the fusion of multiple feature vectors and multiple modality attention. Experiments on the CMU-MOSI dataset and the recently released CMU-MOSEI dataset, both collected from YouTube for sentiment analysis, show the very competitive results of our DFF-ATMF model. Furthermore, by virtue of attention weight distribution heatmaps, we also demonstrate the deep features learned by using DFF-ATMF are complementary to each other and robust. Surprisingly, DFF-ATMF also achieves new state-of-the-art results on the IEMOCAP dataset, indicating that the proposed fusion strategy also has a good generalization ability for multimodal emotion recognition., Comment: Accepted by AAAI2020 Workshop: AffCon2020
Published: 2019

16. Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training

Author: Liu, Bin, Nie, Shuai, Zhang, Yaping, Ke, Dengfeng, Liang, Shan, and Liu1, Wenju
Subjects: Computer Science - Sound, Computer Science - Learning, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In realistic environments, speech is usually interfered by various noise and reverberation, which dramatically degrades the performance of automatic speech recognition (ASR) systems. To alleviate this issue, the commonest way is to use a well-designed speech enhancement approach as the front-end of ASR. However, more complex pipelines, more computations and even higher hardware costs (microphone array) are additionally consumed for this kind of methods. In addition, speech enhancement would result in speech distortions and mismatches to training. In this paper, we propose an adversarial training method to directly boost noise robustness of acoustic model. Specifically, a jointly compositional scheme of generative adversarial net (GAN) and neural network-based acoustic model (AM) is used in the training phase. GAN is used to generate clean feature representations from noisy features by the guidance of a discriminator that tries to distinguish between the true clean signals and generated signals. The joint optimization of generator, discriminator and AM concentrates the strengths of both GAN and AM for speech recognition. Systematic experiments on CHiME-4 show that the proposed method significantly improves the noise robustness of AM and achieves the average relative error rate reduction of 23.38% and 11.54% on the development and test set, respectively.
Published: 2018

17. Multi-domain Attention Fusion Network For Language Recognition

Author: Ju, Minghang, Xu, Yanyan, Ke, Dengfeng, and Su, Kaile
Published: 2023
Full Text: View/download PDF

18. Masked multi-center angular margin loss for language recognition

Author: Ju, Minghang, Xu, Yanyan, Ke, Dengfeng, and Su, Kaile
Published: 2022
Full Text: View/download PDF

19. Trainable back-propagated functional transfer matrices

Author: Cai, Cheng-Hao, Xu, Yanyan, Ke, Dengfeng, Su, Kaile, and Sun, Jing
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Connections between nodes of fully connected neural networks are usually represented by weight matrices. In this article, functional transfer matrices are introduced as alternatives to the weight matrices: Instead of using real weights, a functional transfer matrix uses real functions with trainable parameters to represent connections between nodes. Multiple functional transfer matrices are then stacked together with bias vectors and activations to form deep functional transfer neural networks. These neural networks can be trained within the framework of back-propagation, based on a revision of the delta rules and the error transmission rule for functional connections. In experiments, it is demonstrated that the revised rules can be used to train a range of functional connections: 20 different functions are applied to neural networks with up to 10 hidden layers, and most of them gain high test accuracies on the MNIST database. It is also demonstrated that a functional transfer matrix with a memory function can roughly memorise a non-cyclical sequence of 400 digits., Comment: 39 pages, 4 figures, submitted as a journal article
Published: 2017
Full Text: View/download PDF

20. Learning of Human-like Algebraic Reasoning Using Deep Feedforward Neural Networks

Author: Cai, Cheng-Hao, Ke, Dengfeng, Xu, Yanyan, and Su, Kaile
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Logic in Computer Science, I.2.0, I.2.3, I.2.4, I.2.6, I.2.8, I.5.0, I.5.1, I.5.2, I.5.4, F.4.1
Abstract: There is a wide gap between symbolic reasoning and deep learning. In this research, we explore the possibility of using deep learning to improve symbolic reasoning. Briefly, in a reasoning system, a deep feedforward neural network is used to guide rewriting processes after learning from algebraic reasoning examples produced by humans. To enable the neural network to recognise patterns of algebraic expressions with non-deterministic sizes, reduced partial trees are used to represent the expressions. Also, to represent both top-down and bottom-up information of the expressions, a centralisation technique is used to improve the reduced partial trees. Besides, symbolic association vectors and rule application records are used to improve the rewriting processes. Experimental results reveal that the algebraic reasoning examples can be accurately learnt only if the feedforward neural network has enough hidden layers. Also, the centralisation technique, the symbolic association vectors and the rule application records can reduce error rates of reasoning. In particular, the above approaches have led to 4.6% error rate of reasoning on a dataset of linear equations, differentials and integrals., Comment: 8 pages, 7 figures
Published: 2017
Full Text: View/download PDF

21. WINVC: One-Shot Voice Conversion with Weight Adaptive Instance Normalization

Author: Huang, Shengjie, Chen, Mingjie, Xu, Yanyan, Ke, Dengfeng, Hain, Thomas, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Pham, Duc Nghia, editor, Theeramunkong, Thanaruk, editor, Governatori, Guido, editor, and Liu, Fenrong, editor
Published: 2021
Full Text: View/download PDF

22. Concss: Contrastive-based Context Comprehension for Dialogue-Appropriate Prosody in Conversational Speech Synthesis

Author: Deng, Yayue, primary, Xue, Jinlong, additional, Jia, Yukang, additional, Li, Qifei, additional, Han, Yichen, additional, Wang, Fengping, additional, Gao, Yingming, additional, Ke, Dengfeng, additional, and Li, Ya, additional
Published: 2024
Full Text: View/download PDF

23. [formula omitted]-law SGAN for generating spectra with more details in speech enhancement

Author: Li, Hongfeng, Xu, Yanyan, Ke, Dengfeng, and Su, Kaile
Published: 2021
Full Text: View/download PDF

24. A Dual-Branch Speech Enhancement Model with Harmonic Repair

Author: Jia, Lizhen, primary, Xu, Yanyan, additional, and Ke, Dengfeng, additional
Published: 2024
Full Text: View/download PDF

25. WINVC: One-Shot Voice Conversion with Weight Adaptive Instance Normalization

Author: Huang, Shengjie, primary, Chen, Mingjie, additional, Xu, Yanyan, additional, Ke, Dengfeng, additional, and Hain, Thomas, additional
Published: 2021
Full Text: View/download PDF

26. Learning of human-like algebraic reasoning using deep feedforward neural networks

Author: Cai, Cheng-Hao, Xu, Yanyan, Ke, Dengfeng, and Su, Kaile
Published: 2018
Full Text: View/download PDF

27. Trainable back-propagated functional transfer matrices

Author: Cai, Cheng-Hao, Xu, Yanyan, Ke, Dengfeng, Su, Kaile, and Sun, Jing
Published: 2019
Full Text: View/download PDF

28. Dual Audio Encoders Based Mandarin Prosodic Boundary Prediction by Using Multi-Granularity Prosodic Representations

Author: Li, Ruishan, primary, Gao, Yingming, additional, Xie, Yanlu, additional, Ke, Dengfeng, additional, and Zhang, Jinsong, additional
Published: 2023
Full Text: View/download PDF

29. LIMI-VC: A Light Weight Voice Conversion Model with Mutual Information Disentanglement

Author: Huang, Liangjie, primary, Yuan, Tian, additional, Liang, Yunming, additional, Chen, Zeyu, additional, Wen, Can, additional, Xie, Yanlu, additional, Zhang, Jinsong, additional, and Ke, Dengfeng, additional
Published: 2023
Full Text: View/download PDF

30. Contextualized Latent Semantic Indexing: A New Approach to Automated Chinese Essay Scoring

Author: Xu Yanyan, Ke Dengfeng, and Su Kaile
Subjects: automated chinese essay scoring, latent semantic indexing, n-gram language model, weighted finite-state transducer, natural language processing, Science, Electronic computers. Computer science, QA75.5-76.95
Abstract: The writing part in Chinese language tests is badly in need of a mature automated essay scoring system. In this paper, we propose a new approach applied to automated Chinese essay scoring (ACES), called contextualized latent semantic indexing (CLSI), of which Genuine CLSI and Modified CLSI are two versions. The n-gram language model and the weighted finite-state transducer (WFST), two critical components, are used to extract context information in our ACES system. Not only does CLSI improve conventional latent semantic indexing (LSI), but bridges the gap between latent semantics and their context information, which is absent in LSI. Moreover, CLSI can score essays from the perspectives of language fluency and contents, and address the local overrating and underrating problems caused by LSI. Experimental results show that CLSI outperforms LSI, Regularized LSI, and latent Dirichlet allocation in many aspects, and thus, proves to be an effective approach.
Published: 2017
Full Text: View/download PDF

31. Punctuation Prediction for Chinese Spoken Sentence Based on Model Combination

Author: Chen, Xiao, Ke, Dengfeng, Xu, Bo, Kacprzyk, Janusz, Series editor, Wen, Zhenkun, editor, and Li, Tianrui, editor
Published: 2014
Full Text: View/download PDF

32. Fast Learning of Deep Neural Networks via Singular Value Decomposition

Author: Cai, Chenghao, Ke, Dengfeng, Xu, Yanyan, Su, Kaile, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Siekmann, Jörg, Series editor, Pham, Duc-Nghia, editor, and Park, Seong-Bae, editor
Published: 2014
Full Text: View/download PDF

33. Plde: A Lightweight Pooling Layer for Spoken Language Recognition

Author: Li, Zimu, primary, Xu, Yanyan, additional, Ke, Dengfeng, additional, and Su, Kaile, additional
Published: 2023
Full Text: View/download PDF

34. Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis

Author: Ke, Dengfeng, primary, Deng, Yayue, additional, Jia, Yukang, additional, Xue, Jinlong, additional, Luo, Qi, additional, Li, Ya, additional, Sun, Jianqing, additional, Liang, Jiaen, additional, and Lin, Binghuai, additional
Published: 2022
Full Text: View/download PDF

35. A New Spoken Language Teaching Tech: Combining Multi-attention and AdaIN for One-shot Cross Language Voice Conversion

Author: Ke, Dengfeng, primary, Yao, Wenhan, additional, Hu, Ruixin, additional, Huang, Liangjie, additional, Luo, Qi, additional, and Shu, Wentao, additional
Published: 2022
Full Text: View/download PDF

36. AdaptiveFormer: A Few-shot Speaker Adaptive Speech Synthesis Model based on FastSpeech2

Author: Ke, Dengfeng, primary, Hu, Ruixin, additional, Luo, Qi, additional, Huang, Liangjie, additional, Yao, Wenhan, additional, Shu, Wentao, additional, Zhang, Jinsong, additional, and Xie, Yanlu, additional
Published: 2022
Full Text: View/download PDF

37. StyleFormerGAN-VC:Improving Effect of few shot Cross-Lingual Voice Conversion Using VAE-StarGAN and Attention-AdaIN

Author: Ke, Dengfeng, primary, Yao, Wenhan, additional, Hu, Ruixin, additional, Huang, Liangjie, additional, Luo, Qi, additional, and Shu, Wentao, additional
Published: 2022
Full Text: View/download PDF

38. Compact WFSA Based Language Model and Its Application in Statistical Machine Translation

Author: Fu, Xiaoyin, Wei, Wei, Lu, Shixiang, Ke, Dengfeng, Xu, Bo, Zhou, Ming, editor, Zhou, Guodong, editor, Zhao, Dongyan, editor, Liu, Qun, editor, and Zou, Lei, editor
Published: 2012
Full Text: View/download PDF

39. Multi-domain Attention Fusion Network For Language Recognition

Author: Ju, Minghang, primary, Xu, Yanyan, additional, Ke, Dengfeng, additional, and Su, Kaile, additional
Published: 2022
Full Text: View/download PDF

40. Voicifier-LN: An Novel Approach to Elevate the Speaker Similarity for General Zero-shot Multi-Speaker TTS

Author: Ke, Dengfeng, primary, Huang, Liangjie, additional, Yao, Wenhan, additional, Hu, Ruixin, additional, Zu, Xueyin, additional, Xie, Yanlu, additional, and Zhang, Jinsong, additional
Published: 2022
Full Text: View/download PDF

41. Solving Size and Performance Dilemma by Reversible and Invertible Recurrent Network for Speech Enhancement

Author: Ke, Dengfeng, primary, Xie, Yanlu, additional, Zhang, Jinsong, additional, and Huang, Liangjie, additional
Published: 2022
Full Text: View/download PDF

42. CM-CIF: Cross-Modal for Unaligned Modality Fusion with Continuous Integrate-and-Fire

Author: Jiang, Zheng, primary, Xu, Yang, additional, Xu, Yanyan, additional, Ke, Dengfeng, additional, and Su, Kaile, additional
Published: 2022
Full Text: View/download PDF

43. Research on Multi-round Dialogue Tasks Based on Sequicity

Author: Pang, Yingrui, primary, Gong, Zhenni, additional, Zhao, Zixuan, additional, Xu, Yanyan, additional, Ke, Dengfeng, additional, and Su, Kaile, additional
Published: 2022
Full Text: View/download PDF

44. Speech Stimulus Continuum Generation: A Deep Learning Approach

Author: Li, Zhu, primary, Xie, Yanlu, additional, and Ke, Dengfeng, additional
Published: 2021
Full Text: View/download PDF

45. A Study on Fine-Tuning wav2vec2.0 Model for the Task of Mispronunciation Detection and Diagnosis

Author: Peng, Linkai, primary, Fu, Kaiqi, additional, Lin, Binghuai, additional, Ke, Dengfeng, additional, and Zhan, Jinsong, additional
Published: 2021
Full Text: View/download PDF

46. μ-law SGAN for generating spectra with more details in speech enhancement

Author: Li, Hongfeng, primary, Xu, Yanyan, additional, Ke, Dengfeng, additional, and Su, Kaile, additional
Published: 2021
Full Text: View/download PDF

47. Multi-Scale Model for Mandarin Tone Recognition

Author: Peng, Linkai, primary, Dai, Wang, additional, Ke, Dengfeng, additional, and Zhang, Jinsong, additional
Published: 2021
Full Text: View/download PDF

48. Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection

Author: Dou, Yongqiang, primary, Yang, Haocheng, additional, Yang, Maolin, additional, Xu, Yanyan, additional, and Ke, Dengfeng, additional
Published: 2021
Full Text: View/download PDF

49. Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism

Author: Dai, Wang, primary, Zhang, Jinsong, additional, Gao, Yingming, additional, Wei, Wei, additional, Ke, Dengfeng, additional, Lin, Binghuai, additional, and Xie, Yanlu, additional
Published: 2020
Full Text: View/download PDF

50. Improving speech enhancement by focusing on smaller values using relative loss

Author: Li, Hongfeng, primary, Xu, Yanyan, additional, Ke, Dengfeng, additional, and Su, Kaile, additional
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

131 results on '"Ke, Dengfeng"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources