Author: "Jen-Tzung Chien" / Database: OpenAIRE - Searchworks@Jio Institute Digital Library Search Results

1. Hierarchical Reinforcement Learning With Guidance for Multi-Domain Dialogue Policy

Author: Jen-Tzung Chien and Mahdin Rohmatillah
Subjects: Computational Mathematics, Acoustics and Ultrasonics, Computer Science (miscellaneous), Electrical and Electronic Engineering
Published: 2023
Full Text: View/download PDF

2. Self-Supervised Adversarial Training for Contrastive Sentence Embedding

Author: Jen-Tzung Chien and Yuan-An Chen
Published: 2023
Full Text: View/download PDF

3. Meta Learning for Domain Agnostic Soft Prompt

Author: Ming-Yen Chen, Mahdin Rohmatillah, Ching-Hsien Lee, and Jen-Tzung Chien
Published: 2023
Full Text: View/download PDF

4. Contrastive Adversarial Domain Adaptation Networks for Speaker Recognition

Author: Man-Wai Mak, Jen-Tzung Chien, and Longxin Li
Subjects: Series (mathematics), Computer Networks and Communications, Computer science, Speech recognition, Feature extraction, Posterior probability, Recognition, Psychology, 02 engineering and technology, Space (commercial competition), Speaker recognition, Computer Science Applications, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Feature (machine learning), Learning, 020201 artificial intelligence & image processing, Neural Networks, Computer, Software
Abstract: Domain adaptation aims to reduce the mismatch between the source and target domains. A domain adversarial network (DAN) has been recently proposed to incorporate adversarial learning into deep neural networks to create a domain-invariant space. However, DAN's major drawback is that it is difficult to find the domain-invariant space by using a single feature extractor. In this article, we propose to split the feature extractor into two contrastive branches, with one branch delegating for the class-dependence in the latent space and another branch focusing on domain-invariance. The feature extractor achieves these contrastive goals by sharing the first and last hidden layers but possessing decoupled branches in the middle hidden layers. For encouraging the feature extractor to produce class-discriminative embedded features, the label predictor is adversarially trained to produce equal posterior probabilities across all of the outputs instead of producing one-hot outputs. We refer to the resulting domain adaptation network as ``contrastive adversarial domain adaptation network (CADAN).'' We evaluated the embedded features' domain-invariance via a series of speaker identification experiments under both clean and noisy conditions. Results demonstrate that the embedded features produced by CADAN lead to a 33% improvement in speaker identification accuracy compared with the conventional DAN.
Published: 2022
Full Text: View/download PDF

5. Learning Flow-Based Disentanglement

Author: Jen-Tzung Chien and Sheng-Jhe Huang
Subjects: Artificial Intelligence, Computer Networks and Communications, Software, Computer Science Applications
Abstract: Face reenactment aims to generate the talking face images of a target person given by a face image of source person. It is crucial to learn latent disentanglement to tackle such a challenging task through domain mapping between source and target images. The attributes or talking features due to domains or conditions become adjustable to generate target images from source images. This article presents an information-theoretic attribute factorization (AF) where the mixed features are disentangled for flow-based face reenactment. The latent variables with flow model are factorized into the attribute-relevant and attribute-irrelevant components without the need of the paired face images. In particular, the domain knowledge is learned to provide the condition to identify the talking attributes from real face images. The AF is guided in accordance with multiple losses for source structure, target structure, random-pair reconstruction, and sequential classification. The random-pair reconstruction loss is calculated by means of exchanging the attribute-relevant components within a sequence of face images. In addition, a new mutual information flow is constructed for disentanglement toward domain mapping, condition irrelevance, and condition relevance. The disentangled features are learned and controlled to generate image sequence with meaningful interpretation. Experiments on mouth reenactment illustrate the merit of individual and hybrid models for conditional generation and mapping based on the informative AF.
Published: 2022
Full Text: View/download PDF

6. Derivation of chest‐lead ECG from limb‐lead using temporal convolutional network in variational mode decomposition domain

Author: Yu‐Hung Chuang, Yu‐Chieh Huang, Wen‐Whei Chang, and Jen‐Tzung Chien
Subjects: Electrical and Electronic Engineering
Published: 2022
Full Text: View/download PDF

7. Low-Resource Speech Synthesis with Speaker-Aware Embedding

Author: Li-Jen Yang, I-Ping Yeh, and Jen-Tzung Chien
Published: 2022
Full Text: View/download PDF

8. Flow-Based Variational Sequence Autoencoder

Author: Jen-Tzung Chien and Tien-Ching Luo
Published: 2022
Full Text: View/download PDF

9. Graph Evolving and Embedding in Transformer

Author: Jen-Tzung Chien and Chia-Wei Tsao
Published: 2022
Full Text: View/download PDF

10. Bayesian asymmetric quantized neural networks

Author: Jen-Tzung Chien and Su-Ting Chang
Subjects: Artificial Intelligence, Signal Processing, Computer Vision and Pattern Recognition, Software
Published: 2023
Full Text: View/download PDF

11. Bayesian Transformer Using Disentangled Mask Attention

Author: Jen-Tzung Chien and Yu-Han Huang
Published: 2022
Full Text: View/download PDF

12. Augmentation Strategy Optimization for Language Understanding

Author: Chang-Ting Chu, Mahdin Rohmatillah, Ching-Hsien Lee, and Jen-Tzung Chien
Published: 2022
Full Text: View/download PDF

13. Adversarial Mask Transformer for Sequential Learning

Author: Hou Lio, Shang-En Li, and Jen-Tzung Chien
Published: 2022
Full Text: View/download PDF

14. The Role of Accent and Grouping Structures in Estimating Musical Meter

Author: Han Ying Lin, Chien Chieh Huang, Jen-Tzung Chien, and Wen-Whei Chang
Subjects: Computer science, Applied Mathematics, Speech recognition, Signal Processing, Stress (linguistics), Metre, Musical, Electrical and Electronic Engineering, Computer Graphics and Computer-Aided Design
Published: 2020
Full Text: View/download PDF

15. Variational Domain Adversarial Learning With Mutual Information Maximization for Speaker Verification

Author: Youzhi Tu, Jen-Tzung Chien, and Man-Wai Mak
Subjects: Acoustics and Ultrasonics, Artificial neural network, Computer science, Speech recognition, Gaussian, Maximization, Mutual information, Domain (software engineering), Constraint (information theory), 030507 speech-language pathology & audiology, 03 medical and health sciences, Computational Mathematics, symbols.namesake, Discriminative model, Computer Science (miscellaneous), symbols, Electrical and Electronic Engineering, 0305 other medical science, Communication channel
Abstract: Domain mismatch is a common problem in speaker verification (SV) and often causes performance degradation. For the system relying on the Gaussian PLDA backend to suppress the channel variability, the performance would be further limited if there is no Gaussianity constraint on the learned embeddings. This paper proposes an information-maximized variational domain adversarial neural network (InfoVDANN) that incorporates an InfoVAE into domain adversarial training (DAT) to reduce domain mismatch and simultaneously meet the Gaussianity requirement of the PLDA backend. Specifically, DAT is applied to produce speaker discriminative and domain-invariant features, while the InfoVAE performs variational regularization on the embedded features so that they follow a Gaussian distribution. Another benefit of the InfoVAE is that it avoids posterior collapse in VAEs by preserving the mutual information between the embedded features and the training set so that extra speaker information can be retained in the features. Experiments on both SRE16 and SRE18-CMN2 show that the InfoVDANN outperforms the recent VDANN, which suggests that increasing the mutual information between the embedded features and input features enables the InfoVDANN to extract extra speaker information that is otherwise not possible.
Published: 2020
Full Text: View/download PDF

16. Learning Continuous-Time Dynamics with Attention

Author: Jen-Tzung Chien and Yi-Hsiang Chen
Subjects: Computational Theory and Mathematics, Artificial Intelligence, Applied Mathematics, Computer Vision and Pattern Recognition, Software
Abstract: Learning the hidden dynamics from sequence data is crucial. Attention mechanism can be introduced to spotlight on the region of interest for sequential learning. Traditional attention was measured between a query and a sequence based on a discrete-time state trajectory. Such a mechanism could not characterize the irregularly-sampled sequence data. This paper presents an attentive differential network (ADN) where the attention over continuous-time dynamics is developed. The continuous-time attention is performed over the dynamics at all time. The missing information in irregular or sparse samples can be seamlessly compensated and attended. Self attention is computed to find the attended state trajectory. However, the memory cost for attention score between a query and a sequence is demanding since self attention treats all time instants as query points in an ordinary differential equation solver. This issue is tackled by imposing the causality constraint in causal ADN (CADN) where the query is merged up to current time. To enhance the model robustness, this study further explores a latent CADN where the attended dynamics are calculated in an encoder-decoder structure. Experiments on the irregularly-sampled actions, dialogues and bio-signals illustrate the merits of the proposed methods in action recognition, emotion recognition and mortality prediction, respectively.
Published: 2022

17. Bayesian Multi-Temporal-Difference Learning

Author: Yi-Chung Chiu and Jen-Tzung Chien
Subjects: Signal Processing, Information Systems
Published: 2022
Full Text: View/download PDF

18. Variational Sequential Modeling, Learning and Understanding

Author: Jen-Tzung Chien and Chih-Jung Tsai
Published: 2021
Full Text: View/download PDF

19. Multitask Generative Adversarial Imitation Learning for Multi-Domain Dialogue System

Author: Chuan-En Hsu, Mahdin Rohmatillah, and Jen-Tzung Chien
Published: 2021
Full Text: View/download PDF

20. Corrective Guidance and Learning for Dialogue Management

Author: Jen-Tzung Chien and Mahdin Rohmatillah
Subjects: business.industry, Computer science, computer.software_genre, Task (project management), Action (philosophy), Reinforcement learning, Human-in-the-loop, Corrective feedback, Artificial intelligence, Language model, Dialog system, Core model, business, computer
Abstract: Establishing robust dialogue policy with low computation cost is challenging, especially for multi-domain task-oriented dialogue management due to the high complexity in state and action spaces. The previous works mostly using the deterministic policy optimization only attain moderate performance. Meanwhile, state-of-the-art result that uses end-to-end approach is computationally demanding since it utilizes a large-scaled language model based on the generative pre-trained transformer-2 (GPT-2). In this study, a new learning procedure consisting of three learning stages is presented to improve multi-domain dialogue management with corrective guidance. Firstly, the behavior cloning with an auxiliary task is developed to build a robust pre-trained model by mitigating the causal confusion problem in imitation learning. Next, the pre-trained model is rectified by using reinforcement learning via the proximal policy optimization. Lastly, human-in-the-loop learning strategy is fulfilled to enhance the agent performance by directly providing corrective feedback from rule-based agent so that the agent is prevented to trap in confounded states. The experiments on end-to-end evaluation show that the proposed learning method achieves state-of-the-art result by performing nearly identical to the rule-based agent. This method outperforms the second place of 9th dialog system technology challenge (DSTC9) track 2 that uses GPT-2 as the core model in dialogue management.
Published: 2021
Full Text: View/download PDF

21. Causal Confusion Reduction for Robust Multi-Domain Dialogue Policy

Author: Mahdin Rohmatillah and Jen-Tzung Chien
Subjects: Reduction (complexity), Multi domain, Computer science, business.industry, medicine, Artificial intelligence, medicine.symptom, Machine learning, computer.software_genre, business, computer, Confusion
Published: 2021
Full Text: View/download PDF

22. Online Compressive Transformer for End-to-End Speech Recognition

Author: Chi-Hang Leong, Jen-Tzung Chien, and Yu-Han Huang
Subjects: End-to-end principle, Computer science, Speech recognition, Transformer (machine learning model)
Published: 2021
Full Text: View/download PDF

23. Stochastic Temporal Difference Learning for Sequence Data

Author: Yi-Chung Chiu and Jen-Tzung Chien
Subjects: Flexibility (engineering), State-space representation, Computer science, business.industry, Machine learning, computer.software_genre, Upper and lower bounds, Recurrent neural network, Reinforcement learning, Sequence learning, Artificial intelligence, Language model, Temporal difference learning, business, computer
Abstract: Planning is crucial to train an agent via model-based reinforcement learning who can predict distant observations to reflect his/her past experience. Such a planning method is theoretically and computationally attractive in comparison with traditional learning which relies on step-by-step prediction. However, it is more challenging to build a learning machine which can predict and plan randomly across multiple time steps rather than act step by step. To reflect this flexibility in learning process, we need to predict future states directly without going through all intermediate states. Accordingly, this paper develops the stochastic temporal difference learning where the sequence data are represented with multiple jumpy states while the stochastic state space model is learned by maximizing the evidence lower bound of log likelihood of training data. A general solution with various number of jumpy states is developed and formulated. Experiments demonstrate the merit of the proposed sequential machine to find predictive states to roll forward with jumps as well as predict words.
Published: 2021
Full Text: View/download PDF

24. Collaborative Regularization for Bidirectional Domain Mapping

Author: Wei-Hsiang Chang and Jen-Tzung Chien
Subjects: Artificial neural network, Computer science, business.industry, Collaborative learning, Overfitting, Machine learning, computer.software_genre, Regularization (mathematics), Domain knowledge, Artificial intelligence, Sequence learning, business, Representation (mathematics), computer, Transformer (machine learning model)
Abstract: Learning both domain mapping and domain knowledge is crucial for different sequence-to-sequence (seq2seq) tasks. Traditionally, seq2seq model only characterized domain mapping while the knowledge in source and target domains was ignored. To strengthen seq2seq representation, this study presents a unified transformer for bidirectional domain mapping where collaborative regularization is imposed. This regularization enforces the bidirectional mapping constraint and avoids the model from overfitting for better generalization. Importantly, the unified learning objective is optimized for collaborative learning among different modules in two domains with two learning directions. Experiments on machine translation demonstrate the merit of unified transformer by comparing with the existing methods under different tasks and settings.
Published: 2021
Full Text: View/download PDF

25. Neural adversarial learning for speaker recognition

Author: Jen-Tzung Chien and Kang Ting Peng
Subjects: Discriminator, Artificial neural network, Computer science, Speech recognition, Cosine similarity, Probabilistic logic, Nonlinear dimensionality reduction, 020206 networking & telecommunications, 02 engineering and technology, Speaker recognition, Linear discriminant analysis, 01 natural sciences, Theoretical Computer Science, Human-Computer Interaction, ComputingMethodologies_PATTERNRECOGNITION, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, 010301 acoustics, Software, Subspace topology
Abstract: This paper presents the adversarial learning approaches to deal with various tasks in speaker recognition based on probabilistic discriminant analysis (PLDA) which is seen as a latent variable model for reconstruction of i-vectors. The first task aims to reduce the dimension of i-vectors based on an adversarial manifold learning where the adversarial neural networks of generator and discriminator are merged to preserve neighbor embedding of i-vectors in a low-dimensional space. The generator is trained to fool the discriminator with the generated samples in latent space. A PLDA subspace model is constructed by jointly minimizing a PLDA reconstruction error, a manifold loss for neighbor embedding and an adversarial loss caused by the generator and discriminator. The second task of adversarial learning is developed to tackle the imbalanced data problem. A PLDA based generative adversarial network is trained to generate new i-vectors to balance the size of training utterances across different speakers. An adversarial augmentation learning is proposed for robust speaker recognition. In particular, the minimax optimization is performed to estimate a generator and a discriminator where the class conditional i-vectors produced by generator could not be distinguished from real i-vectors via discriminator. A multiobjective learning is realized for a specialized neural model with the cosine similarity between real and fake i-vectors as well as the regularization for Gaussianity. Experiments are conducted to show the merit of adversarial learning in subspace construction and data augmentation for PLDA-based speaker recognition.
Published: 2019
Full Text: View/download PDF

26. Image-text dual neural network with decision strategy for small-sample image classification

Author: Jen-Tzung Chien, Jing-Hao Xue, Zhanyu Ma, Fangyi Zhu, Jun Guo, Guang Chen, and Xiaoxu Li
Subjects: 0209 industrial biotechnology, Contextual image classification, Artificial neural network, Computer science, business.industry, Cognitive Neuroscience, LabelMe, 02 engineering and technology, DUAL (cognitive architecture), Machine learning, computer.software_genre, Ensemble learning, Computer Science Applications, Image (mathematics), ComputingMethodologies_PATTERNRECOGNITION, 020901 industrial engineering & automation, Artificial Intelligence, Decision strategy, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer
Abstract: Small-sample classification is a challenging problem in computer vision. In this work, we show how to efficiently and effectively utilize semantic information of the annotations to improve the performance of small-sample classification. First, we propose an image-text dual neural network to improve the classification performance on small-sample datasets. The proposed model consists of two sub-models, an image classification model and a text classification model. After training the sub-models separately, we design a novel method to fuse the two sub-models rather than simply combine their results. Our image-text dual neural network aims to utilize the text information to overcome the training problem of deep models on small-sample datasets. Then, we propose to incorporate a decision strategy into the image-text dual neural network to further improve the performance of our original model on few-shot datasets. To demonstrate the effectiveness of the proposed models, we conduct experiments on the LabelMe and UIUC-Sports datasets. Experimental results show that our method is superior to other models.
Published: 2019
Full Text: View/download PDF

27. Sugariness prediction of Syzygium samarangense using convolutional learning of hyperspectral images

Author: Chih-Jung Chen, Yung-Jhe Yan, Chi-Cho Huang, Jen-Tzung Chien, Chang-Ting Chu, Je-Wei Jang, Tzung-Cheng Chen, Shiou-Gwo Lin, Ruei-Siang Shih, and Mang Ou-Yang
Subjects: Multidisciplinary
Abstract: Sugariness is one of the most important indicators to measure the quality of Syzygium samarangense, which is also known as the wax apple. In general, farmers used to measure sugariness by testing the extracted juice of the wax apple products. Such a destructive way to measure sugariness is not only labor-consuming but also wasting products. Therefore, non-destructive and quick techniques for measuring sugariness would be significant for wax apple supply chains. Traditionally, the non-destructive method to predict the sugariness or the other indicators of the fruits was based on the reflectance spectra or Hyperspectral Images (HSIs) using linear regression such as Multi-Linear Regression (MLR), Principal Component Regression (PCR), and Partial Least Square Regression (PLSR), etc. However, these regression methods are usually too simple to precisely estimate the complicated mapping between the reflectance spectra or HSIs and the sugariness. This study presents the deep learning methods for sugariness prediction using the reflectance spectra or HSIs from the bottom of the wax apple. A non-destructive imaging system fabricated with two spectrum sensors and light sources is implemented to acquire the visible and infrared lights with a range of wavelengths. In particular, a specialized Convolutional Neural Network (CNN) with hyperspectral imaging is proposed by investigating the effect of different wavelength bands for sugariness prediction. Rather than extracting spatial features, the proposed CNN model was designed to extract spectral features of HSIs. In the experiments, the ground-truth value of sugariness is obtained from a commercial refractometer. The experimental results show that using the whole band range between 400 and 1700 nm achieves the best performance in terms of °Brix error. CNN models attain the °Brix error of ± 0.552, smaller than ± 0.597 using Feedforward Neural Network (FNN). Significantly, the CNN’s test results show that the minor error in the interval 0 to 10°Brix and 10 to 11°Brix are ± 0.551 and ± 0.408, these results indicate that the model would have the capability to predict if sugariness is below 10°Brix or not, which would be similar to the human tongue. These results are much better than ± 1.441 and ± 1.379 by using PCR and PLSR, respectively. Moreover, this study provides the test error in each °Brix interval within one Brix, and the results show that the test error is varied considerably within different °Brix intervals, especially on PCR and PLSR. On the other hand, FNN and CNN obtain robust results in terms of test error.
Published: 2021

28. Synthesis of Chest-Lead ECG Using Temporal Convolutional Networks

Author: Yu-Chieh Huang, Jen-Tzung Chien, Yu-Hung Chuang, Wen-Whei Chang, and Chein-Fang Chiu
Subjects: medicine.diagnostic_test, Heart disease, Computer science, business.industry, Wearable computer, Pattern recognition, medicine.disease, medicine, Variational mode decomposition, Artificial intelligence, Ecg signal, Lead (electronics), business, Electrocardiography
Abstract: Cardiovascular diseases (CVDs) are a leading cause of mortality globally, and therefore timely and accurate diagnosis is crucial to patient safety. The standard 12-lead electrocardiography (ECG) is routinely used to diagnose heart disease. Most wearable monitoring devices provide insufficient ECG information because of the limitations in the number of leads and measurement positions. This study presents a patient-specific chest-lead synthesis method based on temporal convolutional network (TCN) to exploit both intra- and inter-lead correlations of ECG signals. Performance can be further enhanced by using the variational mode decomposition (VMD), which reduces the non-stationary characteristic of ECG signals and helps to improve the synthesis accuracy. Experiments on PTB diagnostic database demonstrate that the proposed method is effective and has good performance in synthesis of chest-lead ECG signals from a single limb lead.
Published: 2021
Full Text: View/download PDF

29. Attribute Decomposition for Flow-Based Domain Mapping

Author: Sheng-Jhe Huang and Jen-Tzung Chien
Subjects: Signal processing, Generative model, Consistency (database systems), Sequence, Computer science, business.industry, Feature extraction, Decomposition (computer science), Pattern recognition, Artificial intelligence, Latent variable, business, Representation (mathematics)
Abstract: Domain mapping aims to estimate a sophisticated mapping between source and target domains. Finding the specialized attribute in latent representation plays a key role to attain a desirable performance. However, the entangled features usually contain the mixed attribute which can not be easily decomposed in an unsupervised manner. To handle the mixed features for better generation, this paper presents an attribute decomposition based on the sequence data and carries out the flow-based image domain mapping. The latent variables, characterized by flow model, are decomposed into the attribute-relevant and attribute-irrelevant components. The decomposition is guided by multiple objectives including structural-perceptual loss, cycle consistency loss, sequential random-pair reconstruction loss and sequential classification loss where the paired training data for domain mapping are not required. Importantly, the sequential random-pair reconstruction loss is formulated by means of exchanging the attribute-relevant components within a sequence of images. As a result, the source images with the attributes of reference images can be smoothly transferred to the corresponding target images. Experiments on talking face synthesis show the merit of attribute decomposition in domain mapping.
Published: 2021
Full Text: View/download PDF

30. Variational Dialogue Generation with Normalizing Flows

Author: Jen-Tzung Chien and Tien-Ching Luo
Subjects: symbols.namesake, Kullback–Leibler divergence, Recurrent neural network, Autoregressive model, Flow (mathematics), Computer science, Simple (abstract algebra), Gaussian, symbols, Inverse, Applied mathematics, Autoencoder
Abstract: Conditional variational autoencoder (cVAE) has shown promising performance in dialogue generation. However, there still exists two issues in dialog cVAE model. The first issue is the Kullback-Leiblier (KL) vanishing problem which results in degenerating cVAE into a simple recurrent neural network. The second issue is the assumption of isotropic Gaussian prior for latent variable which is too simple to assure diversity of the generated responses. To handle these issues, a simple distribution should be transformed into a complex distribution and simultaneously the value of KL divergence should be preserved. This paper presents the dialogue flow VAE (DF-VAE) for variational dialogue generation. In particular, KL vanishing is tackled by a new normalizing flow. An inverse autoregressive flow is proposed to transform isotropic Gaussian prior to a rich distribution. In the experiments, the proposed DF-VAE is significantly better than the other methods in terms of different evaluation metrics. The diversity of generated dialogue responses is enhanced. Ablation study is conducted to illustrate the merit of the proposed flow models.
Published: 2021
Full Text: View/download PDF

31. Continuous-Time Self-Attention in Neural Differential Equation

Author: Jen-Tzung Chien and Yi-Hsiang Chen
Subjects: Memory management, Recurrent neural network, Finite-state machine, Computer science, Differential equation, Ordinary differential equation, Ode, Sequence learning, Solver, Algorithm
Abstract: Neural differential equation (NDE) is recently developed as a continuous-time state machine which can faithfully represent the irregularly-sampled sequence data. NDE is seen as a substantial extension of recurrent neural network (RNN) which conducts discrete-time state representation for regularly-sampled data. This study presents a new continuous-time attention to improve sequential learning where the region of interest in continuous-time state trajectory over observed as well as missing samples is sufficiently attended. However, the attention score, calculated by relating between a query and a sequence, is memory demanding because self-attention should treat all time observations as query vectors to feed them into ordinary differential equation (ODE) solver. To deal with this issue, we develop a new form of dynamics for continuous-time attention where the causality property is adopted such that query vector is fed into ODE solver up to current time. The experiments on irregularly-sampled human activities and medical features show that this method obtains desirable performance with efficient memory consumption.
Published: 2021
Full Text: View/download PDF

32. Dualformer: A Unified Bidirectional Sequence-to-Sequence Learning

Author: Jen-Tzung Chien and Wei-Hsiang Chang
Subjects: Sequence, Machine translation, Computer science, business.industry, Supervised learning, DUAL (cognitive architecture), computer.software_genre, Domain (software engineering), Feature (machine learning), Domain knowledge, Sequence learning, Artificial intelligence, business, computer
Abstract: This paper presents a new dual domain mapping based on a unified bidirectional sequence-to-sequence (seq2seq) learning. Traditionally, dual learning in domain mapping was constructed with intrinsic connection where the conditional generative models in two directions were mutually leveraged and combined. The additional feedback from the other generation direction was used to regularize sequential learning in original direction of domain mapping. Domain matching between source sequence and target sequence was accordingly improved. However, the reconstructions for knowledge in two domains were ignored. The dual information based on separate models in two training directions was not sufficiently discovered. To cope with this weakness, this study proposes a closed-loop seq2seq learning where domain mapping and domain knowledge are jointly learned. In particular, a new feature-level dual learning is incorporated to build a dualformer where feature integration and feature reconstruction are further performed to bridge dual tasks. Experiments demonstrate the merit of the proposed dualformer for machine translation based on the multi-objective seq2seq learning.
Published: 2021
Full Text: View/download PDF

33. Hierarchical and Self-Attended Sequence Autoencoder

Author: Jen-Tzung Chien and Chun Wei Wang
Subjects: Sequence, Computer science, business.industry, Applied Mathematics, Normal Distribution, Inference, Latent variable, Machine learning, computer.software_genre, Automatic summarization, Autoencoder, Semantics, Recurrent neural network, Computational Theory and Mathematics, Artificial Intelligence, Learning, Computer Vision and Pattern Recognition, Sequence learning, Artificial intelligence, Language model, Neural Networks, Computer, business, computer, Software, Algorithms
Abstract: It is important and challenging to infer stochastic latent semantics for natural language applications. The difficulty in stochastic sequential learning is caused by the posterior collapse in variational inference. The input sequence is disregarded in the estimated latent variables. This paper proposes three components to tackle this difficulty and build the variational sequence autoencoder (VSAE) where sufficient latent information is learned for sophisticated sequence representation. First, the complementary encoders based on a long short-term memory (LSTM) and a pyramid bidirectional LSTM are merged to characterize global and structural dependencies of an input sequence, respectively. Second, a stochastic self attention mechanism is incorporated in a recurrent decoder. The latent information is attended to encourage the interaction between inference and generation in an encoder-decoder training procedure. Third, an autoregressive Gaussian prior of latent variable is used to preserve the information bound. Different variants of VSAE are proposed to mitigate the posterior collapse in sequence modeling. A series of experiments are conducted to demonstrate that the proposed individual and hybrid sequence autoencoders substantially improve the performance for variational sequential learning in language modeling and semantic understanding for document classification and summarization.
Published: 2021

34. Sequential Learning and Regularization in Variational Recurrent Autoencoder

Author: Chih-Jung Tsai and Jen-Tzung Chien
Subjects: Computer science, Gaussian, Inference, 020206 networking & telecommunications, 02 engineering and technology, Latent variable, Bayesian inference, Regularization (mathematics), Autoencoder, symbols.namesake, Recurrent neural network, 0202 electrical engineering, electronic engineering, information engineering, symbols, 020201 artificial intelligence & image processing, Sequence learning, Latent variable model, Encoder, Algorithm
Abstract: Latent variable model based on variational autoen-coder (VAE) is influential in machine learning for signal processing. VAE basically suffers from the issue of posterior collapse in sequential learning procedure where the variational posterior easily collapses to a prior as standard Gaussian. Latent semantics are then neglected in optimization process. The recurrent decoder therefore generates noninformative or repeated sequence data. To capture sufficient latent semantics from sequence data, this study simultaneously fulfills an amortized regularization for encoder, extends a Gaussian mixture prior for latent variable, and runs a skip connection for decoder. The noise robust prior, learned from the amortized encoder, is likely aware of temporal features. A variational prior based on the amortized mixture density is formulated in implementation of variational recurrent autoencoder for sequence reconstruction and representation. Owing to skip connection, the sequence samples are continuously predicted in decoder with contextual precision at each time step. Experiments on language model and sentiment classification show that the proposed method mitigates the issue of posterior collapse and learns the meaningful latent features to improve the inference and generation for semantic representation.
Published: 2021
Full Text: View/download PDF

35. Exploring State Transition Uncertainty in Variational Reinforcement Learning

Author: Issam El Naqa, Wei-Lin Liao, and Jen-Tzung Chien
Subjects: Mathematical optimization, Computer science, Entropy (statistical thermodynamics), 020206 networking & telecommunications, 02 engineering and technology, State (functional analysis), Latent variable, Measure (mathematics), Entropy (classical thermodynamics), 0202 electrical engineering, electronic engineering, information engineering, Entropy (information theory), Reinforcement learning, 020201 artificial intelligence & image processing, Entropy (energy dispersal), Latent variable model, Representation (mathematics), Entropy (arrow of time), Sparse matrix, Entropy (order and disorder)
Abstract: Model-free agent in reinforcement learning (RL) generally performs well but inefficient in training process with sparse data. A practical solution is to incorporate a model-based module in model-free agent. State transition can be learned to make desirable prediction of next state based on current state and action at each time step. This paper presents a new learning representation for variational RL by introducing the so-called transition uncertainty critic based on the variational encoder-decoder network where the uncertainty of structured state transition is encoded in a model-based agent. In particular, an action-gating mechanism is carried out to learn and decode the trajectory of actions and state transitions in latent variable space. The transition uncertainty maximizing exploration (TUME) is performed according to the entropy search by using the intrinsic reward based on the uncertainty measure corresponding to different states and actions. A dedicate latent variable model with a penalty using the bias of state-action value is developed. Experiments on Cart Pole and dialogue system show that the proposed TUME considerably performs better than the other exploration methods for reinforcement learning.
Published: 2021
Full Text: View/download PDF

36. Stochastic Convolutional Recurrent Networks for Language Modeling

Author: Jen Tzung Chien and Yu Min Huang
Subjects: Recurrent neural network, business.industry, Computer science, Language model, Artificial intelligence, Latent variable model, business, Convolutional neural network
Published: 2020
Full Text: View/download PDF

37. Stochastic Curiosity Exploration for Dialogue Systems

Author: Jen Tzung Chien and Po Chien Hsu
Subjects: Computer science, media_common.quotation_subject, Curiosity, Engineering ethics, Dialogue management, media_common
Published: 2020
Full Text: View/download PDF

38. Strategies for End-to-End Text-Independent Speaker Verification

Author: Jen-Tzung Chien, Man-Wai Mak, and Weiwei Lin
Subjects: Speaker verification, End-to-end principle, Computer science, Speech recognition, Text independent
Published: 2020
Full Text: View/download PDF

39. Neural Bayesian Information Processing

Author: Jen-Tzung Chien
Subjects: Artificial neural network, business.industry, Computer science, Deep learning, Bayesian probability, Latent variable, Machine learning, computer.software_genre, Bayesian inference, Bayes' theorem, Recurrent neural network, Artificial intelligence, business, Latent variable model, computer
Abstract: Deep learning is developed as a learning process from source inputs to target outputs where the inference or optimization is performed over an assumed deterministic model with deep structure. A wide range of temporal and spatial data in language and vision are treated as the inputs or outputs to build such a complicated mapping in different information systems. A systematic and elaborate transfer is required to meet the mapping between source and target domains. Also, the semantic structure in natural language and computer vision may not be well represented or trained in mathematical logic or computer programs. The distribution function in discrete or continuous latent variable model for words, sentences, images or videos may not be properly decomposed or estimated. The system robustness to heterogeneous environments may not be assured. This tutorial addresses the fundamentals and advances in statistical models and neural networks, and presents a series of deep Bayesian solutions including variational Bayes, sampling method, Bayesian neural network, variational auto-encoder (VAE), stochastic recurrent neural network, sequence-to-sequence model, attention mechanism, end-to-end network, stochastic temporal convolutional network, temporal difference VAE, normalizing flow and neural ordinary differential equation. Enhancing the prior/posterior representation is addressed in different latent variable models. We illustrate how these models are connected and why they work for a variety of applications on complex patterns in language and vision. The word, sentence and image embeddings are merged with semantic constraint or structural information. Bayesian learning is formulated in the optimization procedure where the posterior collapse is tackled. An informative latent space is trained to incorporate deep Bayesian learning in various information systems.
Published: 2020
Full Text: View/download PDF

40. Deep Bayesian Multimedia Learning

Author: Jen-Tzung Chien
Subjects: Multimedia, Artificial neural network, Computer science, business.industry, Deep learning, Bayesian probability, Latent variable, computer.software_genre, Bayesian inference, Recurrent neural network, Sequence learning, Artificial intelligence, Latent variable model, business, computer
Abstract: Deep learning has been successfully developed as a complicated learning process from source inputs to target outputs in presence of multimedia environments. The inference or optimization is performed over an assumed deterministic model with deep structure. A wide range of temporal and spatial data in language and vision are treated as the inputs or outputs to build such a domain mapping for multimedia applications. A systematic and elaborate transfer is required to meet the mapping between source and target domains. Also, the semantic structure in natural language and computer vision may not be well represented or trained in mathematical logic or computer programs. The distribution function in discrete or continuous latent variable model for words, sentences, images or videos may not be properly decomposed or estimated. The system robustness to heterogeneous environments may not be assured. This tutorial addresses the fundamentals and advances in statistical models and neural networks for domain mapping, and presents a series of deep Bayesian solutions including variational Bayes, sampling method, Bayesian neural network, variational auto-encoder (VAE), stochastic recurrent neural network, sequence-to-sequence model, attention mechanism, end-to-end network, stochastic temporal convolutional network, temporal difference VAE, normalizing flow and neural ordinary differential equation. Enhancing the prior/posterior representation is addressed in different latent variable models. We illustrate how these models are connected and why they work for a variety of applications on complex patterns in language and vision. The word, sentence and image embeddings are merged with semantic constraint or structural information. Bayesian learning is formulated in the optimization procedure where the posterior collapse is tackled. An informative latent space is trained to incorporate deep Bayesian learning in various information systems.
Published: 2020
Full Text: View/download PDF

41. Stochastic Adversarial Learning for Domain Adaptation

Author: Ching-Wei Huang and Jen-Tzung Chien
Subjects: Computer Science::Machine Learning, Artificial neural network, business.industry, Computer science, Stochastic process, Feature extraction, Inference, 010501 environmental sciences, Machine learning, computer.software_genre, 01 natural sciences, Domain (software engineering), Data modeling, 030507 speech-language pathology & audiology, 03 medical and health sciences, ComputingMethodologies_PATTERNRECOGNITION, Discriminative model, Graphical model, Artificial intelligence, Invariant (mathematics), 0305 other medical science, business, computer, 0105 earth and related environmental sciences
Abstract: Learning across domains is challenging especially when test data in target domain are sparse, heterogeneous and unlabeled. This challenge is even severe when building a deep stochastic neural model. This paper presents a stochastic semi-supervised learning for domain adaptation by using labeled data from source domain and unlabeled data from target domain. There are twofold novelties in the proposed method. First, a graphical model is constructed to identify the random latent features for classes as well as domains which are learned by variational inference. Second, we learn the class features which are discriminative among classes and simultaneously invariant to both domains. An adversarial neural model is introduced to pursue domain invariance. The domain features are explicitly learned to purify the extraction of class features for an improved classification. The experiments on sentiment classification illustrate the merits of the proposed stochastic adversarial domain adaptation.
Published: 2020
Full Text: View/download PDF

42. Stochastic Convolutional Recurrent Networks

Author: Yu-Min Huang and Jen-Tzung Chien
Subjects: Computer science, business.industry, Computer Science::Neural and Evolutionary Computation, Inference, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Convolutional neural network, Generative model, Recurrent neural network, Convolutional code, Robustness (computer science), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Sequence learning, Artificial intelligence, business, Encoder, 0105 earth and related environmental sciences
Abstract: Recurrent neural network (RNN) has been widely used for sequential learning which has achieved a great success in different tasks. The temporal convolutional network (TCN), a variant of one-dimensional convolutional neural network (CNN), was also developed for sequential learning in presence of sequence data. RNN and TCN typically captures long-term and short-term features in temporal or spatial domain, respectively. This paper presents a new sequential learning, called the convolutional recurrent network (CRN), which fulfills TCN as an encoder and RNN as a decoder so that the global semantics as well as the local dependencies are simultaneously characterized from sequence data. To facilitate the interpretation and robustness in neural models, we further develop the stochastic modeling for CRN based on variational inference. The merits of CNN and RNN are then incorporated in inference of latent space which sufficiently produces a generative model for sequential prediction. Experiments on language model shows the effectiveness of stochastic CRN when compared with the other sequential machines.
Published: 2020
Full Text: View/download PDF

43. Bayesian Learning for Neural Network Compression

Author: Su-Ting Chang and Jen-Tzung Chien
Subjects: Artificial neural network, Computer science, Quantization (signal processing), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Cognitive neuroscience of visual object recognition, Data_CODINGANDINFORMATIONTHEORY, 02 engineering and technology, 010501 environmental sciences, Parameter space, Bayesian inference, 01 natural sciences, Upper and lower bounds, Quantization (physics), Robustness (computer science), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Algorithm, 0105 earth and related environmental sciences
Abstract: Quantization on weight parameters in neural network training plays a key role for model compression in mobile devices. This paper presents a general M-ary adaptive quantization in construction of Bayesian neural networks. The trade-off between model capacity and memory cost is adjustable. The stochastic weight parameters are faithfully reflected. A compact model is trained to achieve robustness to model uncertainty due to heterogeneous data collection. To minimize the performance loss, the representation levels in quantized neural network are estimated by maximizing the variational lower bound of log likelihood conditioned on M-ary quantization. Bayesian learning is formulated by using the multi-spike-and- slab prior for quantization levels. An adaptive quantization is derived to implement a flexible parameter space for learning representation which is applied for object recognition. Experiments on image recognition show the merit of this Bayesian model compression for M-ary quantized neural networks.
Published: 2020
Full Text: View/download PDF

44. Amortized Mixture Prior for Variational Sequence Generation

Author: Chih-Jung Tsai and Jen-Tzung Chien
Subjects: Sequence, Computer science, Gaussian, 02 engineering and technology, Latent variable, 010501 environmental sciences, 01 natural sciences, Autoencoder, Regularization (mathematics), Data modeling, symbols.namesake, Recurrent neural network, 0202 electrical engineering, electronic engineering, information engineering, symbols, 020201 artificial intelligence & image processing, Language model, Latent variable model, Encoder, Algorithm, 0105 earth and related environmental sciences
Abstract: Variational autoencoder (VAE) is a popular latent variable model for data generation. However, in natural language applications, VAE suffers from the posterior collapse in optimization procedure where the model posterior likely collapses to a standard Gaussian prior which disregards latent semantics from sequence data. The recurrent decoder accordingly generates du-plicate or noninformative sequence data. To tackle this issue, this paper adopts the Gaussian mixture prior for latent variable, and simultaneously fulfills the amortized regularization in encoder and skip connection in decoder. The noise robust prior, learned from the amortized encoder, becomes semantically meaningful. The prediction of sequence samples, due to skip connection, becomes contextually precise at each time. The amortized mixture prior (AMP) is then formulated in construction of variational recurrent autoencoder (VRAE) for sequence generation. Experiments on different tasks show that AMP-VRAE can avoid the posterior collapse, learn the meaningful latent features and improve the inference and generation for semantic representation.
Published: 2020
Full Text: View/download PDF

45. Stochastic Curiosity Maximizing Exploration

Author: Jen-Tzung Chien and Po-Chien Hsu
Subjects: Computer science, business.industry, Mean squared prediction error, media_common.quotation_subject, Feature extraction, Mutual information, 010501 environmental sciences, Machine learning, computer.software_genre, 01 natural sciences, 030507 speech-language pathology & audiology, 03 medical and health sciences, Entropy (classical thermodynamics), Task analysis, Curiosity, Entropy (information theory), Reinforcement learning, Artificial intelligence, Entropy (energy dispersal), 0305 other medical science, business, computer, 0105 earth and related environmental sciences, media_common
Abstract: Deep reinforcement learning (RL) is known as an emerging research trend in machine learning for autonomous systems. In real-world scenarios, the extrinsic rewards, acquired from the environment for learning an agent, are usually missing or extremely sparse. Such an issue of sparse reward constrains the learning capability of agent because the agent only updates the policy when the goal state is successfully attained. It is always challenging to implement an efficient exploration in RL algorithms. To tackle the sparse reward and inefficient exploration, the agent needs other helpful information to update its policy even when there is no interaction with the environment. This paper proposes the stochastic curiosity maximizing exploration (SCME), a learning strategy explored to allow the agent to act as human. We cope with the sparse reward problem by encouraging the agent to explore future diversity. To do so, a latent dynamic system is developed to acquire the latent states and latent actions to predict the variations in future conditions. The mutual information and the prediction error in the predicted states and actions are calculated as the intrinsic rewards. The agent based on SCME is therefore learned by maximizing these rewards to improve sample efficiency for exploration. The experiments on PyDial and Super Mario Bros show the benefits of the proposed SCME in dialogue system and computer game, respectively.
Published: 2020
Full Text: View/download PDF

46. M-ARY Quantized Neural Networks

Author: Jen-Tzung Chien and Su-Ting Chang
Subjects: Artificial neural network, Contextual image classification, Computer science, Quantized neural networks, Quantization (signal processing), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Data_CODINGANDINFORMATIONTHEORY, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Quantization (physics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Adaptive learning, Algorithm, 0105 earth and related environmental sciences
Abstract: Parameter quantization is crucial for model compression. This paper generalizes the binary and ternary quantizations to M-ary quantization for adaptive learning of the quantized neural networks. To compensate the performance loss, the representation values and the quantization partitions of model parameters are jointly trained to optimize the resolution of gradients for parameter updating where the non-differentiable function in back-propagation algorithm is tackled. An asymmetric quantization is implemented. The restriction in parameter quantization is sufficiently relaxed. The resulting M-ary quantization scheme is general and adaptive with different M. Training of the M-ary quantized neural network (MQNN) can be tuned to balance the tradeoff between system performance and memory storage. Experimental results show that MQNN is able to achieve comparable image classification performance with full-precision neural network (FPNN), but the memory storage can be far less than that in FPNN.
Published: 2020
Full Text: View/download PDF

47. Deep Learning Models

Author: Jen-Tzung Chien and Man-Wai Mak
Subjects: business.industry, Computer science, Deep learning, Artificial intelligence, business, Machine learning, computer.software_genre, computer
Published: 2020
Full Text: View/download PDF

48. Machine Learning for Speaker Recognition

Author: Jen-Tzung Chien and Man-Wai Mak
Subjects: Domain adaptation, Computer science, business.industry, Robustness (computer science), Deep learning, Probabilistic logic, Statistical model, Artificial intelligence, Speaker recognition, business, Machine learning, computer.software_genre, computer
Abstract: This book will help readers understand fundamental and advanced statistical models and deep learning models for robust speaker recognition and domain adaptation. This useful toolkit enables readers to apply machine learning techniques to address practical issues, such as robustness under adverse acoustic environments and domain mismatch, when deploying speaker recognition systems. Presenting state-of-the-art machine learning techniques for speaker recognition and featuring a range of probabilistic models, learning algorithms, case studies, and new trends and directions for speaker recognition based on modern machine learning and deep learning, this is the perfect resource for graduates, researchers, practitioners and engineers in electrical engineering, computer science and applied mathematics.
Published: 2020
Full Text: View/download PDF

49. Information Maximized Variational Domain Adversarial Learning for Speaker Verification

Author: Jen-Tzung Chien, Man-Wai Mak, and Youzhi Tu
Subjects: Speaker verification, Artificial neural network, Computer science, Speech recognition, Gaussian, Mutual information, 010501 environmental sciences, 01 natural sciences, Domain (software engineering), 030507 speech-language pathology & audiology, 03 medical and health sciences, Adversarial system, symbols.namesake, Discriminative model, symbols, 0305 other medical science, 0105 earth and related environmental sciences
Abstract: Domain mismatch is a common problem in speaker verification. This paper proposes an information-maximized variational domain adversarial neural network (InfoVDANN) to reduce domain mismatch by incorporating an InfoVAE into domain adversarial training (DAT). DAT aims to produce speaker discriminative and domain-invariant features. The InfoVAE has two roles. First, it performs variational regularization on the learned features so that they follow a Gaussian distribution, which is essential for the standard PLDA backend. Second, it preserves mutual information between the features and the training set to extract extra speaker discriminative information. Experiments on both SRE16 and SRE18-CMN2 show that the InfoVDANN outperforms the recent VDANN, which suggests that increasing the mutual information between the latent features and input features enables the InfoVDANN to extract extra speaker information that is otherwise not possible.
Published: 2020
Full Text: View/download PDF

50. Deep Bayesian Data Mining

Author: Jen-Tzung Chien
Subjects: Hierarchical Dirichlet process, Artificial neural network, Computer science, business.industry, Deep learning, Text segmentation, 010501 environmental sciences, computer.software_genre, Bayesian inference, 01 natural sciences, 030507 speech-language pathology & audiology, 03 medical and health sciences, Information extraction, Recurrent neural network, Data mining, Artificial intelligence, 0305 other medical science, business, computer, Natural language, 0105 earth and related environmental sciences
Abstract: This tutorial addresses the fundamentals and advances in deep Bayesian mining and learning for natural language with ubiquitous applications ranging from speech recognition to document summarization, text classification, text segmentation, information extraction, image caption generation, sentence generation, dialogue control, sentiment classification, recommendation system, question answering and machine translation, to name a few. Traditionally, "deep learning" is taken to be a learning process where the inference or optimization is based on the real-valued deterministic model. The "semantic structure" in words, sentences, entities, actions and documents drawn from a large vocabulary may not be well expressed or correctly optimized in mathematical logic or computer programs. The "distribution function" in discrete or continuous latent variable model for natural language may not be properly decomposed or estimated. This tutorial addresses the fundamentals of statistical models and neural networks, and focus on a series of advanced Bayesian models and deep models including hierarchical Dirichlet process, Chinese restaurant process, hierarchical Pitman-Yor process, Indian buffet process, recurrent neural network (RNN), long short-term memory, sequence-to-sequence model, variational auto-encoder (VAE), generative adversarial network (GAN), attention mechanism, memory-augmented neural network, skip neural network, temporal difference VAE, stochastic neural network, stochastic temporal convolutional network, predictive state neural network, and policy neural network. Enhancing the prior/posterior representation is addressed. We present how these models are connected and why they work for a variety of applications on symbolic and complex patterns in natural language. The variational inference and sampling method are formulated to tackle the optimization for complicated models. The word and sentence embeddings, clustering and co-clustering are merged with linguistic and semantic constraints. A series of case studies, tasks and applications are presented to tackle different issues in deep Bayesian mining, searching, learning and understanding. At last, we will point out a number of directions and outlooks for future studies. This tutorial serves the objectives to introduce novices to major topics within deep Bayesian learning, motivate and explain a topic of emerging importance for data mining and natural language understanding, and present a novel synthesis combining distinct lines of machine learning work.
Published: 2020
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

251 results on '"Jen-Tzung Chien"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources