36 results on '"Dayiheng Liu"'
Search Results
2. CoupGAN: Chinese couplet generation via encoder–decoder model and adversarial training under global control
- Author
-
Qian Qu, Jiancheng Lv, Dayiheng Liu, and Kexin Yang
- Subjects
Geometry and Topology ,Software ,Theoretical Computer Science - Published
- 2022
- Full Text
- View/download PDF
3. Effective Approaches to Neural Query Language Identification
- Author
-
Xingzhang Ren, Baosong Yang, Dayiheng Liu, Haibo Zhang, Xiaoyu Lv, Liang Yao, and Jun Xie
- Subjects
Linguistics and Language ,Artificial Intelligence ,Language and Linguistics ,Computer Science Applications - Abstract
Query language identification (Q-LID) plays a crucial role in a cross-lingual search engine. There exist two main challenges in Q-LID: (1) insufficient contextual information in queries for disambiguation; and (2) the lack of query-style training examples for low-resource languages. In this article, we propose a neural Q-LID model by alleviating the above problems from both model architecture and data augmentation perspectives. Concretely, we build our model upon the advanced Transformer model. In order to enhance the discrimination of queries, a variety of external features (e.g., character, word, as well as script) are fed into the model and fused by a multi-scale attention mechanism. Moreover, to remedy the low resource challenge in this task, a novel machine translation–based strategy is proposed to automatically generate synthetic query-style data for low-resource languages. We contribute the first Q-LID test set called QID-21, which consists of search queries in 21 languages. Experimental results reveal that our model yields better classification accuracy than strong baselines and existing LID systems on both query and traditional LID tasks.1
- Published
- 2022
- Full Text
- View/download PDF
4. An automatic evaluation metric for Ancient-Modern Chinese translation
- Author
-
Kexin Yang, Yongsheng Sang, Jiancheng Lv, Dayiheng Liu, and Qian Qu
- Subjects
0209 industrial biotechnology ,Vocabulary ,Similarity (geometry) ,business.industry ,Computer science ,media_common.quotation_subject ,Text segmentation ,02 engineering and technology ,Semantics ,Translation (geometry) ,computer.software_genre ,020901 industrial engineering & automation ,Artificial Intelligence ,Metric (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,Proper noun ,020201 artificial intelligence & image processing ,Artificial intelligence ,Polysemy ,business ,computer ,Software ,Natural language processing ,media_common - Abstract
As a written language used for thousands of years, Ancient Chinese has some special characteristics like complex semantics as polysemy and the one-to-many alignment with Modern Chinese. Thus it may be translated in a large number of fully different but equally correct ways. In the absence of multiple references, reference-dependent evaluations like Bilingual Evaluation Understudy (BLEU) cannot identify potentially correct translation results. The explore on automatic evaluation of Ancient-Modern Chinese Translation is completely lacking. In this paper, we proposed an automatic evaluation metric for Ancient-Modern Chinese Translation called DTE (Dual-based Translation Evaluation), which can be used to evaluate one-to-many alignment in the absence of multiple references. When using DTE to evaluate, we found that the proper nouns often could not be correctly translated. Hence, we designed a new word segmentation method to improve the translation of proper nouns without increasing the size of the model vocabulary. Experiments show that DTE outperforms several general evaluations in terms of similarity to the evaluation of human experts. Meanwhile, the new word segmentation method promotes the Ancient-Modern Chinese translation models perform better on proper nouns’ translation, and get higher scores on both BLEU and DTE.
- Published
- 2020
- Full Text
- View/download PDF
5. Revision in Continuous Space: Unsupervised Text Style Transfer without Adversarial Learning
- Author
-
Chris Pal, Jie Fu, Dayiheng Liu, Jiancheng Lv, and Yidan Zhang
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Computer science ,business.industry ,Inference ,General Medicine ,Space (commercial competition) ,computer.software_genre ,Style (sociolinguistics) ,Adversarial system ,Key (cryptography) ,Artificial intelligence ,Representation (mathematics) ,business ,Computation and Language (cs.CL) ,computer ,Sentence ,Natural language processing - Abstract
Typical methods for unsupervised text style transfer often rely on two key ingredients: 1) seeking the explicit disentanglement of the content and the attributes, and 2) troublesome adversarial learning. In this paper, we show that neither of these components is indispensable. We propose a new framework that utilizes the gradients to revise the sentence in a continuous space during inference to achieve text style transfer. Our method consists of three key components: a variational auto-encoder (VAE), some attribute predictors (one for each attribute), and a content predictor. The VAE and the two types of predictors enable us to perform gradient-based optimization in the continuous space, which is mapped from sentences in a discrete space, to find the representation of a target sentence with the desired attributes and preserved content. Moreover, the proposed method naturally has the ability to simultaneously manipulate multiple fine-grained attributes, such as sentence length and the presence of specific words, when performing text style transfer tasks. Compared with previous adversarial learning based methods, the proposed method is more interpretable, controllable and easier to train. Extensive experimental studies on three popular text style transfer tasks show that the proposed method significantly outperforms five state-of-the-art methods., Comment: Association for the Advancement of Artificial Intelligence. AAAI 2020
- Published
- 2020
- Full Text
- View/download PDF
6. Self-supervised Product Title Rewrite for Product Listing Ads
- Author
-
Xue Zhao, Dayiheng Liu, Junwei Ding, Liang Yao, Mahone Yan, Huibo Wang, and Wenqing Yao
- Published
- 2022
- Full Text
- View/download PDF
7. UniTE: Unified Translation Evaluation
- Author
-
Yu Wan, Dayiheng Liu, Baosong Yang, Haibo Zhang, Boxing Chen, Derek Wong, and Lidia Chao
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Computation and Language (cs.CL) - Abstract
Translation quality evaluation plays a crucial role in machine translation. According to the input format, it is mainly separated into three tasks, i.e., reference-only, source-only and source-reference-combined. Recent methods, despite their promising results, are specifically designed and optimized on one of them. This limits the convenience of these methods, and overlooks the commonalities among tasks. In this paper, we propose UniTE, which is the first unified framework engaged with abilities to handle all three evaluation tasks. Concretely, we propose monotonic regional attention to control the interaction among input segments, and unified pretraining to better adapt multi-task learning. We testify our framework on WMT 2019 Metrics and WMT 2020 Quality Estimation benchmarks. Extensive analyses show that our \textit{single model} can universally surpass various state-of-the-art or winner methods across tasks. Both source code and associated models are available at https://github.com/NLP2CT/UniTE., Comment: ACL2022
- Published
- 2022
- Full Text
- View/download PDF
8. Bridging the Gap between Training and Inference: Multi-Candidate Optimization for Diverse Neural Machine Translation
- Author
-
Huan Lin, Baosong Yang, Liang Yao, Dayiheng Liu, Haibo Zhang, Jun Xie, Min Zhang, and Jinsong Su
- Published
- 2022
- Full Text
- View/download PDF
9. Unsupervised Preference-Aware Language Identification
- Author
-
Xingzhang Ren, Baosong Yang, Dayiheng Liu, Haibo Zhang, Xiaoyu Lv, Liang Yao, and Jun Xie
- Published
- 2022
- Full Text
- View/download PDF
10. GCPG: A General Framework for Controllable Paraphrase Generation
- Author
-
Kexin Yang, Dayiheng Liu, Wenqiang Lei, Baosong Yang, Haibo Zhang, Xue Zhao, Wenqing Yao, and Boxing Chen
- Published
- 2022
- Full Text
- View/download PDF
11. Frequency-Aware Contrastive Learning for Neural Machine Translation
- Author
-
Tong Zhang, Wei Ye, Baosong Yang, Long Zhang, Xingzhang Ren, Dayiheng Liu, Jinan Sun, Shikun Zhang, Haibo Zhang, and Wen Zhao
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,General Medicine ,Computation and Language (cs.CL) - Abstract
Low-frequency word prediction remains a challenge in modern neural machine translation (NMT) systems. Recent adaptive training methods promote the output of infrequent words by emphasizing their weights in the overall training objectives. Despite the improved recall of low-frequency words, their prediction precision is unexpectedly hindered by the adaptive objectives. Inspired by the observation that low-frequency words form a more compact embedding space, we tackle this challenge from a representation learning perspective. Specifically, we propose a frequency-aware token-level contrastive learning method, in which the hidden state of each decoding step is pushed away from the counterparts of other target words, in a soft contrastive way based on the corresponding word frequencies. We conduct experiments on widely used NIST Chinese-English and WMT14 English-German translation tasks. Empirical results show that our proposed methods can not only significantly improve the translation quality but also enhance lexical diversity and optimize word representation space. Further investigation reveals that, comparing with related adaptive training strategies, the superiority of our method on low-frequency word prediction lies in the robustness of token-level recall across different frequencies without sacrificing precision., Published at AAAI 2022
- Published
- 2021
12. AnchiBERT: A Pre-Trained Model for Ancient Chinese Language Understanding and Generation
- Author
-
Dayiheng Liu, Jiancheng Lv, Huishuang Tian, and Kexin Yang
- Subjects
Vocabulary ,Poetry ,business.industry ,Computer science ,media_common.quotation_subject ,computer.software_genre ,Chinese culture ,Data modeling ,Task analysis ,Artificial intelligence ,Couplet ,Language model ,Architecture ,business ,computer ,Natural language processing ,media_common - Abstract
Ancient Chinese is the essence of Chinese culture. There are several natural language processing tasks of ancient Chinese domain, such as ancient-modern Chinese translation, poem generation, and couplet generation. Previous studies usually use the supervised models which deeply rely on parallel data. However, it is difficult to obtain large-scale parallel data of ancient Chinese. In order to make full use of the more easily available monolingual ancient Chinese corpora, we release An-chiBERT, a pre-trained language model based on the architecture of BERT, which is trained on large-scale ancient Chinese corpora. We evaluate AnchiBERT on both language understanding and generation tasks, including poem classification, ancient-modern Chinese translation, poem generation, and couplet generation. The experimental results show that AnchiBERT outperforms BERT as well as the non-pretrained models and achieves state-of - the-art results in all cases.
- Published
- 2021
- Full Text
- View/download PDF
13. Evolving transformer architecture for neural machine translation
- Author
-
Dayiheng Liu, Yanan Sun, and Ben Feng
- Subjects
Computer engineering ,Machine translation ,Computer science ,Genetic algorithm ,Benchmark (computing) ,Architecture design ,Layer (object-oriented design) ,Architecture ,computer.software_genre ,computer ,Transformer (machine learning model) - Abstract
The transformer models have achieved great success on neural machine translation tasks in recent years. However, the hyper-parameters of the transformer are often manually designed by expertise, where the layer is often regularly stacked together without exploring potentially promising ordering patterns. In this paper, we propose a transformer architecture design algorithm based on genetic algorithm, which can automatically find the proper layer ordering pattern and hyper-parameters for the tasks at hand. The experimental results show that the models designed by the proposed algorithm outperform the vanilla transformer on the widely used machine translation benchmark, which reveals that the performance of transformer architecture can be improved by adjusting layer ordering pattern and hyper-parameters by the proposed algorithm.
- Published
- 2021
- Full Text
- View/download PDF
14. BFGAN: Backward and Forward Generative Adversarial Networks for Lexically Constrained Sentence Generation
- Author
-
Jiancheng Lv, Qian Qu, Dayiheng Liu, and Jie Fu
- Subjects
FOS: Computer and information sciences ,Closed captioning ,Computer Science - Computation and Language ,Acoustics and Ultrasonics ,Machine translation ,Computer science ,business.industry ,Process (engineering) ,computer.software_genre ,Computational Mathematics ,Computer Science (miscellaneous) ,Task analysis ,Beam search ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Computation and Language (cs.CL) ,computer ,Natural language ,Generative grammar ,Natural language processing ,Generator (mathematics) - Abstract
Incorporating prior knowledge like lexical constraints into the model’s output to generate meaningful and coherent sentences has many applications in dialogue system, machine translation, image captioning, etc. However, existing auto-regressive models incrementally generate sentences from left to right via beam search, which makes it difficult to directly introduce lexical constraints into the generated sentences. In this paper, we propose a new algorithmic framework, dubbed BFGAN, to address this challenge. Specifically, we employ a backward generator and a forward generator to generate lexically constrained sentences together, and use a discriminator to guide the joint training of two generators by assigning them reward signals. Due to the difficulty of BFGAN training, we propose several training techniques to make the training process more stable and efficient. Our extensive experiments on three large-scale datasets with human evaluation demonstrate that BFGAN has significant improvements over previous methods.
- Published
- 2019
- Full Text
- View/download PDF
15. Generating Style-Specific Chinese Tang Poetry With a Simple Actor-Critic Model
- Author
-
Jiancheng Lv, Dayiheng Liu, and Yunxia Li
- Subjects
Structure (mathematical logic) ,Control and Optimization ,Poetry ,business.industry ,Computer science ,media_common.quotation_subject ,Computer Science Applications ,Style (sociolinguistics) ,Computational Mathematics ,Consistency (database systems) ,Recurrent neural network ,Value network ,Artificial Intelligence ,Artificial intelligence ,Function (engineering) ,business ,Encoder ,media_common - Abstract
Recent studies in sequence-to-sequence learning demonstrate that recurrent neural network (RNN) encoder-decoder structure can do well in Chinese classical poetry generation. With topic words or first line as the encoder input, an entire poem is then incrementally generated by the decoder from left to right with the highest probability. However, this locally incremental nature of decoding model can lead to the incongruity of style between the front and back of the poem generated. Inspired by the behavior of people tending to plan and associate the following parts in advance when they are writing poems, this paper employs a simple actor-critic method to generate style-specific Chinese poems. We design a style matching reward function and employ a value network with Monte Carlo search as the critic to estimate the future rewards of the desired style for poem generation. This approach makes the generation process more flexible and controllable. The experimental results demonstrate that our approach can generate three specific styles of high-quality poetry, and enhance the consistency of style of generated poems.
- Published
- 2019
- Full Text
- View/download PDF
16. μ-Forcing
- Author
-
Jiancheng Lv, Feng He, Yuanyuan Chen, Yang Xue, and Dayiheng Liu
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Computation and Language ,Forcing (recursion theory) ,General Computer Science ,Computer science ,business.industry ,Training (meteorology) ,Process (computing) ,02 engineering and technology ,Latent variable ,Machine learning ,computer.software_genre ,Machine Learning (cs.LG) ,Term (time) ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Text generation ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Control (linguistics) ,Computation and Language (cs.CL) ,computer - Abstract
It has been previously observed that training Variational Recurrent Autoencoders (VRAE) for text generation suffers from serious uninformative latent variables problem. The model would collapse into a plain language model that totally ignore the latent variables and can only generate repeating and dull samples. In this paper, we explore the reason behind this issue and propose an effective regularizer based approach to address it. The proposed method directly injects extra constraints on the posteriors of latent variables into the learning process of VRAE, which can flexibly and stably control the trade-off between the KL term and the reconstruction term, making the model learn dense and meaningful latent representations. The experimental results show that the proposed method outperforms several strong baselines and can make the model learn interpretable latent variables and generate diverse meaningful sentences. Furthermore, the proposed method can perform well without using other strategies, such as KL annealing., Comment: To appear in the ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)
- Published
- 2019
- Full Text
- View/download PDF
17. Deep learning-based automatic downbeat tracking: a brief review
- Author
-
Dayiheng Liu, Jiancheng Lv, and Bijue Jia
- Subjects
FOS: Computer and information sciences ,Feature engineering ,Sound (cs.SD) ,Computer Networks and Communications ,Computer science ,Feature extraction ,02 engineering and technology ,Machine learning ,computer.software_genre ,Computer Science - Sound ,Computer Science - Information Retrieval ,Audio and Speech Processing (eess.AS) ,FOS: Electrical engineering, electronic engineering, information engineering ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Music information retrieval ,Point (typography) ,Artificial neural network ,business.industry ,Deep learning ,020207 software engineering ,Hardware and Architecture ,Systems architecture ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Feature learning ,Information Retrieval (cs.IR) ,Software ,Electrical Engineering and Systems Science - Audio and Speech Processing ,Information Systems - Abstract
As an important format of multimedia, music has filled almost everyone's life. Automatic analyzing music is a significant step to satisfy people's need for music retrieval and music recommendation in an effortless way. Thereinto, downbeat tracking has been a fundamental and continuous problem in Music Information Retrieval (MIR) area. Despite significant research efforts, downbeat tracking still remains a challenge. Previous researches either focus on feature engineering (extracting certain features by signal processing, which are semi-automatic solutions); or have some limitations: they can only model music audio recordings within limited time signatures and tempo ranges. Recently, deep learning has surpassed traditional machine learning methods and has become the primary algorithm in feature learning; the combination of traditional and deep learning methods also has made better performance. In this paper, we begin with a background introduction of downbeat tracking problem. Then, we give detailed discussions of the following topics: system architecture, feature extraction, deep neural network algorithms, datasets, and evaluation strategy. In addition, we take a look at the results from the annual benchmark evaluation--Music Information Retrieval Evaluation eXchange (MIREX)--as well as the developments in software implementations. Although much has been achieved in the area of automatic downbeat tracking, some problems still remain. We point out these problems and conclude with possible directions and challenges for future research., 22 pages, 7 figures. arXiv admin note: text overlap with arXiv:1605.08396 by other authors
- Published
- 2019
- Full Text
- View/download PDF
18. POS-Constrained Parallel Decoding for Non-autoregressive Generation
- Author
-
Kexin Yang, Jiancheng Lv, Dayiheng Liu, Weizhen Qi, and Wenqiang Lei
- Subjects
Structure (mathematical logic) ,Sequence ,business.industry ,Computer science ,Inference ,Machine learning ,computer.software_genre ,Automatic summarization ,Multimodality ,Autoregressive model ,Text generation ,Artificial intelligence ,business ,computer ,Decoding methods - Abstract
The multimodality problem has become a major challenge of existing non-autoregressive generation (NAG) systems. A common solution often resorts to sequence-level knowledge distillation by rebuilding the training dataset through autoregressive generation (hereinafter known as “teacher AG”). The success of such methods may largely depend on a latent assumption, i.e., the teacher AG is superior to the NAG model. However, in this work, we experimentally reveal that this assumption does not always hold for the text generation tasks like text summarization and story ending generation. To provide a feasible solution to the multimodality problem of NAG, we propose incorporating linguistic structure (Part-of-Speech sequence in particular) into NAG inference instead of relying on teacher AG. More specifically, the proposed POS-constrained Parallel Decoding (POSPD) method aims at providing a specific POS sequence to constrain the NAG model during decoding. Our experiments demonstrate that POSPD consistently improves NAG models on four text generation tasks to a greater extent compared to knowledge distillation. This observation validates the necessity of exploring the alternatives for sequence-level knowledge distillation.
- Published
- 2021
- Full Text
- View/download PDF
19. KGR^4: Retrieval, Retrospect, Refine and Rethink for Commonsense Generation
- Author
-
Xin Liu, Dayiheng Liu, Baosong Yang, Haibo Zhang, Junwei Ding, Wenqing Yao, Weihua Luo, Haiying Zhang, and Jinsong Su
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,General Medicine ,Computation and Language (cs.CL) - Abstract
Generative commonsense reasoning requires machines to generate sentences describing an everyday scenario given several concepts, which has attracted much attention recently. However, existing models cannot perform as well as humans, since sentences they produce are often implausible and grammatically incorrect. In this paper, inspired by the process of humans creating sentences, we propose a novel Knowledge-enhanced Commonsense Generation framework, termed KGR4, consisting of four stages: Retrieval, Retrospect, Refine, Rethink. Under this framework, we first perform retrieval to search for relevant sentences from external corpus as the prototypes. Then, we train the generator that either edits or copies these prototypes to generate candidate sentences, of which potential errors will be fixed by an autoencoder-based refiner. Finally, we select the output sentence from candidate sentences produced by generators with different hyper-parameters. Experimental results and in-depth analysis on the CommonGen benchmark strongly demonstrate the effectiveness of our framework. Particularly, KGR4 obtains 33.56 SPICE in the official leaderboard, outperforming the previously-reported best result by 2.49 SPICE and achieving state-of-the-art performance. We release the code at https://github.com/DeepLearnXMU/KGR-4.
- Published
- 2021
- Full Text
- View/download PDF
20. Mask Attention Networks: Rethinking and Strengthen Transformer
- Author
-
Zhihao Fan, Zhongyu Wei, Nan Duan, Ruofei Zhang, Siyuan Wang, Xuanjing Huang, Dayiheng Liu, Jian Jiao, and Yeyun Gong
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Artificial neural network ,Machine translation ,business.industry ,Computer science ,05 social sciences ,010501 environmental sciences ,computer.software_genre ,01 natural sciences ,Automatic summarization ,Matrix (mathematics) ,0502 economics and business ,Artificial intelligence ,050207 economics ,Layer (object-oriented design) ,business ,Representation (mathematics) ,Computation and Language (cs.CL) ,Feature learning ,computer ,0105 earth and related environmental sciences ,Transformer (machine learning model) - Abstract
Transformer is an attention-based neural network, which consists of two sublayers, namely, Self-Attention Network (SAN) and Feed-Forward Network (FFN). Existing research explores to enhance the two sublayers separately to improve the capability of Transformer for text representation. In this paper, we present a novel understanding of SAN and FFN as Mask Attention Networks (MANs) and show that they are two special cases of MANs with static mask matrices. However, their static mask matrices limit the capability for localness modeling in text representation learning. We therefore introduce a new layer named dynamic mask attention network (DMAN) with a learnable mask matrix which is able to model localness adaptively. To incorporate advantages of DMAN, SAN, and FFN, we propose a sequential layered structure to combine the three types of layers. Extensive experiments on various tasks, including neural machine translation and text summarization demonstrate that our model outperforms the original Transformer., Accepted as a long paper to NAACL 2021
- Published
- 2021
- Full Text
- View/download PDF
21. Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation
- Author
-
Min Zhang, Haibo Zhang, Xin Liu, Baosong Yang, Jinsong Su, Dayiheng Liu, Haiying Zhang, and Weihua Luo
- Subjects
FOS: Computer and information sciences ,Vocabulary ,Computer Science - Computation and Language ,Theoretical computer science ,Computer science ,media_common.quotation_subject ,Natural language generation ,Security token ,Pipeline (software) ,Bridging (programming) ,Embedding ,Representation (mathematics) ,Computation and Language (cs.CL) ,Generator (mathematics) ,media_common - Abstract
A well-known limitation in pretrain-finetune paradigm lies in its inflexibility caused by the one-size-fits-all vocabulary. This potentially weakens the effect when applying pretrained models into natural language generation (NLG) tasks, especially for the subword distributions between upstream and downstream tasks with significant discrepancy. Towards approaching this problem, we extend the vanilla pretrain-finetune pipeline with an extra embedding transfer step. Specifically, a plug-and-play embedding generator is introduced to produce the representation of any input token, according to pre-trained embeddings of its morphologically similar ones. Thus, embeddings of mismatch tokens in downstream tasks can also be efficiently initialized. We conduct experiments on a variety of NLG tasks under the pretrain-finetune fashion. Experimental results and extensive analyses show that the proposed strategy offers us opportunities to feel free to transfer the vocabulary, leading to more efficient and better performed downstream NLG models., Comment: Accepted by ACL2021
- Published
- 2021
- Full Text
- View/download PDF
22. Towards User-Driven Neural Machine Translation
- Author
-
Haibo Zhang, Baosong Yang, Liang Yao, Jinsong Su, Degen Huang, Weihua Luo, Huan Lin, and Dayiheng Liu
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Machine translation ,Computer science ,business.industry ,Translation (geometry) ,computer.software_genre ,Preference ,Expression (mathematics) ,User driven ,Learning methods ,Cache ,Artificial intelligence ,business ,Computation and Language (cs.CL) ,computer ,Natural language processing - Abstract
A good translation should not only translate the original content semantically, but also incarnate personal traits of the original text. For a real-world neural machine translation (NMT) system, these user traits (e.g., topic preference, stylistic characteristics and expression habits) can be preserved in user behavior (e.g., historical inputs). However, current NMT systems marginally consider the user behavior due to: 1) the difficulty of modeling user portraits in zero-shot scenarios, and 2) the lack of user-behavior annotated parallel dataset. To fill this gap, we introduce a novel framework called user-driven NMT. Specifically, a cache-based module and a user-driven contrastive learning method are proposed to offer NMT the ability to capture potential user traits from their historical inputs under a zero-shot learning fashion. Furthermore, we contribute the first Chinese-English parallel corpus annotated with user behavior called UDT-Corpus. Experimental results confirm that the proposed user-driven NMT can generate user-specific translations.
- Published
- 2021
- Full Text
- View/download PDF
23. Herb-Know: Knowledge Enhanced Prescription Generation for Traditional Chinese Medicine
- Author
-
Jiancheng Lv, Xiaoming Huang, Dayiheng Liu, Kexin Yang, and Chanjuan Li
- Subjects
Vocabulary ,food.ingredient ,Medical treatment ,business.industry ,Computer science ,media_common.quotation_subject ,Knowledge economy ,Traditional Chinese medicine ,computer.software_genre ,Knowledge-based systems ,food ,Herb ,Task analysis ,Artificial intelligence ,Medical prescription ,business ,computer ,Natural language processing ,media_common - Abstract
Prescription generation of traditional Chinese medicine (TCM) is a meaningful and challenging problem. Previous researches mainly model the relationship between symptoms and herbal prescription directly. However, TCM practitioners often take herb effects into consideration when prescribing. Few works focus on fusing the external knowledge of herbs. In this paper, we explore how to generate a prescription with the knowledge of herb effects under the given symptoms. We propose Herb-Know, a sequence to sequence (seq2seq) model with pointer network, where the prescription is conditioned over two inputs (symptoms and pre-selected herb candidates). To the best of our knowledge, this is the first attempt to generate a prescription with a knowledge enhanced seq2seq model. The experimental results demonstrate that our method can make use of knowledge to generate informative and reasonable herbs, which outperforms other baseline models.
- Published
- 2020
- Full Text
- View/download PDF
24. Generating Chinese Poetry from Images via Concrete and Abstract Information
- Author
-
Yongsheng Sang, Jiancheng Lv, Dayiheng Liu, and Yusen Liu
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Information retrieval ,Poetry ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,media_common.quotation_subject ,Computer Science - Computer Vision and Pattern Recognition ,Construct (python library) ,Consistency (database systems) ,Classical Chinese poetry ,Task analysis ,Chinese poetry ,Quality (business) ,Computation and Language (cs.CL) ,media_common - Abstract
In recent years, the automatic generation of classical Chinese poetry has made great progress. Besides focusing on improving the quality of the generated poetry, there is a new topic about generating poetry from an image. However, the existing methods for this topic still have the problem of topic drift and semantic inconsistency, and the image-poem pairs dataset is hard to be built when training these models. In this paper, we extract and integrate the Concrete and Abstract information from images to address those issues. We proposed an infilling-based Chinese poetry generation model which can infill the Concrete keywords into each line of poems in an explicit way, and an abstract information embedding to integrate the Abstract information into generated poems. In addition, we use non-parallel data during training and construct separate image datasets and poem datasets to train the different components in our framework. Both automatic and human evaluation results show that our approach can generate poems which have better consistency with images without losing the quality., Accepted by the 2020 International Joint Conference on Neural Networks (IJCNN 2020)
- Published
- 2020
- Full Text
- View/download PDF
25. Tell Me How to Ask Again: Question Data Augmentation with Controllable Rewriting in Continuous Space
- Author
-
Yeyun Gong, Jiusheng Chen, Jie Fu, Ming Zhou, Dayiheng Liu, Nan Duan, Yu Yan, and Jiancheng Lv
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Theoretical computer science ,Computer science ,Continuous embedding ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Autoencoder ,Ask price ,Question generation ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Rewriting ,Computation and Language (cs.CL) ,0105 earth and related environmental sciences ,Transformer (machine learning model) - Abstract
In this paper, we propose a novel data augmentation method, referred to as Controllable Rewriting based Question Data Augmentation (CRQDA), for machine reading comprehension (MRC), question generation, and question-answering natural language inference tasks. We treat the question data augmentation task as a constrained question rewriting problem to generate context-relevant, high-quality, and diverse question data samples. CRQDA utilizes a Transformer autoencoder to map the original discrete question into a continuous embedding space. It then uses a pre-trained MRC model to revise the question representation iteratively with gradient-based optimization. Finally, the revised question representations are mapped back into the discrete space, which serve as additional question data. Comprehensive experiments on SQuAD 2.0, SQuAD 1.1 question generation, and QNLI tasks demonstrate the effectiveness of CRQDA, Comment: Accepted at EMNLP 2020
- Published
- 2020
- Full Text
- View/download PDF
26. Exploration on the Generation of Chinese Palindrome Poetry
- Author
-
Liao Chen, Yongsheng Sang, Jiancheng Lv, Zhichen Lai, and Dayiheng Liu
- Subjects
Poetry ,business.industry ,Computer science ,Deep learning ,Palindrome ,Inference ,computer.software_genre ,Human judgment ,03 medical and health sciences ,0302 clinical medicine ,Beam search ,Chinese poetry ,030212 general & internal medicine ,Language model ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
Recently, Chinese poetry generation gains many significant achievement with the development of deep learning. However, existing methods can not generate Chinese palindrome poetry. Besides, there is no public dataset of Chinese palindrome poetry. In this paper, we propose a novel Chinese palindrome poetry generation model, named Chinese Palindrome Poetry Generation Model (CPPGM), based on the universal seq2seq model and language model with specific beam search algorithms. In addition, the proposed model is the first to generate Chinese palindrome poetry automatically, and is applicable to other palindromes, such as palindrome couplets. Compared with several methods we propose, the experimental results demonstrate the superiority of CPPGM with machine evaluation as well as human judgment.
- Published
- 2020
- Full Text
- View/download PDF
27. RikiNet: Reading Wikipedia Pages for Natural Question Answering
- Author
-
Nan Duan, Jiusheng Chen, Yu Yan, Yeyun Gong, Jie Fu, Daxin Jiang, Jiancheng Lv, and Dayiheng Liu
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Computer science ,business.industry ,media_common.quotation_subject ,Natural language understanding ,02 engineering and technology ,computer.software_genre ,Set (abstract data type) ,03 medical and health sciences ,0302 clinical medicine ,Reading (process) ,030221 ophthalmology & optometry ,0202 electrical engineering, electronic engineering, information engineering ,Question answering ,Natural (music) ,020201 artificial intelligence & image processing ,Artificial intelligence ,Paragraph ,business ,computer ,Computation and Language (cs.CL) ,Natural language processing ,media_common - Abstract
Reading long documents to answer open-domain questions remains challenging in natural language understanding. In this paper, we introduce a new model, called RikiNet, which reads Wikipedia pages for natural question answering. RikiNet contains a dynamic paragraph dual-attention reader and a multi-level cascaded answer predictor. The reader dynamically represents the document and question by utilizing a set of complementary attention mechanisms. The representations are then fed into the predictor to obtain the span of the short answer, the paragraph of the long answer, and the answer type in a cascaded manner. On the Natural Questions (NQ) dataset, a single RikiNet achieves 74.3 F1 and 57.9 F1 on long-answer and short-answer tasks. To our best knowledge, it is the first single model that outperforms the single human performance. Furthermore, an ensemble RikiNet obtains 76.1 F1 and 61.3 F1 on long-answer and short-answer tasks, achieving the best performance on the official NQ leaderboard, Comment: Accepted at ACL 2020
- Published
- 2020
- Full Text
- View/download PDF
28. GLGE: A New General Language Generation Evaluation Benchmark
- Author
-
Ming Gong, Ming Zhou, Winnie Wu, Jiusheng Chen, Jiancheng Lv, Nan Duan, Weizhu Chen, Daxin Jiang, Yu Yan, Yeyun Gong, Jian Jiao, Pengcheng Wang, Ruofei Zhang, Hang Zhang, Dayiheng Liu, Jie Fu, Weizhen Qi, and Linjun Shou
- Subjects
FOS: Computer and information sciences ,Source code ,Computer Science - Computation and Language ,business.industry ,Computer science ,Generalization ,media_common.quotation_subject ,Natural language understanding ,Natural language generation ,computer.software_genre ,Task (project management) ,Range (mathematics) ,Benchmark (computing) ,Artificial intelligence ,Transfer of learning ,business ,computer ,Computation and Language (cs.CL) ,Natural language processing ,media_common - Abstract
Multi-task benchmarks such as GLUE and SuperGLUE have driven great progress of pretraining and transfer learning in Natural Language Processing (NLP). These benchmarks mostly focus on a range of Natural Language Understanding (NLU) tasks, without considering the Natural Language Generation (NLG) models. In this paper, we present the General Language Generation Evaluation (GLGE), a new multi-task benchmark for evaluating the generalization capabilities of NLG models across eight language generation tasks. For each task, we continue to design three subtasks in terms of task difficulty (GLGE-Easy, GLGE-Medium, and GLGE-Hard). This introduces 24 subtasks to comprehensively compare model performance. To encourage research on pretraining and transfer learning on NLG models, we make GLGE publicly available and build a leaderboard with strong baselines including MASS, BART, and ProphetNet (The source code and dataset are publicly available at https://github.com/microsoft/glge)., Comment: Findings of Association for Computational Linguistics. ACL 2021
- Published
- 2020
- Full Text
- View/download PDF
29. ProphetNet: Predicting Future N-gram for Sequence-to-SequencePre-training
- Author
-
Jiusheng Chen, Ming Zhou, Weizhen Qi, Nan Duan, Yu Yan, Ruofei Zhang, Dayiheng Liu, and Yeyun Gong
- Subjects
Sequence ,Scale (ratio) ,Computer science ,business.industry ,Context (language use) ,02 engineering and technology ,010501 environmental sciences ,Overfitting ,Machine learning ,computer.software_genre ,Base (topology) ,01 natural sciences ,Automatic summarization ,n-gram ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,0105 earth and related environmental sciences - Abstract
This paper presents a new sequence-to-sequence pre-training model called ProphetNet, which introduces a novel self-supervised objective named future n-gram prediction and the proposed n-stream self-attention mechanism. Instead of optimizing one-step-ahead prediction in the traditional sequence-to-sequence model, the ProphetNet is optimized by n-step ahead prediction that predicts the next n tokens simultaneously based on previous context tokens at each time step. The future n-gram prediction explicitly encourages the model to plan for the future tokens and prevent overfitting on strong local correlations. We pre-train ProphetNet using a base scale dataset (16GB) and a large-scale dataset (160GB), respectively. Then we conduct experiments on CNN/DailyMail, Gigaword, and SQuAD 1.1 benchmarks for abstractive summarization and question generation tasks. Experimental results show that ProphetNet achieves new state-of-the-art results on all these datasets compared to the models using the same scale pre-training corpus.
- Published
- 2020
- Full Text
- View/download PDF
30. Diverse, Controllable, and Keyphrase-Aware: A Corpus and Method for News Multi-Headline Generation
- Author
-
Nan Duan, Yu Yan, Daxin Jiang, Bo Shao, Jiancheng Lv, Jie Fu, Yeyun Gong, and Dayiheng Liu
- Subjects
Information retrieval ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,0502 economics and business ,05 social sciences ,Headline ,050207 economics ,010501 environmental sciences ,01 natural sciences ,Sentence ,0105 earth and related environmental sciences - Abstract
News headline generation aims to produce a short sentence to attract readers to read the news. One news article often contains multiple keyphrases that are of interest to different users, which can naturally have multiple reasonable headlines. However, most existing methods focus on the single headline generation. In this paper, we propose generating multiple headlines with keyphrases of user interests, whose main idea is to generate multiple keyphrases of interest to users for the news first, and then generate multiple keyphrase-relevant headlines. We propose a multi-source Transformer decoder, which takes three sources as inputs: (a) keyphrase, (b) keyphrase-filtered article, and (c) original article to generate keyphrase-relevant, high-quality, and diverse headlines. Furthermore, we propose a simple and effective method to mine the keyphrases of interest in the news article and build a first large-scale keyphrase-aware news headline corpus, which contains over 180K aligned triples of . Extensive experimental comparisons on the real-world dataset show that the proposed method achieves state-of-the-art results in terms of quality and diversity.
- Published
- 2020
- Full Text
- View/download PDF
31. Deep Poetry: A Chinese Classical Poetry Generation System
- Author
-
Yusen Liu, Jiancheng Lv, and Dayiheng Liu
- Subjects
FOS: Computer and information sciences ,World Wide Web ,Computer Science - Computation and Language ,Poetry ,Computer science ,Process (engineering) ,General Medicine ,Computation and Language (cs.CL) - Abstract
In this work, we demonstrate a Chinese classical poetry generation system called Deep Poetry. Existing systems for Chinese classical poetry generation are mostly template-based and very few of them can accept multi-modal input. Unlike previous systems, Deep Poetry uses neural networks that are trained on over 200 thousand poems and 3 million ancient Chinese prose. Our system can accept plain text, images or artistic conceptions as inputs to generate Chinese classical poetry. More importantly, users are allowed to participate in the process of writing poetry by our system. For the user's convenience, we deploy the system at the WeChat applet platform, users can use the system on the mobile device whenever and wherever possible. The demo video of this paper is available at https://youtu.be/jD1R_u9TA3M., Comment: Association for the Advancement of Artificial Intelligence, Demonstrations Program. AAAI 2020
- Published
- 2020
- Full Text
- View/download PDF
32. TIGS: An Inference Algorithm for Text Infilling with Gradient Search
- Author
-
Pengfei Liu, Dayiheng Liu, Jie Fu, and Jiancheng Lv
- Subjects
FOS: Computer and information sciences ,Sequence ,Computer Science - Computation and Language ,Computer science ,Inference ,Natural language generation ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Generative model ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Paragraph ,Algorithm ,Computation and Language (cs.CL) ,Sentence ,Generative grammar ,0105 earth and related environmental sciences - Abstract
Text infilling is defined as a task for filling in the missing part of a sentence or paragraph, which is suitable for many real-world natural language generation scenarios. However, given a well-trained sequential generative model, generating missing symbols conditioned on the context is challenging for existing greedy approximate inference algorithms. In this paper, we propose an iterative inference algorithm based on gradient search, which is the first inference algorithm that can be broadly applied to any neural sequence generative models for text infilling tasks. We compare the proposed method with strong baselines on three text infilling tasks with various mask ratios and different mask strategies. The results show that our proposed method is effective and efficient for fill-in-the-blank tasks, consistently outperforming all baselines., Comment: The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)
- Published
- 2019
- Full Text
- View/download PDF
33. Ancient-Modern Chinese Translation with a Large Training Dataset
- Author
-
Kexin Yang, Qian Qu, Dayiheng Liu, and Jiancheng Lv
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,General Computer Science ,Machine translation ,business.industry ,Computer science ,Automatic translation ,02 engineering and technology ,010501 environmental sciences ,computer.software_genre ,Translation (geometry) ,01 natural sciences ,Task (project management) ,Manual annotation ,Test set ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Baseline (configuration management) ,Computation and Language (cs.CL) ,computer ,Natural language processing ,0105 earth and related environmental sciences - Abstract
Ancient Chinese brings the wisdom and spirit culture of the Chinese nation. Automatic translation from ancient Chinese to modern Chinese helps to inherit and carry forward the quintessence of the ancients. However, the lack of large-scale parallel corpus limits the study of machine translation in Ancient-Modern Chinese. In this paper, we propose an Ancient-Modern Chinese clause alignment approach based on the characteristics of these two languages. This method combines both lexical-based information and statistical-based information, which achieves 94.2 F1-score on our manual annotation Test set. We use this method to create a new large-scale Ancient-Modern Chinese parallel corpus which contains 1.24M bilingual pairs. To our best knowledge, this is the first large high-quality Ancient-Modern Chinese dataset. Furthermore, we analyzed and compared the performance of the SMT and various NMT models on this dataset and provided a strong baseline for this task., To appear in the ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)
- Published
- 2018
34. A Multi-Modal Chinese Poetry Generation Model
- Author
-
Quan Guo, Wubo Li, Jiancheng Lv, and Dayiheng Liu
- Subjects
FOS: Computer and information sciences ,Phrase ,Computer science ,02 engineering and technology ,010501 environmental sciences ,computer.software_genre ,01 natural sciences ,Latent Dirichlet allocation ,symbols.namesake ,0202 electrical engineering, electronic engineering, information engineering ,Relevance (information retrieval) ,0105 earth and related environmental sciences ,Structure (mathematical logic) ,Computer Science - Computation and Language ,Poetry ,business.industry ,Modal ,symbols ,Chinese poetry ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Computation and Language (cs.CL) ,Sentence ,Natural language processing - Abstract
Recent studies in sequence-to-sequence learning demonstrate that RNN encoder-decoder structure can successfully generate Chinese poetry. However, existing methods can only generate poetry with a given first line or user's intent theme. In this paper, we proposed a three-stage multi-modal Chinese poetry generation approach. Given a picture, the first line, the title and the other lines of the poem are successively generated in three stages. According to the characteristics of Chinese poems, we propose a hierarchy-attention seq2seq model which can effectively capture character, phrase, and sentence information between contexts and improve the symmetry delivered in poems. In addition, the Latent Dirichlet allocation (LDA) model is utilized for title generation and improve the relevance of the whole poem and the title. Compared with strong baseline, the experimental results demonstrate the effectiveness of our approach, using machine evaluations as well as human judgments., Comment: Accepted at the International Joint Conference on Neural Networks, IJCNN, 2018
- Published
- 2018
- Full Text
- View/download PDF
35. Method to Improve the Performance of Restricted Boltzmann Machines
- Author
-
Yong Xu, Jiancheng Lv, Qingyu Mao, Dayiheng Liu, and Jing Yin
- Subjects
Computer Science::Machine Learning ,Restricted Boltzmann machine ,Distribution (number theory) ,Computer science ,Feature extraction ,Boltzmann machine ,02 engineering and technology ,Function (mathematics) ,03 medical and health sciences ,0302 clinical medicine ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Marginal distribution ,Feature learning ,Algorithm ,030217 neurology & neurosurgery - Abstract
Restricted Boltzmann machines (RBMs) are widely applied to solve many machine learning problems. Usually, the cost function of RBM is log-likelihood function of marginal distribution of input data, and the training method involves maximizing the cost function. Distribution of the trained RBM is identical to that of input data. But the reconstruction error always exists even the distributions are almost identical. In this paper, a method to train RBM by adding reconstruction error to the cost function is put forward. Two categories of trials are performed to validate the proposed method: feature extraction and classification. The experimental results show that the proposed method can be effective.
- Published
- 2018
- Full Text
- View/download PDF
36. A neural words encoding model
- Author
-
Jiancheng Lv, Jiangshu Wei, Dayiheng Liu, and Xiaofeng Qi
- Subjects
Incremental encoding ,Artificial neural network ,business.industry ,Computer science ,Speech recognition ,Deep learning ,Boltzmann machine ,020206 networking & telecommunications ,Data_CODINGANDINFORMATIONTHEORY ,02 engineering and technology ,Encryption ,Deep belief network ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Decoding methods ,Data compression - Abstract
This paper proposes a neural network model and learning algorithm that can be applied to encode words. The model realizes the function of words encoding and decoding which can be applied to text encryption/decryption and word-based compression. The model is based on Deep Belief Networks (DBNs) and it differs from traditional DBNs in that it is asymmetric structured and the output of it is a binary vector. With pre-training of multi-layer Restricted Boltzmann Machines (RBMs) and fine-tuning to reconstruct word set, the output of code layer can be used as a kind of representation code of words. We can change the number of neurons of code layer to control the length of representation code for different applications. This paper reports on experiments using English words of American National Corpus to train a neural words encoding model which can be used to encode/decode English words, realizing text encryption and data compression.
- Published
- 2016
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.