Back to Search
Start Over
Text encoding and decoding from global perspectives
- Publication Year :
- 2022
- Publisher :
- University of Liverpool, 2022.
-
Abstract
- As an important application scenario of deep learning, Natural Language Processing (NLP) is receiving more and more attention and developing rapidly. Learning representation for words or documents via neural networks is gradually replacing feature engineering in almost all text-related applications. On the other hand, how to decode these representations or encodings is also very vital for sequence-to-sequence text generation tasks such as Neural Abstractive Summarization (NAS), Neural Machine Translation (NMT), etc. Towards a more comprehensive representation and decoding strategy, this dissertation explores several global perspectives that previous studies ignored. We treat {\it global} as a relative concept that indicates higher-level knowledge conducive to enriching representation or improving decoding. However, its specific definition may vary in different tasks. In text representation or encoding, {\it global} refers to relatively higher-level context information. There usually are three natural contextual relationships for mapping words or documents into latent space, namely (1) co-occurrence relationships between words, (2) coherence relationships between sentences, and (3) subordinate relationships between documents/sentences and their words. Beyond these naturally occurring contexts, there are possibly hidden context relationships between dependent documents from the perspective of the whole corpus (i.e., the global perspective). Although we often assume that documents in a corpus are independent of each other, the assumption may not be valid for some corpora like news corpora, since events reported by news documents interact in the real world. To capture the global-contextual information, we construct a news network for the whole corpus to model the latent relationships between news. A network embedding algorithm is then designed to produce news representations based on the above-mentioned subordinate relationship and news dependency. Besides, such a cross-document relationship plays a vital role in some specific tasks which need to represent or encode a cluster of multiple documents, e.g., Multi-document Summarization (MDS). Some studies concatenate all documents as a flat sequence, which is detrimental to modeling the cross-document and long-term dependency. To alleviate the two problems, we design a Parallel Hierarchical Transformer (PHT), whose local and global attention mechanisms can simultaneously capture cross-token and cross-document relationships. On the other hand, {\it global} in text decoding refers to a higher-level optimum, i.e., the global optimum relative to the local optimum. Under the fact that the neural text generator is almost impossible to generate the whole sentence at once, the heuristic algorithm -- beam search has been the natural choice for text decoding. Inevitably, beam search often gets stuck of local optimum as it decodes word-by-word. Although global optimum is hard to touch directly, it is feasible to conduct a one-shot prediction of how the global optimal hypothesis attends to the source tokens. A global scoring mechanism is then proposed to evaluate generated sentences at each step based on the predicted global attention distribution, thus calibrating beam search stepwise to return a hypothesis that can assign attention distribution to the source in a more-near global optimal manner. Decoding with global awareness improves the local optimum problem to enhance the generation quality significantly, and it can be developed and used in various text generation fields.
Details
- Language :
- English
- Database :
- British Library EThOS
- Publication Type :
- Dissertation/ Thesis
- Accession number :
- edsble.868756
- Document Type :
- Electronic Thesis or Dissertation
- Full Text :
- https://doi.org/10.17638/03164793