1. Open Information Extraction for Knowledge Representation: Triple Extraction and Information Retrieval From Unstructured Text
- Author
-
Sarhan, Ingy and Sarhan, Ingy
- Abstract
The field of Natural Language Processing (NLP) focuses on developing computational techniques to analyze and extract information from human language. With the exponential growth of unstructured textual data, NLP-based techniques have become essential for extracting valuable insights from this data. However, existing information extraction systems have limitations in terms of extracting valuable information without predefined relations or ontology and storing the extracted knowledge effectively. This Ph.D. thesis aims to enhance open information extraction methods to represent unstructured textual data efficiently and effectively. The first part of the research focuses on Open Information Extraction (OIE) systems and their challenges. Existing OIE methods, including pattern-based and machine learning-based approaches, as well as neural techniques, are analyzed to understand their limitations. A Bidirectional Gated Recurrent Unit (Bi-GRU) OIE model is proposed in Chapter 3, which utilizes contextualized word embeddings to extract relevant triples from unstructured text. Experimental results demonstrate the effectiveness of this model in generating high-quality relation triples. Chapter 4 addresses the lack of labeled data, a common problem in NLP tasks. The research extends the OIE model from Chapter 3 by using learned features to generate relation triples and explores the transferability of these features across different OIE domains and the related task of Relation Extraction (RE). The results show comparable performance with traditional training, indicating the potential of OIE in achieving NLP performance without labeled data. In Chapter 5, the focus shifts to enhancing pre-trained language models for taxonomy classification. Pre-trained language models often struggle with unseen patterns during inference, and the limited size of annotated data poses a challenge. A two-stage fine-tuning procedure, incorporating data augmentation techniques, is proposed to improve
- Published
- 2023