1. A Semantic Approach for Tweet Categorization
- Author
-
Ben Ltaifa Ibtihel, Hlaoua Lobna, and Ben Jemaa Maher
- Subjects
Microblogging ,Computer science ,business.industry ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,02 engineering and technology ,computer.software_genre ,eXtended WordNet ,Feature (linguistics) ,Categorization ,Bag-of-words model ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,General Earth and Planetary Sciences ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,Social media ,Artificial intelligence ,business ,Classifier (UML) ,computer ,Natural language processing ,General Environmental Science - Abstract
The explosion of social media and microblogging services has gradually increased the microblogging data and particularly tweets data. In microblogging services such as Twitter, the users may become overwhelmed by the rise of data. Although, Twitter allows people to micro-blog about a broad range of topics in real time, it is often hard to understand what these tweets are about. In this work, we study the problem of Tweet Categorization (TC), which aims to automatically classify tweets based on their topic. The accurate TC, however, is a challenging task within the 140-character limit imposed by Twitter. The majority of TC approaches use lexical features such as Bag of Words (BoW) and Bag of Entities (BoE) extracted from a Tweet content. In this paper, we propose a semantic approach of improving the accuracy of TC based on feature expansion from external Knowledge Bases (KBs) and the use of eXtended WordNet Domain as a classifier. In particular, we propose a deep enrichment strategy to extend tweets with additional features by exploiting the concepts present in the semantic graph structures of the KBs. Then, our supervised categorization relies only on the ontological knowledge and classifier training is not required. Empirical results indicate that this enriched representation of text items can substantially improve the TC performance.
- Published
- 2018