Back to Search
Start Over
VFX: A VISION-BASED APPROACH TO FORUM DATA EXTRACTION.
- Source :
- International Conference on ICT, Society & Human Beings; 2019, p317-324, 8p
- Publication Year :
- 2019
-
Abstract
- Rapid development of the Internet has dramatically increased information available on the World Wide Web. Amongst these vast sources of information, discussion forums may be useful for businesses and organizations to get a glimpse of customer opinions or to extract product information. Little existing work reported in the literature has systemically investigated the problem of extracting user posts from forum sites. Extracting forum posts accurately raises a few challenges. First, forum comes in a variety of templates and this makes it hard to formalize general rules to extract forum posts. Second, each post record might appear relatively different from each other. This introduces inconsistency in the Document Object Model (DOM) for comparisons. Third, each post in the forum can consist of complicated subtrees rather than a single node in the DOM tree. To tackle these challenges, a vision-based approach was introduced to automatically extract posts from a web forum page based on its visual cues. In this paper, we propose a visual-based forum extraction (VFX) algorithm that can extract user posts in any types of forum without the need to inspect its template structure in advance. [ABSTRACT FROM AUTHOR]
- Subjects :
- DATA extraction
INTERNET forums
WORLD Wide Web
INTERNET
WEBSITES
Subjects
Details
- Language :
- English
- Database :
- Complementary Index
- Journal :
- International Conference on ICT, Society & Human Beings
- Publication Type :
- Conference
- Accession number :
- 138554269