Back to Search Start Over

VFX: A VISION-BASED APPROACH TO FORUM DATA EXTRACTION.

Authors :
Chen Hui Ng
Choon Jin Ng
Tong Ming Lim
Source :
International Conference on ICT, Society & Human Beings; 2019, p317-324, 8p
Publication Year :
2019

Abstract

Rapid development of the Internet has dramatically increased information available on the World Wide Web. Amongst these vast sources of information, discussion forums may be useful for businesses and organizations to get a glimpse of customer opinions or to extract product information. Little existing work reported in the literature has systemically investigated the problem of extracting user posts from forum sites. Extracting forum posts accurately raises a few challenges. First, forum comes in a variety of templates and this makes it hard to formalize general rules to extract forum posts. Second, each post record might appear relatively different from each other. This introduces inconsistency in the Document Object Model (DOM) for comparisons. Third, each post in the forum can consist of complicated subtrees rather than a single node in the DOM tree. To tackle these challenges, a vision-based approach was introduced to automatically extract posts from a web forum page based on its visual cues. In this paper, we propose a visual-based forum extraction (VFX) algorithm that can extract user posts in any types of forum without the need to inspect its template structure in advance. [ABSTRACT FROM AUTHOR]

Details

Language :
English
Database :
Complementary Index
Journal :
International Conference on ICT, Society & Human Beings
Publication Type :
Conference
Accession number :
138554269