Back to Search Start Over

AN UNSUPERVISED HEADER INDEPENDENT APPROACH TOWARDS SUBJECT COLUMN DETECTION IN TABLES

Authors :
K. Karpaga Priyaa
A. Meena Kabilan
C. Saranya
Source :
ICTACT Journal on Soft Computing, Vol 8, Iss 4, Pp 1714-1719 (2018)
Publication Year :
2018
Publisher :
ICT Academy of Tamil Nadu, 2018.

Abstract

Subject columns are the important columns that help infer the correct subject matter of the table. The main challenging problem is detecting appropriate subject columns in tables with more than the same. Existing approaches restricted to identification of only one subject column in tables with more than one subject column. With this, it is not possible to infer the correct subject matter of the table. In case of subject column detection, the existing approaches requires table information such as table headers, additional evidences about the table from web pages and also training in prior with a labeled set of tables. To solve these issues, in this paper, we proposed a simple header independent semantic based Concept-Voting Subject Column Detection (CVSCD) algorithm. The proposed algorithm identifies possible subject columns in table with more than one subject column, which provides a way to infer table’s correct subject matter. Moreover, CVSCD is unsupervised and works for tables without any table information such as table caption, table headers etc. Experimental results have shown that our approach achieved better accuracy compared to the existing approaches on a corpus of tables extracted from web.

Details

Language :
English
ISSN :
09766561 and 22296956
Volume :
8
Issue :
4
Database :
Directory of Open Access Journals
Journal :
ICTACT Journal on Soft Computing
Publication Type :
Academic Journal
Accession number :
edsdoj.66f2970e16e540d89be184c7b96246be
Document Type :
article
Full Text :
https://doi.org/10.21917/ijsc.2018.0240