Back to Search Start Over

LWCS: A large-scale web page classification system based on anchor graph hashing

Authors :
Chengzhang Zhu
Xiang Fu
Xv Lan
Yi Zheng
Weihong Han
Chengcheng Sun
Source :
2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS).
Publication Year :
2015
Publisher :
IEEE, 2015.

Abstract

Nowadays, while we are enjoying the convenience brought by such a huge repository of online web information, we may come across difficulties in finding the web pages we want related to particular information we are searching for. Hence, it is essential to classify web documents to facilitate the search and retrieval of pages. Existing algorithms work well with a small quantity of web pages, whereas, they become slow and even non-effective while dealing with a large scale of web pages. Recently, some of these algorithms were adapted to distributed platforms which boosted their classification speeds effectively. However, due to high dimensions of web page features, the parallel classifiers were still trained with limited capacity training sets. In addition, these methods didn't improve the classification itself, merely boosted by high computing performance of distributed platforms. So oriented to large-scale web page classification, we propose to integrate anchor graph hashing with K-Nearest Neighbour(KNN) classifier to reduce the pages' original feature dimensions. The hash value of each page is used for training and classification instead of the original vectors. Experimental comparison with the original KNN on a large dataset demonstrates the efficacy of our proposed method.

Details

Database :
OpenAIRE
Journal :
2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS)
Accession number :
edsair.doi...........8b1744c5674886f2de6d76b11a926d53