Author: "M. V. Dos Santos" / Publisher: acm - Searchworks@Jio Institute Digital Library Search Results

1. An Efficient and Scalable MetaFeature-based Document Classification Approach based on Massively Parallel Computing

Author: Wisllay M. V. dos Santos, Wellington Santos Martins, Sergio Canuto, Marcos André Gonçalves, and Thierson Couto Rosa
Subjects: Speedup, Language identification, Computer science, business.industry, Document classification, Sentiment analysis, Recommender system, Machine learning, computer.software_genre, k-nearest neighbors algorithm, Scalability, Artificial intelligence, Data mining, business, Massively parallel, computer
Abstract: The unprecedented growth of available data nowadays has stimulated the development of new methods for organizing and extracting useful knowledge from this immense amount of data. Automatic Document Classification (ADC) is one of such methods, that uses machine learning techniques to build models capable of automatically associating documents to well-defined semantic classes. ADC is the basis of many important applications such as language identification, sentiment analysis, recommender systems, spam filtering, among others. Recently, the use of meta-features has been shown to substantially improve the effectiveness of ADC algorithms. In particular, the use of meta-features that make a combined use of local information (through kNN-based features) and global information (through category centroids) has produced promising results. However, the generation of these meta-features is very costly in terms of both, memory consumption and runtime since there is the need to constantly call the kNN algorithm. We take advantage of the current manycore GPU architecture and present a massively parallel version of the kNN algorithm for highly dimensional and sparse datasets (which is the case for ADC). Our experimental results show that we can obtain speedup gains of up to 15x while reducing memory consumption in more than 5000x when compared to a state-of-the-art parallel baseline. This opens up the possibility of applying meta-features based classification in large collections of documents, that would otherwise take too much time or require the use of an expensive computational platform.
Published: 2015
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

1 results on '"M. V. Dos Santos"'

1. An Efficient and Scalable MetaFeature-based Document Classification Approach based on Massively Parallel Computing

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Language

Database

1 results on '"M. V. Dos Santos"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources