Back to Search
Start Over
Research paper classification systems based on TF-IDF and LDA schemes
- Source :
- Human-Centric Computing and Information Sciences, Vol 9, Iss 1, Pp 1-21 (2019)
- Publication Year :
- 2019
- Publisher :
- Korea Information Processing Society-Computer Software Research Group, 2019.
-
Abstract
- With the increasing advance of computer and information technologies, numerous research papers have been published online as well as offline, and as new research fields have been continuingly created, users have a lot of trouble in finding and categorizing their interesting research papers. In order to overcome the limitations, this paper proposes a research paper classification system that can cluster research papers into the meaningful class in which papers are very likely to have similar subjects. The proposed system extracts representative keywords from the abstracts of each paper and topics by Latent Dirichlet allocation (LDA) scheme. Then, the K-means clustering algorithm is applied to classify the whole papers into research papers with similar subjects, based on the Term frequency-inverse document frequency (TF-IDF) values of each paper.
- Subjects :
- Scheme (programming language)
General Computer Science
Computer science
LDA
K-means clustering
02 engineering and technology
Latent Dirichlet allocation
lcsh:QA75.5-76.95
symbols.namesake
0202 electrical engineering, electronic engineering, information engineering
lcsh:Information theory
Cluster analysis
tf–idf
computer.programming_language
Information retrieval
Paper classification
business.industry
TF-IDF
Information technology
020206 networking & telecommunications
Class (biology)
lcsh:Q350-390
Term (time)
ComputingMethodologies_PATTERNRECOGNITION
symbols
020201 artificial intelligence & image processing
lcsh:Electronic computers. Computer science
business
computer
Subjects
Details
- Language :
- English
- ISSN :
- 21921962
- Volume :
- 9
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- Human-Centric Computing and Information Sciences
- Accession number :
- edsair.doi.dedup.....8734360f27253181fd4c94af21ed075a