Back to Search
Start Over
Comparison of classification model and annotation method for Undiksha’s official documents
- Source :
- Journal of Physics: Conference Series. 1516:012026
- Publication Year :
- 2020
- Publisher :
- IOP Publishing, 2020.
-
Abstract
- Shakuntala is a system that manages official documents and letters at UniversitasPendidikanGanesha. The system stores various documents in PDF format which are categorized by type of document. But Shakuntala can only receive scanned documents, and document categorization were done manually by the operator. Documents uploaded to Shakuntalaalso generally contain information about people who were manually tagged by the operator. This causes inefficiencies that should be carried out automatically by machine. This study aimed at finding the best classification model for determining document categories. In addition, this research also intent to figure out the best method for tagging the people listed on the document. The results of the study showed that the Decision Tree classification model was the best model with an accuracy of 83.06% compared to KNN and Naive Bayes. As for the annotation of the person’s name, the Levenshtein distance method with a similarity threshold of 95% obtained an accuracy of 68.20%.
- Subjects :
- History
Computer science
business.industry
Operator (linguistics)
Decision tree
computer.software_genre
Levenshtein distance
Computer Science Applications
Education
Annotation
Upload
Naive Bayes classifier
Categorization
Similarity (network science)
ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
Artificial intelligence
business
computer
Natural language processing
Subjects
Details
- ISSN :
- 17426596 and 17426588
- Volume :
- 1516
- Database :
- OpenAIRE
- Journal :
- Journal of Physics: Conference Series
- Accession number :
- edsair.doi...........d4fa6a96a6ce75e759784fe65e4b24a5
- Full Text :
- https://doi.org/10.1088/1742-6596/1516/1/012026