Back to Search
Start Over
PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification
- Source :
- Applied Sciences; Volume 12; Issue 9; Pages: 4554
- Publication Year :
- 2022
- Publisher :
- MDPI AG, 2022.
-
Abstract
- Document classification is an important area in Natural Language Processing (NLP). Because a huge amount of scientific papers have been published at an accelerating rate, it is beneficial to carry out intelligent paper classifications, especially fine-grained classification for researchers. However, a public scientific paper dataset for fine-grained classification is still lacking, so the existing document classification methods have not been put to the test. To fill this vacancy, we designed and collected the PaperNet-Dataset that consists of multi-modal data (texts and figures). PaperNet 1.0 version contains hierarchical categories of papers in the fields of computer vision (CV) and NLP, 2 coarse-grained and 20 fine-grained (7 in CV and 13 in NLP). We ran current mainstream models on the PaperNet-Dataset, along with a multi-modal method that we propose. Interestingly, none of these methods reaches an accuracy of 80% in fine-grained classification, showing plenty of room for improvement. We hope that PaperNet-Dataset will inspire more work in this challenging area.
- Subjects :
- Fluid Flow and Transfer Processes
ComputingMethodologies_PATTERNRECOGNITION
artificial intelligence application
dataset
multi-modal information processing
machine learning
paper classification
Process Chemistry and Technology
General Engineering
General Materials Science
Instrumentation
Computer Science Applications
Subjects
Details
- ISSN :
- 20763417
- Volume :
- 12
- Database :
- OpenAIRE
- Journal :
- Applied Sciences
- Accession number :
- edsair.doi.dedup.....0b2dadc4043dc83d308962048c253ec5