Start Over

Building a Multimodal Dataset of Academic Paper for Keyword Extraction.

Authors :: Zhang, Jingyu
Yan, Xinyi
Xiang, Yi
Zhang, Yingyi
Zhang, Chengzhi
Source :: Proceedings of the Association for Information Science & Technology; Oct2024, Vol. 61 Issue 1, p435-446, 12p
Publication Year :: 2024
Abstract: Up to this point, keyword extraction task typically relies solely on textual data. Neglecting visual details and audio features from image and audio modalities leads to deficiencies in information richness and overlooks potential correlations, thereby constraining the model's ability to learn representations of the data and the accuracy of model predictions. Furthermore, the currently available multimodal datasets for keyword extraction task are particularly scarce, further hindering the progress of research on multimodal keyword extraction task. Therefore, this study constructs a multimodal dataset of academic paper consisting of 1,000 samples, with each sample containing paper text, images, audios and keywords. Based on unsupervised and supervised methods of keyword extraction, experiments are conducted using textual data from papers, as well as text extracted from images and audio. The aim is to investigate the differences in performance in keyword extraction task with respect to different modal information and the fusion of multimodal information. The experimental results indicate that text from different modalities exhibits distinct characteristics in the model. The concatenation of paper text, image text and audio text can effectively enhance the keyword extraction performance of academic papers. [ABSTRACT FROM AUTHOR]

Subjects :: DATA mining
TEXT mining
KEYWORDS
PREDICTION models
DATA fusion (Statistics)

Details

Language :: English
ISSN :: 23739231
Volume :: 61
Issue :: 1
Database :: Complementary Index
Journal :: Proceedings of the Association for Information Science & Technology
Publication Type :: Conference
Accession number :: 180279940
Full Text :: https://doi.org/10.1002/pra2.1040