1. Information Extraction Based on Line Chart for Research Paper in Chemical Science
- Author
-
Hairong Yan and Shaohan Yang
- Subjects
Line chart ,information extraction ,neural network ,YOLOv8 ,paper fast reading ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
In chemical science research progress, reading other researcher research paper in same type area, will speed up the generation of their own research method. Researchers want to automate the extraction of information from charts in paper, because there are detail of what element involved and how experiment environment changed in the charts. This paper designs and implements a line chart information extraction algorithm using neural networks and Hough Transform. First, a large dataset of line charts was collected and annotated to provide a foundation for neural network training. Secondly, detect the line charts in the literature and save them as separate images. Then, the Hough transform line detection algorithm was used to detect the axes, and the line charts were segmented. For each segmented part, different recognition algorithms were designed to identify various elements in the line charts, including axes, line regions, and legends. To validate the effectiveness of the algorithm, experimental tests were conducted in the field of inorganic catalysis, automatically extracting information from line charts in the literature. The experimental results show that the designed algorithm can accurately recognize various elements in line charts and effectively extract experimental data. Compared with traditional manual methods, automated extraction not only saves a considerable amount of time but also improves the accuracy and consistency of data extraction on paper fast reading. In summary, this method provides researchers with an efficient tool that accelerates the acquisition and comparison of experimental data, thereby advancing the progress of related research electronic document.
- Published
- 2025
- Full Text
- View/download PDF