Back to Search Start Over

Unstructured Document Information Extraction Method with Multi-Faceted Domain Knowledge Graph Assistance for M2M Customs Risk Prevention and Screening Application.

Authors :
Tian, Fengchun
Wang, Haochen
Wan, Zhenlong
Liu, Ran
Liu, Ruilong
Lv, Di
Lin, Yingcheng
Source :
Electronics (2079-9292); May2024, Vol. 13 Issue 10, p1941, 19p
Publication Year :
2024

Abstract

As a crucial national security defense line, the existing risk prevention and screening system of customs falls short in terms of intelligence and diversity for risk identification factors. Hence, the urgent issues to be addressed in the risk identification system include intelligent extraction technology for key information from Customs Unstructured Accompanying Documents (CUADs) and the reliability of the extraction results. In the customs scenario, OCR is employed for M2M interactions, but current models have difficulty adapting to diverse image qualities and complex customs document content. We propose a hybrid mutual learning knowledge distillation (HMLKD) method for optimizing a pre-trained OCR model's performance against such challenges. Additionally, current models lack effective incorporation of domain-specific knowledge, resulting in insufficient text recognition accuracy for practical customs risk identification. We propose a customs domain knowledge graph (CDKG) developed using CUAD knowledge and propose an integrated CDKG post-OCR correction method (iCDKG-PostOCR) based on CDKG. The results on real data demonstrate that the accuracies improve for code text fields to 97.70%, for character type fields to 96.55%, and for numerical type fields to 96.00%, with a confidence rate exceeding 99% for each. Furthermore, the Customs Health Certificate Extraction System (CHCES) developed using the proposed method has been implemented and verified at Tianjin Customs in China, where it has showcased outstanding operational performance. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20799292
Volume :
13
Issue :
10
Database :
Complementary Index
Journal :
Electronics (2079-9292)
Publication Type :
Academic Journal
Accession number :
177488050
Full Text :
https://doi.org/10.3390/electronics13101941