Start Over

DUBLIN -- Document Understanding By Language-Image Network

Authors :: Aggarwal, Kriti
Khandelwal, Aditi
Tanmay, Kumar
Khan, Owais Mohammed
Liu, Qiang
Choudhury, Monojit
Chauhan, Hardik Hansrajbhai
Som, Subhojit
Chaudhary, Vishrav
Tiwary, Saurabh
Aggarwal, Kriti
Khandelwal, Aditi
Tanmay, Kumar
Khan, Owais Mohammed
Liu, Qiang
Choudhury, Monojit
Chauhan, Hardik Hansrajbhai
Som, Subhojit
Chaudhary, Vishrav
Tiwary, Saurabh
Publication Year :: 2023
Abstract: Visual document understanding is a complex task that involves analyzing both the text and the visual elements in document images. Existing models often rely on manual feature engineering or domain-specific pipelines, which limit their generalization ability across different document types and languages. In this paper, we propose DUBLIN, which is pretrained on web pages using three novel objectives: Masked Document Text Generation Task, Bounding Box Task, and Rendered Question Answering Task, that leverage both the spatial and semantic information in the document images. Our model achieves competitive or state-of-the-art results on several benchmarks, such as Web-Based Structural Reading Comprehension, Document Visual Question Answering, Key Information Extraction, Diagram Understanding, and Table Question Answering. In particular, we show that DUBLIN is the first pixel-based model to achieve an EM of 77.75 and F1 of 84.25 on the WebSRC dataset. We also show that our model outperforms the current pixel-based SOTA models on DocVQA, InfographicsVQA, OCR-VQA and AI2D datasets by 4.6%, 6.5%, 2.6% and 21%, respectively. We also achieve competitive performance on RVL-CDIP document classification. Moreover, we create new baselines for text-based datasets by rendering them as document images to promote research in this direction.

Details

Database :: OAIster
Publication Type :: Electronic Resource
Accession number :: edsoai.on1381628757
Document Type :: Electronic Resource

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

DUBLIN -- Document Understanding By Language-Image Network

Abstract

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

DUBLIN -- Document Understanding By Language-Image Network

Abstract

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources