Start Over

ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems

Authors :: Zhang, Yi
Deriu, Jan
Katsogiannis-Meimarakis, George
Kosten, Catherine
Koutrika, Georgia
Stockinger, Kurt
Source :: PVLDB Volume 17, 2023-2024
Publication Year :: 2023
Abstract: Natural Language to SQL systems (NL-to-SQL) have recently shown a significant increase in accuracy for natural language to SQL query translation. This improvement is due to the emergence of transformer-based language models, and the popularity of the Spider benchmark - the de-facto standard for evaluating NL-to-SQL systems. The top NL-to-SQL systems reach accuracies of up to 85\%. However, Spider mainly contains simple databases with few tables, columns, and entries, which does not reflect a realistic setting. Moreover, complex real-world databases with domain-specific content have little to no training data available in the form of NL/SQL-pairs leading to poor performance of existing NL-to-SQL systems. In this paper, we introduce ScienceBenchmark, a new complex NL-to-SQL benchmark for three real-world, highly domain-specific databases. For this new benchmark, SQL experts and domain experts created high-quality NL/SQL-pairs for each domain. To garner more data, we extended the small amount of human-generated data with synthetic data generated using GPT-3. We show that our benchmark is highly challenging, as the top performing systems on Spider achieve a very low performance on our benchmark. Thus, the challenge is many-fold: creating NL-to-SQL systems for highly complex domains with a small amount of hand-made training data augmented with synthetic data. To our knowledge, ScienceBenchmark is the first NL-to-SQL benchmark designed with complex real-world scientific databases, containing challenging training and test data carefully validated by domain experts.<br />Comment: 12 pages, 2 figures, 5 tables

Subjects :: Computer Science - Databases
Computer Science - Artificial Intelligence
Computer Science - Computation and Language
H.2.4
I.2.7

Details

Database :: arXiv
Journal :: PVLDB Volume 17, 2023-2024
Publication Type :: Report
Accession number :: edsarx.2306.04743
Document Type :: Working Paper

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources