Back to Search
Start Over
DIR: A Large-Scale Dialogue Rewrite Dataset for Cross-Domain Conversational Text-to-SQL
- Source :
- Applied Sciences, Vol 13, Iss 4, p 2262 (2023)
- Publication Year :
- 2023
- Publisher :
- MDPI AG, 2023.
-
Abstract
- Semantic co-reference and ellipsis always lead to information deficiency when parsing natural language utterances with SQL in a multi-turn dialogue (i.e., conversational text-to-SQL task). The methodology of dividing a dialogue understanding task into dialogue utterance rewriting and language understanding is feasible to tackle this problem. To this end, we present a two-stage framework to complete conversational text-to-SQL tasks. To construct an efficient rewriting model in the first stage, we provide a large-scale dialogue rewrite dataset (DIR), which is extended from two cross-domain conversational text-to-SQL datasets, SParC and CoSQL. The dataset contains 5908 dialogues involving 160 domains. Therefore, it not only focuses on conversational text-to-SQL tasks, but is also a valuable corpus for dialogue rewrite study. In experiments, we validate the efficiency of our annotations with a popular text-to-SQL parser, RAT-SQL. The experiment results illustrate 11.81 and 27.17 QEM accuracy improvement on SParC and CoSQL, respectively, when we eliminate the semantic incomplete representations problem by directly parsing the golden rewrite utterances. The experiment results of evaluating the performance of the two-stage frameworks using different rewrite models show that the efficiency of rewrite models is important and still needs improvement. Additionally, as a new benchmark of the dialogue rewrite task, we also report the performance results of different baselines for related studies. Our dataset will be publicly available once this paper is accepted.
Details
- Language :
- English
- ISSN :
- 13042262 and 20763417
- Volume :
- 13
- Issue :
- 4
- Database :
- Directory of Open Access Journals
- Journal :
- Applied Sciences
- Publication Type :
- Academic Journal
- Accession number :
- edsdoj.6f3abe3127fa4839b386fe764fb98081
- Document Type :
- article
- Full Text :
- https://doi.org/10.3390/app13042262