Back to Search Start Over

Transform-data-by-example (TDE)

Authors :
He, Yeye
Chu, Xu
Ganjam, Kris
Zheng, Yudian
Narasayya, Vivek
Chaudhuri, Surajit
Source :
Proceedings of the VLDB Endowment; June 2018, Vol. 11 Issue: 10 p1165-1177, 13p
Publication Year :
2018

Abstract

Today, business analysts and data scientists increasingly need to clean, standardize and transform diverse data sets, such as name, address, date time, and phone number, before they can perform analysis. This process of data transformation is an important part of data preparation, and is known to be difficult and time-consuming for end-users.Traditionally, developers have dealt with these longstanding transformation problems using custom code libraries. They have built vast varieties of custom logic for name parsing and address standardization, etc., and shared their source code in places like GitHub. Data transformation would be a lot easier for end-users if they can discover and reuse such existing transformation logic.We developed Transform-Data-by-Example(TDE), which works like a search engine for data transformations. TDE"indexes" vast varieties of transformation logic in source code, DLLs, web services and mapping tables, so that users only need to provide a few input/output examples to demonstrate a desired transformation, and TDEcan interactively find relevant functions to synthesize new programs consistent with all examples. Using an index of 50K functions crawled from GitHub and Stackoverflow, TDEcan already handle many common transformations not currently supported by existing systems. On a benchmark with over 200 transformation tasks, TDEgenerates correct transformations for 72% tasks, which is considerably better than other systems evaluated. A beta version of TDEfor Microsoft Excel is available via Office store1. Part of the TDEtechnology also ships in Microsoft Power BI.

Details

Language :
English
ISSN :
21508097
Volume :
11
Issue :
10
Database :
Supplemental Index
Journal :
Proceedings of the VLDB Endowment
Publication Type :
Periodical
Accession number :
ejs51532268
Full Text :
https://doi.org/10.14778/3231751.3231766