Back to Search Start Over

Towards Interactively Improving ML Data Preparation Code via 'Shadow Pipelines'

Authors :
Grafberger, Stefan
Groth, Paul
Schelter, Sebastian
Publication Year :
2024

Abstract

Data scientists develop ML pipelines in an iterative manner: they repeatedly screen a pipeline for potential issues, debug it, and then revise and improve its code according to their findings. However, this manual process is tedious and error-prone. Therefore, we propose to support data scientists during this development cycle with automatically derived interactive suggestions for pipeline improvements. We discuss our vision to generate these suggestions with so-called shadow pipelines, hidden variants of the original pipeline that modify it to auto-detect potential issues, try out modifications for improvements, and suggest and explain these modifications to the user. We envision to apply incremental view maintenance-based optimisations to ensure low-latency computation and maintenance of the shadow pipelines. We conduct preliminary experiments to showcase the feasibility of our envisioned approach and the potential benefits of our proposed optimisations.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2404.19591
Document Type :
Working Paper
Full Text :
https://doi.org/10.1145/3650203.3663327