1. Generating data to train convolutional neural networks for classical music source separation
- Author
-
Marius Miron, Janer, J., Gómez, E., Tapio Lokki, Jukka Pätynen, and Vesa Välimäki
- Abstract
Comunicació presentada a la conferència 14th Sound and Music Computing Conference, celebrada a Finlàndia del 5 al 8 de juliol de 2017. Deep learning approaches have become increasingly popular in estimating time-frequency masks for audio source separation. However, training neural networks usually requires a considerable amount of data. Music data is scarce, particularly for the task of classical music source separation, where we need multi-track recordings with isolated instruments. In this work, we depart from the assumption that all the renditions of a piece are based on the same musical score, and we can generate multiple renditions of the score by synthesizing it with different performance properties, e.g. tempo, dynamics, timbre and local timing variations. We then use this data to train a convolutional neural network (CNN) which can separate with low latency all the renditions of a score or a set of scores. The trained model is tested on real life recordings and is able to effectively separate the corresponding sources. This work follows the principle of research reproducibility, providing related data and code, and can be extended to separate other pieces. The TITANX used for this research was donated by the NVIDIA Corporation. This work is partially supported by the Spanish Ministry of Economy and Competitiveness under CASAS project (TIN2015-70816-R) and by the Spanish Ministry of Economy and Competitiveness under the Maria de Maeztu Units of Excellence Programme (MDM- 2015-0502). We thank Agustin Martorell for his help with Sibelius and Pritish Chandna for his useful feedback.