Back to Search Start Over

Polyphonic pitch perception in rooms using deep learning networks with data rendered in auditory virtual environments

Authors :
Matthew Goodheart
Jeremy Stewart
Jonas Braasch
Michael P. Perrone
Curtis Bahn
David A. Dahlbom
Nathan Keil
Mary Simoni
Source :
The Journal of the Acoustical Society of America. 145:1784-1784
Publication Year :
2019
Publisher :
Acoustical Society of America (ASA), 2019.

Abstract

This paper proposes methods for generation and implementation of uniform, large-scale data from auralized MIDI music files for use with deep learning networks for polyphonic pitch perception and impulse response recognition. This includes synthesis and sound source separation of large batches of multitrack MIDI files in non-real time, convolution with artificial binaural room impulse responses, and techniques for neural network training. Using ChucK, individual tracks for each MIDI file, containing the ground truth for pitch and other parameters, are processed concurrently with variable Synthesis ToolKit (STK) instruments, and the audio output is written to separate wave files in order to create multiple incoherent sound sources. Then, each track is convolved with a measured or synthetic impulse response that corresponds to the virtual position of the instrument in the room before all tracks are digitally summed. The database now contains the symbolic description in the form of MIDI commands and the auralized music performances. A polyphonic pitch model based on an array of autocorrelation functions for individual frequency bands is used to train a neural network and analyze the data [Work supported by IBM AIRC grant and NSF BCS-1539276.]This paper proposes methods for generation and implementation of uniform, large-scale data from auralized MIDI music files for use with deep learning networks for polyphonic pitch perception and impulse response recognition. This includes synthesis and sound source separation of large batches of multitrack MIDI files in non-real time, convolution with artificial binaural room impulse responses, and techniques for neural network training. Using ChucK, individual tracks for each MIDI file, containing the ground truth for pitch and other parameters, are processed concurrently with variable Synthesis ToolKit (STK) instruments, and the audio output is written to separate wave files in order to create multiple incoherent sound sources. Then, each track is convolved with a measured or synthetic impulse response that corresponds to the virtual position of the instrument in the room before all tracks are digitally summed. The database now contains the symbolic description in the form of MIDI commands and the aural...

Details

ISSN :
00014966
Volume :
145
Database :
OpenAIRE
Journal :
The Journal of the Acoustical Society of America
Accession number :
edsair.doi...........e0eaa35d18b8c7d6c01e531dfa29cf4d
Full Text :
https://doi.org/10.1121/1.5101527