On Filter Generalization for Music Bandwidth Extension Using Deep Neural Networks"

My paper titled “On Filter Generalization for Music Bandwidth Extension Using Deep Neural Networks” is published on IEEE Journal of Selected Topics in Signal Processing. The supplementary material is presented at the bottom of this page.

The paper is available on IEEE Xplore and on ArXiv. The source code is available on Github. Our paper can be cited as:

@article{9257456,
  title={On Filter Generalization for Music Bandwidth Extension Using Deep Neural Networks}, 
  author={Sulun, Serkan and Davies, Matthew E.P.},
  journal={IEEE Journal of Selected Topics in Signal Processing}, 
  year={2021},
  volume={15},
  number={1},
  pages={132-142},
  doi={10.1109/JSTSP.2020.3037485}}

Abstract

In this paper, we address a sub-topic of the broad domain of audio enhancement, namely musical audio bandwidth extension. We formulate the bandwidth extension problem using deep neural networks, where a band-limited signal is provided as input to the network, with the goal of reconstructing a full-bandwidth output. Our main contribution centers on the impact of the choice of low pass filter when training and subsequently testing the network. For two different state of the art deep architectures, ResNet and U-Net, we demonstrate that when the training and testing filters are matched, improvements in signal-to-noise ratio (SNR) of up to 7 dB can be obtained. However, when these filters differ, the improvement falls considerably and under some training conditions results in a lower SNR than the band-limited input. To circumvent this apparent overfitting to filter shape, we propose a data augmentation strategy which utilizes multiple low pass filters during training and leads to improved generalization to unseen filtering conditions at test time.

SUPPLEMENTARY MATERIAL

Here are the qualitative results on music bandwidth enhancement using deep neural networks. All samples are obtained using our test set, namely the DSD100 test split. The quantitative results including the signal-to-noise ratio (SNR) and mean absolute distance of the VGG embeddings (VGG), with respect to the ground-truth, for all the test songs in their full length, is also available.

Secretariat - Over The Top

Ground-truth

Inputs and outputs

	Chebyshev1 - 6th order (seen filter)	Butterworth - 6th order (unseen filter)
Input
U-Net
U-Net w/ data augmentation
U-Net w/ batch normalization
U-Net w/ dropout
ResNet
ResNet w/ data augmentation
ResNet w/ batch normalization
ResNet w/ dropout

Skelpolu - Resurrection

Ground-truth

Inputs and outputs

	Chebyshev1-6 (seen filter)	Butterworth-6 (unseen filter)
Input
U-Net
U-Net w/ data augmentation
U-Net w/ batch normalization
U-Net w/ dropout
ResNet
ResNet w/ data augmentation
ResNet w/ batch normalization
ResNet w/ dropout

M.E.R.C. - Music Knockout

Ground-truth

Inputs and outputs

	Chebyshev1-6 (seen filter)	Butterworth-6 (unseen filter)
Input
U-Net
U-Net w/ data augmentation
U-Net w/ batch normalization
U-Net w/ dropout
ResNet
ResNet w/ data augmentation
ResNet w/ batch normalization
ResNet w/ dropout

Written on August 20, 2020