Título:
|
End-to-end learning for music audio tagging at scale
|
Autor/a:
|
Pons Puig, Jordi; Nieto, Oriol; Prockup, Matthew; Schmidt, Erik M.; Ehmann, Andreas F.; Serra, Xavier
|
Abstract:
|
Comunicació presentada a: Workshop Machine Learning for Audio Signal Processing at NIPS 2017 (ML4Audio@NIPS17) celebrat del 4 al 9 de desembre de 2017 a Long Beach, California. |
Abstract:
|
The lack of data tends to limit the outcomes of deep learning research – specially,
when dealing with end-to-end learning stacks processing raw data such as waveforms.
In this study we make use of musical labels annotated for 1.2 million tracks.
This large amount of data allows us to unrestrictedly explore different front-end
paradigms: from assumption-free models – using waveforms as input with very
small convolutional filters; to models that rely on domain knowledge – log-mel
spectrograms with a convolutional neural network designed to learn temporal and
timbral features. Results suggest that while spectrogram-based models surpass
their waveform-based counterparts, the difference in performance shrinks as more
data are employed. |
Abstract:
|
This work is partially supported by the Maria de Maeztu Programme (MDM-2015-0502). |
Derechos:
|
© Sound & Music Computing
|
Tipo de documento:
|
Objeto de conferencia Artículo - Versión publicada |
Compartir:
|
|