Para acceder a los documentos con el texto completo, por favor, siga el siguiente enlace: http://hdl.handle.net/10230/37152

End-to-end learning for music audio tagging at scale
Pons Puig, Jordi; Nieto Caballero, Oriol; Prockup, Matthew; Schmidt, Erik M.; Ehmann, Andreas F.; Serra, Xavier
Comunicació presentada a: 19th International Society for Music Information Retrieval Conference (ISMIR 2018), celebrat del 23 al 27 de setembre de 2018 a París, França.
The lack of data tends to limit the outcomes of deep learning research, particularly when dealing with end-to-end learning stacks processing raw data such as waveforms. In this study, 1.2M tracks annotated with musical labels are available to train our end-to-end models. This large amount of data allows us to unrestrictedly explore two different design paradigms for music auto-tagging: assumption-free models - using waveforms as input with very small convolutional filters; and models that rely on domain knowledge - log-mel spectrograms with a convolutional neural network designed to learn timbral and temporal features. Our work focuses on studying how these two types of deep architectures perform when datasets of variable size are available for training: the MagnaTagATune (25k songs), the Million Song Dataset (240k songs), and a private dataset of 1.2M songs. Our experiments suggest that music domain assumptions are relevant when not enough training data are available, thus showing how waveform-based models outperform spectrogram-based ones in large-scale data scenarios.
This work was partially supported by the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502) – and we are grateful for the GPUs donated by NVidia.
© Jordi Pons, Oriol Nieto, Matthew Prockup, ErikSchmidt, Andreas Ehmann, Xavier Serra. Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: Jordi Pons, Oriol Nieto, Matthew Prockup, Erik Schmidt, Andreas Ehmann, Xavier Serra. “END-TO-END LEARNING FOR MUSIC AUDIO TAGGING AT SCALE”, 19th International Society for Music Infor-mation Retrieval Conference, Paris, France, 2018.
https://creativecommons.org/licenses/by/4.0/
Objeto de conferencia
Artículo - Versión publicada
International Society for Music Information Retrieval (ISMIR)
         

Mostrar el registro completo del ítem

Documentos relacionados

Otros documentos del mismo autor/a

Pons Puig, Jordi; Nieto, Oriol; Prockup, Matthew; Schmidt, Erik M.; Ehmann, Andreas F.; Serra, Xavier
Won, Minz; Chun, Sangyuk; Nieto Caballero, Oriol; Serra, Xavier
Oramas, Sergio; Nieto Caballero, Oriol; Sordo, Mohamed; Serra, Xavier
Oramas, Sergio; Nieto Caballero, Oriol; Barbieri, Francesco; Serra, Xavier