Título:
|
Correspondence between audio and visual deep models for musical instrument detection in video recordings
|
Autor/a:
|
Slizovskaia, Olga; Gómez Gutiérrez, Emilia, 1975-; Haro Ortega, Gloria
|
Abstract:
|
Comunicació presentada a: 18th International Society for Music Information Retrieval Conference (ISMIR17) celebrat del 23 al 27 d'octubre de 2017 a Suzhou, Xina. |
Abstract:
|
This work aims at investigating cross-modal connections
between audio and video sources in the task of musical
instrument recognition. We also address in this work the
understanding of the representations learned by convolutional
neural networks (CNNs) and we study feature correspondence
between audio and visual components of a multimodal
CNN architecture. For each instrument category,
we select the most activated neurons and investigate existing
cross-correlations between neurons from the audio and
video CNN which activate the same instrument category.
We analyse two training schemes for multimodal applications
and perform a comparative analysis and visualisation
of model predictions. |
Abstract:
|
This work is partly supported by the Maria de Maeztu
Units of Excellence Programme (MDM-2015-0502). We
gratefully acknowledge the support of NVIDIA Corporation
with the donation of the Titan X GPU and WiMIR
society for covering the registration expenses. |
Derechos:
|
© Olga Slizovskaia, Emilia Gomez, Gloria Haro. Licensed under a Creative Commons Attribution 4.0 International License (CC BY4.0). Attribution: Olga Slizovskaia, Emilia Gomez, Gloria Haro. “Correspondence between audio and visual deep models for musical instrument detection in video recordings”, 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017.
https://creativecommons.org/licenses/by/4.0/
|
Tipo de documento:
|
Objeto de conferencia Artículo - Versión publicada |
Editor:
|
International Society for Music Information Retrieval (ISMIR)
|
Compartir:
|
|