Title:
|
Studying the impact of the Full-Network embedding on multimodal pipelines
|
Author:
|
Vilalta, Armand; Garcia-Gasulla, Dario; Pares, Ferran; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Moya-Sánchez, Ulises; Cortés García, Claudio Ulises
|
Other authors:
|
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors; Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions |
Abstract:
|
The current state of the art for image annotation and image retrieval tasks is obtained through deep neural network multimodal pipelines, which combine an image representation and a text representation into a shared embedding space. In this paper we evaluate the impact of using the Full-Network embedding (FNE) in this setting, replacing the original image representation in four competitive multimodal embedding generation schemes. Unlike the one-layer image embeddings typically used by most approaches, the Full-Network embedding provides a multi-scale discrete representation of images, which results in richer characterisations. Extensive testing is performed on three different datasets comparing the performance of the studied variants and the impact of the FNE on a levelled playground, i.e., under equality of data used, source CNN models and hyper-parameter tuning. The results obtained indicate that the Full-Network embedding is consistently superior to the one-layer embedding. Furthermore, its impact on performance is superior to the improvement stemming from the other variants studied. These results motivate the integration of the Full-Network embedding on any multimodal embedding generation scheme. |
Abstract:
|
This work is partially supported by the Joint Study Agreement no. W156463 under the IBM/BSC Deep Learning Center agreement, by the Spanish Government through Programa Severo Ochoa (SEV-2015-
0493), by the Spanish Ministry of Science and Technology through TIN2015-65316-P project and by the Generalitat de Catalunya (contracts 2014-SGR-1051), and by the Core Research for Evolutional Science and
Technology (CREST) program of Japan Science and Technology Agency (JST). |
Abstract:
|
Peer Reviewed |
Subject(s):
|
-Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Telemàtica i xarxes d'ordinadors -Neural networks (Computer science) -Multimodal embedding -Full-network embedding -Caption retrieval -Image retrieval -Deep neural network -Xarxes neuronals (Informàtica) |
Rights:
|
|
Document type:
|
Article - Submitted version Article |
Published by:
|
IOS Press
|
Share:
|
|