Título:
|
Scaling a convolutional neural network for classification of adjective noun pairs with TensorFlow on GPU clusters
|
Autor/a:
|
Campos, Víctor; Sastre, Francesc; Yagües, Maurici; Torres Viñals, Jordi; Giró Nieto, Xavier
|
Otros autores:
|
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors; Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions; Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions; Universitat Politècnica de Catalunya. GPI - Grup de Processament d'Imatge i Vídeo |
Abstract:
|
Deep neural networks have gained popularity in recent years, obtaining outstanding results in a wide range of applications such as computer vision in both academia and multiple industry areas. The progress made in recent years cannot be understood without taking into account the technological advancements seen in key domains such as High Performance Computing, more specifically in the Graphic Processing Unit (GPU) domain. These kind of deep neural networks need massive amounts of data to effectively train the millions of parameters they contain, and this training can take up to days or weeks depending on the computer hardware we are using. In this work, we present how the training of a deep neural network can be parallelized on a distributed GPU cluster. The effect of distributing the training process is addressed from two different points of view. First, the scalability of the task and its performance in the distributed setting are analyzed. Second, the impact of distributed training methods on the training times and final accuracy of the models is studied. We used TensorFlow on top of the GPU cluster of servers with 2 K80 GPU cards, at Barcelona Supercomputing Center (BSC). The results show an improvement for both focused areas. On one hand, the experiments show promising results in order to train a neural network faster. The training time is decreased from 106 hours to 16 hours in our experiments. On the other hand we can observe how increasing the numbers of GPUs in one node rises the throughput, images per second, in a near-linear way. Morever an additional distributed speedup of 10.3 is achieved with 16 nodes taking as baseline the speedup of one node. |
Abstract:
|
This work is partially supported by the Spanish Ministry of Economy and Competitivity under contract TIN2012-34557, by the BSC-CNS Severo Ochoa program (SEV-2011-00067),
by the SGR programmes (2014-SGR-1051 and 2014-SGR-1421 ) of the Catalan Government and by the framework of the
project BigGraph TEC2013-43935-R, funded by the Spanish Ministerio de Economia y Competitividad and the European Regional Development Fund (ERDF). We also would like to thank the technical support team at the Barcelona Supercomputing center (BSC) especially to Carlos Tripiana. |
Abstract:
|
Peer Reviewed |
Materia(s):
|
-Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors -Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic -Computer vision -Graphics processing units -Machine learning -Distributed computing -Parallel systems -Deep learning -Convolutional neural networks -TensorFlow -Visió per ordinador -Aprenentatge automàtic |
Derechos:
|
|
Tipo de documento:
|
Artículo - Versión presentada Objeto de conferencia |
Editor:
|
Institute of Electrical and Electronics Engineers (IEEE)
|
Compartir:
|
|