Para acceder a los documentos con el texto completo, por favor, siga el siguiente enlace: http://hdl.handle.net/2117/118042

Asynchronous and exact forward recovery for detected errors in iterative solvers
Jaulmes, Luc Etienne; Casas, Marc; Moreto Planas, Miquel; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors; Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Current trends and projections show that faults in computer systems become increasingly common. Such errors may be detected, and possibly corrected transparently, e.g. by Error Correcting Codes (ECC). For a program to be fault-tolerant, it needs to also handle the Errors that are Detected and Uncorrected (DUE), such as an ECC encountering too many bit flips in a codeword. While correcting an error has an overhead in itself, it can also affect the progress of a program. The most generic technique, rolling back the program state to a previously taken checkpoint, sets back any progress done since then. Alternately, application specific techniques exist, such as restarting an iterative program with its latest iteration's values as initial guess.
This manuscript is the journal extension of a previously published conference paper [25]. This work has been partially supported by the European Research Council under the European Union’s 7th FP, ERC Advanced Grant 321253, and by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P. L. Jaulmes has been partially supported by the Spanish Ministry of Education, Culture and Sports under grant FPU2013/06982. M. Moretó has been partially supported by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship JCI-2012-15047. M. Casas has been partially supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Co-fund programme of the Marie Curie Actions of the European Union’s 7th FP (contract 2013 BP B 00243). We would like to thank Nicolas Vidal for his contribution on using huge pages natively.
Peer Reviewed
-Àrees temàtiques de la UPC::Informàtica
-High performance computing
-Error correction codes
-Hardware
-Redundancy
-Programming
-Program processors
-Registers
-Càlcul intensiu (Informàtica)
Artículo - Versión presentada
Artículo
         

Mostrar el registro completo del ítem

Documentos relacionados

Otros documentos del mismo autor/a

Jaulmes, Luc Etienne; Casas Guix, Marc; Moreto Planas, Miquel; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo
Casas Guix, Marc; Moreto Planas, Miquel; Álvarez Martí, Lluc; Castillo Villar, Emilio; Chasapis, Dimitrios; Hayes, Timothy; Jaulmes, Luc Etienne; Palomar Pérez, Óscar; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo
Jaulmes, Luc Etienne; Casas Guix, Marc; Moreto Planas, Miquel; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo
Sánchez Barrera, Isaac; Casas, Marc; Moreto Planas, Miquel; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo
Álvarez Martí, Lluc; Casas, Marc; Labarta Mancho, Jesús José; Ayguadé Parra, Eduard; Valero Cortés, Mateo; Moreto Planas, Miquel