dc.contributor |
Barcelona Supercomputing Center |
dc.contributor.author |
de la Cruz, Raúl |
dc.contributor.author |
Folch, Arnau |
dc.contributor.author |
Farré, Pau |
dc.contributor.author |
Cabezas, Javier |
dc.contributor.author |
Navarro, Nacho |
dc.contributor.author |
Cela, José M. |
dc.date |
2016-12 |
dc.identifier.citation |
de la Cruz, Raúl [et al.]. Optimization of atmospheric transport models on HPC platforms. "Computers & Geosciences", Desembre 2016, vol. 97, p. 30-39. |
dc.identifier.citation |
0098-3004 |
dc.identifier.citation |
10.1016/j.cageo.2016.08.019 |
dc.identifier.uri |
http://hdl.handle.net/2117/90142 |
dc.language.iso |
eng |
dc.publisher |
Elsevier |
dc.relation |
http://www.sciencedirect.com/science/article/pii/S0098300416303077 |
dc.relation |
info:eu-repo/grantAgreement/ES/1PE/TIN2012-34557 |
dc.rights |
Attribution-NonCommercial-NoDerivs 4.0 International License |
dc.rights |
https://creativecommons.org/licenses/by-nc-nd/4.0/ |
dc.rights |
info:eu-repo/semantics/openAccess |
dc.subject |
Àrees temàtiques de la UPC::Enginyeria biomèdica::Impacte ambiental |
dc.subject |
Atmosphere--Measurement |
dc.subject |
Weather Prediction Research Programmes |
dc.subject |
Parallel programming (Computer science) |
dc.subject |
HPC platforms |
dc.subject |
Atmospheric transport models |
dc.subject |
FALL3D model |
dc.subject |
Programació en paral·lel (Informàtica) |
dc.subject |
Clima--Observacions |
dc.subject |
Atmosfera -- Mesurament |
dc.title |
Optimization of atmospheric transport models on HPC platforms |
dc.type |
info:eu-repo/semantics/submittedVersion |
dc.type |
info:eu-repo/semantics/article |
dc.description.abstract |
The performance and scalability of atmospheric transport models on high performance computing environments is often far from optimal for multiple reasons including, for example, sequential input and output, synchronous communications, work unbalance, memory access latency or lack of task overlapping. We investigate how different software optimizations and porting to non general-purpose hardware architectures improve code scalability and execution times considering, as an example, the FALL3D volcanic ash transport model. To this purpose, we implement the FALL3D model equations in the WARIS framework, a software designed from scratch to solve in a parallel and efficient way different geoscience problems on a wide variety of architectures. In addition, we consider further improvements in WARIS such as hybrid MPI-OMP parallelization, spatial blocking, auto-tuning and thread affinity. Considering all these aspects together, the FALL3D execution times for a realistic test case running on general-purpose cluster architectures (Intel Sandy Bridge) decrease by a factor between 7 and 40 depending on the grid resolution. Finally, we port the application to Intel Xeon Phi (MIC) and NVIDIA GPUs (CUDA) accelerator-based architectures and compare performance, cost and power consumption on all the architectures. Implications on time-constrained operational model configurations are discussed. |
dc.description.abstract |
We thank M.S. Osores from the Argentinean National Scientific and Technical Research Council (CONICET) for providing hourly
column heights for the Cordón Caulle eruption simulation and the constructive
comments from two anonymous reviewers. This work was supported by NVIDIA through the UPC/BSC GPU Center of Excellence,
and the Spanish Ministry of Science and Technology through the TIN2012-34557 project. Finally, we dedicate this work to our colleague and co-author Nacho Navarro, who sadly passed away during the reviewing process. |
dc.description.abstract |
Peer Reviewed |