Evolution of the maintainability of HPC facilities at CIEMAT headquarters
Published 2020-03-05
Keywords
- resilience,
- management practices,
- history of computing
How to Cite
Copyright (c) 2020 Revista UIS Ingenierías
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
Abstract
Since its establishment in 1951, CIEMAT has been continuously boosting the use of computation as a research method, deploying innovative computing facilities. Hence, Vectorial, MPP, NUMA, and distributed architectures have been managed at CIEMAT, resulting in an extensive expertise on HPC maintainability as well as on the computational needs of the community related to international projects. Nowadays, the evolution of HPC hardware and software is progressively faster and implies a continuous challenge to increase their availability for the greater number of different initiatives supported. To address this task, the ICT team has been changing towards a flexible management model, with a look toward future acquisitions.
Downloads
References
[2] United States Code - Definitions (44 U.S.C., Sec. 3542) and NIST Glossary, [Online]. Available: https://csrc.nist.gov/Glossary/?term=3103.
[3] F. Cappello, “Fault tolerance in petascale/exascale systems: current knowledge, challenges and research opportunities,” Int. J. High Perform. Comput. Appl., vol. 23, no. 3, pp. 212-226, 2009. doi: 10.1177/1094342009106189
[4] J. A. Moríñigo, M. Rodríguez-Pascual, R. Mayo-García, “On the Modelling of Optimal Coordinated Checkpoint Period in Supercomputers,” J. of Supercomputing, vol. 75, no. 2, pp. 930-954, 2019. doi: 10.1007/s11227-018-2621-1
[5] A. J. Rubio-Montero, E. Huedo, R. Mayo-García, “Scheduling multiple virtual environments in cloud federations for distributed calculations,” Future Generation Computer Systems, vol. 74, pp. 90-103, 2017. doi: 10.1016/j.future.2016.03.021
[6] D. Stanzione et al., “Stampede 2: The Evolution of an XSEDE Supercomputer,” in Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact, vol. Part F1287, pp. 1–8. doi: 10.1145/3093338.3093385
[7]J. A. Moríñigo, P. García-Muller, et al. “Benchmarking LAMMPS: Sensitivity to Task Location under CPU-based Weak-scaling,” 5th Latin American Conference on High Per-formance Computing (CARLA2018), Comm. Comp. Inf. Sci., vol. 979, 2019. doi: 10.1007/978-3-030-16205-4_17
[8] Spanish Official Bolletin (BOE-A-2010-1330) R. D. 3/2010, de 8 de enero, por el que se regula el Esquema Nacional de Seguridad en el ámbito de la Administración Electrónica.
[9] E. Mocskos, C. J. Barrios, H. Castro, et al. “Boosting advanced computational applications and resources in Latin America through collaboration and sharing,” Comp. Sci. & Eng., vol. 20, no. 3, pp. 39-48, 2018. doi: 10.1109/MCSE.2018.03202633
[10] PRACE Homepage. [Online]. Available: http://www.prace-ri.eu/