Speeding up Scientific Codes in HPC Architectures by Code Modernization: Lessons Learned
The computational resources required in scientific research for key areas, such as medicine, physics, bioinformatics or climate modelling, are continuously increasing every year. To meet this demand, high-performance computing (HPC) systems keep growing in scale and complexity. However, the race for higher processor frequency is (temporally) over. The most profitable way to use the available silicon resources is offered by new-generation parallel multi/many core architectures, such as NVIDIA GPUs, or Intel Xeon Scalable or Intel Xeon Phi. These processors put forward new opportunities to enhance scientific computations, also increasing the performance per watt, but shifting to a different programming model to exploit the parallelism (task, data and thread-level). However, both developers and users are usually reluctant to modify their working codes to adapt to new systems. While the benefits of migrating codes to new systems are clear, it is important to evaluate how to do so in a simple and general way. ‘‘Code modernization’’ is a new paradigm that aims to provide both code and performance portability. In this talk, we identify the key issues that determine performance in modernized code, such as: (a) the ability to scale as with core count, (b) ensure a proper usage of the vectorization capabilities of the system and (c) the exploitation of data locality. We show three use cases from key applications used in computational science: 1) 3-D Stencil-based codes as they are the basis for solving partial differential equations (PDEs), which are widely used as a mathematical model in many applications from a wide variety of fields of science and engineering, 2) A population based metaheuristic for solving NP-hard optimization problems, such as the Traveling Salesman Problem, called Ant Colony Optimization (ACO), which is a bio-inspired method, based on ant’s foraging process and, 3) A Semantic Web Integration Tool (SWIT) that transforms and integrates heterogeneous biomedical data for generating open semantic repositories defining mapping rules between an input schema and an OWL ontology. The talk will shed light on modernizing these use cases to new Intel architectures.