The High-Performance Computing Architectures and Systems (HPCAS) Research Group at INESC-ID, works on state-of-the-art topics in High-Performance Computing (HPC), performance modeling, and bioinformatics. This includes epistasis detection which involves identifying combinations of specific gene mutations that may increase the likelihood of expressing a disease and have adverse effects on health (e.g. Alzheimer’s disease, breast cancer, and others). Epistasis analysis in disease research is proving crucial for a better understanding of complex illnesses.

In this INESC ID interview, HPCAS researchers Aleksandar Ilic and Ricardo Nobre explain the core of their research work and the progress made in Epistasis detection using Intel Software. The achievements of this research counted on the support of numerous researchers and students from INESC-ID and Técnico, including Leonel Sousa, Frederico Pratas, Diogo Marques, Rafael Campos, Sergio Jimenez-Santander, Miguel Graça, among others.

Briefly, what is the core of the INESC ID High-Performance Computing Architectures and Systems (HPCAS) research work?

Although tackling many different aspects of high-performance computing, the core HPCAS research work addresses the performance and efficiency of computational systems and applications both at the software and hardware level. At the software level, some of the key aspects that we focus on include parallel algorithms, scheduling, and load-balancing methods targeting systems with state-of-the-art devices, such as multi-core CPUs, GPUs, Deep-Learning Accelerators, FPGAs, etc. At the hardware level, the focus is often on the design of accelerators and application-specific processors and systems.

Epistasis detection has been at the center of your recent research. Can you briefly explain what epistasis is and the main goals/achievements of your research in this field?”

Epistasis detection is a bioinformatics application that is concerned with the identification of which combinations of single nucleotide polymorphisms (SNPs) are most correlated with a given condition. SNPs represent a variation of a single nucleotide at a given position in DNA. Several associations between combinations of SNPs and complex human diseases (e.g. Alzheimer’s, Breast Cancer) have been found, but many more are likely to be uncovered.

“The main goals/achievements of our research in this field have been to develop fast and efficient methods to perform epistasis detection searches. “

You proposed “using heterogeneous computer architectures composed of multicore CPUs and GPUs to achieve performant and energy-efficient epistasis analysis.” Briefly, what did you suggest?

Heterogeneous computer architectures are at the core of many of today’s computing systems, from embedded devices to supercomputers. Our work involves developing efficient parallel algorithms and identifying which hardware is the most suitable to perform the different blocks of epistasis computations, as well as orchestrating them for efficient execution. Our main contribution in this research field represents one of the first studies to propose methods to efficiently exploit unconventional hardware to achieve high-performance epistasis detection, e.g., by using the NVIDIA Tensor Cores and AVX-512 POPCNT in Intel CPUs. These works currently represent the fastest approaches to epistasis detection in the literature.

What are the main challenges that the team had to face?

The use of hardware with novel instructions (e.g. vectorized POPCNT) and programming models/tools (e.g. Data Parallel C++ / SYCL) requires us to be at the forefront of technology in what concerns both the hardware and software methodologies. This requires investing a significant portion of the time in research, especially in cases where the hardware/software has not been previously explored, even for applications from other domains. Exploiting novel hardware/software/tools was often not straightforward and required redesigning the epistasis detection problem in ways that core operations are mapped to efficiently implemented instructions and programming primitives.

What are the potential implications and usefulness of these results?

Faster epistasis detection searches mean that a larger portion of today’s case-control datasets can be processed considering high-order SNP interactions. As a result, this might enable uncovering previously unknown relations between SNPs and complex conditions or diseases, which can help in better understanding causation mechanisms and have an impact on their treatment.

You have been working within several national and international EU research projects. Can you let us know more about the international projects you are involved in? What is their common thread/theme related to optimizing this type of bioinformatics applications?

Our explorations on epistasis detection started as part of the national FCT-funded HiPerBio project and continued in the context of other EU-funded research projects, such as SPARCITY, SYCLOPS, and multiple advanced computing projects. The common thread related to optimizing this type of bioinformatics application has been the efficient evaluation of the huge combination space. This computationally intensive task relates to many projects running at the HPCAS group, which are focused on fully exploiting the capabilities of state-of-the-art hardware.

How does Intel come in and why? Main advantages of this collaboration?

Intel is one of the key players and our long-term partner in what concerns our developments using both Intel hardware and software. Our recent studies show that some of the novel features introduced in recent Intel architectures are very useful for the acceleration of epistasis detection studies, e.g., AVX-512 POPCOUNT in Intel CPUs (Saphire Rapids) and data-parallel engines in Intel Datacenter Max Series GPUs (PVC).

Furthermore, our research contribution (Cache-aware Roofline Model, CARM) has been integrated as a fully supported feature in the Intel Advisor tool, since 2017. The CARM allows for efficient characterization of performance upper bounds of different Intel CPU and GPU devices, while also providing intuitive guidance for application optimization. We relied also on this feature when optimizing epistasis detection codes, which provided speedups of up to 9x when compared to a baseline implementation.

Ideally, where do you imagine your research going in the future?

As for now, we have mostly focused on CPUs and GPUs, and FPGAs to some extent. However, we envision our research to extend to systems with even higher heterogeneity, including novel domain-specific accelerators, such as Tensor and Intelligence Processing Units (TPUs/IPUs). We also explore portable cross-device software solutions, based on open-standard programming languages such as DPC++ / SYCL, as well as the use of machine learning frameworks to achieve high performance and portability.

Bios

Aleksandar Ilic (PhD’14) is an Associate Professor at the Department of Electrical and Computer Engineering (DEEC), Instituto Superior Técnico (Técnico), Universidade de Lisboa, and a Researcher of INESC-ID, Lisbon, Portugal. He has contributed to more than 60 international journal and conference publications and has participated in many tutorials at different international venues. The integration of his scientific contribution (Cache-aware Roofline Model) in Intel Advisor received the HiPEAC Tech Transfer award. His research interests include high-performance and energy-efficient computing and modeling of parallel heterogeneous systems.

Ricardo Nobre is a researcher at INESC ID and part of the High-Performance Computing Architectures and Systems (HPCAS) research area. His interests include high-performance computing, parallel programming, compilers and machine learning. He has contributed close to 30 papers in international journals and conferences. Ricardo Nobre received a PhD in Informatics Engineering from Faculty of Engineering of the University of Porto (FEUP).

More Info:

– INESC-ID Achieves 9x Acceleration for Epistasis Disease Detection using oneAPI Tools and Intel Hardware (Aleksandar Ilic and Ricardo Nobre)
Improving the Efficacy of Patient-Centered Drug Development (Aleksandar Ilic)
– Podcast: Accelerating Epistasis Detection – How oneAPI Supports Genetics Researchers (Aleksandar Ilic and Ricardo Nobre)