Managing Application Resilience: A Programming Language Approach
Pedro Diniz,
USC Information Sciences Institute –
Abstract:
System resilience is an important challenge that needs to be addressed in the era of extreme scale computing. High-performance computing systems will be architected using millions of processor cores and memory modules. As process technology scales, the reliability of such systems will be challenged by the inherent unreliability of individual components due to extremely small transistor geometries, variability in silicon manufacturing processes, device aging, etc. Therefore, errors and failures in extreme scale systems will increasingly be the norm rather than the exception. Not all the errors detected warrant catastrophic system failure, but there are presently no mechanisms for the programmer to communicate their knowledge of algorithmic fault tolerance to the system.
In this talk we present a programming model approach for system resilience that allows programmers to explicitly express their fault tolerance knowledge. We propose novel resilience oriented programming model extensions and programming directives, and illustrate their effectiveness. An inference engine leverages this information and combines it with runtime gathered context to increase the dependability of HPC systems. The preliminary experimental results presented here, for a limited set of kernel codes from both scientific and graph-based computing domains reveal that with a very modest programming effort, the described approach incurs fairly low execution time overhead while allowing computations to survive a large number of faults that would otherwise always result in the termination of the computation.
As transient faults become the norm, rather than the exception, it will be come increasingly important to provide the user with high-level programming mechanisms with which he/she can convey important application acceptability criteria. For best performance (either in terms of time, power, energy) the underlying systems need to leverage this information to better navigate the very complex system-level trade-offs to still deliver a reliable and productive computing environment. The work presented here is a simple first step towards this vision.
Bio
Dr. Pedro Diniz received his M.S. in Electrical and Computer Engineering from the Technical University in Lisbon, Portugal and his Ph.D. from the University of California, Santa Barbara in Computer Science in 1997. From 1997 until 2007 he was a researcher with the University of Southern California’s Information Sciences Institute (USC/ISI) as a Researcher and became an Assistant Professor of Computer Science at the University of Southern California in Los Angeles, California. At USC/ISI was the technical lead of DARPA-funded and DoE-funded research projects, in particular in the DEFACTO project. The DEFACTO project combined the strengths of traditional compilation approaches with commercially available EDA synthesis tools and lead to the development of a prototype compiler for the automatically mapping of image processing algorithms written in programming languages such as C to Field-Programmable-Gate-Array-based computing architectures. Recently, he has also been involved in several research projects focusing on programming technology and execution models addressing productivity-related issues as well as fault-tolerance for large scale high-performance architectures. Dr. Diniz has graduated 3 PhD students while at USC and authored or co-authored 12 internationally recognized scientific journal papers and over 50 international conference papers. He has participated in many scientific proposal review boards at the National Science Foundation in the US and European Commission in Europe and is heavily involved in the scientific community having participated as part of the technical program committee of over 15 international conferences in the area of high-performance computing, reconfigurable and field-programmable computing. He is also Co-Founder and Vice-President of Engineering of Quantum Semiconductor, LLC, a start-up company that focus on Si-Ge-C image sensors and as a consultant has been recently involved in high-performance cryptocurrency ASIC chip designs already in production.
Date: 2017-Dec-11 Time: 15:00:00 Room: 336
For more information:
Upcoming Events
INESC Brussels HUB Winter Meeting 2023

This edition of the HUB Winter Meeting will be co-organised with Science Business and will take place on the 30 and 31 January, in Lisbon, at Instituto Superior Técnico, Department of Computer Science and Engineering.
Please see below a summary of the agenda, this will be updated on the INESC Brussels HUB website regularly (confirmed speakers and other relevant info). Places for onsite participation are limited so registration is mandatory. Online participants will be sent a ZOOM link for each specific session on the 27th January.
INESC Brussels HUB website: https://hub.inesc.pt/
Monday, 30 January
a) Digital Europe Programme & Chips Act: state of play and possibilities for INESC.
9h to 10h30 GMT
(Exclusive for INESC researchers and administrators).
b) Science Business: how can INESC tap into Science Business network, activities and communications tools.
(Exclusive for INESC researchers and administrators).
c) Networking Lunch (for all onsite participants).
d) Roundtable: From rhetoric to reality – Embedding international strategy in the DNA of research organisations.
(Closed-door, roundtable workshop, Chatham House rules, open to INESC researchers and administrators, external participants by invitation only).
e) Networking Dinner
(By invitation only – INESC researchers participating onsite in the event are elegible to join).
Tuesday, 31 January
f) Workshop: How they did it? Strategic positioning for structural success in Horizon Europe: a discussion of best practices.
(Exclusive for INESC researchers, administrators and international invited speakers).
g) The public consultation on European R&I Programmes: Towards FP10.
(Closed-door, roundtable workshop, Chatham House rules, open to INESC researchers and administrators, external participants by invitation only).
h) Networking Lunch (for all onsite participants).
i) Management Committee meeting (Directors and POB members)
The HUB Winter Meeting aims at bringing together researchers and administrators from the 5 INESC institutes, affiliated higher education institutions in Portugal and abroad, with key European and global players, to:
– Discuss key research and innovation issues at EU level.
– Inform institutional policy and strategy.
– Exchange best-practices about R&I management, career development and policy positioning.
– Promote, discuss and deliver vision, visibility, networking and impactful communication.
– Create, identify and deepen partnerships and collaboration opportunities for collaborative R&I.