Managing Application Resilience: A Programming Language Approach
Pedro Diniz,
USC Information Sciences Institute –
Abstract:
System resilience is an important challenge that needs to be addressed in the era of extreme scale computing. High-performance computing systems will be architected using millions of processor cores and memory modules. As process technology scales, the reliability of such systems will be challenged by the inherent unreliability of individual components due to extremely small transistor geometries, variability in silicon manufacturing processes, device aging, etc. Therefore, errors and failures in extreme scale systems will increasingly be the norm rather than the exception. Not all the errors detected warrant catastrophic system failure, but there are presently no mechanisms for the programmer to communicate their knowledge of algorithmic fault tolerance to the system.
In this talk we present a programming model approach for system resilience that allows programmers to explicitly express their fault tolerance knowledge. We propose novel resilience oriented programming model extensions and programming directives, and illustrate their effectiveness. An inference engine leverages this information and combines it with runtime gathered context to increase the dependability of HPC systems. The preliminary experimental results presented here, for a limited set of kernel codes from both scientific and graph-based computing domains reveal that with a very modest programming effort, the described approach incurs fairly low execution time overhead while allowing computations to survive a large number of faults that would otherwise always result in the termination of the computation.
As transient faults become the norm, rather than the exception, it will be come increasingly important to provide the user with high-level programming mechanisms with which he/she can convey important application acceptability criteria. For best performance (either in terms of time, power, energy) the underlying systems need to leverage this information to better navigate the very complex system-level trade-offs to still deliver a reliable and productive computing environment. The work presented here is a simple first step towards this vision.
Bio
Dr. Pedro Diniz received his M.S. in Electrical and Computer Engineering from the Technical University in Lisbon, Portugal and his Ph.D. from the University of California, Santa Barbara in Computer Science in 1997. From 1997 until 2007 he was a researcher with the University of Southern California’s Information Sciences Institute (USC/ISI) as a Researcher and became an Assistant Professor of Computer Science at the University of Southern California in Los Angeles, California. At USC/ISI was the technical lead of DARPA-funded and DoE-funded research projects, in particular in the DEFACTO project. The DEFACTO project combined the strengths of traditional compilation approaches with commercially available EDA synthesis tools and lead to the development of a prototype compiler for the automatically mapping of image processing algorithms written in programming languages such as C to Field-Programmable-Gate-Array-based computing architectures. Recently, he has also been involved in several research projects focusing on programming technology and execution models addressing productivity-related issues as well as fault-tolerance for large scale high-performance architectures. Dr. Diniz has graduated 3 PhD students while at USC and authored or co-authored 12 internationally recognized scientific journal papers and over 50 international conference papers. He has participated in many scientific proposal review boards at the National Science Foundation in the US and European Commission in Europe and is heavily involved in the scientific community having participated as part of the technical program committee of over 15 international conferences in the area of high-performance computing, reconfigurable and field-programmable computing. He is also Co-Founder and Vice-President of Engineering of Quantum Semiconductor, LLC, a start-up company that focus on Si-Ge-C image sensors and as a consultant has been recently involved in high-performance cryptocurrency ASIC chip designs already in production.
Date: 2017-Dec-11 Time: 15:00:00 Room: 336
For more information:
Upcoming Events
Seminar: Combining Reasoning and Learning for Discovery

07 June, 1.30pm, at Sala José Tribolet in Pavilhão Informática II at IST.
Artificial Intelligence (AI) is a rapidly advancing field inspired by human intelligence. AI systems are now performing at human and even superhuman levels on various tasks, such as image identification, face and speech recognition, and chatbots such as chatGPT. The tremendous AI progress that we have witnessed in the last decade has been largely driven by deep learning advances and heavily hinges on the availability of large, annotated datasets to supervise model training. However, often we only have access to small datasets and incomplete data. We amplify a few data examples with human intuitions and detailed reasoning from first principles and prior knowledge for discovery. I will talk about our work on AI for accelerating the discovery for new solar fuels materials, which has been featured in Nature Machine Intelligence, in a cover article entitled, Automating crystal-structure phase mapping by combining deep learning with constraint reasoning [1]. In this work, we propose an approach called Deep Reasoning Networks (DRNets), which seamlessly integrates deep learning and reasoning via an interpretable latent space for incorporating prior knowledge. and tackling challenging problems. DRNets requires only modest amounts of (unlabeled) data, in sharp contrast to standard deep learning approaches. DRNets reach super-human performance for crystal-structure phase mapping, a core, long-standing challenge in materials science, enabling the discovery of solar-fuels materials. DRNets provide a general framework for integrating deep learning and reasoning for tackling challenging problems. For an intuitive demonstration of our approach, using a simpler domain, we also solve variants of the Sudoku problem. The article DRNets can solve Sudoku, speed scientific discovery [2] provides a perspective for a general audience about DRNets. DRNets is part of SARA, the Scientific Reasoning Agent for materials discovery [3]. Finally, I will also talk about the effectiveness of a novel curriculum learning with restarts strategy to boost a reinforcement learning framework [4]. We show how such a strategy is characterized by left heavy-tails and can outperform specialized solvers for Sokoban, a prototypical AI planning problem.
Professor Carla P. Gomes: Department of Computer Science, Cornell University
Carla Gomes is the Ronald C. and Antonia V. Nielsen Professor of Computing and Information Science, the director of the Institute for Computational Sustainability at Cornell University, and co-director of the Cornell University AI for Science Institute. Gomes received a Ph.D. in computer science in artificial intelligence from the University of Edinburgh. Her research area is Artificial Intelligence with a focus on large-scale constraint reasoning, optimization, and machine learning. Recently, Gomes has become deeply immersed in research on scientific discovery for a sustainable future and, more generally, in research in the new field of Computational Sustainability. Computational Sustainability aims to develop computational methods to help solve some of the key environmental, economic, and societal challenges to help put us on a path toward a sustainable future. Gomes was the lead PI of two NSF Expeditions in Computing awards. Gomes has (co-)authored over 200 publications, which have appeared in venues spanning Nature, Science, and a variety of conferences and journals in AI and Computer Science, including five best paper awards. Gomes was named the “most influential Cornell professor” by a Merrill Presidential Scholar (2020). Gomes was also the recipient of the Association for the Advancement of Artificial Intelligence (AAAI) Feigenbaum Prize (2021) for “high-impact contributions to the field of artificial intelligence, through innovations in constraint reasoning, optimization, the integration of reasoning and learning, and through founding the field of Computational Sustainability, with impactful applications in ecology, species conservation, environmental sustainability, and materials discovery for energy” and of the 2022 ACM/AAAI Allen Newell Award, for contributions bridging computer science and other disciplines. Gomes is a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), a Fellow of the Association for Computing Machinery (ACM), and a Fellow of the American Association for the Advancement of Science (AAAS).
INESC-ID ESR Talks – June 2023

If you are a masters/PhD student or a postdoctoral fellow, come and present your work in an informal and friendly environment – and savour some tasty snacks!
Individual talks will be 10-15 minutes plus time for feedback. Enroll on your selected date by emailing pedro.ferreira[at]inesc-id.pt.
Happening on the second Wednesday of every month (4pm-5pm):
- 14 June (Alves Redol, Room 9)
- 12 July (Alves Redol, Room 9)
We hope to see you there!
OLISSIPO Summer School in Lisbon | Computational phylogenetics to analyse the evolution of cells and communities

We are happy to announce the OLISSIPO Summer School on Computational phylogenetics to analyse the evolution of cells and communities, which will be held in Lisbon, Portugal, at INESC-ID, between July 2-7, 2023.
Keynote speakers:
David Posada, University of Vigo (class)
João Alves, University of Vigo (hands-on)
Nadia El-Mabrouk, Université de Montréal (class)
Mattéo Delabre, Université de Montréal (hands-on)
Ran Libeskind-Hadas, Claremont McKenna College (class and hands-on)
Russell Schwartz, Carnegie Mellon University (class and hands-on)
See the preliminary agenda at: https://olissipo.inesc-id.pt/tree-tango-school
Registration is mandatory. You can register at: https://forms.gle/VsASFHW5E7MJvaCc9
The registration fee is 250€ for students and OLISSIPO members and 350€ for postdocs or other researchers (meals indicated at the schedule of the school are included, accommodation and flights are not). All details will be made available upon registration.
We will have slots for flash talks (3-10 min depending on the number of submissions) to present yourself and the work you have been developing in your research.
The 13th Lisbon Machine Learning School | LxMLS 2023

The Lisbon Machine Learning Summer School (LxMLS) takes place yearly at Instituto Superior Técnico (IST). LxMLS 2023 will be a 6-day event (14-20 July, 2023), scheduled to take place as an in-person event.
The school covers a range of machine learning topics, from theory to practice, that are important in solving natural language processing problems arising in different application areas. It is organized jointly by Instituto Superior Técnico (IST), a leading Engineering and Science school in Portugal, the Instituto de Telecomunicações, the Instituto de Engenharia de Sistemas e Computadores, Investigação e Desenvolvimento em Lisboa (INESC-ID), the Lisbon ELLIS Unit for Learning and Intelligent Systems (LUMLIS), Unbabel, Zendesk, and IBM Research.
Check online for information about past editions: LxMLS 2011, LxMLS 2012, LxMLS 2013, LxMLS 2014, LxMLS 2015, LxMLS 2016, LxMLS 2017, LxMLS 2018, LxMLS 2019, LxMLS 2020, LxMLS 2021, LxMLS 2022 (you can also watch the videos of the lectures for 2016, 2017, 2018, and 2020).
31st International Conference on Information Systems Development (ISD 2023)

The 31st International Conference on Information Systems Development (ISD 2023) conference provides a forum for research and developments in the field of information systems. The theme of ISD 2023 is “Information systems development, organizational aspects and societal trends”. New trends in developing information systems emphasize the continuous collaboration between developers and operators in order to optimize the software delivery time. The conference promotes research on methodological and technological issues and how IS developers and operators are transforming organizations and society through information systems.
The ISD 2023 conference held this year also provides an opportunity for researchers and practitioners to promote their research, practical experience, and to discuss issues related to Information Systems through papers, posters, and journal-first paper presentations.
ISD 2023 will be hosted by Instituto Superior Técnico, in Lisbon, Portugal, on August 30–September 1, 2023.