The OLISSIPO 2023 Summer School on Computational phylogenetics to analyse the evolution of cells and communities, held in Lisbon, Portugal, at INESC-ID, is today closing after three days of brilliant discussions and cutting-edge computational and biological science.

Including flash talks of 3-10 minutes where attendees presented the work they have been developing in their research, the Summer School was composed of four comprehensive and interdisciplinary workshops: Designing phylogenetic models for clonal lineage reconstruction1, by Russell Schwartz, on 03 July; Inferring the Evolution of Genes and Syntenies from Tree Reconcilition. An application to CRISPR-Cas systems2, by Nadia El-Mabrouk and Mattéo Delabre, on 04 July; Phylogenetic analysis of single-cell DNA sequencing data3, by David Posada and João Alves, on 05 and 06 July; and Adventures in Cophylogenetic Tree Reconciliation4, by Ran Libeskind-Hadas, on 06 July.

Computational phylogenetics is a powerful tool used to investigate and understand the evolutionary relationships between cells and communities. By analyzing genetic sequences and other molecular data, computational phylogenetics allows researchers to reconstruct the evolutionary history of organisms, infer their common ancestors, and explore the patterns and processes that have shaped the diversity of life. This approach not only helps us unravel the evolutionary relationships between different species or populations, but also provides insights into the dynamics of community assembly, the spread of infectious diseases, and the impact of environmental factors on the evolution of cells and communities. Ultimately, computational phylogenetics serves as a fundamental framework for studying the evolutionary biology of organisms and the intricate interplay between genes, organisms, and their environments.

The speakers that the Summer School brought to Lisbon, and the INESC-ID and Instituto Superior Técnico (IST) communities, distilled some of the most innovative applications of computational biology to phylogenetics from a diverse set of research centres: from the University of Vigo and the Université de Montréal, to Claremont McKenna College and Carnegie Mellon University.

OLISSIPO is funded by Horizon 2020 and coordinated at INESC-ID by Susana Vinga, Information and Decision Support Systems (IDSS) researcher and Associate Professor at IST. Bringing together four reference European institutions — the European Molecular Biology Laboratory at Heidelberg, ETH Zurich, Inria and INESC-ID — OLISSIPO aims at enhancing the competences in computational biology at INESC-ID with the ultimate goal of creating an international pole of excellence in multi-disciplinary science in Portugal.

The OLISSIPO 2023 Summer School was organized by Susana Vinga, Marie-France Sagot (Inria), Niko Beerenwinkel (ETH Zürich), Wolfgang Huber (EMBL), Blerina Sinaimeri (Inria and LUISS University), Alexandre Francisco (INESC-ID/IST), João Carriço (BioMérieux) and Sara Tanqueiro (INESC-ID).

1 These sessions explored how we can design our own phylogenetic models and adapt them to different scenarios, data types, and mutation models, with application to problems in clonal phylogenetics and deconvolution in cancers. The lecture material provided background information on clonal deconvolution and phylogenetics and used these to illustrate principles in computationally modeling an optimization problem. Techniques for implementing these problem statements were then explored through the technique of integer linear programming. Hands-on sessions worked with Jupyter Notebooks and R to allow the implementation of some simple phylogenetic models and explore how we can generalize and adapt them to new versions of the inference problem.

2 In this course, algorithmic results for inferring the evolution of a gene family or a set of co-localized gene families in a genome were introduced. After a brief review on species tree/gene tree reconstruction methods and challenges, the session introduced the reconciliation approach which, given a rooted (resolved or partially resolved) gene tree and a given rooted and binary species tree, predicts an embedding of the gene tree into the species tree inducing a most parsimonious (or more likely) duplication/loss/gain/transfer scenario for the gene family. In a second part, the problem of inferring the evolution of “syntenies”, i.e., groups of co-localized genes evolving together from an ancestral genomic segment was addressed, considering or ignoring the order of genes. It was then shown how the reconciliation method can be generalized to a synteny tree. The application part of the presentation addressed the case of the CRISPR–Cas module, an adaptive system used by microbes to defend against invading viruses and plasmids, leading to the most reliable and accurate “molecular scissors” with important biotechnology and biomedical applications. The study and analysis of CRISPR-Cas systems have revealed a remarkable wide diversity of Cas protein sequences, but also composition and architecture. The evolution of Class I CRISPR-Cas systems composed of multi-subunit effector proteins was considered. This example was taken as a case study and, in the hands-out sessions, address the practical questions of isolating Cas genes, dealing with duplicates, constructing gene trees, inferring synteny trees and running the reconciliation tool.

3 Concepts and techniques from organismal evolutionary biology can provide a detailed, quantitative picture of the complex dynamics of somatic cell populations over time and space. In the theoretical lecture, basic concepts of cancer evolution, the complexity of single-cell DNA sequencing data (scDNA-seq), different phylogenetic techniques for analyzing these types of data, together with empirical examples of their application, were reviewed. During the hands-on practical session, participants acquired basic skills to perform scDNA-seq phylogenetic analysis with CellPhy, including variant calling, setting up of genotype models, tree search, bootstrap support estimation, and mutation mapping. Participants then learned to prepare their data, explore its characteristics, and present their results using tree plotting packages in R.

4 The talks and hands-on sessions explored current methods for reconciling pairs of phylogenetic trees to provide insights into the co-evolutionary histories of pairs of taxa. Different approaches to tree reconciliation and their relative merits were discussed. In particular, the Jane and eMPRess software tools were described in some details, their underlying algorithms, their strengths, and their limitations. In the hands-on sessions, participants used these tools on a number of datasets – including their own datasets if they wish.