VersionClimber: an algorithm and system for package evolution in data science
Prof. Dennis Shasha,
Courant Institute of New York University –
Imagine you are a data scientist (as many of us are/have become).
Systems you build typically require many data sources and many packages
(machine learning/data mining, data management, and visualization) to run.
Your working configuration will consist of a set of packages each at a particular version.You want to update some packages (software or data) to their most recent versions possible, but you want your system to run after the upgrades,
thus perhaps entailing changes to the versions of other packages.
One approach is to hope the latest versions of all packages work.If that fails, the fallback is manual trial and error, but that quickly ends in frustration.
We advocate a provenance-style approach in which tools like ptrace
enable us to identify version combinations of different packages.
Then version control systems like pip, and github and VirtualEnv enable us to fetch particular versions of packages and try them in a sandbox-like environment.
Because the space of versions to explore grows exponentially with the number of packages, we have developed a memoizing algorithm that avoids exponential search while still finding an optimum version combination.
Heuristics combined with certain empirical facts about packages (e.g. local upward compatibility) improves performance further still.
We present experimental results on well known packages used in data science to illustrate the effectiveness of our approach.
Dennis Shasha is a professor of computer science at the Courant Institute of New York University and an Associate Director of NYU Wireless.
He works with biologists on pattern discovery for network inference; with computational chemists on algorithms for protein design; with physicists and financial people on algorithms for time series; on clocked computation for DNA computing; and on computational reproducibility.
Other areas of interest include database tuning as well as tree and graph matching.
Because he likes to type, he has written six books of puzzles about a mathematical detective named Dr. Ecco, a biography about great computer scientists, and a book about the future of computing.
He has also written five technical books about database tuning, biological pattern recognition, time series, DNA computing, resampling statistics, and causal inference in molecular networks.
He has co-authored over eighty journal papers, seventy conference papers, and twenty-five patents.He has written the puzzle column for various publications including Scientific American, Dr. Dobb’s Journal, and the Communications of the ACM.
He is a fellow of the ACM and an INRIA International Chair.
Helena Isabel de Jesus Galhardas
IST – anfiteatro VA1
Workshop “Metabolism and mathematical models: Two for a tango” – 2nd Edition
Title: Workshop Metabolism and mathematical models: Two for a tango – 2nd Edition
Dates: October 25-26, 2022
Location: This workshop will be held in a virtual way
The topic of this workshop is metabolism in general, with a special focus, although not exclusive, on parasitology. Besides an exploration of the biological, biochemical and biomedical aspects, the workshop will also aim at presenting some of the mathematical modelling, algorithmic theory and software development that have become crucial to explore such aspects.
This workshop is being organised in the context of two projects, both with the Inria European Team Erable. One of the projects involves a partnership with the University of São Paulo (USP), in São Paulo, Brazil, more specifically the Institute of Mathematics and Statistics (IME) and the Institute of Biomedical Sciences – Inria Associated Team Capoeira – and the other involves the Inesc-ID/IST in Portugal, ETH in Zürich and EMBL in Heidelberg – H2020 Twinning Project Olissipo.
The workshop is open to all members of these two projects but also, importantly, to the community in general.
The program and more details are available here.