Next-generation data-parallel dataflow systems
Prof. Frank McSherry,
ETH Zurich –
The Naiad project at Microsoft Research introduced a new model of dataflow computation, timely dataflow, which was designed to support low-latency computation in data-parallel dataflow graphs containing structured cycles. This model substantially enlarged the space of data-parallel computations that can be reasonably expressed, as compared to other modern “big data” systems. Naiad achieved excellent performance it its intended application domains, largely by providing the dataflow operators with meaningful and low-overhead coordination primitives, but otherwise staying out of their way.
In this talk we will discuss performance issues with existing systems, review timely dataflow, and present a new data-parallel design that coordinates less frequently yet more accurately. The design is largely implemented, written in 100% safe Rust and available at https://github.com/frankmcsherry/timely-dataflow, and currently out-performs several popular distributed systems even when run on the speaker’s laptop.
This talk reflects work done jointly with Derek Murray, Rebecca Isaacs, Michael Isard, Paul Barham, and Martin Abadi. The photo credit is due to Mihai Budiu.
Frank McSherry is currently visiting ETH Zurich, and is formerly affiliated with Microsoft Research, Silicon Valley. While there he led the Naiad project, which introduced both differential and timely dataflow, and remains one of the top-performing big data platforms. He also works with differential privacy, due in part to its interesting relationship to data-parallel computation. Frank currently enjoys spending his time in places other than Silicon Valley.
Rodrigo Seromenho Miragaia Rodrigues
IST Alameda, anfiteatro EA3