INRIA & CNRS, Univerisity of Montpellier –
In recent years, big simulation data is commonly generated from specific models, in different applications domains (astronomy, bioinformatics social networks, etc). In general, the simulation data corresponds to meshes that represent for instance a seismic soil area. It is of much importance to analyze the uncertainty of the simulation data in order to safely identify geological or seismic phenomenons, e.g. seismic faults. In order to analyze the uncertainty, a Probability Density Function (PDF) of each point in the mesh is computed to be analyzed. However, this may be very time consuming (from several hours to even months) using a baseline approach based on parallel processing frameworks such as Spark. In this paper, we propose new solutions to efficiently compute and analyze the uncertainty of very big simulation data using Spark. Our solutions use an original distributed architecture design. We propose three general approaches: data aggregation, machine learning prediction and fast processing. We validate our approaches by extensive experimentations using big data ranging from hundreds of GB to several TB. The experimental results show that our approach scales up very well and reduce the execution time by a factor of 33 (in the order of seconds or minutes) compared with a baseline approach.
This work is part of the HPC4E European project, joint work with LNCC.
Esther Pacitti is a full professor of Computer Science at University of Montpellier in the south of France. She is co-head of the Zenith team (Inria&Cnrs), pursuing her research in distributed data management and scientific data management. She teaches in an engineering school (Polytech’ Montpellier) where she is responsible for international relations, welcoming foreign students. Previously, she was an assistant professor at the University of Nantes (2002-2009). Her teaching and research interests include data replication, recommendation systems, query processing in large-scale distributed systems (cluster, P2P, cloud) and scientific workflow management. She has published more than 90 technical papers. She has served or is serving as program committee member of major international conferences including SIGMOD, ICDE, CIKM ,VLDB, EDBT, etc.
For more information: