Paulo Jorge Fernandes Carreira,

Faculdade de Ciências de Universidade de Lisboa

Abstract:

Application scenarios such as legacy data migration, Extract-Transform-Load (ETL) processes, and data cleaning require the transformation of input tuples into output tuples. Traditional approaches for implementing these data transformations enclose solutions as Persistent Stored Modules (PSM) executed by an RDBMS or transformation code using a commercial ETL tool. Neither of these is easily maintainable or optimizable. A third approach consists of combining SQL queries with external code, written in a programming language. However, this solution is not expressive enough to specify an important class of data transformations that produce several output tuples for a single input tuple. In my PhD thesis, I propose the data mapper operator as an extension to the relational algebra to address this class of data transformations. Furthermore, the thesis discusses a set of algebraic rewriting rules for optimizing expressions that combine standard relational operators with mappers. Experimental results confirm the benefits brought by some of the proposed semantic optimizations.

 

Date: 2008-Apr-16     Time: 15:00:00     Room: N7.1


For more information: