A Linear Time Biclustering Algorithm for Time Series Genomic Expression Data
Sara C. Madeira,
Universidade da Beira Interior –
Recent developments in DNA chips now enable the simultaneous measure of the expression level of a large number of genes (sometimes all the genes of an organism) for a given experimental condition. Most commonly, gene expression data is arranged in a data matrix, where each gene corresponds to one row and each condition to one column. The conditions may correspond to different time points, different environmental conditions, different organs or different individuals. Simply visualizing this kind of data is challenging. Using it to extract biologically relevant knowledge is even harder.
Several non-supervised machine learning methods have been used in the analysis of gene expression data obtained from microarray experiments. Recently, biclustering, a non-supervised approach that performs simultaneous clustering on the row and column dimensions of the data matrix, has been shown to be remarkably effective in a variety of applications. The goal of biclustering is to find subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated behaviors. In the most common settings, biclustering is an NP-complete problem, and heuristic approaches are used to obtain sub-optimal solutions using reasonable computational resources.
In this talk, we describe a particular setting of the problem, where we are concerned with finding biclusters in time series expression data, and present a linear time biclustering algorithm to achieve this goal.
When analyzing time series expression data, with the goal of isolating coherent activity between genes in a subset of conditions, it is reasonable to restrict the attention to biclusters with contiguous columns. We support this view by assuming that the activation of a set of genes under specific conditions corresponds to the activation of a particular biological process. As time goes on, biological processes start and finish, leading to increased (or decreased) activity of sets of genes that can be identified because they form biclusters with contiguous columns. In this setting, we are interested in finding biclusters where the columns are consecutive in time. For this particular version of the problem, we propose an algorithm that finds and reports all relevant biclusters in time linear on the size of the data matrix. This impressive reduction in complexity is obtained by manipulating a discretized version of the data matrix and by using advanced string manipulation techniques based on suffix trees.
The talk will give a short introduction to biclustering and suffix trees, present the biclustering algorithm and show results in synthetic data and preliminary results on a real biological data set from Yeast, that show the effectiveness of the approach.
Date: 2005-Feb-24 Time: 16:30:00 Room: 336
For more information:
Mathematics, Physics & Machine Learning Seminar Series (Online)
The Mathematics, Physics & Machine Learning seminar series has started on October 2020 and runs until March 2021.
The seminars aim to bring together mathematicians and physicists interested in machine learning (ML) with ML and AI experts interested in mathematics and physics, with the goal of introducing innovative Mathematics and Physics-inspired techniques in Machine Learning and, reciprocally, applying Machine Learning to problems in Mathematics and Physics.
Attendance is free but registration is required.
More information is available here.
International European Conference on Parallel and Distributed Computing
The 27th International European Conference on Parallel and Distributed Computing (Euro-Par 2021) will take from August 30 to September 3 2021 in Lisbon.
Euro-Par is the prime European conference covering all aspects of parallel and distributed processing, ranging from theory to practice, from small to the largest parallel and distributed systems and infrastructures, from fundamental computational problems to full-fledged applications, from architecture, compiler, language and interface design and implementation, to tools, support infrastructures, and application performance aspects.
The 2021 edition of Euro-Par will be organized as a collaboration between INESC-ID and Instituto Superior Técnico (IST).
– Abstract Submission: February 5, 2021
– Paper Submission Deadline: February 12, 2021
– Author Notification: April 30, 2021
– Camera-Ready Papers: June 6, 2021
More information is available here.