Previous abstract | Graphical version | Text version | Next abstract

Session S10 - Modeling Large Scale Molecular Biological Data.
INVITED session, Wednesday afternoon, March 05
Room 5ABC, Austin Convention Center

[S10.002] Modeling Genome-Scale mRNA Expression Datasets: From Matrix Algebra to Genetic Networks

Orly Alter (Stanford University School of Medicine, Department of Genetics)

DNA microarray genome-wide expression data promise to enhance fundamental understanding of life on the molecular level, and may prove useful in medical diagnosis, treatment and drug design. Analysis of these new data requires mathematical tools that use large quantities of data and reduce the complexity of the data to make them comprehensible. These tools should provide predictive models, i.e., mathematical frameworks for the description of the data, in which the mathematical variables and operations may be assigned biological meaning. Such models will facilitate the unraveling of the cellular machineries that generate, sense and react to the expression signal.

I will start with a description of the use of singular value decomposition (SVD) to construct the first model for genome-wide expression data. SVD is a unique data-driven linear transformation of the expression data from the genes \times arrays space to the reduced ``eigengenes'' \times ``eigenarrays'' space, where the eigengenes (eigenarrays) are unique orthonormal superpositions of the genes (arrays). Normalizing the data, by detecting and filtering out additive and multiplicative experimental artifacts and irrelevant biological processes, enables meaningful comparison of the expression of different genes across different arrays in different experiments. Sorting the data, according to a chosen subset of eigengenes (and eigenarrays), rather than by overall expression, gives a global picture of gene expression in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype, respectively. In some experiments, the significant eigengenes and eigenarrays can be associated with genome-wide effects of regulators, or with measured samples in which these regulators are overactive or underactive, respectively.

I will then describe the use of generalized singular value decomposition (GSVD) to construct the first comparative model for two genome-scale datasets. GSVD is a unique data-driven linear transformation of the two datasets from the two genes \times arrays spaces to two reduced and diagonal ``genelets'' \times ``arraylets'' spaces. Some of the genelets can be associated with independent regulatory programs that are common to both datasets. Other genelets can be associated with independent biological processes or experimental artifacts that are almost exclusive to one of the datasets or the other.

I will conclude with a discussion of the insights that these models may offer into the biology, chemistry and physics of gene expression.

Part S of program listing