
Session E23 - Bioinformatics.
FOCUS session, Monday afternoon, March 12
Room 606, Washington State Convention Center
DNA microarray genome-wide expression data promise to enhance fundamental understanding of biological processes on the molecular level. Analysis of these data requires using large quantities of data and reducing the complexity of the data to make them comprehensible.
We describe the use of singular value decomposition in transforming genome-wide expression data from genes \times arrays space to ``eigengenes'' \times ``eigenarrays'' space, where the eigengenes (eigenarrays) are unique orthonormal superpositions of the genes (arrays). In this space the data are diagonalized, where each eigengene is expressed only in the corresponding eigenarray, with the corresponding ``eigenexpression'' level indicating their relative significance.
We show that normalizing the data by filtering out the eigengenes (and eigenarrays), which are inferred to represent noise, or experimental artifacts, enables meaningful comparison of the expression of different genes across different arrays in different experiments. Sorting the data according to the eigengenes and eigenarrays gives a global picture of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype, respectively. In some experiments, the significant eigengenes and eigenarrays can be associated with genome-wide effects of regulators, or with measured samples, in which these regulators are overactive or underactive, respectively.