This web site is a supplement to the paper "Statistical analysis of strain and regional variation in gene expression in mouse brain" by Paul Pavlidis and William Stafford Noble, published in Genome Biology. The pdf version of the paper is available from the publisher.
The primary purpose of this site is to provide the complete data from the work in an interactive format, and raw results files for use by other researchers. Some additional data is also provided here. Go to:
A working version of the ANOVA and template-matching software is now availble here.
The tools are Perl scripts which run under UNIX. Potential users should be aware that the scripts that perform ANOVA do not handle many complex situations which can arise. Currently only t-tests, two-way (with and without replication) and one-way ANOVA are supported.
In addition, the software we used to make the figures, matrix2png, is also available for download.
We performed a statistical analysis of a previously published set of gene expression microarray data from six different brain regions in two mouse strains. In the previously published analysis of this data, 24 genes showing expression differences between the strains were identified, while about 240 genes were found to show regional differences in expression. Like many gene expression studies, the previous analysis relied primarily on ad hoc "fold change" and "absent/present" criteria to select genes. To determine whether statistically-motivated methods would permit a more sensitive and selective analysis of gene expression patterns in the brain, we used analysis of variance (ANOVA) and feature selection methods designed to select genes showing strain- or region- dependent patterns of expression.
Our analysis reveals a large number of new candidate genes for involvement in behavioral differences between the two mouse strains and functional differences among the six brain regions. Using conservative statistical criteria, we identified at least 63 genes showing strain variation and approximately 600 genes showing regional variation. Unlike the ad hoc methods, our methods have the additional benefit of ranking the genes by statistical score, permitting further analysis to focus on the most significant genes. A comparison of our results to the previous studies and to published reports on individual genes show that we achieved high sensitivity while preserving selectivity.
Our results indicate that the molecular differences between the strains and regions studied are larger than originally indicated by the previous studies. We also conclude that for large, complex data sets, ANOVA and feature selection, alone or in combination, are more powerful tools than methods based on "fold change" thresholds and other ad hoc gene selection criteria.
Latest updates (4/2006)
|