Tmm

Pavlidis home page	Tmm: Analysis of multiple microarray data sets
	TMM is being phased out in favor of our new resource, Gemma. Go directly to the Tmm web interface. Introduction Tmm is an experimental system for exploring the coexpression of genes based on microarray data. This page contains links to supplementary data to our paper "Coexpression analysis of human genes across many microarray data sets". You can read the abstract or download a PDF. For a description of the data sets we used, click here. Our thanks to those who have made their data available. For documentation of the web interface, look here News May 9 2004: The database has been updated and now contains many more mouse data sets. More updates will be available soon, in particular to the metadata, which is incomplete for many of the sets. Because of this update, the online version of the database is no longer identical to the one used in our paper and might yield slightly different results for the same queries. This is because we are still experimenting with the methods used to select coexpression links. In the coming months we will be adding additional datasets and tools. Useful links Figure 4 in the paper makes use of matrix2png, a simple piece of software we use to make matrix visualizations. Clustering was performed using Gavin Sherlocks "Xcluster". We also used our own implementation of Bader and Hogue's "MCODE" (Those interested in our MCODE implementation should contact us). Figure 5 used Pajek, a network visualization and analysis package. Our gene annotations, which were used to determine what genes were on what microarrays, are all derived from our Ermine database, which has its own simple web interface. Much of the microarray data was obtained from the Stanford or NCBI databases. We also made heavy use of the Gene Ontology, LocusLink, Unigene and Swissprot databases. Supplemental data and figures for the paper Full version of table 1 Supplementary data on intra dataset reproducibility of links See also Supplementary figures A-D. Supplementary figure A: Effect of gene over-representation on link counts. Supplementary figure B: Distribution of intra dataset reproducibility of links. Supplementary figure C: Intra dataset reproducibility correlates with inter dataset confirmation. Supplementary figure D: Semantic simliarity analysis for genes with good and bad intra dataset reproducibility. Supplementary figure E: Negative and Positive correlation links analyzed separately for semantic similarity. Supplementary tables for the paper The following files are available for download. All are plain ASCII tab-delimited text. Most of the files have been compressed (tar'ed and gzip'ed) and must be unpacked before viewing. Many of the files can be opened in Excel, but some of them are quite large and are intended for use in automated analyses. Each file has a header that explains the columns. Note that in several files positive and negative correlations are counted separately, typically denoted by "+" and "-" respectively. (download) List of genes used in our study, with database cross references. In the remaining files, the genes are referred to by name. (download) Full list of all links in a simple format. Very large. This file can be easily parsed to select links confirmed in any desired number of data sets, or gather other salient statistics. (download) Per data set details of link counts, used to make Table 1 but contains some additional columns of statistics. (download) Summary table of the number of links per gene at various confirmation levels (large, but can be viewed in Excel). This file was used for the construction of Figure 2B and C. (download) Another gene-by-gene summary of links, breaking down the number of links by the type of array. Also gives the number of Gene Ontology terms per gene. (download) Links that are confirmed in at least 3 data sets (3+ confirmed). This file represents a primary starting point for our analysis. A very long list. (download) The same table of 3+ links, but in Pajek ".net" format. (download) Links which are 3+ confirmed with both positive and negative correlations. (a short list). (download) Links that are confirmed in at least 7 data sets. This file was used in the construction of Figure 4 in the paper. (download) Tar file containing the clusters we identified using MCODE, some of which are shown in Figure 5. The files are in Pajek ".net" format. As indicated by the file names, these complexes were identified from the links.3ormore network, but after removing a small number of heavily linked genes. (download) Gene ontology annotation matrix used for Figure 4. Raw data files used for analysis

	Paul Pavlidis. Last modified: Tue Feb 22 11:38:22 EST 2005