Home

patternmatch

Synopsis

This is a simple program for identifying genes showing an expression patterns of interest in a data file. In many cases, it is probably best used as an adjunct to other statistical methods such as ANOVA. In the simplest cases, this program effectively performs a t-test between two groups in your data. Because the method is so simple, you can easily find situations where it is unable to identify genes that you would be interested in. There is much more explanation of what is going on here on the page discussing templates.

patternmatch 
	[-a: use absolute value of correl;
	 -r: rdb formatl line is present;
	 -d: negative correlations not set to 1.0]
	<pattern> <data>

Inputs

  • A pattern file, which is a space-delimited file containing, on the first line, a template definition. Much more detail is here.
  • A data file in RDB-like format (watch out for the format line: use the -r format if necessary)

Outputs

  • A file containing the list of genes in one column and the correlations with the pattern in the second column, and corresponding p-values in the third column. The file name is derived from the file name, in the form "datafile-pattern-correlpvals.txt"

Options

  • -a: Use the absolute value of the correlation
  • -r: input file is rdb format, so the second line should be ignored.
  • -d: Negative correlations are not set to 1.0. If this is not set, then the software assumes that negative correlations are not of interest. If you use the 'absolute value of the correlation coefficient' switch in patternmatch, then all correlations will be positive and this option doesn't do anything. In other situations setting negative correlations to a pvalue of 1.0 avoids giving high scores to expression patterns which match the opposite of the template used.

Dependencies

Problems/bugs

  • None known. The pvalue calculation was combined into this script only recently!

Version history

Script

References

--