Home

ttest

Synopsis

Perform various two-sided statistical analyses of data that is divided into two groups, including the Student's t-test. Each row (gene) in the data file is given a p value.

 ttest
	[-r: format line needs to be removed
   -w: use Welch approximate t
   -m; do mann-whitney U (a.k.a. Wilcoxon) test instead (this is experimental)
   -rank: use rank transformation of the data for ttest
   -l: log transform the data]  
	<data> <layout>

ttest -r -rank affydatafile.txt affydatafile-layout | /usr/local/bin/sort -gk 3 >! test.rank.out

Produces tab delimited output:

label	stat	p	fold
160901_at	7.38548945875996	7.50313817077242e-07	0.354838709677419
94821_at	7.38548945875996	7.50313817077242e-07	0.354838709677419
104139_at	6.3733037727251	5.29481385225239e-06	0.372549019607843
93974_at	6.3733037727251	5.29481385225239e-06	0.372549019607843
98579_at	6.3733037727251	5.29481385225239e-06	0.372549019607843
104312_at	5.61862983682549	2.48391733865816e-05	0.390728476821192
102362_i_at	5.61281765999638	2.514365670514e-05	0.390728476821192
94490_at	5.61281765999638	2.514365670514e-05	0.390728476821192

(The -rank option typically yields many pvalues that are equal)

Inputs

data: A tab-delimited data file, where each row represents a set of measurements to be analyzed. A p value is generated for each row in the file. See the detailed instructions for the format.
layout: A simple file describing the experimental design. See the documentation of the format.

Outputs

The following columns

The gene identifier
The statistic ('t', or 'u' if the M-W test was used)
The two-sided p value.
The 'fold change' between the two groups. This is provided as a convenience and is not directly used in the analysis.

Options

-r: The data file includes an extra line after the first line. (See the data format page for an explanation)
-rank: Use the rank-transformation of the data. The ranks are used instead of the raw data. (a nonparametric version of the t-test)
-m: EXPERIMENTAL: Do the Mann-Whitney (Wilcoxin) test instead (a non-parametric test). Note that in the current implementation the pvalues this yields are not very accurate for small numbers of samples.
-l: Use the log transformation of the data. Do not use this if your data includes non-positive values.
-w: Use the Welch 'approximate t' (applied when the variance in the two groups are not equal).

Dependencies

Stats.pm
This isn't a dependency, but gnu sort is useful for processing the output.

Problems/bugs

The U test is approximate and is not very accurate for small numbers of samples (less than 20 or so)

Version history

Script

References

Many of the methods were implemented with help from Zar (Biostatistical analysis)
Numerical recipes in C is an invaluable book for programming statistical distributions, even when programming in another language.