Synopsis
Perform various two-sided statistical analyses of data that is divided into two groups, including the
Student's t-test. Each row (gene) in the data file is given a p value.
ttest
[-r: format line needs to be removed
-w: use Welch approximate t
-m; do mann-whitney U (a.k.a. Wilcoxon) test instead (this is experimental)
-rank: use rank transformation of the data for ttest
-l: log transform the data]
<data> <layout>
ttest -r -rank affydatafile.txt affydatafile-layout | /usr/local/bin/sort -gk 3 >! test.rank.out
Produces tab delimited output:
label | stat | p | fold
|
160901_at | 7.38548945875996 | 7.50313817077242e-07 | 0.354838709677419 |
94821_at | 7.38548945875996 | 7.50313817077242e-07 | 0.354838709677419 |
104139_at | 6.3733037727251 | 5.29481385225239e-06 | 0.372549019607843 |
93974_at | 6.3733037727251 | 5.29481385225239e-06 | 0.372549019607843 |
98579_at | 6.3733037727251 | 5.29481385225239e-06 | 0.372549019607843 |
104312_at | 5.61862983682549 | 2.48391733865816e-05 | 0.390728476821192 |
102362_i_at | 5.61281765999638 | 2.514365670514e-05 | 0.390728476821192 |
94490_at | 5.61281765999638 | 2.514365670514e-05 | 0.390728476821192 |
(The -rank option typically yields many pvalues that are equal)
Inputs
- data: A tab-delimited data file, where each row represents a set of measurements to be analyzed. A p value is generated for each
row in the file. See the detailed instructions for the format.
- layout: A simple file describing the experimental design. See the documentation of the format.
Outputs
The following columns
- The gene identifier
- The statistic ('t', or 'u' if the M-W test was used)
- The two-sided p value.
- The 'fold change' between the two groups. This is provided as a convenience and is not directly used in the analysis.
Options
- -r: The data file includes an extra line after the first line. (See the data format page for an explanation)
- -rank: Use the rank-transformation of the data. The ranks are used instead of the raw data. (a nonparametric version of the t-test)
- -m: EXPERIMENTAL: Do the Mann-Whitney (Wilcoxin) test instead (a non-parametric test). Note that in the current implementation the pvalues this yields are not very accurate for small numbers of samples.
- -l: Use the log transformation of the data. Do not use this if your data includes non-positive values.
- -w: Use the Welch 'approximate t' (applied when the variance in the two groups are not equal).
Dependencies
- Stats.pm
- This isn't a dependency, but gnu sort is useful for processing the output.
Problems/bugs
- The U test is approximate and is not very accurate for small numbers of samples (less than 20 or so)
Version history
Script
References
|