Class ExpressionDataSVD
- java.lang.Object
-
- ubic.gemma.core.analysis.preprocess.svd.ExpressionDataSVD
-
public class ExpressionDataSVD extends Object
Perform SVD on an expression data matrix, E = U S V'. The rows of the input matrix are probes (genes), following the convention of Alter et al. 2000 (PNAS). Thus the U matrix columns are the eigensamples (eigenarrays) and the V matrix columns are the eigengenes. See also http://genome-www.stanford.edu/SVD/. Because SVD can't be done on a matrix with missing values, values are imputed. Rows with no variance are removed, and rows with too many missing values are also removed (MIN_PRESENT_FRACTION_FOR_ROW)- Author:
- paul
-
-
Constructor Summary
Constructors Constructor Description ExpressionDataSVD(ExpressionDataDoubleMatrix expressionData)
Does normalization.ExpressionDataSVD(ExpressionDataDoubleMatrix expressionData, boolean normalizeMatrix)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description ExpressionDataDoubleMatrix
equalize()
Implements the method described in the SPELL paper, alternative interpretation as related by Q.Double[]
getEigenGene(int i)
Double[]
getEigenSample(int i)
double[]
getEigenvalues()
int
getNumVariables()
DoubleMatrix<Integer,Integer>
getS()
double[]
getSingularValues()
DoubleMatrix<CompositeSequence,Integer>
getU()
DoubleMatrix<Integer,BioMaterial>
getV()
Double[]
getVarianceFractions()
ExpressionDataDoubleMatrix
removeHighestComponents(int numComponentsToRemove)
Provide a reconstructed matrix removing the first N components (the most significant ones).ExpressionDataDoubleMatrix
uMatrixAsExpressionData()
ExpressionDataDoubleMatrix
winnow(double thresholdQuantile)
Implements method described in Skillicorn et al., "Strategies for winnowing microarray data" (also section 3.5.5 of his book)
-
-
-
Constructor Detail
-
ExpressionDataSVD
public ExpressionDataSVD(ExpressionDataDoubleMatrix expressionData) throws SVDException
Does normalization.- Parameters:
expressionData
- expression data- Throws:
SVDException
-
ExpressionDataSVD
public ExpressionDataSVD(ExpressionDataDoubleMatrix expressionData, boolean normalizeMatrix) throws SVDException
- Parameters:
expressionData
- Note that this may be modified!normalizeMatrix
- If true, the data matrix will be rescaled and centred to mean zero, variance one, for both rows and columns ("double-standardized")- Throws:
SVDException
-
-
Method Detail
-
equalize
public ExpressionDataDoubleMatrix equalize()
Implements the method described in the SPELL paper, alternative interpretation as related by Q. Morris. Set all components to have equal weight (set all singular values to 1)- Returns:
- the reconstructed matrix; values that were missing before are re-masked.
-
getEigenGene
public Double[] getEigenGene(int i)
- Parameters:
i
- which eigengene- Returns:
- the ith eigengene (column of V)
-
getEigenSample
public Double[] getEigenSample(int i)
- Parameters:
i
- which eigensample- Returns:
- the ith eigensample (column of U)
-
getEigenvalues
public double[] getEigenvalues()
- Returns:
- the square roots of the singular values.
-
getNumVariables
public int getNumVariables()
- Returns:
- how many rows the U matrix has.
-
getS
public DoubleMatrix<Integer,Integer> getS()
- Returns:
- the matrix of singular values, indexed by the eigenarray (row) and eigengene (column) numbers (starting from 0).
-
getSingularValues
public double[] getSingularValues()
-
getU
public DoubleMatrix<CompositeSequence,Integer> getU()
- Returns:
- the left singular vectors. The column indices are of the eigenarrays (starting from 0).
-
getV
public DoubleMatrix<Integer,BioMaterial> getV()
- Returns:
- the right singular vectors. The column indices are of the eigengenes (starting from 0). The row indices are of the original samples in the given ExpressionDataDoubleMatrix.
-
getVarianceFractions
public Double[] getVarianceFractions()
- Returns:
- fractions of the variance for each singular vector.
-
removeHighestComponents
public ExpressionDataDoubleMatrix removeHighestComponents(int numComponentsToRemove)
Provide a reconstructed matrix removing the first N components (the most significant ones). If the matrix was normalized first, removing the first component replicates the normalization approach taken by Nielsen et al. (Lancet 359, 2002) and Alter et al. (PNAS 2000). Correction by ANOVA would yield similar results if the nuisance variable is known.- Parameters:
numComponentsToRemove
- The number of components to remove, starting from the largest eigenvalue.- Returns:
- the reconstructed matrix; values that were missing before are re-masked.
-
uMatrixAsExpressionData
public ExpressionDataDoubleMatrix uMatrixAsExpressionData()
- Returns:
- Implements the method described in the SPELL paper. Note that this alters the U matrix of this.
We make two assumptions about the method that are not described in the paper: 1) The data are rescaled and centered; 2) the absolute value of the U matrix is used. Note that unlike the original data, the transformed data will have no missing values.
-
winnow
public ExpressionDataDoubleMatrix winnow(double thresholdQuantile)
Implements method described in Skillicorn et al., "Strategies for winnowing microarray data" (also section 3.5.5 of his book)- Parameters:
thresholdQuantile
- Enter 0.5 for median. Value must be > 0 and < 1.- Returns:
- a filtered matrix
-
-