Package ubic.basecode.math.linearmodels
Class LeastSquaresFit
- java.lang.Object
-
- ubic.basecode.math.linearmodels.LeastSquaresFit
-
public class LeastSquaresFit extends Object
For performing "bulk" linear model fits, but also offers simple methods for simple univariate and multivariate regression for a single vector of dependent variables (data). Has support for ebayes-like shrinkage of variance.Data with missing values is handled but is less memory efficient and somewhat slower. The main cost is that when there are no missing values, a single QR decomposition can be performed.
- Author:
- paul
-
-
Constructor Summary
Constructors Constructor Description LeastSquaresFit(cern.colt.matrix.DoubleMatrix1D vectorA, cern.colt.matrix.DoubleMatrix1D vectorB)Least squares fit between two vectors.LeastSquaresFit(cern.colt.matrix.DoubleMatrix1D vectorA, cern.colt.matrix.DoubleMatrix1D vectorB, cern.colt.matrix.DoubleMatrix1D weights)Stripped-down interface for simple use.LeastSquaresFit(cern.colt.matrix.DoubleMatrix2D A, cern.colt.matrix.DoubleMatrix2D b)ANOVA not possible (use the other constructors)LeastSquaresFit(cern.colt.matrix.DoubleMatrix2D A, cern.colt.matrix.DoubleMatrix2D b, cern.colt.matrix.DoubleMatrix2D weights)Weighted least squares fit between two matricesLeastSquaresFit(ObjectMatrix<String,String,Object> sampleInfo, cern.colt.matrix.impl.DenseDoubleMatrix2D data)LeastSquaresFit(ObjectMatrix<String,String,Object> sampleInfo, cern.colt.matrix.impl.DenseDoubleMatrix2D data, boolean interactions)LeastSquaresFit(ObjectMatrix<String,String,Object> design, DoubleMatrix<String,String> b)NamedMatrix allows easier handling of the results.LeastSquaresFit(ObjectMatrix<String,String,Object> design, DoubleMatrix<String,String> data, boolean interactions)NamedMatrix allows easier handling of the results.LeastSquaresFit(DesignMatrix designMatrix, cern.colt.matrix.DoubleMatrix2D b, cern.colt.matrix.DoubleMatrix2D weights)Preferred interface for weighted least squares fit between two matricesLeastSquaresFit(DesignMatrix designMatrix, DoubleMatrix<String,String> data)Preferred interface if you want control over how the design is set up.LeastSquaresFit(DesignMatrix designMatrix, DoubleMatrix<String,String> data, cern.colt.matrix.DoubleMatrix2D weights)Weighted least squares fit between two matrices
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected List<GenericAnovaResult>anova()Compute ANOVA based on the model fit (Type I SSQ, sequential)protected voidebayesUpdate(double d, double v, cern.colt.matrix.DoubleMatrix1D vp)Provide results of limma eBayes algorithm.cern.colt.matrix.DoubleMatrix2DgetCoefficients()The matrix of coefficients x for Ax = b (parameter estimates).doublegetDfPrior()cern.colt.matrix.DoubleMatrix2DgetFitted()intgetResidualDof()List<Integer>getResidualDofs()cern.colt.matrix.DoubleMatrix2DgetResiduals()cern.colt.matrix.DoubleMatrix2DgetStudentizedResiduals()cern.colt.matrix.DoubleMatrix1DgetVarPost()doublegetVarPrior()cern.colt.matrix.DoubleMatrix2DgetWeights()booleanisHasBeenShrunken()booleanisHasMissing()List<LinearModelSummary>summarize()List<LinearModelSummary>summarize(boolean anova)protected LinearModelSummarysummarize(int i)Compute and organize the various summary statistics for a fit.Map<String,LinearModelSummary>summarizeByKeys(boolean anova)
-
-
-
Constructor Detail
-
LeastSquaresFit
public LeastSquaresFit(DesignMatrix designMatrix, DoubleMatrix<String,String> data)
Preferred interface if you want control over how the design is set up.- Parameters:
designMatrix-data-
-
LeastSquaresFit
public LeastSquaresFit(DesignMatrix designMatrix, DoubleMatrix<String,String> data, cern.colt.matrix.DoubleMatrix2D weights)
Weighted least squares fit between two matrices- Parameters:
designMatrix-data-weights- to be used in modifying the influence of the observations in data.
-
LeastSquaresFit
public LeastSquaresFit(DesignMatrix designMatrix, cern.colt.matrix.DoubleMatrix2D b, cern.colt.matrix.DoubleMatrix2D weights)
Preferred interface for weighted least squares fit between two matrices- Parameters:
designMatrix-b- the dataweights- to be used in modifying the influence of the observations in vectorB.
-
LeastSquaresFit
public LeastSquaresFit(cern.colt.matrix.DoubleMatrix1D vectorA, cern.colt.matrix.DoubleMatrix1D vectorB)Least squares fit between two vectors. Always adds an intercept!- Parameters:
vectorA- DesignvectorB- Data
-
LeastSquaresFit
public LeastSquaresFit(cern.colt.matrix.DoubleMatrix1D vectorA, cern.colt.matrix.DoubleMatrix1D vectorB, cern.colt.matrix.DoubleMatrix1D weights)Stripped-down interface for simple use. Least squares fit between two vectors. Always adds an intercept!- Parameters:
vectorA- DesignvectorB- Dataweights- to be used in modifying the influence of the observations in vectorB.
-
LeastSquaresFit
public LeastSquaresFit(cern.colt.matrix.DoubleMatrix2D A, cern.colt.matrix.DoubleMatrix2D b)ANOVA not possible (use the other constructors)- Parameters:
A- Design matrix, which will be used directly in least squares regressionb- Data matrix, containing data in rows.
-
LeastSquaresFit
public LeastSquaresFit(cern.colt.matrix.DoubleMatrix2D A, cern.colt.matrix.DoubleMatrix2D b, cern.colt.matrix.DoubleMatrix2D weights)Weighted least squares fit between two matrices- Parameters:
A- Designb- Dataweights- to be used in modifying the influence of the observations in b. If null, will be ignored.
-
LeastSquaresFit
public LeastSquaresFit(ObjectMatrix<String,String,Object> sampleInfo, cern.colt.matrix.impl.DenseDoubleMatrix2D data)
- Parameters:
sampleInfo- information that will be converted to a design matrix; intercept term is added.data- Data matrix
-
LeastSquaresFit
public LeastSquaresFit(ObjectMatrix<String,String,Object> sampleInfo, cern.colt.matrix.impl.DenseDoubleMatrix2D data, boolean interactions)
- Parameters:
sampleInfo-data-interactions- add interaction term (two-way only is supported)
-
LeastSquaresFit
public LeastSquaresFit(ObjectMatrix<String,String,Object> design, DoubleMatrix<String,String> b)
NamedMatrix allows easier handling of the results.- Parameters:
design- information that will be converted to a design matrix; intercept term is added.b- Data matrix
-
LeastSquaresFit
public LeastSquaresFit(ObjectMatrix<String,String,Object> design, DoubleMatrix<String,String> data, boolean interactions)
NamedMatrix allows easier handling of the results.- Parameters:
design- information that will be converted to a design matrix; intercept term is added.data- Data matrix
-
-
Method Detail
-
getCoefficients
public cern.colt.matrix.DoubleMatrix2D getCoefficients()
The matrix of coefficients x for Ax = b (parameter estimates). Each column represents one fitted model (e.g., one gene); there is a row for each parameter.- Returns:
-
getDfPrior
public double getDfPrior()
-
getFitted
public cern.colt.matrix.DoubleMatrix2D getFitted()
-
getResidualDof
public int getResidualDof()
-
getResiduals
public cern.colt.matrix.DoubleMatrix2D getResiduals()
-
getStudentizedResiduals
public cern.colt.matrix.DoubleMatrix2D getStudentizedResiduals()
- Returns:
- externally studentized residuals (assumes we have only one QR)
-
getVarPost
public cern.colt.matrix.DoubleMatrix1D getVarPost()
-
getVarPrior
public double getVarPrior()
-
getWeights
public cern.colt.matrix.DoubleMatrix2D getWeights()
-
isHasBeenShrunken
public boolean isHasBeenShrunken()
-
isHasMissing
public boolean isHasMissing()
-
summarize
public List<LinearModelSummary> summarize()
- Returns:
- summaries. ANOVA will not be computed. If ebayesUpdate has been run, variance and degrees of freedom estimated using the limma eBayes algorithm will be used.
-
summarize
public List<LinearModelSummary> summarize(boolean anova)
- Parameters:
anova- if true, ANOVA will be computed- Returns:
-
summarizeByKeys
public Map<String,LinearModelSummary> summarizeByKeys(boolean anova)
- Parameters:
anova- perform ANOVA, otherwise only basic summarization will be done. If ebayesUpdate has been run, variance and degrees of freedom estimated using the limma eBayes algorithm will be used.- Returns:
-
anova
protected List<GenericAnovaResult> anova()
Compute ANOVA based on the model fit (Type I SSQ, sequential)The idea is to add up the sums of squares (and dof) for all parameters associated with a particular factor.
This code is more or less ported from R summary.aov.
- Returns:
-
ebayesUpdate
protected void ebayesUpdate(double d, double v, cern.colt.matrix.DoubleMatrix1D vp)Provide results of limma eBayes algorithm. These will be used next time summarize is called on this.- Parameters:
d- dfPriorv- varPriorvp- varPost
-
summarize
protected LinearModelSummary summarize(int i)
Compute and organize the various summary statistics for a fit.If ebayes has been run, variance and degrees of freedom estimated using the limma eBayes algorithm will be used.
Does not populate the ANOVA.
- Parameters:
i- index of the fit to summarize- Returns:
-
-