Package ubic.basecode.math.linearmodels
Class LeastSquaresFit
- java.lang.Object
-
- ubic.basecode.math.linearmodels.LeastSquaresFit
-
public class LeastSquaresFit extends Object
For performing "bulk" linear model fits, but also offers simple methods for simple univariate and multivariate regression for a single vector of dependent variables (data). Has support for ebayes-like shrinkage of variance.Data with missing values is handled but is less memory efficient and somewhat slower. The main cost is that when there are no missing values, a single QR decomposition can be performed.
- Author:
- paul
-
-
Constructor Summary
Constructors Constructor Description LeastSquaresFit(cern.colt.matrix.DoubleMatrix1D vectorA, cern.colt.matrix.DoubleMatrix1D vectorB)
Least squares fit between two vectors.LeastSquaresFit(cern.colt.matrix.DoubleMatrix1D vectorA, cern.colt.matrix.DoubleMatrix1D vectorB, cern.colt.matrix.DoubleMatrix1D weights)
Stripped-down interface for simple use.LeastSquaresFit(cern.colt.matrix.DoubleMatrix2D A, cern.colt.matrix.DoubleMatrix2D b)
ANOVA not possible (use the other constructors)LeastSquaresFit(cern.colt.matrix.DoubleMatrix2D A, cern.colt.matrix.DoubleMatrix2D b, cern.colt.matrix.DoubleMatrix2D weights)
Weighted least squares fit between two matricesLeastSquaresFit(ObjectMatrix<String,String,Object> sampleInfo, cern.colt.matrix.impl.DenseDoubleMatrix2D data)
LeastSquaresFit(ObjectMatrix<String,String,Object> sampleInfo, cern.colt.matrix.impl.DenseDoubleMatrix2D data, boolean interactions)
LeastSquaresFit(ObjectMatrix<String,String,Object> design, DoubleMatrix<String,String> b)
NamedMatrix allows easier handling of the results.LeastSquaresFit(ObjectMatrix<String,String,Object> design, DoubleMatrix<String,String> data, boolean interactions)
NamedMatrix allows easier handling of the results.LeastSquaresFit(DesignMatrix designMatrix, cern.colt.matrix.DoubleMatrix2D b, cern.colt.matrix.DoubleMatrix2D weights)
Preferred interface for weighted least squares fit between two matricesLeastSquaresFit(DesignMatrix designMatrix, DoubleMatrix<String,String> data)
Preferred interface if you want control over how the design is set up.LeastSquaresFit(DesignMatrix designMatrix, DoubleMatrix<String,String> data, cern.colt.matrix.DoubleMatrix2D weights)
Weighted least squares fit between two matrices
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected List<GenericAnovaResult>
anova()
Compute ANOVA based on the model fit (Type I SSQ, sequential)protected void
ebayesUpdate(double d, double v, cern.colt.matrix.DoubleMatrix1D vp)
Provide results of limma eBayes algorithm.cern.colt.matrix.DoubleMatrix2D
getCoefficients()
The matrix of coefficients x for Ax = b (parameter estimates).double
getDfPrior()
cern.colt.matrix.DoubleMatrix2D
getFitted()
int
getResidualDof()
List<Integer>
getResidualDofs()
cern.colt.matrix.DoubleMatrix2D
getResiduals()
cern.colt.matrix.DoubleMatrix2D
getStudentizedResiduals()
cern.colt.matrix.DoubleMatrix1D
getVarPost()
double
getVarPrior()
cern.colt.matrix.DoubleMatrix2D
getWeights()
boolean
isHasBeenShrunken()
boolean
isHasMissing()
List<LinearModelSummary>
summarize()
List<LinearModelSummary>
summarize(boolean anova)
protected LinearModelSummary
summarize(int i)
Compute and organize the various summary statistics for a fit.Map<String,LinearModelSummary>
summarizeByKeys(boolean anova)
-
-
-
Constructor Detail
-
LeastSquaresFit
public LeastSquaresFit(DesignMatrix designMatrix, DoubleMatrix<String,String> data)
Preferred interface if you want control over how the design is set up.- Parameters:
designMatrix
-data
-
-
LeastSquaresFit
public LeastSquaresFit(DesignMatrix designMatrix, DoubleMatrix<String,String> data, cern.colt.matrix.DoubleMatrix2D weights)
Weighted least squares fit between two matrices- Parameters:
designMatrix
-data
-weights
- to be used in modifying the influence of the observations in data.
-
LeastSquaresFit
public LeastSquaresFit(DesignMatrix designMatrix, cern.colt.matrix.DoubleMatrix2D b, cern.colt.matrix.DoubleMatrix2D weights)
Preferred interface for weighted least squares fit between two matrices- Parameters:
designMatrix
-b
- the dataweights
- to be used in modifying the influence of the observations in vectorB.
-
LeastSquaresFit
public LeastSquaresFit(cern.colt.matrix.DoubleMatrix1D vectorA, cern.colt.matrix.DoubleMatrix1D vectorB)
Least squares fit between two vectors. Always adds an intercept!- Parameters:
vectorA
- DesignvectorB
- Data
-
LeastSquaresFit
public LeastSquaresFit(cern.colt.matrix.DoubleMatrix1D vectorA, cern.colt.matrix.DoubleMatrix1D vectorB, cern.colt.matrix.DoubleMatrix1D weights)
Stripped-down interface for simple use. Least squares fit between two vectors. Always adds an intercept!- Parameters:
vectorA
- DesignvectorB
- Dataweights
- to be used in modifying the influence of the observations in vectorB.
-
LeastSquaresFit
public LeastSquaresFit(cern.colt.matrix.DoubleMatrix2D A, cern.colt.matrix.DoubleMatrix2D b)
ANOVA not possible (use the other constructors)- Parameters:
A
- Design matrix, which will be used directly in least squares regressionb
- Data matrix, containing data in rows.
-
LeastSquaresFit
public LeastSquaresFit(cern.colt.matrix.DoubleMatrix2D A, cern.colt.matrix.DoubleMatrix2D b, cern.colt.matrix.DoubleMatrix2D weights)
Weighted least squares fit between two matrices- Parameters:
A
- Designb
- Dataweights
- to be used in modifying the influence of the observations in b. If null, will be ignored.
-
LeastSquaresFit
public LeastSquaresFit(ObjectMatrix<String,String,Object> sampleInfo, cern.colt.matrix.impl.DenseDoubleMatrix2D data)
- Parameters:
sampleInfo
- information that will be converted to a design matrix; intercept term is added.data
- Data matrix
-
LeastSquaresFit
public LeastSquaresFit(ObjectMatrix<String,String,Object> sampleInfo, cern.colt.matrix.impl.DenseDoubleMatrix2D data, boolean interactions)
- Parameters:
sampleInfo
-data
-interactions
- add interaction term (two-way only is supported)
-
LeastSquaresFit
public LeastSquaresFit(ObjectMatrix<String,String,Object> design, DoubleMatrix<String,String> b)
NamedMatrix allows easier handling of the results.- Parameters:
design
- information that will be converted to a design matrix; intercept term is added.b
- Data matrix
-
LeastSquaresFit
public LeastSquaresFit(ObjectMatrix<String,String,Object> design, DoubleMatrix<String,String> data, boolean interactions)
NamedMatrix allows easier handling of the results.- Parameters:
design
- information that will be converted to a design matrix; intercept term is added.data
- Data matrix
-
-
Method Detail
-
getCoefficients
public cern.colt.matrix.DoubleMatrix2D getCoefficients()
The matrix of coefficients x for Ax = b (parameter estimates). Each column represents one fitted model (e.g., one gene); there is a row for each parameter.- Returns:
-
getDfPrior
public double getDfPrior()
-
getFitted
public cern.colt.matrix.DoubleMatrix2D getFitted()
-
getResidualDof
public int getResidualDof()
-
getResiduals
public cern.colt.matrix.DoubleMatrix2D getResiduals()
-
getStudentizedResiduals
public cern.colt.matrix.DoubleMatrix2D getStudentizedResiduals()
- Returns:
- externally studentized residuals (assumes we have only one QR)
-
getVarPost
public cern.colt.matrix.DoubleMatrix1D getVarPost()
-
getVarPrior
public double getVarPrior()
-
getWeights
public cern.colt.matrix.DoubleMatrix2D getWeights()
-
isHasBeenShrunken
public boolean isHasBeenShrunken()
-
isHasMissing
public boolean isHasMissing()
-
summarize
public List<LinearModelSummary> summarize()
- Returns:
- summaries. ANOVA will not be computed. If ebayesUpdate has been run, variance and degrees of freedom estimated using the limma eBayes algorithm will be used.
-
summarize
public List<LinearModelSummary> summarize(boolean anova)
- Parameters:
anova
- if true, ANOVA will be computed- Returns:
-
summarizeByKeys
public Map<String,LinearModelSummary> summarizeByKeys(boolean anova)
- Parameters:
anova
- perform ANOVA, otherwise only basic summarization will be done. If ebayesUpdate has been run, variance and degrees of freedom estimated using the limma eBayes algorithm will be used.- Returns:
-
anova
protected List<GenericAnovaResult> anova()
Compute ANOVA based on the model fit (Type I SSQ, sequential)The idea is to add up the sums of squares (and dof) for all parameters associated with a particular factor.
This code is more or less ported from R summary.aov.
- Returns:
-
ebayesUpdate
protected void ebayesUpdate(double d, double v, cern.colt.matrix.DoubleMatrix1D vp)
Provide results of limma eBayes algorithm. These will be used next time summarize is called on this.- Parameters:
d
- dfPriorv
- varPriorvp
- varPost
-
summarize
protected LinearModelSummary summarize(int i)
Compute and organize the various summary statistics for a fit.If ebayes has been run, variance and degrees of freedom estimated using the limma eBayes algorithm will be used.
Does not populate the ANOVA.
- Parameters:
i
- index of the fit to summarize- Returns:
-
-