Class LeastSquaresFit

java.lang.Object
ubic.basecode.math.linearmodels.LeastSquaresFit

public class LeastSquaresFit extends Object
For performing "bulk" linear model fits, but also offers simple methods for simple univariate and multivariate regression for a single vector of dependent variables (data). Has support for ebayes-like shrinkage of variance.

Data with missing values is handled but is less memory efficient and somewhat slower. The main cost is that when there are no missing values, a single QR decomposition can be performed.

Author:
paul
  • Constructor Details

    • LeastSquaresFit

      public LeastSquaresFit(DesignMatrix designMatrix, DoubleMatrix<String,String> data)
      Preferred interface if you want control over how the design is set up.
      Parameters:
      designMatrix -
      data -
    • LeastSquaresFit

      public LeastSquaresFit(DesignMatrix designMatrix, DoubleMatrix<String,String> data, DoubleMatrix2D weights)
      Weighted least squares fit between two matrices
      Parameters:
      designMatrix -
      data -
      weights - to be used in modifying the influence of the observations in data.
    • LeastSquaresFit

      public LeastSquaresFit(DesignMatrix designMatrix, DoubleMatrix2D b, DoubleMatrix2D weights)
      Preferred interface for weighted least squares fit between two matrices
      Parameters:
      designMatrix -
      b - the data
      weights - to be used in modifying the influence of the observations in vectorB.
    • LeastSquaresFit

      public LeastSquaresFit(DoubleMatrix1D vectorA, DoubleMatrix1D vectorB)
      Least squares fit between two vectors. Always adds an intercept!
      Parameters:
      vectorA - Design
      vectorB - Data
    • LeastSquaresFit

      public LeastSquaresFit(DoubleMatrix1D vectorA, DoubleMatrix1D vectorB, DoubleMatrix1D weights)
      Stripped-down interface for simple use. Least squares fit between two vectors. Always adds an intercept!
      Parameters:
      vectorA - Design
      vectorB - Data
      weights - to be used in modifying the influence of the observations in vectorB.
    • LeastSquaresFit

      public LeastSquaresFit(DoubleMatrix2D A, DoubleMatrix2D b)
      ANOVA not possible (use the other constructors)
      Parameters:
      A - Design matrix, which will be used directly in least squares regression
      b - Data matrix, containing data in rows.
    • LeastSquaresFit

      public LeastSquaresFit(DoubleMatrix2D A, DoubleMatrix2D b, DoubleMatrix2D weights)
      Weighted least squares fit between two matrices
      Parameters:
      A - Design
      b - Data
      weights - to be used in modifying the influence of the observations in b. If null, will be ignored.
    • LeastSquaresFit

      public LeastSquaresFit(ObjectMatrix<String,String,Object> sampleInfo, DenseDoubleMatrix2D data)
      Parameters:
      sampleInfo - information that will be converted to a design matrix; intercept term is added.
      data - Data matrix
    • LeastSquaresFit

      public LeastSquaresFit(ObjectMatrix<String,String,Object> sampleInfo, DenseDoubleMatrix2D data, boolean interactions)
      Parameters:
      sampleInfo -
      data -
      interactions - add interaction term (two-way only is supported)
    • LeastSquaresFit

      public LeastSquaresFit(ObjectMatrix<String,String,Object> design, DoubleMatrix<String,String> b)
      NamedMatrix allows easier handling of the results.
      Parameters:
      design - information that will be converted to a design matrix; intercept term is added.
      b - Data matrix
    • LeastSquaresFit

      public LeastSquaresFit(ObjectMatrix<String,String,Object> design, DoubleMatrix<String,String> data, boolean interactions)
      NamedMatrix allows easier handling of the results.
      Parameters:
      design - information that will be converted to a design matrix; intercept term is added.
      data - Data matrix
  • Method Details

    • getCoefficients

      public DoubleMatrix2D getCoefficients()
      The matrix of coefficients x for Ax = b (parameter estimates). Each column represents one fitted model (e.g., one gene); there is a row for each parameter.
      Returns:
    • getDfPrior

      public double getDfPrior()
    • getFitted

      public DoubleMatrix2D getFitted()
    • getResidualDof

      public int getResidualDof()
    • getResidualDofs

      public List<Integer> getResidualDofs()
    • getResiduals

      public DoubleMatrix2D getResiduals()
    • getStudentizedResiduals

      public DoubleMatrix2D getStudentizedResiduals()
      Returns:
      externally studentized residuals (assumes we have only one QR)
    • getVarPost

      public DoubleMatrix1D getVarPost()
    • getVarPrior

      public double getVarPrior()
    • getWeights

      public DoubleMatrix2D getWeights()
    • isHasBeenShrunken

      public boolean isHasBeenShrunken()
    • isHasMissing

      public boolean isHasMissing()
    • summarize

      public List<LinearModelSummary> summarize()
      Returns:
      summaries. ANOVA will not be computed. If ebayesUpdate has been run, variance and degrees of freedom estimated using the limma eBayes algorithm will be used.
    • summarize

      public List<LinearModelSummary> summarize(boolean anova)
      Parameters:
      anova - if true, ANOVA will be computed
      Returns:
    • summarizeByKeys

      public Map<String,LinearModelSummary> summarizeByKeys(boolean anova)
      Parameters:
      anova - perform ANOVA, otherwise only basic summarization will be done. If ebayesUpdate has been run, variance and degrees of freedom estimated using the limma eBayes algorithm will be used.
      Returns:
    • anova

      protected List<GenericAnovaResult> anova()
      Compute ANOVA based on the model fit (Type I SSQ, sequential)

      The idea is to add up the sums of squares (and dof) for all parameters associated with a particular factor.

      This code is more or less ported from R summary.aov.

      Returns:
    • ebayesUpdate

      protected void ebayesUpdate(double d, double v, DoubleMatrix1D vp)
      Provide results of limma eBayes algorithm. These will be used next time summarize is called on this.
      Parameters:
      d - dfPrior
      v - varPrior
      vp - varPost