Class LeastSquaresFit


  • public class LeastSquaresFit
    extends Object
    For performing "bulk" linear model fits, but also offers simple methods for simple univariate and multivariate regression for a single vector of dependent variables (data). Has support for ebayes-like shrinkage of variance.

    Data with missing values is handled but is less memory efficient and somewhat slower. The main cost is that when there are no missing values, a single QR decomposition can be performed.

    Author:
    paul
    • Constructor Detail

      • LeastSquaresFit

        public LeastSquaresFit​(DesignMatrix designMatrix,
                               DoubleMatrix<String,​String> data)
        Preferred interface if you want control over how the design is set up.
        Parameters:
        designMatrix -
        data -
      • LeastSquaresFit

        public LeastSquaresFit​(DesignMatrix designMatrix,
                               DoubleMatrix<String,​String> data,
                               cern.colt.matrix.DoubleMatrix2D weights)
        Weighted least squares fit between two matrices
        Parameters:
        designMatrix -
        data -
        weights - to be used in modifying the influence of the observations in data.
      • LeastSquaresFit

        public LeastSquaresFit​(DesignMatrix designMatrix,
                               cern.colt.matrix.DoubleMatrix2D b,
                               cern.colt.matrix.DoubleMatrix2D weights)
        Preferred interface for weighted least squares fit between two matrices
        Parameters:
        designMatrix -
        b - the data
        weights - to be used in modifying the influence of the observations in vectorB.
      • LeastSquaresFit

        public LeastSquaresFit​(cern.colt.matrix.DoubleMatrix1D vectorA,
                               cern.colt.matrix.DoubleMatrix1D vectorB)
        Least squares fit between two vectors. Always adds an intercept!
        Parameters:
        vectorA - Design
        vectorB - Data
      • LeastSquaresFit

        public LeastSquaresFit​(cern.colt.matrix.DoubleMatrix1D vectorA,
                               cern.colt.matrix.DoubleMatrix1D vectorB,
                               cern.colt.matrix.DoubleMatrix1D weights)
        Stripped-down interface for simple use. Least squares fit between two vectors. Always adds an intercept!
        Parameters:
        vectorA - Design
        vectorB - Data
        weights - to be used in modifying the influence of the observations in vectorB.
      • LeastSquaresFit

        public LeastSquaresFit​(cern.colt.matrix.DoubleMatrix2D A,
                               cern.colt.matrix.DoubleMatrix2D b)
        ANOVA not possible (use the other constructors)
        Parameters:
        A - Design matrix, which will be used directly in least squares regression
        b - Data matrix, containing data in rows.
      • LeastSquaresFit

        public LeastSquaresFit​(cern.colt.matrix.DoubleMatrix2D A,
                               cern.colt.matrix.DoubleMatrix2D b,
                               cern.colt.matrix.DoubleMatrix2D weights)
        Weighted least squares fit between two matrices
        Parameters:
        A - Design
        b - Data
        weights - to be used in modifying the influence of the observations in b. If null, will be ignored.
      • LeastSquaresFit

        public LeastSquaresFit​(ObjectMatrix<String,​String,​Object> sampleInfo,
                               cern.colt.matrix.impl.DenseDoubleMatrix2D data)
        Parameters:
        sampleInfo - information that will be converted to a design matrix; intercept term is added.
        data - Data matrix
      • LeastSquaresFit

        public LeastSquaresFit​(ObjectMatrix<String,​String,​Object> sampleInfo,
                               cern.colt.matrix.impl.DenseDoubleMatrix2D data,
                               boolean interactions)
        Parameters:
        sampleInfo -
        data -
        interactions - add interaction term (two-way only is supported)
      • LeastSquaresFit

        public LeastSquaresFit​(ObjectMatrix<String,​String,​Object> design,
                               DoubleMatrix<String,​String> b)
        NamedMatrix allows easier handling of the results.
        Parameters:
        design - information that will be converted to a design matrix; intercept term is added.
        b - Data matrix
      • LeastSquaresFit

        public LeastSquaresFit​(ObjectMatrix<String,​String,​Object> design,
                               DoubleMatrix<String,​String> data,
                               boolean interactions)
        NamedMatrix allows easier handling of the results.
        Parameters:
        design - information that will be converted to a design matrix; intercept term is added.
        data - Data matrix
    • Method Detail

      • getCoefficients

        public cern.colt.matrix.DoubleMatrix2D getCoefficients()
        The matrix of coefficients x for Ax = b (parameter estimates). Each column represents one fitted model (e.g., one gene); there is a row for each parameter.
        Returns:
      • getDfPrior

        public double getDfPrior()
      • getFitted

        public cern.colt.matrix.DoubleMatrix2D getFitted()
      • getResidualDof

        public int getResidualDof()
      • getResidualDofs

        public List<Integer> getResidualDofs()
      • getResiduals

        public cern.colt.matrix.DoubleMatrix2D getResiduals()
      • getStudentizedResiduals

        public cern.colt.matrix.DoubleMatrix2D getStudentizedResiduals()
        Returns:
        externally studentized residuals (assumes we have only one QR)
      • getVarPost

        public cern.colt.matrix.DoubleMatrix1D getVarPost()
      • getVarPrior

        public double getVarPrior()
      • getWeights

        public cern.colt.matrix.DoubleMatrix2D getWeights()
      • isHasBeenShrunken

        public boolean isHasBeenShrunken()
      • isHasMissing

        public boolean isHasMissing()
      • summarize

        public List<LinearModelSummary> summarize()
        Returns:
        summaries. ANOVA will not be computed. If ebayesUpdate has been run, variance and degrees of freedom estimated using the limma eBayes algorithm will be used.
      • summarize

        public List<LinearModelSummary> summarize​(boolean anova)
        Parameters:
        anova - if true, ANOVA will be computed
        Returns:
      • summarizeByKeys

        public Map<String,​LinearModelSummary> summarizeByKeys​(boolean anova)
        Parameters:
        anova - perform ANOVA, otherwise only basic summarization will be done. If ebayesUpdate has been run, variance and degrees of freedom estimated using the limma eBayes algorithm will be used.
        Returns:
      • anova

        protected List<GenericAnovaResult> anova()
        Compute ANOVA based on the model fit (Type I SSQ, sequential)

        The idea is to add up the sums of squares (and dof) for all parameters associated with a particular factor.

        This code is more or less ported from R summary.aov.

        Returns:
      • ebayesUpdate

        protected void ebayesUpdate​(double d,
                                    double v,
                                    cern.colt.matrix.DoubleMatrix1D vp)
        Provide results of limma eBayes algorithm. These will be used next time summarize is called on this.
        Parameters:
        d - dfPrior
        v - varPrior
        vp - varPost
      • summarize

        protected LinearModelSummary summarize​(int i)
        Compute and organize the various summary statistics for a fit.

        If ebayes has been run, variance and degrees of freedom estimated using the limma eBayes algorithm will be used.

        Does not populate the ANOVA.

        Parameters:
        i - index of the fit to summarize
        Returns: