Class DescriptiveWithMissing


  • public class DescriptiveWithMissing
    extends cern.jet.stat.Descriptive
    Mathematical functions for statistics that allow missing values without scotching the calculations.

    Be careful because some methods from cern.jet.stat.Descriptive have not been overridden and will yield a UnsupportedOperationException if used.

    Some functions that come with DoubleArrayLists will not work in an entirely compatible way with missing values. For examples, size() reports the total number of elements, including missing values. To get a count of non-missing values, use this.sizeWithoutMissingValues(). The right one to use may vary.

    Not all methods need to be overridden. However, all methods that take a "size" parameter should be passed the results of sizeWithoutMissingValues(data), instead of data.size().

    Based in part on code from the colt package: Copyright © 1999 CERN - European Organization for Nuclear Research.

    Author:
    Paul Pavlidis
    See Also:
    cern.jet.stat.Descriptive
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static double autoCorrelation​(cern.colt.list.DoubleArrayList data, int lag, double mean, double variance)
      Not supported.
      static double correlation​(double[] x, double[] y, double[] selfSquaredX, double[] selfSquaredY, boolean[] nanStatusX, boolean[] nanStatusY)
      Highly optimized version of the correlation computation, where as much information is precomputed as possible.
      static double correlation​(cern.colt.list.DoubleArrayList data1, double standardDev1, cern.colt.list.DoubleArrayList data2, double standardDev2)
      Returns the correlation of two data sequences.
      static double correlation​(cern.colt.list.DoubleArrayList x, cern.colt.list.DoubleArrayList y)
      Calculate the pearson correlation of two arrays.
      static double covariance​(cern.colt.list.DoubleArrayList data1, cern.colt.list.DoubleArrayList data2)
      Returns the SAMPLE covariance of two data sequences.
      static double durbinWatson​(cern.colt.list.DoubleArrayList data)
      Durbin-Watson computation.
      static double geometricMean​(cern.colt.list.DoubleArrayList data)
      Returns the geometric mean of a data sequence.
      static void incrementalUpdate​(cern.colt.list.DoubleArrayList data, int from, int to, double[] inOut)
      Incrementally maintains and updates minimum, maximum, sum and sum of squares of a data sequence.
      static void incrementalUpdateSumsOfPowers​(cern.colt.list.DoubleArrayList data, int from, int to, int fromSumIndex, int toSumIndex, double[] sumOfPowers)
      Not supported.
      static void incrementalWeightedUpdate​(cern.colt.list.DoubleArrayList data, cern.colt.list.DoubleArrayList weights, int from, int to, double[] inOut)
      Not supported.
      static double kurtosis​(double moment4, double standardDeviation)
      Returns the kurtosis (aka excess) of a data sequence.
      static double kurtosis​(cern.colt.list.DoubleArrayList data, double mean, double standardDeviation)
      Returns the kurtosis (aka excess) of a data sequence, which is -3 + moment(data,4,mean) / standardDeviation4.
      static double lag1​(cern.colt.list.DoubleArrayList data, double mean)
      Not supported.
      static double mad​(cern.colt.list.DoubleArrayList dat)  
      static double max​(cern.colt.list.DoubleArrayList input)  
      static double mean​(double[] elements, int effectiveSize)
      Special mean calculation where we use the effective size as an input.
      static double mean​(cern.colt.list.DoubleArrayList data)  
      static double mean​(cern.colt.list.DoubleArrayList x, int effectiveSize)
      Special mean calculation where we use the effective size as an input.
      static double meanAboveQuantile​(double quantile, cern.colt.list.DoubleArrayList array)
      Calculate the mean of the values above to a particular quantile of an array.
      static double median​(cern.colt.list.DoubleArrayList data)
      Returns the median.
      static double min​(cern.colt.list.DoubleArrayList input)  
      static double moment​(cern.colt.list.DoubleArrayList data, int k, double c)
      Returns the moment of k -th order with constant c of a data sequence, which is Sum( (data[i]-c)k ) / data.size().
      static double product​(cern.colt.list.DoubleArrayList data)
      Returns the product of a data sequence, which is Prod( data[i] ).
      static double quantile​(cern.colt.list.DoubleArrayList data, double phi)
      Returns the phi- quantile; that is, an element elem for which holds that phi percent of data elements are less than elem.
      static double quantileInverse​(cern.colt.list.DoubleArrayList data, double element)
      Returns how many percent of the elements contained in the receiver are <= element.
      static cern.colt.list.DoubleArrayList quantiles​(cern.colt.list.DoubleArrayList sortedData, cern.colt.list.DoubleArrayList percentages)
      Returns the quantiles of the specified percentages.
      static double rankInterpolated​(cern.colt.list.DoubleArrayList sortedList, double element)
      Returns the linearly interpolated number of elements in a list less or equal to a given element.
      static cern.colt.list.DoubleArrayList removeMissing​(cern.colt.list.DoubleArrayList data)
      Makes a copy of the list that doesn't have the missing values.
      static double sampleKurtosis​(cern.colt.list.DoubleArrayList data, double mean, double sampleVariance)
      Returns the sample kurtosis (aka excess) of a data sequence.
      static double sampleSkew​(cern.colt.list.DoubleArrayList data, double mean, double sampleVariance)
      Returns the sample skew of a data sequence.
      static double sampleStandardDeviation​(int size, double sampleVariance)
      Returns the sample standard deviation.
      static double sampleVariance​(cern.colt.list.DoubleArrayList data, double mean)
      Returns the sample variance of a data sequence.
      static int sizeWithoutMissingValues​(cern.colt.list.DoubleArrayList list)
      Return the size of the list, ignoring missing values.
      static double skew​(cern.colt.list.DoubleArrayList data, double mean, double standardDeviation)
      Returns the skew of a data sequence, which is moment(data,3,mean) / standardDeviation3.
      static void standardize​(cern.colt.list.DoubleArrayList data)
      Standardize.
      static void standardize​(cern.colt.list.DoubleArrayList data, double mean, double standardDeviation)
      Modifies a data sequence to be standardized.
      static double sum​(cern.colt.list.DoubleArrayList data)
      Returns the sum of a data sequence.
      static double sumOfInversions​(cern.colt.list.DoubleArrayList data, int from, int to)
      Returns the sum of inversions of a data sequence, which is Sum( 1.0 / data[i]).
      static double sumOfLogarithms​(cern.colt.list.DoubleArrayList data, int from, int to)
      Returns the sum of logarithms of a data sequence, which is Sum( Log(data[i]).
      static double sumOfPowerDeviations​(cern.colt.list.DoubleArrayList data, int k, double c)
      Returns Sum( (data[i]-c)k ); optimized for common parameters like c == 0.0 and/or k == -2 ..
      static double sumOfPowerDeviations​(cern.colt.list.DoubleArrayList data, int k, double c, int from, int to)
      Returns Sum( (data[i]-c)k ) for all i = from ..
      static double sumOfPowers​(cern.colt.list.DoubleArrayList data, int k)
      Returns the sum of powers of a data sequence, which is Sum ( data[i]k ).
      static double sumOfSquaredDeviations​(cern.colt.list.DoubleArrayList data)
      Compute the sum of the squared deviations from the mean of a data sequence.
      static double sumOfSquares​(cern.colt.list.DoubleArrayList data)
      Returns the sum of squares of a data sequence.
      static double trimmedMean​(cern.colt.list.DoubleArrayList sortedData, double mean, int left, int right)
      Returns the trimmed mean of a sorted data sequence.
      static double variance​(int sizeWithoutMissing, double sum, double sumOfSquares)  
      static double variance​(cern.colt.list.DoubleArrayList data)
      Provided for convenience!
      static double weightedMean​(cern.colt.list.DoubleArrayList data, cern.colt.list.DoubleArrayList weights)
      Returns the weighted mean of a data sequence.
      static double winsorizedMean​(cern.colt.list.DoubleArrayList sortedData, double mean, int left, int right)
      Returns the winsorized mean of a sorted data sequence.
      • Methods inherited from class cern.jet.stat.Descriptive

        checkRangeFromTo, frequencies, geometricMean, harmonicMean, meanDeviation, moment, pooledMean, pooledVariance, product, rms, sampleKurtosis, sampleKurtosisStandardError, sampleSkew, sampleSkewStandardError, sampleVariance, sampleWeightedVariance, skew, split, standardDeviation, standardError, sumOfSquaredDeviations, variance, weightedRMS
    • Method Detail

      • autoCorrelation

        public static double autoCorrelation​(cern.colt.list.DoubleArrayList data,
                                             int lag,
                                             double mean,
                                             double variance)
        Not supported.
        Parameters:
        data - DoubleArrayList
        lag - int
        mean - double
        variance - double
        Returns:
        double
      • correlation

        public static double correlation​(double[] x,
                                         double[] y,
                                         double[] selfSquaredX,
                                         double[] selfSquaredY,
                                         boolean[] nanStatusX,
                                         boolean[] nanStatusY)
        Highly optimized version of the correlation computation, where as much information is precomputed as possible. Use of this method only makes sense if many comparisons with the inputs x and y are being performed.

        Implementation note: In correlation(DoubleArrayList x, DoubleArrayList y), profiling shows that calls to Double.NaN consume half the CPU time. The precomputation of the element-by-element squared values is another obvious optimization. There is also no checking for matching lengths of the arrays.

        Parameters:
        x -
        y -
        selfSquaredX - double array containing values of x_i^2 for each x.
        selfSquaredY -
        nanStatusX - boolean array containing value of Double.isNaN() for each X.
        nanStatusY -
        Returns:
      • correlation

        public static double correlation​(cern.colt.list.DoubleArrayList data1,
                                         double standardDev1,
                                         cern.colt.list.DoubleArrayList data2,
                                         double standardDev2)
        Returns the correlation of two data sequences. That is covariance(data1,data2)/(standardDev1*standardDev2). Missing values are ignored. This method is overridden to stop users from using the method in the superclass when missing values are present. The problem is that the standard deviation cannot be computed without knowning which values are not missing in both vectors to be compared. Thus the standardDev parameters are thrown away by this method.
        Parameters:
        data1 - DoubleArrayList
        standardDev1 - double - not used
        data2 - DoubleArrayList
        standardDev2 - double - not used
        Returns:
        double
      • correlation

        public static double correlation​(cern.colt.list.DoubleArrayList x,
                                         cern.colt.list.DoubleArrayList y)
        Calculate the pearson correlation of two arrays. Missing values (NaNs) are ignored.
        Parameters:
        x - DoubleArrayList
        y - DoubleArrayList
        Returns:
        double
      • covariance

        public static double covariance​(cern.colt.list.DoubleArrayList data1,
                                        cern.colt.list.DoubleArrayList data2)
        Returns the SAMPLE covariance of two data sequences. Pairs of values are only considered if both are not NaN. If there are no non-missing pairs, the covariance is zero.
        Parameters:
        data1 - the first vector
        data2 - the second vector
        Returns:
        double
      • durbinWatson

        public static double durbinWatson​(cern.colt.list.DoubleArrayList data)
        Durbin-Watson computation. This measures the serial correlation in a data series.
        Parameters:
        data - DoubleArrayList
        Returns:
        double
      • geometricMean

        public static double geometricMean​(cern.colt.list.DoubleArrayList data)
        Returns the geometric mean of a data sequence. Missing values are ignored. Note that for a geometric mean to be meaningful, the minimum of the data sequence must not be less or equal to zero.
        The geometric mean is given by pow( Product( data[i] ), 1/data.size()). This method tries to avoid overflows at the expense of an equivalent but somewhat slow definition: geo = Math.exp( Sum( Log(data[i]) ) / data.size()).
        Parameters:
        data - DoubleArrayList
        Returns:
        double
      • incrementalUpdate

        public static void incrementalUpdate​(cern.colt.list.DoubleArrayList data,
                                             int from,
                                             int to,
                                             double[] inOut)
        Incrementally maintains and updates minimum, maximum, sum and sum of squares of a data sequence.

        Assume we have already recorded some data sequence elements and know their minimum, maximum, sum and sum of squares. Assume further, we are to record some more elements and to derive updated values of minimum, maximum, sum and sum of squares. This method computes those updated values without needing to know the already recorded elements. This is interesting for interactive online monitoring and/or applications that cannot keep the entire huge data sequence in memory.

        Parameters:
        data - the additional elements to be incorporated into min, max, etc.
        from - the index of the first element within data to consider.
        to - the index of the last element within data to consider. The method incorporates elements data[from], ..., data[to].
        inOut - the old values in the following format:
        • inOut[0] is the old minimum.
        • inOut[1] is the old maximum.
        • inOut[2] is the old sum.
        • inOut[3] is the old sum of squares.
        If no data sequence elements have so far been recorded set the values as follows
        • inOut[0] = Double.POSITIVE_INFINITY as the old minimum.
        • inOut[1] = Double.NEGATIVE_INFINITY as the old maximum.
        • inOut[2] = 0.0 as the old sum.
        • inOut[3] = 0.0 as the old sum of squares.
      • incrementalUpdateSumsOfPowers

        public static void incrementalUpdateSumsOfPowers​(cern.colt.list.DoubleArrayList data,
                                                         int from,
                                                         int to,
                                                         int fromSumIndex,
                                                         int toSumIndex,
                                                         double[] sumOfPowers)
        Not supported.
        Parameters:
        data - DoubleArrayList
        from - int
        to - int
        fromSumIndex - int
        toSumIndex - int
        sumOfPowers - double[]
      • incrementalWeightedUpdate

        public static void incrementalWeightedUpdate​(cern.colt.list.DoubleArrayList data,
                                                     cern.colt.list.DoubleArrayList weights,
                                                     int from,
                                                     int to,
                                                     double[] inOut)
        Not supported.
        Parameters:
        data - DoubleArrayList
        weights - DoubleArrayList
        from - int
        to - int
        inOut - double[]
      • kurtosis

        public static double kurtosis​(double moment4,
                                      double standardDeviation)
        Returns the kurtosis (aka excess) of a data sequence.
        Parameters:
        moment4 - the fourth central moment, which is moment(data,4,mean).
        standardDeviation - the standardDeviation.
        Returns:
        double
      • kurtosis

        public static double kurtosis​(cern.colt.list.DoubleArrayList data,
                                      double mean,
                                      double standardDeviation)
        Returns the kurtosis (aka excess) of a data sequence, which is -3 + moment(data,4,mean) / standardDeviation4.
        Parameters:
        data - DoubleArrayList
        mean - double
        standardDeviation - double
        Returns:
        double
      • lag1

        public static double lag1​(cern.colt.list.DoubleArrayList data,
                                  double mean)
        Not supported.
        Parameters:
        data - DoubleArrayList
        mean - double
        Returns:
        double
      • mad

        public static double mad​(cern.colt.list.DoubleArrayList dat)
        Parameters:
        dat -
        Returns:
        the median absolute deviation from the median.
      • max

        public static double max​(cern.colt.list.DoubleArrayList input)
      • mean

        public static double mean​(double[] elements,
                                  int effectiveSize)
        Special mean calculation where we use the effective size as an input.
        Parameters:
        elements - The data double array.
        effectiveSize - The effective size used for the mean calculation.
        Returns:
        double
      • mean

        public static double mean​(cern.colt.list.DoubleArrayList data)
        Parameters:
        data - Values to be analyzed.
        Returns:
        Mean of the values in x. Missing values are ignored in the analysis.
      • mean

        public static double mean​(cern.colt.list.DoubleArrayList x,
                                  int effectiveSize)
        Special mean calculation where we use the effective size as an input.
        Parameters:
        x - The data
        effectiveSize - The effective size used for the mean calculation.
        Returns:
        double
      • meanAboveQuantile

        public static double meanAboveQuantile​(double quantile,
                                               cern.colt.list.DoubleArrayList array)
        Calculate the mean of the values above to a particular quantile of an array.
        Parameters:
        quantile - A value from 0 to 1
        array - Array for which we want to get the quantile.
        Returns:
        double
      • median

        public static double median​(cern.colt.list.DoubleArrayList data)
        Returns the median. Missing values are ignored entirely.
        Parameters:
        data - the data sequence, does not have to be sorted.
        Returns:
        double
      • min

        public static double min​(cern.colt.list.DoubleArrayList input)
      • moment

        public static double moment​(cern.colt.list.DoubleArrayList data,
                                    int k,
                                    double c)
        Returns the moment of k -th order with constant c of a data sequence, which is Sum( (data[i]-c)k ) / data.size().
        Parameters:
        data - DoubleArrayList
        k - int
        c - double
        Returns:
        double
      • product

        public static double product​(cern.colt.list.DoubleArrayList data)
        Returns the product of a data sequence, which is Prod( data[i] ). Missing values are ignored. In other words: data[0]*data[1]*...*data[data.size()-1]. Note that you may easily get numeric overflows.
        Parameters:
        data - DoubleArrayList
        Returns:
        double
      • quantile

        public static double quantile​(cern.colt.list.DoubleArrayList data,
                                      double phi)
        Returns the phi- quantile; that is, an element elem for which holds that phi percent of data elements are less than elem. Missing values are ignored. The quantile need not necessarily be contained in the data sequence, it can be a linear interpolation.
        Parameters:
        data - the data sequence, does not have to be sorted.
        phi - the percentage; must satisfy 0 <= phi <= 1.
        Returns:
        double
      • quantileInverse

        public static double quantileInverse​(cern.colt.list.DoubleArrayList data,
                                             double element)
        Returns how many percent of the elements contained in the receiver are <= element. Does linear interpolation if the element is not contained but lies in between two contained elements. Missing values are ignored.
        Parameters:
        data - the list to be searched
        element - the element to search for.
        Returns:
        the percentage phi of elements <= element(0.0 <= phi <= 1.0).
      • quantiles

        public static cern.colt.list.DoubleArrayList quantiles​(cern.colt.list.DoubleArrayList sortedData,
                                                               cern.colt.list.DoubleArrayList percentages)
        Returns the quantiles of the specified percentages. The quantiles need not necessarily be contained in the data sequence, it can be a linear interpolation.
        Parameters:
        sortedData - the data sequence; must be sorted ascending .
        percentages - the percentages for which quantiles are to be computed. Each percentage must be in the interval [0.0,1.0].
        Returns:
        the quantiles.
      • rankInterpolated

        public static double rankInterpolated​(cern.colt.list.DoubleArrayList sortedList,
                                              double element)
        Returns the linearly interpolated number of elements in a list less or equal to a given element. Missing values are ignored. The rank is the number of elements <= element. Ranks are of the form {0, 1, 2,..., sortedList.size()}. If no element is <= element, then the rank is zero. If the element lies in between two contained elements, then linear interpolation is used and a non integer value is returned.
        Parameters:
        sortedList - the list to be searched (must be sorted ascending).
        element - the element to search for.
        Returns:
        the rank of the element.
      • removeMissing

        public static cern.colt.list.DoubleArrayList removeMissing​(cern.colt.list.DoubleArrayList data)
        Makes a copy of the list that doesn't have the missing values.
        Parameters:
        data - DoubleArrayList
        Returns:
        DoubleArrayList
      • sampleKurtosis

        public static double sampleKurtosis​(cern.colt.list.DoubleArrayList data,
                                            double mean,
                                            double sampleVariance)
        Returns the sample kurtosis (aka excess) of a data sequence.
        Parameters:
        data - DoubleArrayList
        mean - double
        sampleVariance - double
        Returns:
        double
      • sampleSkew

        public static double sampleSkew​(cern.colt.list.DoubleArrayList data,
                                        double mean,
                                        double sampleVariance)
        Returns the sample skew of a data sequence.
        Parameters:
        data - DoubleArrayList
        mean - double
        sampleVariance - double
        Returns:
        double
      • sampleStandardDeviation

        public static double sampleStandardDeviation​(int size,
                                                     double sampleVariance)
        Returns the sample standard deviation.

        This is included for compatibility with the superclass, but does not implement the correction used there.

        Parameters:
        size - the number of elements of the data sequence.
        sampleVariance - the sample variance .
        See Also:
        Descriptive.sampleStandardDeviation(int, double)
      • sampleVariance

        public static double sampleVariance​(cern.colt.list.DoubleArrayList data,
                                            double mean)
        Returns the sample variance of a data sequence. That is Sum ( (data[i]-mean)^2 ) / (data.size()-1).
        Parameters:
        data - DoubleArrayList
        mean - double
        Returns:
        double
      • sizeWithoutMissingValues

        public static int sizeWithoutMissingValues​(cern.colt.list.DoubleArrayList list)
        Return the size of the list, ignoring missing values.
        Parameters:
        list - DoubleArrayList
        Returns:
        int
      • skew

        public static double skew​(cern.colt.list.DoubleArrayList data,
                                  double mean,
                                  double standardDeviation)
        Returns the skew of a data sequence, which is moment(data,3,mean) / standardDeviation3.
        Parameters:
        data - DoubleArrayList
        mean - double
        standardDeviation - double
        Returns:
        double
      • standardize

        public static void standardize​(cern.colt.list.DoubleArrayList data)
        Standardize. Note that this does something slightly different than standardize in the superclass, because our sampleStandardDeviation does not use the correction of the superclass (which isn't really standard).
        Parameters:
        data - DoubleArrayList
      • standardize

        public static void standardize​(cern.colt.list.DoubleArrayList data,
                                       double mean,
                                       double standardDeviation)
        Modifies a data sequence to be standardized. Mising values are ignored. Changes each element data[i] as follows: data[i] = (data[i]-mean)/standardDeviation.
        Parameters:
        data - DoubleArrayList
        mean - mean of data
        standardDeviation - stdev of data
      • sum

        public static double sum​(cern.colt.list.DoubleArrayList data)
        Returns the sum of a data sequence. That is Sum( data[i] ).
        Parameters:
        data - DoubleArrayList
        Returns:
        double
      • sumOfInversions

        public static double sumOfInversions​(cern.colt.list.DoubleArrayList data,
                                             int from,
                                             int to)
        Returns the sum of inversions of a data sequence, which is Sum( 1.0 / data[i]).
        Parameters:
        data - the data sequence.
        from - the index of the first data element (inclusive).
        to - the index of the last data element (inclusive).
        Returns:
        double
      • sumOfLogarithms

        public static double sumOfLogarithms​(cern.colt.list.DoubleArrayList data,
                                             int from,
                                             int to)
        Returns the sum of logarithms of a data sequence, which is Sum( Log(data[i]). Missing values are ignored.
        Parameters:
        data - the data sequence.
        from - the index of the first data element (inclusive).
        to - the index of the last data element (inclusive).
        Returns:
        double
      • sumOfPowerDeviations

        public static double sumOfPowerDeviations​(cern.colt.list.DoubleArrayList data,
                                                  int k,
                                                  double c)
        Returns Sum( (data[i]-c)k ); optimized for common parameters like c == 0.0 and/or k == -2 .. 4.
        Parameters:
        data - DoubleArrayList
        k - int
        c - double
        Returns:
        double
      • sumOfPowerDeviations

        public static double sumOfPowerDeviations​(cern.colt.list.DoubleArrayList data,
                                                  int k,
                                                  double c,
                                                  int from,
                                                  int to)
        Returns Sum( (data[i]-c)k ) for all i = from .. to; optimized for common parameters like c == 0.0 and/or k == -2 .. 5. Missing values are ignored.
        Parameters:
        data - DoubleArrayList
        k - int
        c - double
        from - int
        to - int
        Returns:
        double
      • sumOfPowers

        public static double sumOfPowers​(cern.colt.list.DoubleArrayList data,
                                         int k)
        Returns the sum of powers of a data sequence, which is Sum ( data[i]k ).
        Parameters:
        data - DoubleArrayList
        k - int
        Returns:
        double
      • sumOfSquaredDeviations

        public static double sumOfSquaredDeviations​(cern.colt.list.DoubleArrayList data)
        Compute the sum of the squared deviations from the mean of a data sequence. Missing values are ignored.
        Parameters:
        data - DoubleArrayList
        Returns:
        double
      • sumOfSquares

        public static double sumOfSquares​(cern.colt.list.DoubleArrayList data)
        Returns the sum of squares of a data sequence. Skips missing values.
        Parameters:
        data - DoubleArrayList
        Returns:
        double
      • trimmedMean

        public static double trimmedMean​(cern.colt.list.DoubleArrayList sortedData,
                                         double mean,
                                         int left,
                                         int right)
        Returns the trimmed mean of a sorted data sequence. Missing values are completely ignored.
        Parameters:
        sortedData - the data sequence; must be sorted ascending .
        mean - the mean of the (full) sorted data sequence.
        left - int the number of leading elements to trim.
        right - int number of trailing elements to trim.
        Returns:
        double
      • variance

        public static double variance​(cern.colt.list.DoubleArrayList data)
        Provided for convenience!
        Parameters:
        data - DoubleArrayList
        Returns:
        double
      • variance

        public static double variance​(int sizeWithoutMissing,
                                      double sum,
                                      double sumOfSquares)
      • weightedMean

        public static double weightedMean​(cern.colt.list.DoubleArrayList data,
                                          cern.colt.list.DoubleArrayList weights)
        Returns the weighted mean of a data sequence. That is Sum (data[i] * weights[i]) / Sum ( weights[i] ).
        Parameters:
        data - DoubleArrayList
        weights - DoubleArrayList
        Returns:
        double
      • winsorizedMean

        public static double winsorizedMean​(cern.colt.list.DoubleArrayList sortedData,
                                            double mean,
                                            int left,
                                            int right)
        Returns the winsorized mean of a sorted data sequence.
        Parameters:
        sortedData - DoubleArrayList, must already be sorted ascending
        mean - the mean of the (full) sorted data sequence.
        left - the number of leading elements to trim. Refers to the number of elements to trim excluding any missing values
        right - the number of trailing elements to trim excluding any missing values
        Returns:
        double