public class DescriptiveWithMissing
extends cern.jet.stat.Descriptive
Be careful because some methods from cern.jet.stat.Descriptive have not been overridden and will yield a UnsupportedOperationException if used.
Some functions that come with DoubleArrayLists will not work in an entirely compatible way with missing values. For examples, size() reports the total number of elements, including missing values. To get a count of non-missing values, use this.sizeWithoutMissingValues(). The right one to use may vary.
Not all methods need to be overridden. However, all methods that take a "size" parameter should be passed the results of sizeWithoutMissingValues(data), instead of data.size().
Based in part on code from the colt package: Copyright © 1999 CERN - European Organization for Nuclear Research.
Modifier and Type | Method and Description |
---|---|
static double |
autoCorrelation(cern.colt.list.DoubleArrayList data,
int lag,
double mean,
double variance)
Not supported.
|
static double |
correlation(double[] x,
double[] y,
double[] selfSquaredX,
double[] selfSquaredY,
boolean[] nanStatusX,
boolean[] nanStatusY)
Highly optimized version of the correlation computation, where as much information is precomputed as possible.
|
static double |
correlation(cern.colt.list.DoubleArrayList x,
cern.colt.list.DoubleArrayList y)
Calculate the pearson correlation of two arrays.
|
static double |
correlation(cern.colt.list.DoubleArrayList data1,
double standardDev1,
cern.colt.list.DoubleArrayList data2,
double standardDev2)
Returns the correlation of two data sequences.
|
static double |
covariance(cern.colt.list.DoubleArrayList data1,
cern.colt.list.DoubleArrayList data2)
Returns the SAMPLE covariance of two data sequences.
|
static double |
durbinWatson(cern.colt.list.DoubleArrayList data)
Durbin-Watson computation.
|
static double |
geometricMean(cern.colt.list.DoubleArrayList data)
Returns the geometric mean of a data sequence.
|
static void |
incrementalUpdate(cern.colt.list.DoubleArrayList data,
int from,
int to,
double[] inOut)
Incrementally maintains and updates minimum, maximum, sum and sum of squares of a data sequence.
|
static void |
incrementalUpdateSumsOfPowers(cern.colt.list.DoubleArrayList data,
int from,
int to,
int fromSumIndex,
int toSumIndex,
double[] sumOfPowers)
Not supported.
|
static void |
incrementalWeightedUpdate(cern.colt.list.DoubleArrayList data,
cern.colt.list.DoubleArrayList weights,
int from,
int to,
double[] inOut)
Not supported.
|
static double |
kurtosis(cern.colt.list.DoubleArrayList data,
double mean,
double standardDeviation)
Returns the kurtosis (aka excess) of a data sequence, which is -3 +
moment(data,4,mean) / standardDeviation4.
|
static double |
kurtosis(double moment4,
double standardDeviation)
Returns the kurtosis (aka excess) of a data sequence.
|
static double |
lag1(cern.colt.list.DoubleArrayList data,
double mean)
Not supported.
|
static double |
mad(cern.colt.list.DoubleArrayList dat) |
static double |
max(cern.colt.list.DoubleArrayList input) |
static double |
mean(double[] elements,
int effectiveSize)
Special mean calculation where we use the effective size as an input.
|
static double |
mean(cern.colt.list.DoubleArrayList data) |
static double |
mean(cern.colt.list.DoubleArrayList x,
int effectiveSize)
Special mean calculation where we use the effective size as an input.
|
static double |
meanAboveQuantile(double quantile,
cern.colt.list.DoubleArrayList array)
Calculate the mean of the values above to a particular quantile of an array.
|
static double |
median(cern.colt.list.DoubleArrayList data)
Returns the median.
|
static double |
min(cern.colt.list.DoubleArrayList input) |
static double |
moment(cern.colt.list.DoubleArrayList data,
int k,
double c)
Returns the moment of k -th order with constant c of a data sequence, which is
Sum( (data[i]-c)k ) /
data.size().
|
static double |
product(cern.colt.list.DoubleArrayList data)
Returns the product of a data sequence, which is Prod( data[i] ).
|
static double |
quantile(cern.colt.list.DoubleArrayList data,
double phi)
Returns the phi- quantile; that is, an element elem for which holds that phi percent
of data elements are less than elem.
|
static double |
quantileInverse(cern.colt.list.DoubleArrayList data,
double element)
Returns how many percent of the elements contained in the receiver are <= element.
|
static cern.colt.list.DoubleArrayList |
quantiles(cern.colt.list.DoubleArrayList sortedData,
cern.colt.list.DoubleArrayList percentages)
Returns the quantiles of the specified percentages.
|
static double |
rankInterpolated(cern.colt.list.DoubleArrayList sortedList,
double element)
Returns the linearly interpolated number of elements in a list less or equal to a given element.
|
static cern.colt.list.DoubleArrayList |
removeMissing(cern.colt.list.DoubleArrayList data)
Makes a copy of the list that doesn't have the missing values.
|
static double |
sampleKurtosis(cern.colt.list.DoubleArrayList data,
double mean,
double sampleVariance)
Returns the sample kurtosis (aka excess) of a data sequence.
|
static double |
sampleSkew(cern.colt.list.DoubleArrayList data,
double mean,
double sampleVariance)
Returns the sample skew of a data sequence.
|
static double |
sampleStandardDeviation(int size,
double sampleVariance)
Returns the sample standard deviation.
|
static double |
sampleVariance(cern.colt.list.DoubleArrayList data,
double mean)
Returns the sample variance of a data sequence.
|
static int |
sizeWithoutMissingValues(cern.colt.list.DoubleArrayList list)
Return the size of the list, ignoring missing values.
|
static double |
skew(cern.colt.list.DoubleArrayList data,
double mean,
double standardDeviation)
Returns the skew of a data sequence, which is moment(data,3,mean) /
standardDeviation3.
|
static void |
standardize(cern.colt.list.DoubleArrayList data)
Standardize.
|
static void |
standardize(cern.colt.list.DoubleArrayList data,
double mean,
double standardDeviation)
Modifies a data sequence to be standardized.
|
static double |
sum(cern.colt.list.DoubleArrayList data)
Returns the sum of a data sequence.
|
static double |
sumOfInversions(cern.colt.list.DoubleArrayList data,
int from,
int to)
Returns the sum of inversions of a data sequence, which is Sum( 1.0 /
data[i]).
|
static double |
sumOfLogarithms(cern.colt.list.DoubleArrayList data,
int from,
int to)
Returns the sum of logarithms of a data sequence, which is Sum(
Log(data[i]).
|
static double |
sumOfPowerDeviations(cern.colt.list.DoubleArrayList data,
int k,
double c)
Returns Sum( (data[i]-c)k ); optimized for common parameters like c == 0.0 and/or
k == -2 .. 4.
|
static double |
sumOfPowerDeviations(cern.colt.list.DoubleArrayList data,
int k,
double c,
int from,
int to)
Returns Sum( (data[i]-c)k ) for all i = from ..
|
static double |
sumOfPowers(cern.colt.list.DoubleArrayList data,
int k)
Returns the sum of powers of a data sequence, which is Sum (
data[i]k ).
|
static double |
sumOfSquaredDeviations(cern.colt.list.DoubleArrayList data)
Compute the sum of the squared deviations from the mean of a data sequence.
|
static double |
sumOfSquares(cern.colt.list.DoubleArrayList data)
Returns the sum of squares of a data sequence.
|
static double |
trimmedMean(cern.colt.list.DoubleArrayList sortedData,
double mean,
int left,
int right)
Returns the trimmed mean of a sorted data sequence.
|
static double |
variance(cern.colt.list.DoubleArrayList data)
Provided for convenience!
|
static double |
variance(int sizeWithoutMissing,
double sum,
double sumOfSquares) |
static double |
weightedMean(cern.colt.list.DoubleArrayList data,
cern.colt.list.DoubleArrayList weights)
Returns the weighted mean of a data sequence.
|
static double |
winsorizedMean(cern.colt.list.DoubleArrayList sortedData,
double mean,
int left,
int right)
Returns the winsorized mean of a sorted data sequence.
|
checkRangeFromTo, frequencies, geometricMean, harmonicMean, meanDeviation, moment, pooledMean, pooledVariance, product, rms, sampleKurtosis, sampleKurtosisStandardError, sampleSkew, sampleSkewStandardError, sampleVariance, sampleWeightedVariance, skew, split, standardDeviation, standardError, sumOfSquaredDeviations, variance, weightedRMS
public static double autoCorrelation(cern.colt.list.DoubleArrayList data, int lag, double mean, double variance)
data
- DoubleArrayListlag
- intmean
- doublevariance
- doublepublic static double correlation(double[] x, double[] y, double[] selfSquaredX, double[] selfSquaredY, boolean[] nanStatusX, boolean[] nanStatusY)
Implementation note: In correlation(DoubleArrayList x, DoubleArrayList y), profiling shows that calls to Double.NaN consume half the CPU time. The precomputation of the element-by-element squared values is another obvious optimization. There is also no checking for matching lengths of the arrays.
x
- y
- selfSquaredX
- double array containing values of x_i^2 for each x.selfSquaredY
- nanStatusX
- boolean array containing value of Double.isNaN() for each X.nanStatusY
- public static double correlation(cern.colt.list.DoubleArrayList data1, double standardDev1, cern.colt.list.DoubleArrayList data2, double standardDev2)
data1
- DoubleArrayListstandardDev1
- double - not useddata2
- DoubleArrayListstandardDev2
- double - not usedpublic static double correlation(cern.colt.list.DoubleArrayList x, cern.colt.list.DoubleArrayList y)
x
- DoubleArrayListy
- DoubleArrayListpublic static double covariance(cern.colt.list.DoubleArrayList data1, cern.colt.list.DoubleArrayList data2)
data1
- the first vectordata2
- the second vectorpublic static double durbinWatson(cern.colt.list.DoubleArrayList data)
data
- DoubleArrayListpublic static double geometricMean(cern.colt.list.DoubleArrayList data)
data
- DoubleArrayListpublic static void incrementalUpdate(cern.colt.list.DoubleArrayList data, int from, int to, double[] inOut)
Assume we have already recorded some data sequence elements and know their minimum, maximum, sum and sum of squares. Assume further, we are to record some more elements and to derive updated values of minimum, maximum, sum and sum of squares. This method computes those updated values without needing to know the already recorded elements. This is interesting for interactive online monitoring and/or applications that cannot keep the entire huge data sequence in memory.
data
- the additional elements to be incorporated into min, max, etc.from
- the index of the first element within data to consider.to
- the index of the last element within data to consider. The method incorporates elements
data[from], ..., data[to].inOut
- the old values in the following format:
public static void incrementalUpdateSumsOfPowers(cern.colt.list.DoubleArrayList data, int from, int to, int fromSumIndex, int toSumIndex, double[] sumOfPowers)
data
- DoubleArrayListfrom
- intto
- intfromSumIndex
- inttoSumIndex
- intsumOfPowers
- double[]public static void incrementalWeightedUpdate(cern.colt.list.DoubleArrayList data, cern.colt.list.DoubleArrayList weights, int from, int to, double[] inOut)
data
- DoubleArrayListweights
- DoubleArrayListfrom
- intto
- intinOut
- double[]public static double kurtosis(double moment4, double standardDeviation)
moment4
- the fourth central moment, which is moment(data,4,mean).standardDeviation
- the standardDeviation.public static double kurtosis(cern.colt.list.DoubleArrayList data, double mean, double standardDeviation)
data
- DoubleArrayListmean
- doublestandardDeviation
- doublepublic static double lag1(cern.colt.list.DoubleArrayList data, double mean)
data
- DoubleArrayListmean
- doublepublic static double mad(cern.colt.list.DoubleArrayList dat)
dat
- public static double max(cern.colt.list.DoubleArrayList input)
public static double mean(double[] elements, int effectiveSize)
elements
- The data double array.effectiveSize
- The effective size used for the mean calculation.public static double mean(cern.colt.list.DoubleArrayList data)
data
- Values to be analyzed.public static double mean(cern.colt.list.DoubleArrayList x, int effectiveSize)
x
- The dataeffectiveSize
- The effective size used for the mean calculation.public static double meanAboveQuantile(double quantile, cern.colt.list.DoubleArrayList array)
quantile
- A value from 0 to 1array
- Array for which we want to get the quantile.public static double median(cern.colt.list.DoubleArrayList data)
data
- the data sequence, does not have to be sorted.public static double min(cern.colt.list.DoubleArrayList input)
public static double moment(cern.colt.list.DoubleArrayList data, int k, double c)
data
- DoubleArrayListk
- intc
- doublepublic static double product(cern.colt.list.DoubleArrayList data)
data
- DoubleArrayListpublic static double quantile(cern.colt.list.DoubleArrayList data, double phi)
data
- the data sequence, does not have to be sorted.phi
- the percentage; must satisfy 0 <= phi <= 1.public static double quantileInverse(cern.colt.list.DoubleArrayList data, double element)
data
- the list to be searchedelement
- the element to search for.public static cern.colt.list.DoubleArrayList quantiles(cern.colt.list.DoubleArrayList sortedData, cern.colt.list.DoubleArrayList percentages)
sortedData
- the data sequence; must be sorted ascending .percentages
- the percentages for which quantiles are to be computed. Each percentage must be in the
interval [0.0,1.0].public static double rankInterpolated(cern.colt.list.DoubleArrayList sortedList, double element)
sortedList
- the list to be searched (must be sorted ascending).element
- the element to search for.public static cern.colt.list.DoubleArrayList removeMissing(cern.colt.list.DoubleArrayList data)
data
- DoubleArrayListpublic static double sampleKurtosis(cern.colt.list.DoubleArrayList data, double mean, double sampleVariance)
data
- DoubleArrayListmean
- doublesampleVariance
- doublepublic static double sampleSkew(cern.colt.list.DoubleArrayList data, double mean, double sampleVariance)
data
- DoubleArrayListmean
- doublesampleVariance
- doublepublic static double sampleStandardDeviation(int size, double sampleVariance)
This is included for compatibility with the superclass, but does not implement the correction used there.
size
- the number of elements of the data sequence.sampleVariance
- the sample variance .Descriptive.sampleStandardDeviation(int, double)
public static double sampleVariance(cern.colt.list.DoubleArrayList data, double mean)
data
- DoubleArrayListmean
- doublepublic static int sizeWithoutMissingValues(cern.colt.list.DoubleArrayList list)
list
- DoubleArrayListpublic static double skew(cern.colt.list.DoubleArrayList data, double mean, double standardDeviation)
data
- DoubleArrayListmean
- doublestandardDeviation
- doublepublic static void standardize(cern.colt.list.DoubleArrayList data)
data
- DoubleArrayListpublic static void standardize(cern.colt.list.DoubleArrayList data, double mean, double standardDeviation)
data
- DoubleArrayListmean
- mean of datastandardDeviation
- stdev of datapublic static double sum(cern.colt.list.DoubleArrayList data)
data
- DoubleArrayListpublic static double sumOfInversions(cern.colt.list.DoubleArrayList data, int from, int to)
data
- the data sequence.from
- the index of the first data element (inclusive).to
- the index of the last data element (inclusive).public static double sumOfLogarithms(cern.colt.list.DoubleArrayList data, int from, int to)
data
- the data sequence.from
- the index of the first data element (inclusive).to
- the index of the last data element (inclusive).public static double sumOfPowerDeviations(cern.colt.list.DoubleArrayList data, int k, double c)
data
- DoubleArrayListk
- intc
- doublepublic static double sumOfPowerDeviations(cern.colt.list.DoubleArrayList data, int k, double c, int from, int to)
data
- DoubleArrayListk
- intc
- doublefrom
- intto
- intpublic static double sumOfPowers(cern.colt.list.DoubleArrayList data, int k)
data
- DoubleArrayListk
- intpublic static double sumOfSquaredDeviations(cern.colt.list.DoubleArrayList data)
data
- DoubleArrayListpublic static double sumOfSquares(cern.colt.list.DoubleArrayList data)
data
- DoubleArrayListpublic static double trimmedMean(cern.colt.list.DoubleArrayList sortedData, double mean, int left, int right)
sortedData
- the data sequence; must be sorted ascending .mean
- the mean of the (full) sorted data sequence.left
- int the number of leading elements to trim.right
- int number of trailing elements to trim.public static double variance(cern.colt.list.DoubleArrayList data)
data
- DoubleArrayListpublic static double variance(int sizeWithoutMissing, double sum, double sumOfSquares)
public static double weightedMean(cern.colt.list.DoubleArrayList data, cern.colt.list.DoubleArrayList weights)
data
- DoubleArrayListweights
- DoubleArrayListpublic static double winsorizedMean(cern.colt.list.DoubleArrayList sortedData, double mean, int left, int right)
sortedData
- DoubleArrayList, must already be sorted ascendingmean
- the mean of the (full) sorted data sequence.left
- the number of leading elements to trim. Refers to the number of elements to trim
excluding any missing valuesright
- the number of trailing elements to trim excluding any missing valuesCopyright © 2003–2022 UBC Michael Smith Laboratories. All rights reserved.