Class Distance


  • public class Distance
    extends Object
    Alternative distance and similarity metrics for vectors.
    Author:
    Paul Pavlidis
    • Constructor Summary

      Constructors 
      Constructor Description
      Distance()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static double correlationOfStandardized​(double[] xe, double[] ye)
      Highly optimized implementation of the Pearson correlation.
      static double correlationOfStandardized​(cern.colt.list.DoubleArrayList x, cern.colt.list.DoubleArrayList y)
      Like correlationofNormedFast, but takes DoubleArrayLists as inputs, handles missing values correctly, and does more error checking.
      static double euclDistance​(cern.colt.list.DoubleArrayList x, cern.colt.list.DoubleArrayList y)
      Calculate the Euclidean distance between two vectors.
      static double manhattanDistance​(cern.colt.list.DoubleArrayList x, cern.colt.list.DoubleArrayList y)
      Calculate the Manhattan distance between two vectors.
      static double spearmanRankCorrelation​(cern.colt.list.DoubleArrayList x)
      Convenience function to compute the rank correlation when we just want to know if the values are "in order".
      static double spearmanRankCorrelation​(cern.colt.list.DoubleArrayList x, cern.colt.list.DoubleArrayList y)
      Spearman Rank Correlation.
    • Constructor Detail

      • Distance

        public Distance()
    • Method Detail

      • correlationOfStandardized

        public static double correlationOfStandardized​(double[] xe,
                                                       double[] ye)
        Highly optimized implementation of the Pearson correlation. The inputs must be standardized - mean zero, variance one, without any missing values.
        Parameters:
        xe - A standardized vector
        ye - A standardized vector
        Returns:
        Pearson correlation coefficient.
      • correlationOfStandardized

        public static double correlationOfStandardized​(cern.colt.list.DoubleArrayList x,
                                                       cern.colt.list.DoubleArrayList y)
        Like correlationofNormedFast, but takes DoubleArrayLists as inputs, handles missing values correctly, and does more error checking. Assumes the data has been converted to z scores already.
        Parameters:
        x - A standardized vector
        y - A standardized vector
        Returns:
        The Pearson correlation between x and y.
      • euclDistance

        public static double euclDistance​(cern.colt.list.DoubleArrayList x,
                                          cern.colt.list.DoubleArrayList y)
        Calculate the Euclidean distance between two vectors.
        Parameters:
        x - DoubleArrayList
        y - DoubleArrayList
        Returns:
        Euclidean distance between x and y
      • manhattanDistance

        public static double manhattanDistance​(cern.colt.list.DoubleArrayList x,
                                               cern.colt.list.DoubleArrayList y)
        Calculate the Manhattan distance between two vectors.
        Parameters:
        x - DoubleArrayList
        y - DoubleArrayList
        Returns:
        Manhattan distance between x and y
      • spearmanRankCorrelation

        public static double spearmanRankCorrelation​(cern.colt.list.DoubleArrayList x)
        Convenience function to compute the rank correlation when we just want to know if the values are "in order". Values in perfect ascending order are a correlation of 1, descending is -1.
        Parameters:
        x -
        Returns:
      • spearmanRankCorrelation

        public static double spearmanRankCorrelation​(cern.colt.list.DoubleArrayList x,
                                                     cern.colt.list.DoubleArrayList y)
        Spearman Rank Correlation. This does the rank transformation of the data. Only mutually non-NaN values are used.
        Parameters:
        x - DoubleArrayList
        y - DoubleArrayList
        Returns:
        Spearman's rank correlation between x and y or NaN if it could not be computed.