Class StringDistance

java.lang.Object
ubic.basecode.math.StringDistance

public class StringDistance extends Object
Author:
pavlidis
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    static int
    The edit distance counts the differences between two strings, where we would count a difference not only when strings have different characters but also when one has a character whereas the other does not.
    static int
    The Hamming distance H is defined only for strings of the same length.
    static double
    Compute the Hamming distance between two strings of equal length (if they are of unequal length the longer one is trimmed), giving higher weight to characters at the start of the strings, so strings that are similar at the starts are given higher scores (shorter distances) than strings that are similar at the ends.
    static double
     

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • StringDistance

      public StringDistance()
  • Method Details

    • editDistance

      public static int editDistance(String s, String t)
      The edit distance counts the differences between two strings, where we would count a difference not only when strings have different characters but also when one has a character whereas the other does not. The formal definition follows.

      For a string s, let s(i) stand for its ith character. For two characters a and b, define r(a, b) = 0 if a = b. Let r(a, b) = 1, otherwise.

      Assume we are given two strings s and t of length n and m, respectively. We are going to fill an (n+1) x (m+1) array d with integers such that the low right corner element d(n+1, m+1) will furnish the required values of the Levenshtein distance L(s, t).

      The definition of entries of d is recursive. First set d(i, 0) = i, i = 0, 1,..., n, and d(0, j) = j, j = 0, 1, ..., m. For other pairs i, j use (1) d(i, j) = min(d(i-1, j)+1, d(i, j-1)+1, d(i-1, j-1) + r(s(i), t(j)))

      That last step is also described as:

    • Set cell d[i,j] of the matrix equal to the minimum of:
      a. The cell immediately above plus 1: d[i-1,j] + 1.
      b. The cell immediately to the left plus 1: d[i,j-1] + 1.
      c. The cell diagonally above and to the left plus the cost: d[i-1,j-1] + cost.

      (Description partly cribbed from http://www.cut-the-knot.org/do_you_know/Strings.shtml and http://www.merriampark.com/ld.htm)

    • Parameters:
      a -
      b -
      Returns:
    • hammingDistance

      public static int hammingDistance(String a, String b)
      The Hamming distance H is defined only for strings of the same length. For two strings s and t, H(s, t) is the number of places in which the two string differ, i.e., have different characters.
      Parameters:
      a -
      b -
      Returns:
    • prefixWeightedHammingDistance

      public static double prefixWeightedHammingDistance(String s, String t, double weight)
      Compute the Hamming distance between two strings of equal length (if they are of unequal length the longer one is trimmed), giving higher weight to characters at the start of the strings, so strings that are similar at the starts are given higher scores (shorter distances) than strings that are similar at the ends.
      Parameters:
      s -
      t -
      weight - Controls how quickly (normalized by the length of the string) the bias towards the front drops off. The penalty for mismatches drops off linearly for the fraction of the String represented by weight. A weight of 1.0 indicates the bias should be linear across the length of the string. Intermediate values (e.g., 0.25) mean that differences beyond the first 1/4 of the string results in no penalty.
      Returns:
    • suffixWeightedHammingDistance

      public static double suffixWeightedHammingDistance(String s, String t, double weight)
      Parameters:
      s -
      t -
      weight -
      Returns:
      See Also: