Class StringDistance
- java.lang.Object
-
- ubic.basecode.math.StringDistance
-
public class StringDistance extends Object
- Author:
- pavlidis
-
-
Constructor Summary
Constructors Constructor Description StringDistance()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static int
editDistance(String s, String t)
The edit distance counts the differences between two strings, where we would count a difference not only when strings have different characters but also when one has a character whereas the other does not.static int
hammingDistance(String a, String b)
The Hamming distance H is defined only for strings of the same length.static double
prefixWeightedHammingDistance(String s, String t, double weight)
Compute the Hamming distance between two strings of equal length (if they are of unequal length the longer one is trimmed), giving higher weight to characters at the start of the strings, so strings that are similar at the starts are given higher scores (shorter distances) than strings that are similar at the ends.static double
suffixWeightedHammingDistance(String s, String t, double weight)
-
-
-
Method Detail
-
editDistance
public static int editDistance(String s, String t)
The edit distance counts the differences between two strings, where we would count a difference not only when strings have different characters but also when one has a character whereas the other does not. The formal definition follows.For a string s, let s(i) stand for its ith character. For two characters a and b, define r(a, b) = 0 if a = b. Let r(a, b) = 1, otherwise.
Assume we are given two strings s and t of length n and m, respectively. We are going to fill an (n+1) x (m+1) array d with integers such that the low right corner element d(n+1, m+1) will furnish the required values of the Levenshtein distance L(s, t).
The definition of entries of d is recursive. First set d(i, 0) = i, i = 0, 1,..., n, and d(0, j) = j, j = 0, 1, ..., m. For other pairs i, j use (1) d(i, j) = min(d(i-1, j)+1, d(i, j-1)+1, d(i-1, j-1) + r(s(i), t(j)))
That last step is also described as:
- Set cell d[i,j] of the matrix equal to the minimum of:
a. The cell immediately above plus 1: d[i-1,j] + 1.
b. The cell immediately to the left plus 1: d[i,j-1] + 1.
c. The cell diagonally above and to the left plus the cost: d[i-1,j-1] + cost.
(Description partly cribbed from http://www.cut-the-knot.org/do_you_know/Strings.shtml and http://www.merriampark.com/ld.htm)
- Parameters:
a
-b
-- Returns:
- Set cell d[i,j] of the matrix equal to the minimum of:
-
hammingDistance
public static int hammingDistance(String a, String b)
The Hamming distance H is defined only for strings of the same length. For two strings s and t, H(s, t) is the number of places in which the two string differ, i.e., have different characters.- Parameters:
a
-b
-- Returns:
-
prefixWeightedHammingDistance
public static double prefixWeightedHammingDistance(String s, String t, double weight)
Compute the Hamming distance between two strings of equal length (if they are of unequal length the longer one is trimmed), giving higher weight to characters at the start of the strings, so strings that are similar at the starts are given higher scores (shorter distances) than strings that are similar at the ends.- Parameters:
s
-t
-weight
- Controls how quickly (normalized by the length of the string) the bias towards the front drops off. The penalty for mismatches drops off linearly for the fraction of the String represented byweight
. A weight of 1.0 indicates the bias should be linear across the length of the string. Intermediate values (e.g., 0.25) mean that differences beyond the first 1/4 of the string results in no penalty.- Returns:
-
suffixWeightedHammingDistance
public static double suffixWeightedHammingDistance(String s, String t, double weight)
- Parameters:
s
-t
-weight
-- Returns:
- See Also:
prefixWeightedHammingDistance(java.lang.String,java.lang.String,double)
-
-