Class JaroWinklerDistance

java.lang.Object
org.apache.commons.text.similarity.JaroWinklerDistance
All Implemented Interfaces:
BiFunction<CharSequence,CharSequence,Double>, EditDistance<Double>, ObjectSimilarityScore<CharSequence,Double>, SimilarityScore<Double>

public class JaroWinklerDistance extends Object implements EditDistance<Double>
Measures the Jaro-Winkler distance of two character sequences. It is the complementary of Jaro-Winkler similarity.
Since:
1.0
  • Field Details

  • Constructor Details

  • Method Details

    • matches

      @Deprecated protected static int[] matches(CharSequence first, CharSequence second)
      Deprecated.
      Deprecated as of 1.7. This method will be removed in 2.0, and moved to a Jaro Winkler similarity class. TODO see TEXT-104.
      Computes the Jaro-Winkler string matches, half transpositions, prefix array.
      Parameters:
      first - the first string to be matched.
      second - the second string to be matched.
      Returns:
      array containing: matches, half transpositions, and prefix
    • apply

      public Double apply(CharSequence left, CharSequence right)
      Computes the Jaro Winkler Distance between two character sequences.
       distance.apply(null, null)          = IllegalArgumentException
       distance.apply("foo", null)         = IllegalArgumentException
       distance.apply(null, "foo")         = IllegalArgumentException
       distance.apply("", "")              = 0.0
       distance.apply("foo", "foo")        = 0.0
       distance.apply("foo", "foo ")       = 0.06
       distance.apply("foo", "foo  ")      = 0.09
       distance.apply("foo", " foo ")      = 0.13
       distance.apply("foo", "  foo")      = 0.49
       distance.apply("", "a")             = 1.0
       distance.apply("aaapppp", "")       = 1.0
       distance.apply("frog", "fog")       = 0.07
       distance.apply("fly", "ant")        = 1.0
       distance.apply("elephant", "hippo") = 0.56
       distance.apply("hippo", "elephant") = 0.56
       distance.apply("hippo", "zzzzzzzz") = 1.0
       distance.apply("hello", "hallo")    = 0.12
       distance.apply("ABC Corporation", "ABC Corp") = 0.09
       distance.apply("D N H Enterprises Inc", "D & H Enterprises, Inc.") = 0.05
       distance.apply("My Gym Children's Fitness Center", "My Gym. Childrens Fitness") = 0.08
       distance.apply("PENNSYLVANIA", "PENNCISYLVNIA") = 0.12
       
      Specified by:
      apply in interface BiFunction<CharSequence,CharSequence,Double>
      Specified by:
      apply in interface ObjectSimilarityScore<CharSequence,Double>
      Specified by:
      apply in interface SimilarityScore<Double>
      Parameters:
      left - the first input, must not be null.
      right - the second input, must not be null.
      Returns:
      result distance.
      Throws:
      IllegalArgumentException - if either CharSequence input is null
    • apply

      public <E> Double apply(SimilarityInput<E> left, SimilarityInput<E> right)
      Computes the Jaro Winkler Distance between two character sequences.
       distance.apply(null, null)          = IllegalArgumentException
       distance.apply("foo", null)         = IllegalArgumentException
       distance.apply(null, "foo")         = IllegalArgumentException
       distance.apply("", "")              = 0.0
       distance.apply("foo", "foo")        = 0.0
       distance.apply("foo", "foo ")       = 0.06
       distance.apply("foo", "foo  ")      = 0.09
       distance.apply("foo", " foo ")      = 0.13
       distance.apply("foo", "  foo")      = 0.49
       distance.apply("", "a")             = 1.0
       distance.apply("aaapppp", "")       = 1.0
       distance.apply("frog", "fog")       = 0.07
       distance.apply("fly", "ant")        = 1.0
       distance.apply("elephant", "hippo") = 0.56
       distance.apply("hippo", "elephant") = 0.56
       distance.apply("hippo", "zzzzzzzz") = 1.0
       distance.apply("hello", "hallo")    = 0.12
       distance.apply("ABC Corporation", "ABC Corp") = 0.09
       distance.apply("D N H Enterprises Inc", "D & H Enterprises, Inc.") = 0.05
       distance.apply("My Gym Children's Fitness Center", "My Gym. Childrens Fitness") = 0.08
       distance.apply("PENNSYLVANIA", "PENNCISYLVNIA") = 0.12
       
      Type Parameters:
      E - The type of similarity score unit.
      Parameters:
      left - the first input, must not be null.
      right - the second input, must not be null.
      Returns:
      result distance.
      Throws:
      IllegalArgumentException - if either CharSequence input is null.
      Since:
      1.13.0