Class LevenshteinDistance
java.lang.Object
org.apache.commons.text.similarity.LevenshteinDistance
- All Implemented Interfaces:
BiFunction<CharSequence,
,CharSequence, Integer> EditDistance<Integer>
,ObjectSimilarityScore<CharSequence,
,Integer> SimilarityScore<Integer>
An algorithm for measuring the difference between two character sequences.
This is the number of changes needed to change one sequence into another, where each change is a single character modification (deletion, insertion or substitution).
This code has been adapted from Apache Commons Lang 3.3.
- Since:
- 1.0
-
Constructor Summary
ConstructorDescriptionDeprecated.LevenshteinDistance
(Integer threshold) If the threshold is not null, distance calculations will be limited to a maximum length. -
Method Summary
Modifier and TypeMethodDescriptionapply
(CharSequence left, CharSequence right) Computes the Levenshtein distance between two Strings.<E> Integer
apply
(SimilarityInput<E> left, SimilarityInput<E> right) Computes the Levenshtein distance between two inputs.static LevenshteinDistance
Gets the default instance.Gets the distance threshold.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface java.util.function.BiFunction
andThen
-
Constructor Details
-
LevenshteinDistance
Deprecated.UsegetDefaultInstance()
.This returns the default instance that uses a version of the algorithm that does not use a threshold parameter.- See Also:
-
LevenshteinDistance
If the threshold is not null, distance calculations will be limited to a maximum length. If the threshold is null, the unlimited version of the algorithm will be used.- Parameters:
threshold
- If this is null then distances calculations will not be limited. This may not be negative.
-
-
Method Details
-
getDefaultInstance
Gets the default instance.- Returns:
- The default instance
-
apply
Computes the Levenshtein distance between two Strings.A higher score indicates a greater distance.
The previous implementation of the Levenshtein distance algorithm was from http://www.merriampark.com/ld.htm
Chas Emerick has written an implementation in Java, which avoids an OutOfMemoryError which can occur when my Java implementation is used with very large strings.
This implementation of the Levenshtein distance algorithm is from http://www.merriampark.com/ldjava.htmdistance.apply(null, *) = IllegalArgumentException distance.apply(*, null) = IllegalArgumentException distance.apply("","") = 0 distance.apply("","a") = 1 distance.apply("aaapppp", "") = 7 distance.apply("frog", "fog") = 1 distance.apply("fly", "ant") = 3 distance.apply("elephant", "hippo") = 7 distance.apply("hippo", "elephant") = 7 distance.apply("hippo", "zzzzzzzz") = 8 distance.apply("hello", "hallo") = 1
- Specified by:
apply
in interfaceBiFunction<CharSequence,
CharSequence, Integer> - Specified by:
apply
in interfaceObjectSimilarityScore<CharSequence,
Integer> - Specified by:
apply
in interfaceSimilarityScore<Integer>
- Parameters:
left
- the first input, must not be nullright
- the second input, must not be null- Returns:
- result distance, or -1
- Throws:
IllegalArgumentException
- if either String inputnull
-
apply
Computes the Levenshtein distance between two inputs.A higher score indicates a greater distance.
distance.apply(null, *) = IllegalArgumentException distance.apply(*, null) = IllegalArgumentException distance.apply("","") = 0 distance.apply("","a") = 1 distance.apply("aaapppp", "") = 7 distance.apply("frog", "fog") = 1 distance.apply("fly", "ant") = 3 distance.apply("elephant", "hippo") = 7 distance.apply("hippo", "elephant") = 7 distance.apply("hippo", "zzzzzzzz") = 8 distance.apply("hello", "hallo") = 1
- Type Parameters:
E
- The type of similarity score unit.- Parameters:
left
- the first input, must not be null.right
- the second input, must not be null.- Returns:
- result distance, or -1.
- Throws:
IllegalArgumentException
- if either String inputnull
.- Since:
- 1.13.0
-
getThreshold
Gets the distance threshold.- Returns:
- The distance threshold
-
getDefaultInstance()
.