This project is read-only.
1
Vote

Issues with specific algorithms

description

Thank you so much for sharing this library with the community. I wanted to report the following issues with the current code base.
Background: I created a test project that performs an ApproximatelyEquals using a single FuzzyStringComparison option to isolate the performance of each algorithm and found the following issues.
  1. Jaro Distance algorithm is always returning true
    (e.g. "Larry" vs "Larry" return true , "Larry" vs "xxxxx" also returns true
  2. Levenshtein Distance always returns false
    (e.g. "Larry" vs Larry returns false.
  3. Tanimoto Coefficient always returns false
    (e.g. "Larry" vs Larry returns false.
  4. Jaro Winkler appears to have a bug.
    It appears that that string with a greater similarity are returning values closer to 1
    Current Code Base in JarwoWinklerDistance is 
         return jaroDistance + (commonPrefixLength * 0.1 * (1 - jaroDistance));
    
    When I modified Line 16 in JaroWinklerDistance.cs to be the following, the results were better
    
    return 1 - (jaroDistance + (commonPrefixLength * 0.1 * (1 - jaroDistance)));
---- Results of my program


Hamming Distance: Larry Larry True
Jaccard Distance: Larry Larry True
Jaro Distance: Larry Larry True
Jaro Winkler Distance: Larry Larry True
Levenshtein Distance: Larry Larry False
LongestCommonSubsequence Distance: Larry Larry True
LongestCommonSubstringDistance: Larry Larry True
Overlap Coefficient Distance: Larry Larry True
RatcliffObershelpSimilarity: Larry Larry True
SorensenDiceDistance: Larry Larry True

TanimotoCoefficient: Larry Larry False

Hamming Distance: Larry Larre True
Jaccard Distance: Larry Larre True
Jaro Distance: Larry Larre True
Jaro Winkler Distance: Larry Larre True
Levenshtein Distance: Larry Larre False
LongestCommonSubsequence Distance: Larry Larre True
LongestCommonSubstringDistance: Larry Larre True
Overlap Coefficient Distance: Larry Larre True
RatcliffObershelpSimilarity: Larry Larre True
SorensenDiceDistance: Larry Larre True

TanimotoCoefficient: Larry Larre False

Hamming Distance: Larry Larey True
Jaccard Distance: Larry Larey True
Jaro Distance: Larry Larey True
Jaro Winkler Distance: Larry Larey True
Levenshtein Distance: Larry Larey False
LongestCommonSubsequence Distance: Larry Larey True
LongestCommonSubstringDistance: Larry Larey True
Overlap Coefficient Distance: Larry Larey True
RatcliffObershelpSimilarity: Larry Larey True
SorensenDiceDistance: Larry Larey True

TanimotoCoefficient: Larry Larey False

Hamming Distance: Larry Laree True
Jaccard Distance: Larry Laree True
Jaro Distance: Larry Laree True
Jaro Winkler Distance: Larry Laree True
Levenshtein Distance: Larry Laree False
LongestCommonSubsequence Distance: Larry Laree True
LongestCommonSubstringDistance: Larry Laree True
Overlap Coefficient Distance: Larry Laree True
RatcliffObershelpSimilarity: Larry Laree True
SorensenDiceDistance: Larry Laree True

TanimotoCoefficient: Larry Laree False

Hamming Distance: Larry Lavee True
Jaccard Distance: Larry Lavee False
Jaro Distance: Larry Lavee True
Jaro Winkler Distance: Larry Lavee False
Levenshtein Distance: Larry Lavee False
LongestCommonSubsequence Distance: Larry Lavee False
LongestCommonSubstringDistance: Larry Lavee False
Overlap Coefficient Distance: Larry Lavee False
RatcliffObershelpSimilarity: Larry Lavee False
SorensenDiceDistance: Larry Lavee False

TanimotoCoefficient: Larry Lavee False

Hamming Distance: Larry Levee True
Jaccard Distance: Larry Levee False
Jaro Distance: Larry Levee True
Jaro Winkler Distance: Larry Levee False
Levenshtein Distance: Larry Levee False
LongestCommonSubsequence Distance: Larry Levee False
LongestCommonSubstringDistance: Larry Levee False
Overlap Coefficient Distance: Larry Levee False
RatcliffObershelpSimilarity: Larry Levee False
SorensenDiceDistance: Larry Levee False

TanimotoCoefficient: Larry Levee False

Hamming Distance: Barry Levee False
Jaccard Distance: Barry Levee False
Jaro Distance: Barry Levee True
Jaro Winkler Distance: Barry Levee False
Levenshtein Distance: Barry Levee False
LongestCommonSubsequence Distance: Barry Levee False
LongestCommonSubstringDistance: Barry Levee False
Overlap Coefficient Distance: Barry Levee False
RatcliffObershelpSimilarity: Barry Levee False
SorensenDiceDistance: Barry Levee False

TanimotoCoefficient: Barry Levee False

Hamming Distance: Barry Lavee True
Jaccard Distance: Barry Lavee False
Jaro Distance: Barry Lavee True
Jaro Winkler Distance: Barry Lavee False
Levenshtein Distance: Barry Lavee False
LongestCommonSubsequence Distance: Barry Lavee False
LongestCommonSubstringDistance: Barry Lavee False
Overlap Coefficient Distance: Barry Lavee False
RatcliffObershelpSimilarity: Barry Lavee False
SorensenDiceDistance: Barry Lavee False

TanimotoCoefficient: Barry Lavee False

Hamming Distance: Barry Bave False
Jaccard Distance: Barry Bave False
Jaro Distance: Barry Bave True
Jaro Winkler Distance: Barry Bave False
Levenshtein Distance: Barry Bave False
LongestCommonSubsequence Distance: Barry Bave False
LongestCommonSubstringDistance: Barry Bave False
Overlap Coefficient Distance: Barry Bave False
RatcliffObershelpSimilarity: Barry Bave False
SorensenDiceDistance: Barry Bave False

TanimotoCoefficient: Barry Bave False

Hamming Distance: Larry xxxxx False
Jaccard Distance: Larry xxxxx False
Jaro Distance: Larry xxxxx True
Jaro Winkler Distance: Larry xxxxx False
Levenshtein Distance: Larry xxxxx False
LongestCommonSubsequence Distance: Larry xxxxx False
LongestCommonSubstringDistance: Larry xxxxx False
Overlap Coefficient Distance: Larry xxxxx False
RatcliffObershelpSimilarity: Larry xxxxx False
SorensenDiceDistance: Larry xxxxx False

TanimotoCoefficient: Larry xxxxx False

file attachments

comments