CodePlexProject Hosting for Open Source Software

1

Vote
Thank you so much for sharing this library with the community. I wanted to report the following issues with the current code base.

Background: I created a test project that performs an ApproximatelyEquals using a single FuzzyStringComparison option to isolate the performance of each algorithm and found the following issues.

Hamming Distance: Larry Larry True

Jaccard Distance: Larry Larry True

Jaro Distance: Larry Larry True

Jaro Winkler Distance: Larry Larry True

Levenshtein Distance: Larry Larry False

LongestCommonSubsequence Distance: Larry Larry True

LongestCommonSubstringDistance: Larry Larry True

Overlap Coefficient Distance: Larry Larry True

RatcliffObershelpSimilarity: Larry Larry True

SorensenDiceDistance: Larry Larry True

## TanimotoCoefficient: Larry Larry False

Hamming Distance: Larry Larre True

Jaccard Distance: Larry Larre True

Jaro Distance: Larry Larre True

Jaro Winkler Distance: Larry Larre True

Levenshtein Distance: Larry Larre False

LongestCommonSubsequence Distance: Larry Larre True

LongestCommonSubstringDistance: Larry Larre True

Overlap Coefficient Distance: Larry Larre True

RatcliffObershelpSimilarity: Larry Larre True

SorensenDiceDistance: Larry Larre True

## TanimotoCoefficient: Larry Larre False

Hamming Distance: Larry Larey True

Jaccard Distance: Larry Larey True

Jaro Distance: Larry Larey True

Jaro Winkler Distance: Larry Larey True

Levenshtein Distance: Larry Larey False

LongestCommonSubsequence Distance: Larry Larey True

LongestCommonSubstringDistance: Larry Larey True

Overlap Coefficient Distance: Larry Larey True

RatcliffObershelpSimilarity: Larry Larey True

SorensenDiceDistance: Larry Larey True

## TanimotoCoefficient: Larry Larey False

Hamming Distance: Larry Laree True

Jaccard Distance: Larry Laree True

Jaro Distance: Larry Laree True

Jaro Winkler Distance: Larry Laree True

Levenshtein Distance: Larry Laree False

LongestCommonSubsequence Distance: Larry Laree True

LongestCommonSubstringDistance: Larry Laree True

Overlap Coefficient Distance: Larry Laree True

RatcliffObershelpSimilarity: Larry Laree True

SorensenDiceDistance: Larry Laree True

## TanimotoCoefficient: Larry Laree False

Hamming Distance: Larry Lavee True

Jaccard Distance: Larry Lavee False

Jaro Distance: Larry Lavee True

Jaro Winkler Distance: Larry Lavee False

Levenshtein Distance: Larry Lavee False

LongestCommonSubsequence Distance: Larry Lavee False

LongestCommonSubstringDistance: Larry Lavee False

Overlap Coefficient Distance: Larry Lavee False

RatcliffObershelpSimilarity: Larry Lavee False

SorensenDiceDistance: Larry Lavee False

## TanimotoCoefficient: Larry Lavee False

Hamming Distance: Larry Levee True

Jaccard Distance: Larry Levee False

Jaro Distance: Larry Levee True

Jaro Winkler Distance: Larry Levee False

Levenshtein Distance: Larry Levee False

LongestCommonSubsequence Distance: Larry Levee False

LongestCommonSubstringDistance: Larry Levee False

Overlap Coefficient Distance: Larry Levee False

RatcliffObershelpSimilarity: Larry Levee False

SorensenDiceDistance: Larry Levee False

## TanimotoCoefficient: Larry Levee False

Hamming Distance: Barry Levee False

Jaccard Distance: Barry Levee False

Jaro Distance: Barry Levee True

Jaro Winkler Distance: Barry Levee False

Levenshtein Distance: Barry Levee False

LongestCommonSubsequence Distance: Barry Levee False

LongestCommonSubstringDistance: Barry Levee False

Overlap Coefficient Distance: Barry Levee False

RatcliffObershelpSimilarity: Barry Levee False

SorensenDiceDistance: Barry Levee False

## TanimotoCoefficient: Barry Levee False

Hamming Distance: Barry Lavee True

Jaccard Distance: Barry Lavee False

Jaro Distance: Barry Lavee True

Jaro Winkler Distance: Barry Lavee False

Levenshtein Distance: Barry Lavee False

LongestCommonSubsequence Distance: Barry Lavee False

LongestCommonSubstringDistance: Barry Lavee False

Overlap Coefficient Distance: Barry Lavee False

RatcliffObershelpSimilarity: Barry Lavee False

SorensenDiceDistance: Barry Lavee False

## TanimotoCoefficient: Barry Lavee False

Hamming Distance: Barry Bave False

Jaccard Distance: Barry Bave False

Jaro Distance: Barry Bave True

Jaro Winkler Distance: Barry Bave False

Levenshtein Distance: Barry Bave False

LongestCommonSubsequence Distance: Barry Bave False

LongestCommonSubstringDistance: Barry Bave False

Overlap Coefficient Distance: Barry Bave False

RatcliffObershelpSimilarity: Barry Bave False

SorensenDiceDistance: Barry Bave False

## TanimotoCoefficient: Barry Bave False

Hamming Distance: Larry xxxxx False

Jaccard Distance: Larry xxxxx False

Jaro Distance: Larry xxxxx True

Jaro Winkler Distance: Larry xxxxx False

Levenshtein Distance: Larry xxxxx False

LongestCommonSubsequence Distance: Larry xxxxx False

LongestCommonSubstringDistance: Larry xxxxx False

Overlap Coefficient Distance: Larry xxxxx False

RatcliffObershelpSimilarity: Larry xxxxx False

SorensenDiceDistance: Larry xxxxx False

## TanimotoCoefficient: Larry xxxxx False

Background: I created a test project that performs an ApproximatelyEquals using a single FuzzyStringComparison option to isolate the performance of each algorithm and found the following issues.

- Jaro Distance algorithm is always returning true

(e.g. "Larry" vs "Larry" return true , "Larry" vs "xxxxx" also returns true

- Levenshtein Distance always returns false

(e.g. "Larry" vs Larry returns false.

- Tanimoto Coefficient always returns false

(e.g. "Larry" vs Larry returns false.

- Jaro Winkler appears to have a bug.

It appears that that string with a greater similarity are returning values closer to 1

return 1 - (jaroDistance + (commonPrefixLength * 0.1 * (1 - jaroDistance)));`Current Code Base in JarwoWinklerDistance is return jaroDistance + (commonPrefixLength * 0.1 * (1 - jaroDistance)); When I modified Line 16 in JaroWinklerDistance.cs to be the following, the results were better`

Hamming Distance: Larry Larry True

Jaccard Distance: Larry Larry True

Jaro Distance: Larry Larry True

Jaro Winkler Distance: Larry Larry True

Levenshtein Distance: Larry Larry False

LongestCommonSubsequence Distance: Larry Larry True

LongestCommonSubstringDistance: Larry Larry True

Overlap Coefficient Distance: Larry Larry True

RatcliffObershelpSimilarity: Larry Larry True

SorensenDiceDistance: Larry Larry True

Jaccard Distance: Larry Larre True

Jaro Distance: Larry Larre True

Jaro Winkler Distance: Larry Larre True

Levenshtein Distance: Larry Larre False

LongestCommonSubsequence Distance: Larry Larre True

LongestCommonSubstringDistance: Larry Larre True

Overlap Coefficient Distance: Larry Larre True

RatcliffObershelpSimilarity: Larry Larre True

SorensenDiceDistance: Larry Larre True

Jaccard Distance: Larry Larey True

Jaro Distance: Larry Larey True

Jaro Winkler Distance: Larry Larey True

Levenshtein Distance: Larry Larey False

LongestCommonSubsequence Distance: Larry Larey True

LongestCommonSubstringDistance: Larry Larey True

Overlap Coefficient Distance: Larry Larey True

RatcliffObershelpSimilarity: Larry Larey True

SorensenDiceDistance: Larry Larey True

Jaccard Distance: Larry Laree True

Jaro Distance: Larry Laree True

Jaro Winkler Distance: Larry Laree True

Levenshtein Distance: Larry Laree False

LongestCommonSubsequence Distance: Larry Laree True

LongestCommonSubstringDistance: Larry Laree True

Overlap Coefficient Distance: Larry Laree True

RatcliffObershelpSimilarity: Larry Laree True

SorensenDiceDistance: Larry Laree True

Jaccard Distance: Larry Lavee False

Jaro Distance: Larry Lavee True

Jaro Winkler Distance: Larry Lavee False

Levenshtein Distance: Larry Lavee False

LongestCommonSubsequence Distance: Larry Lavee False

LongestCommonSubstringDistance: Larry Lavee False

Overlap Coefficient Distance: Larry Lavee False

RatcliffObershelpSimilarity: Larry Lavee False

SorensenDiceDistance: Larry Lavee False

Jaccard Distance: Larry Levee False

Jaro Distance: Larry Levee True

Jaro Winkler Distance: Larry Levee False

Levenshtein Distance: Larry Levee False

LongestCommonSubsequence Distance: Larry Levee False

LongestCommonSubstringDistance: Larry Levee False

Overlap Coefficient Distance: Larry Levee False

RatcliffObershelpSimilarity: Larry Levee False

SorensenDiceDistance: Larry Levee False

Jaccard Distance: Barry Levee False

Jaro Distance: Barry Levee True

Jaro Winkler Distance: Barry Levee False

Levenshtein Distance: Barry Levee False

LongestCommonSubsequence Distance: Barry Levee False

LongestCommonSubstringDistance: Barry Levee False

Overlap Coefficient Distance: Barry Levee False

RatcliffObershelpSimilarity: Barry Levee False

SorensenDiceDistance: Barry Levee False

Jaccard Distance: Barry Lavee False

Jaro Distance: Barry Lavee True

Jaro Winkler Distance: Barry Lavee False

Levenshtein Distance: Barry Lavee False

LongestCommonSubsequence Distance: Barry Lavee False

LongestCommonSubstringDistance: Barry Lavee False

Overlap Coefficient Distance: Barry Lavee False

RatcliffObershelpSimilarity: Barry Lavee False

SorensenDiceDistance: Barry Lavee False

Jaccard Distance: Barry Bave False

Jaro Distance: Barry Bave True

Jaro Winkler Distance: Barry Bave False

Levenshtein Distance: Barry Bave False

LongestCommonSubsequence Distance: Barry Bave False

LongestCommonSubstringDistance: Barry Bave False

Overlap Coefficient Distance: Barry Bave False

RatcliffObershelpSimilarity: Barry Bave False

SorensenDiceDistance: Barry Bave False

Jaccard Distance: Larry xxxxx False

Jaro Distance: Larry xxxxx True

Jaro Winkler Distance: Larry xxxxx False

Levenshtein Distance: Larry xxxxx False

LongestCommonSubsequence Distance: Larry xxxxx False

LongestCommonSubstringDistance: Larry xxxxx False

Overlap Coefficient Distance: Larry xxxxx False

RatcliffObershelpSimilarity: Larry xxxxx False

SorensenDiceDistance: Larry xxxxx False

## comments