fuzzystring Discussions Rss Feedhttps://fuzzystring.codeplex.com/discussionsfuzzystring Discussions Rss DescriptionNew Post: Comparing Nameshttps://fuzzystring.codeplex.com/discussions/662719<div style="line-height: normal;">So I glanced through each of the links for the matching algorithms and it was all so complicated. Can someone tell me which would be best just to compare names that may have been mispelled? Each name would consist of a first name, space, and a last name. E.G. "Mike Jones". I would like to return the top(maybe 10) matching names. Even possibly giving preference to matching the first letter; I saw one of the algorithms had that option.<br />
</div>dmikester1Wed, 26 Apr 2017 21:09:22 GMTNew Post: Comparing Names 20170426090922PNew Post: Minor Mod To ApproximatelyEqualshttps://fuzzystring.codeplex.com/discussions/662609<div style="line-height: normal;">In the course of things, I ended up using LongestCommonSubstring with an empty string as one of the arguments.
<br />
<br />
The routine returned a null value which promptly blew up when adding to the comparison results in ApproximatelyEquals (line 66 in my version).
<br />
<br />
I have changed my version to handle this more gracefully:<br />
<pre><code>comparisonResults.Add(1 - Convert.ToDouble((source.LongestCommonSubstring(target)?.Length ?? 1) / Convert.ToDouble(Math.Min(source.Length, target.Length))));
</code></pre>
</div>RMCXWed, 19 Apr 2017 15:51:24 GMTNew Post: Minor Mod To ApproximatelyEquals 20170419035124PNew Post: The LevenshteinDistance is not working ( also neighter of the Jaro, which uses the same algorithm )https://fuzzystring.codeplex.com/discussions/649568<div style="line-height: normal;"><pre><code> return Math.Min(Math.Min(LevenshteinDistance(source.Substring(0, source.Length - 1), target) + 1,
LevenshteinDistance(source, target.Substring(0, target.Length - 1))) + 1,
LevenshteinDistance(source.Substring(0, source.Length - 1), target.Substring(0, target.Length - 1)) + distance);
</code></pre>
..will overflow stack.
<br />
<br />
the update posted on Oct 21, 2015 is fixing it. <br />
</div>koolinoorMon, 04 Jan 2016 14:38:41 GMTNew Post: The LevenshteinDistance is not working ( also neighter of the Jaro, which uses the same algorithm ) 20160104023841PNew Post: Levenshtein Distance speeduphttps://fuzzystring.codeplex.com/discussions/646462<div style="line-height: normal;">I've implemented this library to sort and filter a search, however searches for more than 5 character took multiple seconds.
<br />
As most of the algorithms used here rely on the levenshtein distance I've replaced the given implementation with the reduced memory useage version from here: <a href="https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#C.23" rel="nofollow">https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#C.23</a>
<br />
<br />
(Just replace the contents of the function, everything works fine then)
<br />
<br />
With this implementation the search returns instantly.<br />
<div style="color:Black;background-color:White;"><pre>
<span style="color:Blue;">public</span> <span style="color:Blue;">int</span> LevenshteinDistance(<span style="color:Blue;">string</span> source, <span style="color:Blue;">string</span> target){
<span style="color:Blue;">if</span>(String.IsNullOrEmpty(source)){
<span style="color:Blue;">if</span>(String.IsNullOrEmpty(target)) <span style="color:Blue;">return</span> 0;
<span style="color:Blue;">return</span> target.Length;
}
<span style="color:Blue;">if</span>(String.IsNullOrEmpty(target)) <span style="color:Blue;">return</span> source.Length;
<span style="color:Blue;">if</span>(source.Length > target.Length){
<span style="color:Blue;">var</span> temp = target;
target = source;
source = temp;
}
<span style="color:Blue;">var</span> m = target.Length;
<span style="color:Blue;">var</span> n = source.Length;
<span style="color:Blue;">var</span> distance = <span style="color:Blue;">new</span> <span style="color:Blue;">int</span>[2, m + 1];
<span style="color:Green;">// Initialize the distance 'matrix'</span>
<span style="color:Blue;">for</span>(<span style="color:Blue;">var</span> j = 1; j <= m; j++) distance[0, j] = j;
<span style="color:Blue;">var</span> currentRow = 0;
<span style="color:Blue;">for</span>(<span style="color:Blue;">var</span> i = 1; i <= n; ++i){
currentRow = i & 1;
distance[currentRow, 0] = i;
<span style="color:Blue;">var</span> previousRow = currentRow ^ 1;
<span style="color:Blue;">for</span>(<span style="color:Blue;">var</span> j = 1; j <= m; j++){
<span style="color:Blue;">var</span> cost = (target[j - 1] == source[i - 1] ? 0 : 1);
distance[currentRow, j] = Math.Min(Math.Min(
distance[previousRow, j] + 1,
distance[currentRow, j - 1] + 1),
distance[previousRow, j - 1] + cost);
}
}
<span style="color:Blue;">return</span> distance[currentRow, m];
}
</pre></div></div>MarvinPohlWed, 21 Oct 2015 10:41:16 GMTNew Post: Levenshtein Distance speedup 20151021104116ANew Post: fuzzy numbershttps://fuzzystring.codeplex.com/discussions/566956<div style="line-height: normal;">Without knowing what you are trying to achieve, here is some general advice
<br />
<br />
<a href="http://en.wikipedia.org/wiki/String_metric" rel="nofollow">http://en.wikipedia.org/wiki/String_metric</a>
<br />
<br />
<a href="http://en.wikipedia.org/wiki/Edit_distance" rel="nofollow">http://en.wikipedia.org/wiki/Edit_distance</a>
<br />
<br />
<a href="http://en.wikipedia.org/wiki/Levenshtein_distance" rel="nofollow">http://en.wikipedia.org/wiki/Levenshtein_distance</a> (see section 3 in particular)
<br />
<br />
Taking your second example {123456789, 123465789} The strings differ by 1 transposition, so if you compared the two using the Damerauâ€“Levenshtein distance algorithm your edit distance would be 1. Using Levenshtein your edit distance would be 2. Damerau-Levenshtein would be better at finding two numbers that have digits swapped around. Levenshtein is good at finding two numbers that only differ by n digits. Longest Substring would help find numbers with the largest matching sections (may be useful if these were phone numbers for example).<br />
</div>jumpjack96Tue, 24 Feb 2015 14:47:56 GMTNew Post: fuzzy numbers 20150224024756PNew Post: fuzzy numbershttps://fuzzystring.codeplex.com/discussions/566956<div style="line-height: normal;">Which algorithm is most suited for comparing numbers<br />
<pre><code>ie. 123456789
123456189
or
123456789
123465789
or
123456789
128456789
or
123456789
12345678
etc</code></pre>
</div>paultaitWed, 10 Sep 2014 06:39:55 GMTNew Post: fuzzy numbers 20140910063955ANew Post: Overlap Coefficienthttp://fuzzystring.codeplex.com/discussions/546851<div style="line-height: normal;">An update to this.<br />
<br />
In looking at your code, <br />
<br />
Intersect provides only DISTINCT characters in string. For example, the following code:<br />
<br />
string sStringA = "CHRIS KNOWLES";<br />
string sStringB = "CHRIS KNOWLES";<br />
<br />
sStringA.Intersect(sStringB) yields<br />
"C", "H", "R", "I", "S", "K", "N", "O", "W", "L", "E", " "<br />
so <br />
sStringA.Intersect(sStringB).Count() yields 12 - MIN of the lengths of the source strings is 13.<br />
<br />
Which never lets you get to the subset containment rule because you never get a length equal unless all of the characters in the compared sets are distinct ... highly unlikely.<br />
</div>ChrisKnowlesWed, 28 May 2014 22:17:18 GMTNew Post: Overlap Coefficient 20140528101718PNew Post: Overlap Coefficienthttp://fuzzystring.codeplex.com/discussions/546851<div style="line-height: normal;">According to the Wikipedia link you gave for Overlap Coefficient, if either set is a complete subset of the other, the coefficient should be 1. That does not appear to be represented in this code.
<br />
<br />
Thanks.<br />
</div>ChrisKnowlesWed, 28 May 2014 19:01:28 GMTNew Post: Overlap Coefficient 20140528070128P