This project is read-only.
3
Vote

LevenshteinDistance Goes to Infinite loop

description

Simple call as following goes to infinite loop
ComparisonMetrics.LevenshteinDistance("ROBERTMELANSON", "ROBERTOMELANSON")

comments

andremilack wrote Nov 26, 2014 at 4:12 PM

I guess it's not an infinite loop actually. The Levenshtein distance is being calculated by its definition, which is extremely inefficient due to recursion and recalculation of same result several times, resulting in an exponential order of complexity. In the Wikipedia page, there's another algorithm to calculate the Levenshtein distance pretty much faster (see "Iterative with two matrix rows"), whose order of complexity is n². I would implement it in C# like this:
public static int LevenshteinDistance(this string source, string target)
{
    int[,] rows = new int[2, source.Length + 1];    // 2D array containing two rows of the matrix
    int current = 0;    // Index of the row that is currently being calculated
    int previous;       // Index of the row that was calculated in the previous iteration
    int i, j = 0;

    // Fills the first row with 0, 1, 2, 3...
    while (j <= source.Length) rows[0, j] = j++;

    // For each subsequent row in the matrix:
    for (i = 1; i <= target.Length; i++)
    {
        current = i % 2;        // Finds the index of the row to be calculated in the current iteration
        previous = current ^ 1; // The row calculated in the previous iteration is the other one
        rows[current, 0] = i;   // First column = 0, 1, 2, 3...

        // For each column of the matrix:
        for (j = 1; j <= source.Length; j++)
        {
            int cost = source[j - 1] == target[i - 1] ? 0 : 1;

            rows[current, j] = Math.Min(Math.Min(
                    rows[previous, j] + 1,
                    rows[current, j - 1] + 1),
                    rows[previous, j - 1] + cost);
        }
    }

    return rows[current, source.Length];    // Returns the element at the last column of the last row
}

lihbea wrote Nov 26, 2014 at 5:08 PM

thanks, andremilack.
actually, i did find a similar one. comparing same two strings takes less than 1 second!

lihbea wrote Dec 15, 2014 at 3:27 PM

BTW, regarding andremilack's comment "I guess it's not an infinite loop actually": It may not be "an infinite loop actually", but it runs 20 minutes without return! I definitely consider that as an infinite loop no matter what.

wrote May 10, 2015 at 5:36 PM

wrote Jul 24, 2015 at 10:19 PM

fdelgadob09 wrote Feb 26, 2016 at 6:41 PM

I agree with lihbea, I ran the library from 8pm to 8am and simply I had no results. But you can correct this problem changing the recursive function to this:
        public static int LevenshteinDistance(this string source, string target)
        {
            if (source.Length == 0) { return target.Length; }
            if (target.Length == 0) { return source.Length; }

            int distance = 0;

            if (source[source.Length - 1] == target[target.Length - 1]) { distance = 0; }
            else { distance = 1; }

            return Math.Min(
                    Math.Min(LevenshteinDistance(source.Substring(0, source.Length - 1), target) + 1,
                             LevenshteinDistance(source, target.Substring(0, target.Length - 1)) + 1),
                    LevenshteinDistance(source.Substring(0, source.Length - 1), target.Substring(0, target.Length - 1)) + distance);
        }
It really works! ;)