Looking over the code, I noticed a bug in operations.cs. ListBigrams (which should be ListBiGrams to be consistent with the others) ListTriGrams and ListNGrams all seem to be written on the belief that the parameters to string.Substring() are are startIndex and endIndex. However, they are actually startIndex and length. so instead of
           bigrams.Add(source.Substring(i, i + 1));
it should be
           bigrams.Add(source.Substring(i, 2));
Actually, if we start be implementing the fix in ListNGrams:
                ngrams.Add(source.Substring(i, n));
Then we can reduce the first two to calls to the third:
    public static List<string> ListBiGrams(this string source) 
                          { return ListNGrams(source, 2);}

    public static List<string> ListTriGrams(this string source)
                          { return ListNGrams(source, 3);}
Closed May 10, 2015 at 3:30 PM by kevinjones


wrote May 10, 2015 at 3:30 PM

Resolved with changeset 95787: Added proposed changes from Work Item 11051.