Under the hood of Skeyer…

Here comes a little insight (and bragging!) into the internals of Skeyer – at least the way it works as of now…

If i give you a grid of keys – a fairly general assumption about most keyboards, and ask you to swipe the word “hey” on it, what would you do? You start at the H key then swipe till you reach the E key and then swipe till you reach the Y key. If you were to do that on your QWERTY keyboard, the keys you would have most probably pressed are : “HGFTRERTY”. Lets call this swipeHint of a word…

If you were to write a program to compute this, it would probably look like:


function swipeHint( word )
{
    if(word.length < 2)
        return word
    var result = ''
    for( var i = 0; i < word.length - 1; i++ ){
        result += word[i] + keys_between( word[i], word[i+1] )
    }
    result += word[word.length-1]
    return result
}

Pretty straight forward isn’t it? Now, the problem Skeyer was trying to solve boils down to that of a dictionary search for the input pattern. Instead of searching for words, we search for swipeHint(words), and then return the words related to the swipeHints.

I tried using an existing spelling corrector (hunspell) for this task… But it was of no use. Its main problems were:

  1. It was VERY slow… (Notice the VERY in caps!)
  2. It had no clue about what words are more “probable” than others.. Some one told me that I use the word “like” a lot. So if i mistype that as “loke”, it was suggesting me “luke” instead of “like”.
  3. It was very noisy… It had no clue about the layout of the keyboard, so it ended up suggesting many words which are totally unrelated to the word you swiped/intended to type. for eg. if i mistyped “meat” and typed “meay”, it was suggesting me “mean” instead of “meat”.

So I dug around on the internet a little and found Norvig’s Spell Checker, which was basically unusuable if you want to find words which differ by more than 3 characters. So, I wrote a very straightforward implementation of such a matcher myself, something along the lines of:


function findSimilarWords( input, dictionary, count )
{
    var results = []

    for(var word in dictionary){
        // Bayes Theorem
        // Our dictionary can also provide probabilit(word), based on the user's usage...
        var score = similarity( input, swipeHint(word) )*probability(word)
        if( score > threshold )
            results.add(word, score)
      }
    return results
}

Now the question is: How do we compute similarity( string1, string2 )?

Once again, the internet to the rescue and I found: Edit distanceNikita’s blog explains it a lot better.

I implemented an editDistance method whose cost function for substitution of two characters returns the distance between the characters you are trying to substitute on the keyboard. The only problem was that it was too.. slow.

The dynamic programming approach for it was an O(n*m) algorithm, where n, m are the lengths of patterns we are trying to compare.
To put things into perspective, my dictionary has 150,000 words. And an average word length of 30. Even if i were to search 10% of it, (you know… words starting wtih the same character as the first character of the input), that would be: 15000*30*30 = 13,500,000 operations, which is definitely waaay too much for your little phone.

Now a series of neat little tricks to make this work:

  1. Since we are checking for a threshold, as per Nikita’s blog, I stopped computing similarity after finding the threshold number of differences. So the similarity(string1, string2), now became O(k*min(m,n)). If we are looking for words not differing by more than 50%, for an average word length of 30, k would be 15, and the total number of operations would be 6750000. That’s still waaay too much.
  2. We are interested in finding no more than 5 words (Even that’s too much if you ask me, but lets keep that at 5 for now). So If you already found 5 words whose score is 0.8, there is no point computing the similarity of the input with words whose probability of occurance is 50% as score = probability*similarity.So skip computing the edit distance for a word, if it’s probability < minScore(result). For the word “beautiful”, whose swipeHint is “bvgfdrewsasdfghyuyttyuiuhgfgyuiol”, this brings down the cost to around 2700000.
  3. As we already know what the minimum score our word needs to enter the results, why still keep k at 50% of the input length? why not less? we are computing edit distance for not more than k differences anyway

    // we know that
    similarity = 1.0 - editDistance( string1, string2, k )/maxLength(string1,string2)


    // and that
    score = similarity(input, word, k)*probability(word)


    // As we want a score > results' minScore
    k = min( k, ( maxLength(input, word)*( 1.0 - results.minScore()/probability(word) ) ) )

    This brings down the cost of “beautiful” to ~1700000 operations (900msec) …

  4. Okay, so even that’s too expensive. On my 8 year old laptop with Core 2 Duo processor, that gives me results in about 900msec… and I assume it would take the same amount of time on your Jolla/Ubuntu touch devices…You don’t want to swipe 1 word per second do you? If the word lengths differ by more than k, it is obvious their edit distance would be more than k. So skip all the words whose lengths differ by more than k. As it turns out this helps, but not so much.“beautiful” now costs ~1500000 operations (700msec).
  5. I tried looking at other distance metrics for quickly estimating the similarity of two strings. The first nice one was Hamming Distance ( http://en.wikipedia.org/wiki/Hamming_distance ), but sadly it only applies to strings of equal length. So I tried computing Hamming Distance for the “signature” of two strings. i.e, if the a string A has the alphabets “C”, “A”, “T”, it’s
    signature = 1 << index(C) | 1 << index(A) | 1 << index(T)
    As it turns out this absolutely sucks as a measure of similarity for two normal strings. But the insight this gave me was that if the number of characters of each alphabet in the two strings differs by more than k, then their editDistance would obviously be more than k, so don’t bother computing edit distance for them… this can be done in O(n).
    “beautiful” now costs about about 132370 operations or ~101msec, with a lazy implementation of this, which isn’t too bad for a brute force matcher don’t you think? From 5 secs to 101 msec without too much effort πŸ˜€

 

Now, I m getting back to some more… less interesting tasks from my Todo list for Skeyer.. Trello

Btw. I am still looking for a job, So if you know anyone who is hiring for a role that you think suits me ( My Resume ), please let me know/let them know πŸ™‚

Advertisements
Under the hood of Skeyer…

7 thoughts on “Under the hood of Skeyer…

    1. Hey,

      That sounds roughly similar to one of the very first edit distance methods I wrote (before all these optimizations), where the edit distance itself was assigning costs to deletions and substitutions using rules like
      if(keys_between(destination[j], destination[j+1]).contains(source[i])){ substitutionCost = 0; deletionCost = 0;} It was very slow because of the string.contains( character ) operation – which was O(n). I thought I could get away with using hunspell or at least something based on tries. So I Didn’t investigate it further. Next week I am planning on experimenting with various curvy swipeHints, Maybe I should try this again with bloom filters or something. Thanks for the tip πŸ™‚

      Another thing I did try was computing the similarity between the curve a user draws and the curve one needs to draw for a given word ( as described in http://stackoverflow.com/questions/6418042/how-do-i-calculate-the-difference-between-two-sequences-of-points ). This was a lot faster – O(n), but the precision was abysmal and it was even harder to debug. I think I mainly gave up on it because:
      1) The noise in the curves i’ve traced was a lot harder to eliminate – sharp edges – what do they mean?. There are too many ways to draw a curve…
      2) It was also harder to deal with repeating characters and when users deliberately drawing weirder curves to skip a key to get different suggestions : Think of swiping “poiuyt” for “pot” and “poijhyt” for “pit” instead of “put”. so dealing with strings seemed more intuitive because it gives us a feedback of all the characters we’ve inputted.

  1. Hello.

    Thx for your hard work on skeyer. One question about FOSS – would you consider making the skeyer available to android users via fdroid?

    Would be much appreciated if you consider it.
    Regards.

    1. Hey, I am afraid Android is a bit too beyond the current scope for me, at least at the moment.

      On the bright side, I just started refactoring skeyer into a separate libskeyer ( Which the Ubuntu touch folks wanted to try out for their spelling correction ), so if you want to develop the Android port of it, I would be more than willing to help πŸ™‚

      Skeyer already reads from Android’s dictionary files and here is a *very* unoptimized demo of skeyer engine i was testing on android phones, if you are interested:

      https://www.dropbox.com/s/t8mw09rrxoqx9l1/skeyer-demo.apk?dl=0

  2. Have you considered scanning the other way around ?

    – Currently, you use *every single letter* that’s on key under the gesture, and start with the logic:
    “we have all the letters that form a word here, plus some extra”
    (i.e.: you need to *remove* some combination of letters from the hint to find an existing word)

    – You could to the opposite, you use every *stop position* (i.e.: every position on the curve where the direction change a lot). The hint of “hello” becomes “helo”.
    The logic is now reverse: “the word we’re looking for has these stop letters plus some thrown in the middle” (those who where on the trajectory).
    The advantage is that you can represent the hint as a regular expression (^h.*e.*l.*o$) which can be pretty fast on the dictionary (specially if you using some non-backtracking trie-based approach, or even more so if you make your own simplified search).
    Then you’ll only need to compute scores/assess probability of the few filtered candidates.

    You won’t get that many words as with the current approach, but because of speed it might be possible to run this in a first pass (does this return words frequent enough ?) and fall back to the slower method in case more/more frequent words are needed. (i.e.: “^m.*e.*a.*y$” is only going to return either highly unlikely words or words which don’t even match the full path. Time to do a more exhaustive search on the dictionary.)

    You can see it as a more advanced form of your naive approach. Instead of filtering down to all words which begin with the same letter (like your 10% explanation above), you use an even narrower pre-filter (first letter and last letter the same, and in the middle should also contains letter where the swyping made a turn, defined as an abrupt change of angle of direction in the curve, in the order in which such turn happened).

  3. Hey Dryak,

    Sorry for the delayed reply. Have been very busy ( and my nice and long earlier reply to you got lost when the browser crashed ).

    First and foremost, thanks a ton for “Instead of filtering down to all words which begin with the same letter (like your 10% explanation above), you use an even narrower pre-filter (first letter and last letter the same…)”. Now I am checking even if the last character of the input is a neighbour of the last character of the word, and it brought down the worst case time of the top 1100 words from 75ms to 25ms. ( For not having to compute swipeHints, editDistance etc.. ) The average look up time didn’t change much though, but I think this can be speeded up a little further. I am surprised I missed this one out!.

    Now the details about why I had to go down via. this path:
    I first tried matching the curve drawn by the user with the curve needed to draw a word. I tried various interpolations and what not, but it absolutely sucked. Possible reasons were, the input was too “jittery”, ( i tried smoothing things using average, but that didnt work well either.. ). And debugging was a nightmare, because visually we look at characters while swiping, and don’t care much for the path.

    The reason I wanted try it without having to analyze the curve for “stop points” is because there were a lot of complexities that simply added unnecessary complexity. There are many things that affect the “amount of change” for a key, and keeping track of them simply seemed complex. Like the curve changes a lot for some keys than others. (middle vs. corner keys for instance), the speed of swiping etc.. And things like “draw a loop to repeat a character” became harder to implement. Also http://git.tuxfamily.org/okboard/ was implementing it that way By counting a lot of parameters, so I chose to explore it without having to analyze the curve a lot. The code is a lot simpler, and seems to work quite well too.

    I also tried implementing tries – But the problem is that there were simply too many nodes and the thing hogged memory like anything. ( WordTrie = 404644 nodes, SwipeHintTrie = 1125085 nodes => 100+ MB for just the engine).

    I do want to come back to tries after releasing some basic version though. Maybe using PATRICIA trie/succinct data structures. But all that only after I figure out the issue I am currently stuck at – how to add new words, and how to update the dictionary based on what words user types, so that it goes well with the android dictionary I am using.

    Btw. β€œwe have all the letters that form a word here, plus some extra” is not necessarily true. The user might miss out some needed characters, the user might swipe a word’s neighbour and so on. Hence I had to use all 3 operations of the levenstein’s distance ( insert, delete, substitute ) to compute the distance.

    Feel free to check out the code and send me a pull request,
    Thanks a ton πŸ™‚

    1. DrYak says:

      > I tried various interpolations and what not, but it absolutely sucked.

      One very simple rule:

      – find all the keys where the path entered and exited on the same side. Chances are the finger made a turn while in the key.
      – use this key as a 3rd criteria in addition to first and last letter to weed out possibilities.

      (i.e.: depending on how the user traced the path, when swiping “Hello”, the sequence “RER” might appear in the middle of the path “HgtRERtzhj…”. You know that the user wanted to trace something that not only has a “H” at the begining and a “O” at the end, but probably has a “E” somewhere in the middle. Thus you can exclude “hypo” from the list – or only consider it on a second pass, when none of the “H*E*O” words match).

      > Feel free to check out the code and send me a pull request,

      Actually, I *am* interested into playing with gesture based inputs. But I’m more interested in trying to find a way to resuscitate old school PalmOS’s Graffiti 1 hand writing :

      – It’s a tiny bit more costly in term of gesture:
      — it’s an alphabet, so basically 1 hand gesture per letter. “HELLO” is five signs.
      — Whereas swype links letters together, so in worst case it’s “letters – 1”. “HI” is one gesture. “HAD” is 2 gestures.
      — And best case can be less gestures if several letters are on the same path “HELLO” is only three gestures as both “L” are on the same trajectory.

      – But it requires 0 aiming. It’s all relative motions. It could be done without even looking at the screen. Fitt’s law rules!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s