Skeyer gets a Jenkins server

Since my primary laptop was under repair, here is a quick update for the last couple of weeks:

I wrote a bunch of tests and benchmarks for Skeyer and set up a Jenkins server to run them for me on every git push. Since I couldn’t get the BitBucket’s Jenkins hook to work, I manually added a POST hook like this:

http://<username>:<user authentication key>@<jenkins server address>/job/<project name>/build?token=<build token>

I even picked up a few WebDev skills to write a shiny UI for the Test Reports, generated from QTestlib’s xml files (I tried looking for the existing tools, but none of them seem to be working):


Yep. I actually manually created the test cases for the top 1000 words in the english Dictionary.
I probably should clean this code up and release a QTestlib Reports tool. Later on….



By the end of this week, I hope to stop working on the “Engine” part. So next week I can actually fix the Maliit plugin to make it ready for an alpha build.


Cheers! :)

Skeyer gets a Jenkins server

Under the hood of Skeyer…

Here comes a little insight (and bragging!) into the internals of Skeyer – at least the way it works as of now…

If i give you a grid of keys – a fairly general assumption about most keyboards, and ask you to swipe the word “hey” on it, what would you do? You start at the H key then swipe till you reach the E key and then swipe till you reach the Y key. If you were to do that on your QWERTY keyboard, the keys you would have most probably pressed are : “HGFTRERTY”. Lets call this swipeHint of a word…

If you were to write a program to compute this, it would probably look like:

function swipeHint( word )
    if(word.length < 2)
        return word
    var result = ''
    for( var i = 0; i < word.length - 1; i++ ){
        result += word[i] + keys_between( word[i], word[i+1] )
    result += word[word.length-1]
    return result

Pretty straight forward isn’t it? Now, the problem Skeyer was trying to solve boils down to that of a dictionary search for the input pattern. Instead of searching for words, we search for swipeHint(words), and then return the words related to the swipeHints.

I tried using an existing spelling corrector (hunspell) for this task… But it was of no use. Its main problems were:

  1. It was VERY slow… (Notice the VERY in caps!)
  2. It had no clue about what words are more “probable” than others.. Some one told me that I use the word “like” a lot. So if i mistype that as “loke”, it was suggesting me “luke” instead of “like”.
  3. It was very noisy… It had no clue about the layout of the keyboard, so it ended up suggesting many words which are totally unrelated to the word you swiped/intended to type. for eg. if i mistyped “meat” and typed “meay”, it was suggesting me “mean” instead of “meat”.

So I dug around on the internet a little and found Norvig’s Spell Checker, which was basically unusuable if you want to find words which differ by more than 3 characters. So, I wrote a very straightforward implementation of such a matcher myself, something along the lines of:

function findSimilarWords( input, dictionary, count )
    var results = []

    for(var word in dictionary){
        // Bayes Theorem
        // Our dictionary can also provide probabilit(word), based on the user's usage...
        var score = similarity( input, swipeHint(word) )*probability(word)
        if( score > threshold )
            results.add(word, score)
    return results

Now the question is: How do we compute similarity( string1, string2 )?

Once again, the internet to the rescue and I found: Edit distanceNikita’s blog explains it a lot better.

I implemented an editDistance method whose cost function for substitution of two characters returns the distance between the characters you are trying to substitute on the keyboard. The only problem was that it was too.. slow.

The dynamic programming approach for it was an O(n*m) algorithm, where n, m are the lengths of patterns we are trying to compare.
To put things into perspective, my dictionary has 150,000 words. And an average word length of 30. Even if i were to search 10% of it, (you know… words starting wtih the same character as the first character of the input), that would be: 15000*30*30 = 13,500,000 operations, which is definitely waaay too much for your little phone.

Now a series of neat little tricks to make this work:

  1. Since we are checking for a threshold, as per Nikita’s blog, I stopped computing similarity after finding the threshold number of differences. So the similarity(string1, string2), now became O(k*min(m,n)). If we are looking for words not differing by more than 50%, for an average word length of 30, k would be 15, and the total number of operations would be 6750000. That’s still waaay too much.
  2. We are interested in finding no more than 5 words (Even that’s too much if you ask me, but lets keep that at 5 for now). So If you already found 5 words whose score is 0.8, there is no point computing the similarity of the input with words whose probability of occurance is 50% as score = probability*similarity.So skip computing the edit distance for a word, if it’s probability < minScore(result). For the word “beautiful”, whose swipeHint is “bvgfdrewsasdfghyuyttyuiuhgfgyuiol”, this brings down the cost to around 2700000.
  3. As we already know what the minimum score our word needs to enter the results, why still keep k at 50% of the input length? why not less? we are computing edit distance for not more than k differences anyway

    // we know that
    similarity = 1.0 - editDistance( string1, string2, k )/maxLength(string1,string2)

    // and that
    score = similarity(input, word, k)*probability(word)

    // As we want a score > results' minScore
    k = min( k, ( maxLength(input, word)*( 1.0 - results.minScore()/probability(word) ) ) )

    This brings down the cost of “beautiful” to ~1700000 operations (900msec) …

  4. Okay, so even that’s too expensive. On my 8 year old laptop with Core 2 Duo processor, that gives me results in about 900msec… and I assume it would take the same amount of time on your Jolla/Ubuntu touch devices…You don’t want to swipe 1 word per second do you? If the word lengths differ by more than k, it is obvious their edit distance would be more than k. So skip all the words whose lengths differ by more than k. As it turns out this helps, but not so much.“beautiful” now costs ~1500000 operations (700msec).
  5. I tried looking at other distance metrics for quickly estimating the similarity of two strings. The first nice one was Hamming Distance ( ), but sadly it only applies to strings of equal length. So I tried computing Hamming Distance for the “signature” of two strings. i.e, if the a string A has the alphabets “C”, “A”, “T”, it’s
    signature = 1 << index(C) | 1 << index(A) | 1 << index(T)
    As it turns out this absolutely sucks as a measure of similarity for two normal strings. But the insight this gave me was that if the number of characters of each alphabet in the two strings differs by more than k, then their editDistance would obviously be more than k, so don’t bother computing edit distance for them… this can be done in O(n).
    “beautiful” now costs about about 132370 operations or ~101msec, with a lazy implementation of this, which isn’t too bad for a brute force matcher don’t you think? From 5 secs to 101 msec without too much effort :D


Now, I m getting back to some more… less interesting tasks from my Todo list for Skeyer.. Trello

Btw. I am still looking for a job, So if you know anyone who is hiring for a role that you think suits me ( My Resume ), please let me know/let them know :)

Under the hood of Skeyer…

Skeyer gets a maliit plugin – Part 1


Hopefully, this image shall be a video for the next blogpost
Skeyer gets a maliit plugin – Part 1

Last week I started refactoring Skeyer into libskeyer and the demo.The main reason to do this is to:

1) Easily write automated tests to benchmark the performance and precision of Skeyer’s algorithm(s): I’ve realized I need a more objective way to measure the performance and precision of Skeyer than to manually swipe a few words and look at the suggestions and make a wild guess if it is good or not.

2) I have also started writing a Maliit plugin for Skeyer(based on the examples from Maliit-framework): I’ve realized it is important to build the maliit plugin first. That way, I’ll have better idea of constraints on Skeyer, based on my usage of it on a real device. And of course to easily show off Skeyer to everyone. A real world application seems to be much more…. interesting… than a tech demo.

However, this plugin is still a work in progress and needs a lot more work to be usable. Right now I seem to be facing weird flickering issues with the plugin window and can’t even interact with it. I have no clue where to even look for clues. So my immediate plan of action(for this weekend/next weekend) is to first finish off the Maliit plugin. (I’m looking at the Ubuntu Touch/Open WebOS keyboards for inspiration, help, code. Both of them use Maliit.)


I have also started looking at implementing a very interesting new feature for Skeyer, based on the way I’ve been using my Nokia N9 lately (Yup. No other phone seems to beat the N9 experience for me so far). I think this feature will make Skeyer a lot more useful. But I need to really build this feature first to test it’s worth. So I will talk more about this once I have something to show(off? :) ).

To recap, my immediate priorities now are:

1) Finish off the maliit plugin (and dig more into the Ubuntu touch/Open WebOS virtual keyboard).

2) Write the tests/automated benchmarks.

3) Start looking into implementing the interesting feature for Skeyer. I’m thinking of releasing(AND open sourcing) Skeyer, once I have a set of tests and benchmarks. Hmm.. looks like it’s going to be busy weekends for a couple of months :)



Update: grrrrr… My hard disk crashed and I lost the work I did on the Refactoring and the Maliit plugin I was working on. So I am having to redo it again. So as usual “expect the delays”…

Skeyer gets a maliit plugin – Part 1

Skeyer gets a facelift

Work In Progress

1) Implemented a basic User Dictionary functionality

2) Cleaned up the Keyboard file format. I think I need to do this once more.

3) Experimented with a few more custom dictionaries generated from stardict dictionaries. The prediction performance is now satisfactory.

4) Slight improvements to the word prediction algorithm based on the perplexity. The prediction precision in general has improved quite a lot, except in the case of short words for which you have to swipe a long track (like “tfdsasdfghjijnbhgt” -> “taint”) where it is now abysmal.

5) The facelift for the UI has started. Something tells me this will be a lot trickier to get right though.

Next Tasks: ——————–

1) Implement the long press and extended keypress popup functionality for the keys.

2) Annoy Maliit folks to help me compile the Maliit plugin. (Update: DONE. Thanks to bfederau, I got to compile and run Maliit on my desktop.) – Now time to develop the Maliit plugin.

3) Fix the annoyances with the Prediction functionality.

4) Implement the state machine to take care of actual text input.

5) Finish off the UI facelift

Skeyer gets a facelift