Sikuli – for all those hard to reach places!

Ever had the scenario in automated testing, that you had something to automate that really didn’t fit any of your tools? Something that was as bristly as an Echidnea?

Normally I try and steer away from anything that doesn’t use standard protocols and/or interfaces. Things like Flash, Silverlight and others. Not that there aren’t test tools that handle these things but it’s just that it’s messy to say the least. For open standards like HTML or SOAP there are gazillions of ways to automate.

So I got surprised by having to test an application on -or should I rather say through- Citrix.

In order to understand what the issue is you have to know a little more about Citrix. Citrix is basically a specialised form of Microsofts Remote Desktop Protocol (RDP). Instead of running an application on your desktop it gets executed on a serve(farm) and then just visually transported back to your desktop. The difference to RDP is, that it actually acts like an application. You cannot really see that it is not running locally. The integration to your desktop is seamless. RDP, in comparison, would show you the whole desktop on the remote server.

From a test tool perspective (on the client) you only see pixels and keyboard and mouse interactions with it. There is nothing for a test tool to grab on to really. There are some tools that leverage off the background ICA protocol but that is also difficult to stabilize.

So, whichever way you turn you’re in a bind anyway. In comes Sikuli. I should rather say Sikuli X. This is a beta classed project by MIT, that is testing through the actual front end interface. Usually I am fiercely critical of such test automation but Sikuli really brings some interesting features to the task that mitigates a lot of the inherent risks.

The main issue with such automation is relativity to where objects are on screen. Sikuli leverages off actually finding things on the display but instead of doing a 1:1 search there is some fuzzy-logic thingamajiggy going on, where certain differences are actually allowed. That means the hit-rate is much improved over many other tools.

The editor! The actual scripting editor seamlessly incorporates the concept of using screenshots. The images can be seen in the actual code you write. See the example below. Click on the image to see the full size screenshot.

This in my mind is a novel and easy way of developing. It makes things a lot simpler to visualise. Sikuli is developed in Java and uses Jython as the programming language. This opens up all the powerful language features of Python to you in the scripts. Python is a currently popular laguage so you might be able to leverage off previous know-how.

But let me say this, so that there is no misunderstanding:

  1. Although Sikuli is cool and really the best implementation of this testing concept. I would still not use it for large scale test automation. I would not ever use it for any web testing. There things like Selenium and Watir are better suited.
  2. The kind of test automation Sikuli does is still fragile, no matter how clever it is. ll your tests should always be treated as fragile and maintenance intensive.
  3. If you’re in a real bind for a tool and nothing seems to fit, then this is it.

So now we can get back to the cool stuff! Sikuli can react to things on screen or screen regions. So you can say things like “if this picture turns up on the screen do something”. So you could actually have a program running in the background waiting to detect a login screen. If it has detected it it would automagically log you in. A good example on how to use this tech in a way not really intended here is Sikuli playing Angry Birds.

Go to the Sikuli Blog for more cool stuff.

Now for something more concrete. I’ll do a small MS Word automation. The script is as follows:

I won’t insult anyone’s intelligence by explaining the above script. Just LOOK at it and you will understand. The only thing I need to add is, that the path in the 1st line is split for display purposes, in order for the program to work the 1st two lines must be one. I don’t think there is a quicker way to automate Word without any know-how in VB and the such. So I find it quite useful and easy to do even if you sometimes get issues with rendering times and things overtaking each other prematurely.

Because you’re in Jython and Sikuli is so cool you can also do things like for and while loops. If statements can get really nice with things like if exists(<image to find>) then click(<image to click>). You can also define asynchronous events that track things while you’re executing a script (onAppear, onVanish). So you can catch an error pop-up should it happen.

So my advice is, get Sikuli and have some fun. Add it to your tester-tools-baggie because the day will come when it will help you out of a tight spot. If you have your own Sikuli experiences then please post a comment on what you use it for and how.

Author: Oliver Erlewein


One thought on “Sikuli – for all those hard to reach places!

  1. Just to clarify, Citrix is pretty much SSH with X forwarding, but for windows?

    Sikuli certainly sounds like an interesting piece of kit, though. It sounds like it has a fair bit of potential outside of the testing world, as well – as a gamer, it sounds like the perfect tool for automating some of the more menial tasks – topping up health, reloading, etc. If you wanted to get incredibly advanced, you could probably have it snipe for you, and thought that thatwoulthatwould require a fairly insane amount of scerenshot preparation to allow for targeting at different distances etc.

    Of course, this is if it will interact with directx frames, and can be fast enough to react to an object fast enough while the cpu is already taking a pounding.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s