Google Analytics


To search for specific articles you can use advanced Google features. Go to and enter "" before your search terms, e.g. CSS selectors

will search for "CSS selectors" but only on my site.

Tuesday, November 27, 2012

List of Selenium/WebDriver blogs.

I was recently pointed to a nice list of Selenium/WebDriver blogs. If you are looking for more information about Selenium/WebDriver you should check out

I haven't checked out all the links but there appears to be a good start to the list of Selenium/WebDriver blogs available out there.

Monday, October 29, 2012

Generating a file of a specific size

Every once in a while someone is looking for a file of a specific size. Occasionally, it must be real data. If you are transmitting the file and the data will be compressed then the type of data will make a difference.

However, if you just need a file to fill some space or there will be no compression then Windows has a neat little utility called FSUTIL.

The FSUTIL file can be used for a number of things but the nicest feature is creating a new file filled with zero bytes. First, you need to know how many bytes. If you want a file which is 38 gigabytes then you need to figure out how many bytes that is. Technically, it is 1024*1024*1024*38. If you want  a rough idea you can just use 38,000,000,000 but a 38 gigabyte file is really 40,802,189,312. Next is which file you want to hold the data. Let's say you want to create C:\DELETEME.TXT then the full FSUTIL command is:

fsutil file createnew C:\DELETEME.TXT 40802189312

This will create a 38G file is a matter of seconds.

For UNIX, Mac OS X or Linux you can use DD. The DD command is for converting and copying files. If we wanted to create the 38G file with DD the command would be:

dd of=deleteme.txt oseek=79691776

The oseek is the magic. It will seek n*512 bytes. The standard size of a block is 512. So we take 79691776*512, which is 38G. Even easier would be:

dd of=deleteme.txt oseek=38m obs=1024

This will generate a file of 38M * 1024 or 38G. Much easier to figure things on working with values like these. That is, no need for a calculator.

IMPORTANT: after you press enter on the DD command it will take the standard input as its input. So you need to enter CONTROL-D.

The other option for UNIX/Linux/Mac OS X is to copy a file of a set size. The nice thing about this is you can take a file with real data and copy it enough times to make a single file of the correct size. For example, if I have a text file with 512 bytes of real data I can set the if= to that file and make multiple copies of that one file into the output file.

Thursday, July 26, 2012

Removing logout button

When I set up a Selenium Server it only works when someone is logged in. In a previous article I wrote how to make a user automatically log in. For this article I'll write how to prevent that user from logging out.

  1. Run regedit.exe
  2. Go to HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Policies
  3. Create a new Key called Explorer
  4. Go to Explorer folder
  5. Create the DWORD key StartMenuLogoff.
  6. Set it to 1.
  7. Go to the Start Menu and use Shutdown or Reboot (so Logoff is not the last option used).
When you log in as that user now, they will not have a Logoff button available to them. This has been tested on Windows 7.

automatic log in, automatic run

When writing test scripts for testing an installer, on Windows, one of the challenges is when the installer requires a reboot. To handle this I use automatic log in and the run once features of Windows.

To create automatic log in:
  1. Run regedit.exe
  2. Go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon
  3. Set DefaultUserName to the name of the person you want to automatically log in. If the key does not exist add it as REG_SZ.
  4. Set DefaultPassword to the user's password. If the key does not exist add it as REG_SZ. Note: anyone who can read the registry can see the user's name and password. Create a local user with no permission on your network.
  5. Set AutoAdminLogon to 1. If the key does not exist add it as REG_SZ.
  6. Set ForceAutoLogon to 1. If the key does not exist add it as REG_SZ.
And that is all you need to have the computer automatically log in after a reboot.

The second part is to create a script which will start up the automation at the right place. What usually happens is the installer will put something in place so that after the reboot it will continue with the install the moment the user logs in. If this is the case for your installer, write a script which continues the automation and add it to the RunOnce key. Assuming you have a batch file which starts part 2 of your automation:
  1. Run regedit.exe
  2. Go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\RunOnce
  3. Add a string value
  4. The Name of the should be something like zzz_Automation
  5. The Data value will be the full path to the batch file. It is important to note that the current directory will not be the location of the batch file. Do not assume it will be in your script. Either use full paths for everything in the script or change to the directory you think you should be in then continue with the script.

Tuesday, July 24, 2012

Generating a screen capture when using RemoteWebDriver

Recently I was asked if all the different implementations of WebDriver allow for screen captures. I had a look at the source code and see that almost all of them "implement TakesScreenshot". If the Java class for the driver implements TakeScreenshot then it allows for screen captures. This means a previous posting ( there isn't really any need for using Robot to generate the screen capture.

The nicest thing about using the built-in screen capture is that Robot will capture the visible screen. The built-in screen capture will capture the entire window in the browser, including the parts not visible, i.e. things you have to scroll to see.

Shortly after this someone asked of RemoteWebDriver supported screen capture. This was very important because Robot would not capture even the visible part of a REMOTE screen.

A quick check of showed that it did NOT implement TakesScreenshot.

Recently however I found a solution. Assuming you have the code:
DesiredCapabilities dc = DesiredCapabilities.firefox();
URL url = new URL("http://localhost:4444/wd/hub");
WebDriver driver = new RemoteWebDriver(url,dc);
You can change the driver to:
WebDriver driver = new Augmenter().augment(new RemoteWebDriver(url,dc));
Now it has the capability to take a screen shot. To make the call, here are the three different outputs:
File f = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
String s = ((TakesScreenshot)driver).getScreenshotAs(OutputType.BASE64);
byte[] b = ((TakesScreenshot)driver).getScreenshotAs(OutputType.BYTES);

And that is all there is to taking a screenshot with RemoteWebDriver. I have tried this with Firefox and InternetExplorer. I assume it works with the other browsers but I leave that for you to explore.

Wednesday, July 4, 2012

Creating a screen capture on every action

Someone recently commented on an article I wrote about generating a screen capture when an exception is thrown (see Generating a screen capture on exception thrown with Selenium 2).

Performing an action when an exception is thrown is built into the Selenium framework. You just need to create the action to generate a screen capture and hook it into the framework.

The Selenium framework comes with a WebDriverEventListener interface. In my article above I created an implementation of the WebDriverEventListener interface. To do this properly, you need to implement the following methods:
beforeNavigateTo(String url, WebDriver driver);
afterNavigateTo(String url, WebDriver driver);
beforeNavigateBack(WebDriver driver);
afterNavigateBack(WebDriver driver);
beforeNavigateForward(WebDriver driver);
afterNavigateForward(WebDriver driver);
beforeFindBy(By by, WebElement element, WebDriver driver);
afterFindBy(By by, WebElement element, WebDriver driver);
beforeClickOn(WebElement element, WebDriver driver);
afterClickOn(WebElement element, WebDriver driver);
beforeChangeValueOf(WebElement element, WebDriver driver);
afterChangeValueOf(WebElement element, WebDriver driver);
beforeScript(String script, WebDriver driver);
afterScript(String script, WebDriver driver);
onException(Throwable throwable, WebDriver driver);
In my screen capture when an exception is thrown, I put the necessary code to generate a screen capture in the onException method. You can see from the list above that you can have Selenium perform an action when other events occur. For example, if you wanted to do a screen capture after each click() action, you could write:

public void afterClickOn(WebElement element, WebDriver driver) {
    String filename = generateRandomFilenameFromWebElementAndDriver(element, driver);

I'll leave it to the reader to figure out how to create the generateRandomFilenameFromWebElementAndDriver. You might do something like use getCurrentUrl(), convert things like colon, slash, etc. into valid filename characters then append the web element getTagName() to the end of the filename.

Or you could do something tricky like have the beforeClickOn generate the filename and the afterClickOn use that filename to generate the screen capture.

If the event you want to trigger a screen capture is not listed above, there is no easy answer for how you would do it.

Initially, you might considering changing the WebDriverEventListener interface. But this would require a change to how Selenium works. Every time there is a new release of Selenium, you would have to merge your changes back in.

I would recommend submitting the changes as a new feature to the Selenium project to see if they'll incorporate it or wrap the Selenium methods with your own methods. For example, you would call your sendKeys() which could do a driver.sendKeys then do a screen capture.

Wednesday, June 20, 2012


Most automation tools depend on the concept of the page has finished loading. With AJAX and Web 2.0 this has become a grey area. META tags can refresh the page and Javascript can update the DOM at regular intervals.

For Selenium this means that StaleElementException can occur. StaleElementException occurs if I find an element, the DOM gets updated then I try to interact with the element.

Actions like:
are not atomic. Just because it was all entered on one line, the code generated is no different than:
By fooID ="foo");
WebElement foo = driver.findElement(fooID);;
If Javascript updates the page between the findElement call and the click call then I'll get a StaleElementException. It is not uncommon for this to occur on modern web pages. It will not happen consistently however. The timing has to be just right for this bug to occur.

Generally speaking, if you know the page has Javascript which automatically updates the DOM, you should assume a StaleElementException will occur. It might not occur when you are writing the test or running it on your local machine but it will happen. Often it will happen after you have 5000 test cases and haven't touched this code for over a year. Like most developers, if it worked yesterday and stopped working today you'll look at what you changed recently and never find this bug.

So how do I handle it? I use the following click method:
public boolean retryingFindClick(By by) {
        boolean result = false;
        int attempts = 0;
        while(attempts < 2) {
            try {
                result = true;
            } catch(StaleElementException e) {
        return result;
This will attempt to find and click the element. If the DOM changes between the find and click, it will try again. The idea is that if it failed and I try again immediately the second attempt will succeed. If the DOM changes are very rapid then this will not work. At that point you need to get development to slow down the DOM change so this works or you need to make a custom solution for that particular project.

The method takes as input a locator for the element you want to click. If it is successful it will return true. Otherwise it returns false. If it makes it past the click call, it will return true. All other failures will return false.

Personally, I would argue this should always work. If the developers are refreshing the page too quickly then it will be overloading the browser on the client machine.

Tuesday, June 19, 2012

When trial and error is a bad thing

I'm a hacker. Not the breaking into systems and doing damage hacker. To me a hacker is someone who learns things by trial and error. I will systematically poke away at something which for all intents and purpose is a black box. I hack to learn. I hack things I own. I'll create an instance of something and hack away at it. People I work with will create development, test or staging environments which I will hack.

I do not hack sites I don't own or have permission to hack. This is what differentiates good hackers and bad hackers.

What I do is poke at something. Maybe I'll try changing an input or altering the environment slightly and see how that changes things. I'll keep doing this until a pattern merges. From trying different things I start forming a hypothesis of what is happening inside the black box. If I try something and the result does not fit in my hypothesis, I form a new hypothesis. At some point I usually get a clear understanding of what is happening in the black box without ever seeing what is in the black box.

I essentially look at the symptoms and narrow down what the cause would be.

This is a good use of trial and error. The goal is not to find the input which gives me the desired output. If I stopped the moment I got the desired output, I might think I have the solution but I don't. Case in point, I input 2 and 2 and get 4. My hypothesis is that the black box does addition. At this point my hypothesis is correct. However, if I poke further I might find that inputing 2 and 3 gives me 6. Now I see that it is not addition. New hypothesis is that it is multiplication.

Hacking is really empirical. Unless I try every possible input, I cannot be certain my hypothesis is correct. For example, I might input 1 to a function and it returns 43, I input 2 and get 47, I input 3 and get 53. After inputting the numbers from 1 to 20 I notice all the numbers are prime! My hypothesis is that the function is a prime number generator. However, if I input 41 I get 1763. This is not prime (43 * 41 = 1763). Turns out the function is Euler's formula for finding prime numbers, i.e. n^2 + n + 41. This has been proven to only produce prime numbers when n is less than 40.

Still hacking can be a good thing. Trail and error to find THE answer is never a good thing.

I see a lot of people solving problems as follows:

  1. Program or computer not functioning correctly.
  2. Change something.
  3. If program or computer not functioning correctly go to step 2.
  4. Problem solved.
Now maybe they did find the right solution but most often they don't. Later the problem will come back with different symptoms. If I purchased a program from you and you used this method to solve the problem, here is how I see this as a consumer of software:

My car is running slower than normal. I bring it to my mechanic and he does the following:
  1. He changes the spark plugs and charges me for that.
  2. Car is still running slow.
  3. He adjusts the valve on the carburetor and charges me for that.
  4. Car is still running slow.
  5. He rotates the tires and charges me for that.
  6. Car is still running slow.
  7. He changes all the fluids and charges me for that.
  8. Car is still running slow.
  9. Cars today have a lot of electronics, so he disconnects the battery for a week.
  10. All my programming, bluetooth, radio stations, clock, GPS, etc. are gone.
  11. The car is no longer running slow.
  12. Three months later the car is running slow again.
  13. My mechanic disconnects the battery for a week.
  14. All my programming, bluetooth, radio stations, clock, GPS, etc. are gone.
  15. My car is still running slow.
Would you pay for all the work the mechanic did? I think it is safe to say that NO ONE would put up with this. Some people might put up with it until step 12 then find a new mechanic. Others would put up with this until just step 2 or 4. Most of us would not pay for anything after step 2.

I've worked in industries where EVERYONE programs like this. There might be 4 or 5 different vendors and you really don't have any other choice. However, it just takes one guy to write quality software and everyone switches to that other guy. Trying to win back those customers means you have to make up for all the poor software issues PLUS give them some incentive to switch away from the guy who has always given them good software.

Thursday, June 14, 2012

So you want to do unit testing

What is a unit test? Wikipedia describes unit testing as testing individual units of code in isolation. If the code has external dependencies, you simulate the dependencies using mock objects.

For example, if I am testing code which gets data from a database, hopefully access to the database is via something like ODBC or JDBC. In which case, it is possible to use a fake database (file system or memory based) rather than say an Oracle or SQL Server driver.

If my database connection is hard coded to a particular machine or assumes the machine is localhost then my first step is to refactor the code to remove this dependency.

Part of the purpose of having unit test cases is so that we can safely change the code and know we didn't break any existing functionality. So if we need to modify the code to be able to add unit tests we have a bit of a Catch-22 situation. The truth of the matter is, if we have been changing the code without unit tests, changing it one more time in order to add unit tests is actually a step in the right direction and no worse than previous development.

Another important feature of unit tests are speed. If I am adding a new feature and I want to be sure it hasn't broken anything, I want to know as soon as possible. I don't want to write the feature, run the tests and check the results tomorrow. Ideally, I want to know in seconds. Realistically, I might have to live with minutes at first.

Test runs should be automated. If I have to make a change and figure out what tests to run, run them and check the results there is a strong chance I will stop running them. Especially if I'm on a tight timeline.

Ideally, I would check in my code. This will fire a trigger which builds my code (not the entire product, just my code) and run the unit tests against it. Putting such a build system in place is a great deal of work but worth the effort. Every minute it takes to create this build system should be weighed against how much time developers spend testing their code before they check in, how many minutes testers spend finding bugs, how much time developers take understanding the bug and fixing it. Numerous studies have shown fixing bugs is much more expensive than never introducing them in the first place.

So what do we need so far?

First, we need a unit test framework. You wouldn't create your own replacement for JDBC/ODBC. So why create your own unit test framework. There are plenty of them out there.

Second, we need mocking frameworks for the technologies we are utilizing. Which mock object frameworks you require depends on what you are using in your application. If it is a web application, you might need to mock out the web server. If it accesses a database, you will need to mock out the database.

Third, we need a build system to automate the running and reporting of the unit tests. Reporting the results is important too. Most systems will either report back to the source control client or send you an email. If the tests run in, literally, seconds, you can afford to reject the checkin if a unit test fails. If it takes more than say 5 seconds, you might want to send an email when a checkin fails.

Fourth, we need commitment from management and the team. If you don't believe there is benefit to unit testing there will be no benefit to unit testing. Training people on how to create good unit tests and maintain them is critical. If I'm starting a new project and writing tests from the beginning it is easy but the majority of you will be adding unit tests to existing code.

The first three things are relatively easy to obtain. There are plenty of technologies and examples of people using them. The fourth requirement is the biggest reason adopting unit testing fails. If you don’t get buy in from everyone involved it just won’t work. The developers need to understand this will benefit them in the long run. The testers need to understand that less testing will be required and they need to focus on things unit testing will not catch. There will always be plenty of things to test. So there should be no fear unit testing will replace  integration or system testing. Management has to understand if they cut timelines for a project, they will not give developers time to write the unit tests. If you reward the Project Manager for getting the project out on time, he will get the project out on time even if it means giving developers no time for unit test creation. As a Project Manager, if reducing the number of issues AFTER the project has shipped is not a metric I’m evaluated on, I’m happy to ship a product which will make the next project difficult to get out on time.

So, you have the tools and you have buy in from everyone. Now what? If you have 100,000+ lines of code, where do you start writing unit tests? The answer is actually really simple. For example piece of code a developer touches, they should add unit tests. Bug fixing is the best place to start. I would FIRST write a unit test which would have caught the bug. Then I’d fix the bug and see the unit test pass.

By focusing on unit tests for bug fixes it reduces the need for regression testing, it focuses on the features customers are using and the developers are in that code anyways. If we need to refactor the code to support unit testing, might as well happen as we are changing the code. The code was broken when  we started the bug fix. So we’ll have to manually test the fix without unit tests. Hopefully, with a unit test in place, it will be the last time we manually test changes to this code.

If we are modifying the code for feature creation, not bug fixing, we want to write unit tests to confirm the current behaviour. Once we have a test which passes with the current code, we can add the feature and the tests should continue to pass.

At this point we know what we need and where to start. So let’s cover some of the how to write a unit test.

First, a unit test is going to be a function/method which calls our code. We want the name of the unit test to reflect what it is testing. When results are published they will go out to the developer but they will also be seen by the backup developer, project management and various other people as well. If I got an email telling me test17() failed I’m going to have to open the code and read what test17() is testing. You added comments and kept them up to date, right? Or course you didn’t. The comments shouldn’t be necessary. The test name should tell me what it is doing. If the test method was called, callingForgotPasswordWhenNoEmailInUserPreferences() then we all know what is being tested.

Second, what failed? Most unit test frameworks has assert statements. There is the basic fail() call but there are also things like AssertTrue, AssertEquals, AssertNotNull, etc. They can be called with just what you are checking or with a message and what you are checking. You don’t want to code any more than you have to but enough that someone receiving the results will know what failed. If the requirement for my software is “When a user clicks the Forgot Password button but they have not set an email address in their preferences, they should be presented with a message telling them to contact the system administrator.” Then the result message from my example here might be something like, “callingForgotPasswordWhenNoEmailInUserPreferences() failed. Was expecting: ‘No email address was set for your account. Please contact the System Administrator.’ but received: ‘No email address.’”. From this is it pretty clear what was expected and what we received instead. Failing to tell the user how to proceed should be considered a show stopper for the customer. On the other hand, if the result was: “callingForgotPasswordWhenNoEmailInUserPreferences() failed. Was expecting: ‘No email address was set for your account. Please contact the System Administrator.’ but received: ‘No email address was set for your account. Please contact the system administrator.’” the customer might consider this acceptable. We might even update the unit test case to ignore case so the test becomes a pass.

Unit test frameworks are pretty well established now. The general structure of a unit test is:

  • set up for the test
  • run the test
  • assert the test passed
  • clean up so the next test starts at the same point

The set up would be things like creating mock objects, initializing the inputs for the test, etc. The running of the test would be a call to the method being testing. Next would be an assert statement confirming that we received the expected results or side effect. Finally, clean up (often called tear down) the environment so it is at the exact same condition it was before the set up occurred.

Often you will group similar tests in one test suite. If I have 12 tests and they all require the same set up I will put them all in one suite. The code will then have one setUp() method that creates the environment for each test, one method for each test (12 methods in total for this example) and one tearDown(). The setUp() method will create any mock objects, initial global variables, etc. The test method will create anything particular to that test, call the method being tested then make an assert call. The tearDown() method will then clean up the environment so it is just like it was before the setUp() method was called. This is important because most unit test frameworks do no guarantee the order the tests will be run. Assuming one test starts where a previous test left off is just bad practice. I have worked on a project with 45,000 unit tests. All test are run as part of the nightly build. Rather than running all the tests on one machine, they are distributed to 238 different machines. If they all ran on one machine they would take 378 hours (over 2 weeks) to run. By distributing them over 238 computers they run in approximately 3 hours. However, if test1932 depends on test1931 and the two tests get sent to different machines, test1932 will not run correctly. Each test must be independent of all other tests. This will not seem important at first but 1 year later you might find yourself needing weeks (possibly months) to refactor all your unit tests. Moments like these often cause management to abandon unit testing.

This is unit testing is a nutshell. I will warn you, ‘the devil is in the details.’ Hiring someone who has gone through the pains of setting up a unit test framework is always a good idea. Either find a good consultant or hire someone full time to work on the framework for you. Some unit test frameworks are jUnit for Java, cppUnit for C++, nUnit for .NET, etc. Gerard Meszaros has written an excellent book called “xUnit Test Patterns: Refactoring Test Code”. In it he talks about “Test Smells”. Essentially, you can sometimes look at a piece of code and say, “This code stinks.” A code or test ‘smell’ is an indicating that the code has problems, i.e. it stinks. I have found reading Gerard Meszaros book I know what to look for before I do it. Originally the book was designed for people who created unit tests, found the tests have issues, i.e. they ‘smell’ and are looking to fix them, i.e. refactor. By reading the book, I avoid creating the bad unit tests in the first place.

Good luck and have fun!

Friday, April 20, 2012

How to find a popup window if it does not have a name

When using Selenium 2.0 (WebDriver) you will sometimes find clicking a link opens a popup window. Before you can interact with the elements on the new window you need to switch to the new window. If the new window has a name you can use the name to switch to the popup window:


However, if the window does not have a name you need to use the window handle. The problem is that every time you run the code the window handle will change. So you need to find the window handle for the popup window. To do this I use getWindowHandles() to find all the open windows, I click the link to open the new window. Next I use getWindowHandles() to get a second set of window handles. If I remove all the window handles of the first set from the second set I should end up with a set with only one element. That element will be the handle of the popup window.

Here is the code:
String getPopupWindowHandle(WebDriver driver, WebElement link) {

    // get all the window handles before the popup window appears
    Set<String> beforePopup = driver.getWindowHandles();

    // click the link which creates the popup window;

    // get all the window handles after the popup window appears
    Set<String> afterPopup = driver.getWindowHandles();

    // remove all the handles from before the popup window appears

    // there should be only one window handle left
    if(afterPopup.size() == 1) {
        return (String)afterPopup.toArray()[0];
    return null;
To use this I simply call it with the WebElement which clicking opens the new window. You want to add some error handling to this but the basic idea of finding the popup window is here. To use the method I would use:
String currentWindowHandle = driver.getWindowHandle();
String popupWindowHandle = getPopupWindowHandle(driver, link);
// do stuff on the pop window
// close the popup window

Wednesday, April 4, 2012

Regular Expression

I've been processing files and data a lot lately which means I've been using Regular Expressions.

Regular Expressions is a very powerful pattern matching tool. If you have used MSDOS or Bourne shell you are familiar with wildcards like "*.txt" will match all files ending with .txt. Regular expressions take this to a whole new level.

First thing to note is there are different implementations of Regular Expression. The basic concepts are the same and most the syntax is the same but there are subtle differences. I'll talk more about this as I give examples of the language.

The second thing to note is, some of the special symbols from MSDOS or Bourne shell are used by Regular Expression but they have a completely different meaning. Most notably is the asterisk (*).

The example above, "*.txt", would be a bad Regular Expression. Why? The asterisk means the previous character zero or more times. There is no character preceding the asterisk so it is a syntax error.

For simple things like "*.txt", Regular Expression can be overly complex. The dot (.) means any character. So if I want to emulate the "*" of MSDOS, I would use ".*" in Regular Expression. If I wanted an actual dot I would use "\." in Regular Expression. So the whole "*.txt" in MSDOS becomes ".*\.txt" in Regular Expression. In most languages, the "\." would get processes as a control character by the String implementation. The slash (\) would never make it to the Regular Expression parser. So if you want "\." to reach the Regular Expression parser, you need to use "\\." because the String implementation will parse this, resulting in "\.", then pass it to the Regular Expression parser.

The language I use most right now is Java. If you look at the Java API documentation for the Pattern class you will see this is the Regular Expression parser.

Some of the basic stuff:

  • Anything not a special character is matched verbatim. So in my example above "txt" only matches "txt".
  • If you want to match a special character you need to escape it. From my example, the dot is a special character. To match a dot and nothing else you use "\\.". I use double slash because Java will parse the "\\" before sending it to Pattern.
  • Special characters from things like println() or printf() work the same in Regular Expression. These are "\t" for tab, "\n" for newline, "\r" for return, "\f" for form-feed, "\a" for a bell. A bell in ASCII is control-g or "\x07" but "\a" is better because you shouldn't assume ASCII.
  • You can have a sets using square brackets. If I have "[abc]" this will match "a", "b" or "c".
  • You can use the square brackets for negation. If the first character in the set is caret (^) it means 'not'. For example, "[^abc]" would match anything not "a", not "b" and not "c".
  • If you want all digits you could use "[0123456789]" but there is a shorthand for this. A range can be specified using a minus (-) symbol. This example would be "[0-9]". You can also do things like "[a-z]" but alphabetic strings can be problematic if you allow different character sets.
  • If you want upper and lower case letters you might think "[a-Z]" would work but this is an error. The letter 'Z' in ASCII has a value of 90 and 'a' has a value of 97. Second attempt might be "[A-z]". This is closer but in ASCII the symbols '[', '\', ']', '^', '_' and '`' are between 'Z' and 'a'. So you have too many characters in this set. The solution is a union (like in Set Theory). You want "[a-z]" union "[A-Z]". In Regular Expression this is written as "[a-zA-Z]".
  • You can also write a union as "[a-z[A-Z]]". This might seem like extra typing and in some cases it is. What if you wanted all consonants? That would be 21 letters uppercase and 21 letters lower case. A string with 42 letters (you cannot really use a single range). You could use "[b-df-hj-np-tv-zB-DF-HJ-NP-TV-Z]" but even that is a little ugly. How about: "[a-zA-Z[^aeiouAEIOU]]". When I look at that it is pretty obvious what I'm trying to match. It reads as "all letters but not vowels".
  • There is 'syntactic sugar' for some things:
    • Rather than "[0-9]" I can use "\d" (the d is for digit)
    • Rather than "[^0-9]" I can use "\D" (uppercase implies NOT)
    • Rather than "[ \t\n\x0b\\f\r]" I can use "\s" (the s is for space or whiteSpace)
    • Rather than "[^ \t\n\x0b\\f\r]" I can use "\S" (uppercase implies NOT)
  • A 'word' is a String made of letters, digits or underscore. A character of a 'word' therefore would be: "[a-zA-Z\d_]". Syntactic sugar for this is "\w".
  • Alternately, "\W" is for not a 'word' character.

Some slightly more advanced stuff would be boundary qualifiers:

  • The caret (^) not in a set means beginning of line. So if I have the string "^a" it matches if 'a' is the first character in the string. With wildcards or substring matching this can be very helpful. For example, "^def" will not match a substring check with "abcdefghi" but "def" will match.
  • The dollar ($) is for end of line. For example, "def$" will not match "abcdefghi" but "def" will match.
  • Capture groups are used for substitution. For example, if I have a string with my full name, "Darrell Grainger" and I want to change it to "Grainger, Darrell" I would do the following:
String name = "Darrell Grainger";
String flip = name.replaceFirst("(\\w*) (\\w*)", "$2, $1");
  • The "\\w*" means get the first word. It will match "Darrell". By wrapping it with parenthesis it becomes a 'capture group'. So the first "(\\w*)" gets saved into "$1" and the second "(\\w*)" gets saved into "$2".  In other implementations of Regular Expression, capture groups are saved into things like "\1" rather than "$1".
  • Capture groups are great if you are processing a number of strings in an array. This example will flip the first and second word for any set of strings.
More advance stuff would be Greedy quantifiers versus Reluctant quantifiers. Lets look at this with capture groups.
String s = "aaabbbaaa";
String s1 = s.replaceFirst("(a*)(.*)", "$2 $1");
String s2 = s.replaceFirst("(a*?)(.*)", "$2 $1");
The string s1 will contain "bbbaaa aaa".
The string s2 will contain "aaabbbaaa ".

For s1, what happened is "(a*)" matched "aaa" and "(.*)" matched "bbbaaa".
For s2, what happened is "(a*?)" was a Reluctant quantifier. Because "(.*)" is a Greedy quantifier, it captured everything. This left nothing for "(a*?)" to capture.

What happens under the hood is that the Regular Expression parser will find the Greedy quantifiers, read in the entire string and see if it matches. If it does not it pushes one character back out, checks for a match, pushes a character back out, checks for a match. It keeps doing this until it finds a match. Whatever didn't match is used to process Reluctant quantifiers. 

While processing the Reluctant quantifiers the parser will read in one character, check for a match, read another character, check for a match, read another character, check for a match. It keeps doing this so long as things are matching. The moment there isn't a match it stops.

So the s1 string processed "(a*)" first, because it is a Greedy quantifier and captured "aaa" into "$1". Then it processed "(.*)" which matched the rest of the string. This captured "bbbaaa" into "$2".

With the string s2 it processed "(.*)" because it is a Greedy quantifier and "(a*?)" is a Reluctant quantifier. The "(.*)" grabbed the entire string and put it into "$2". This left an empty string "". The empty string is used to process the Reluctant quantifier "(a*?)" and "" gets captured into "$1".

Here is a table of the Greedy versus Reluctant quantifiers:

Greedy Reluctant Meaning
X? X?? X, once or not at all
X* X*? X, zero or more times
X+ X+? X, one or more times
X{n} X{n}? X, exactly n times
X{n,} X{n,}? X, at least n times
X{n,m} X{n,m}? X, at least n but not more than m times

There is more the Regular Expressions but this information is what you need for most situations.

Tuesday, April 3, 2012

Frames and WebDriver

When dealing with iframes and WebDriver things can quickly get confusing. Especially if you add popup windows to the mix.

When you have an iframe, it is a separate DOM. You can look at it as a separate web page inside the current web page. Lets take an example diagram:

If we look at the source code for this it might be something like:
    <iframe src="frame1.html" style="border: red;">
    <iframe id="2" src="frame2.html" style="border: green;">
        <iframe id="2-1" src="frame2-1.html">...</iframe>
        <iframe id="2-2" src="frame2-2.html">...</iframe>
        <iframe id="2-3" src="frame2-3.html">...</iframe>
        <iframe id="2-4" src="frame2-4.html">....</iframe>
        <iframe id="2-5" src="frame2-5.html">....</iframe>
        <iframe id="2-6" src="frame2-6.html">...</iframe>
        <iframe id="2-7" src="frame2-7.html">...</iframe>
        <iframe id="2-8" src="frame2-8.html">...</iframe>
        <iframe id="2-9" src="frame2-9.html">...</iframe>
    <iframe id="3" src="frame3.html" style="border: brown;">
        <iframe id="3-1" src="frame3-1.html">...</iframe>
        <iframe id="3-2" src="frame3-2.html">...</iframe>
    <iframe id="4" src="frame4.html" style="border: blue;">
        <iframe id="4-1" src="frame4-1.html">...</iframe>
        <iframe id="4-2" src="frame4-2.html">...</iframe>
        <iframe id="4-3" src="frame4-3.html">...</iframe>
        <iframe id="4-4" src="frame4-4.html">...</iframe>
        <iframe id="4-5" src="frame4-5.html">...</iframe>
        <iframe id="4-6" src="frame4-6.html">...</iframe>
        <iframe id="4-7" src="frame4-7.html">...</iframe>
        <iframe id="4-8" src="frame4-8.html">...</iframe>
        <iframe id="4-9" src="frame4-9.html">...</iframe>
All the rectangles in the diagram are iframes. The iframe with the red border would be the first iframe in the HTML code. Inside each iframe will be a full HTML page. It will have the <html></html> and everything which goes inside an HTML page.

So in WebDriver you have a switchTo method. The switchTo method returns a WebDriver.TargetLocator interface. If we look at the WebDriver.TargetLocator interface we see the following methods:

  • frame(int index)
  • frame(String nameOrId)
  • frame(WebElement frameElement)
  • defaultContent()
We can use the index to find the frame. If you have one main page and it contains a set of iframes, this is fine. However, if you have frames within frames it can get a little difficult to follow. Even if you can figure it out today, you will have to go through the whole exercise again if they change the layout by moving, adding or deleting a frame.

The best way to find a frame is with the id attribute or find the frame element using findElement() then use the third version listed above.

Now here is the most important thing to remember: you cannot jump in two or more frames. So if you are at the main content page and want an element in frame3-1.html you have to switch to frame3.html then to frame3.1.html. Assuming we are at the main page, the code for this might look like:


Additionally, if I have switched to frame4-7.html and I want to go to frame2-1.html, I have to go back to the top, then to frame2.html, then to frame2-1.html.

So how do you get to the top? If you are dealing with iframes then the defaultContent() method will take you to the main page, above all the iframes. If you are dealing with frames, defaultContent() will take you to the first frame.

So you can either leave focus were every you last left it then every action in a frame assumes you are at some unknown focus, call driver.switchTo().defaultContent() to get to the top, then go down to the frame you want OR you can start at the top, go to the frame you want then back to the top when you are done. The first way you go to the top as needed. The second way you ensure you are always at the top. Both ways work. It is just a matter of convention.

Sometimes drawing the layout of the page is a little harder then the diagram above. Additionally, if the layout changes, it can be difficult to alter the picture, depending on how you drew it. What I like to do is draw the relationships like a tree. For example, the diagram above might be draw as:

From this tree the rules are that you can go down a branch (switchTo().frame() from the parent) or you can get to the root of the tree (defaultContent() for iframes). You cannot jump across nodes or up levels.

Friday, March 9, 2012

Help for selenium-server-standalone.jar

One thing you might not realize if you have been trying to get command line help from the Selenium Server jar file is that there are two different help outputs.

If you run the server with no inputs it just runs with no help at all.

As an old UNIX/Linux guy I have gotten used to the standard of a single dash and a single letter (e.g. -h for help) or two dashes and a word (e.g. --help) I was a little thrown by the Selenium Server jar file.

The reason is that -h will give you help for the Standalone Selenium Server but --help will launch the Standalone Selenium Server. The Grid Selenium Server is in there but getting help for it is not obvious.

The help switch for Grid Selenium Server is actually -help (not the same as -h but not quite the Linux convention). Actually, this will give you the help message for the Standalone Server and the Grid Server.

For running the Grid, you need to specify hub or node. If you want to run it as a hub you would use:
java -jar selenium-server-standalone.jar -role hub
Looking at the rest of the help you will see some switches have (hub), some have (node) and some have (hub & node). If the switch has a (hub) then that switch applies only when using -role hub. Conversely, if it has (node) then it only applies to -role node.

Generally, you will set up one hub and multiple nodes. For example, the application I am currently testing needs to be tested on:

  • Windows 7 with IE9
  • Windows XP with IE6
  • Mac OS X 10.8 with Safari 10
So I would run Grid Selenium Server with (omit the text after the # symbol):
  • -role hub # (defaults to localhost and 4444)
  • -role node -hubHost hubmachine -hubPort 4444 -browser browserName=iexplore,version=9,platform=VISTA # (Windows 7, IE9)
  • -role node -hubHost  hubmachine -hubPort 4444 -browser browserName=iexplore,version=6,platform=XP # (Windows XP, IE6)
  • -role node -hubHost  hubmachine -hubPort 4444 -browser browserName=safariproxy,version=10,platform=MAC # (Mac OS X, Safari 10)
The -hubHost and -hubPort just need to match the host and port for the hub. If you specify a -port for the hub then you need to change the settings for the nodes as well. In this example, I have the hub running on the computer with name hubmachine.

The -browser option is a little tricker. It mirrors the Selenium code. 

If you look at the Selenium Client source code for src/org/openqa/selenium/remote/ you will find a list of the strings which can be used with the browserName key. At this time for following values are permitted:
  • firefox
  • firefoxproxy
  • safari
  • opera
  • iexplore
  • iexploreproxy
  • safariproxy
  • chrome
  • mock
  • iehta
The platform key comes from src/org/openqa/selenium/ and valid values at this time are:
  • XP
  • MAC
  • UNIX
  • ANY
When you look at the file you will see that Windows 7 'feels like' Windows Vista. So we use the VISTA platform when we want Windows 7.

At this point, if you go to http://localhost:4444/ you'll have a link to the Grid2 Wiki and a link to the console. If you go to the console you should see an entry for each of the nodes you are running. All the nodes will be running on port 5555 but different machines.

You can also go to the Grid2 Wiki from the hub home page. You might want to take the time to have a look at that as well. There are additional parameters for the node. For example, I can add maxInstances=3 to the browser string and it will let me run up to 3 instances of that browser on the computer. By default the maxInstances will be 5.


Wednesday, February 22, 2012

Technical Debt

What is technical debt? 

For the past ten years I have been hearing the expression "Technical Debt" in reference to programming. But what is technical debt?

At the same time I noticed the best project managers would have a summary chart for upper management. It broke things down into red light, yellow light or green light. The idea was that upper management have a lot of information to sift through so we needed to keep it simple.

I think the same idea applies to technical debt. You don't need a precise measurement of technical debt but you need to know if it is growing (red light), stable but still there (yellow light) or reducing (green light).

This still leaves the question, what is technical debt? If I'm running a business and my business needs a piece of machinery to finish a project. I cannot get money from the customer until I deliver the product but I need money to buy the machine to finish the project. So I get a small business loan, buy the machine, finish the project and sell it to the customer. At this point I have debt. Is the interest on the debt acceptable? If the interest is equal to the profit I made on the sale then I'll never get ahead (yellow light). If the interest is greater than the profit I made then I'll slowly go deeper and deeper into debt (red light). But if the interest is less than the profit I made, I can pay down or pay off my debt. BEFORE I get the loan, I need to know the profit will be greater than the interest on the loan. I need to know I will still turn a profit and can get out of debt.

Technical debt is similar. It MUST be a conscience decision to do something which is not good from a development point of view but which I'll be able to correct after the product is out the door. For example, I'm going to pick an inexpensive or easy to implement technology which I know will not scale. I know it will exceed the current load. I can predict, if successful, this choice will be a bottleneck in 18 months. In other words, the 'interest' on this 'loan' will eat away all my profit in 18 months. The long I wait to change to a more scalable technology the less profitable my project will be.

Worse would be to not only fail to pay off or at least pay down this technical debt but to incur more technical debt. Most people would like to have a nice home, a good car, annual trips to Europe or the Caribbean, send their kids to the best schools, etc. but would you use a credit card to pay for it all? When the bill collector comes knocking would you apply for another credit card to pay off the first credit card? Hopefully, you answered no. Even if you didn't, credit bureaus like Equifax or TransUnion would quickly make it impossible for you to get more credit.

Unfortunately, in software development there is no Equifax or TransUnion. Creating technical debt is a LOT easier than creating financial debt. This means you need to monitor your technical debt and make sure you are paying it down.

For short periods of time your technical debt might rise but the over all trend should be a reduction.

What is NOT technical debt? 

It is MORE important to understand what is not technical debt. A lot of people will chalk things up to technical debt when it is really lack of planning.

It isn't even poor planning but more a total lack of planning. Many times I have worked on projects where the project manager (or above) have made it very clear that the project must ship by a specific date. No excuses. What are you telling the programmers? You are telling them you don't care what the cost it, it has to ship by a specific date. The truth is that no one REALLY means 'money is no object'.

If you don't look at the cost of a technical compromise and just implement it, it is the equivalent of going to the local loan shark and signing a piece of paper without reading it. No successful business person would do this but I have seen Fortune 500 companies do the technical equivalent.

As a programmer, if my management tells me something has to be completed by a specific date, no matter what then they are telling me technical debt is no object. Take on whatever technical debt is necessary to get the project done on time. As a senior programmer I know they don't REALLY mean this. However, I have worked in companies where I tried to point out the technical debt is too high only to be told they don't want to hear excuses. Another programmer will swoop in and tell them he can get it done. I get demoted, the other programmer gets promoted.

Inevitably, the project ships on time and under budget. The programmer who swooped in and saved the day might even get a bonus. He might even do this again on the next release of the product. But then I noticed a pattern. Sooner or later, the programmer does a lateral shift to another department or worse, leaves the company to go work for the competition. Shortly after his departure from the project, the new developer starts trying to point out how much technical debt the previous programmer created. For example, a 50,000+ line function which uses lots of GOTOs and stack manipulation to make things work. Adding a new feature which should take 2 days to implement takes months and introduces dozens of new bugs in other features.

Reducing technical debt.

So what if you already have technical debt, planned or otherwise? You need to pay it down. I worked at one company where they went to their customers and let them know they had incurred a lot of unplanned technical   debt. They needed to pay it down. This meant the product would not be growing new features but it would be more stable and scalable. Most the customers looked at the market and believed we could still stay ahead of the competition plus the cost of re-training staff on a new product, creating a business relationship with a new company, etc. would be more costly than giving us a chance. Some major clients walked away and we took a big hit. The clients who did give us a second chance wanted to see that we had a plan. They wanted to see how we would make sure it didn't happen again. They wanted to see a road map of how we'd get back to adding new features. We had to regain their trust.

Hopefully you never get to this point. You want to start addressing unplanned technical debt before it gets this bad. But how do you do it?

Try and leave this world a little better than you found it. Robert Baden-Powell.
This is a quote from the father of scouting. The idea was, when you went camping to leave the camp site in better condition than you found it. If you show up at a camp site and someone has left some garbage, pick up their garbage as well as any garbage you create. If they trampled some saplings, plant new saplings.

I used to be a tradesman. Many homes in North America have aluminium wiring. Aluminium wiring was used because copper was in high demand and electricians could save money using aluminium wiring. However, the aluminium cannot handle as high a load. Homes were 40 to 60 Amp service. Today we are finding homes with 100 to 200 Amp service. The aluminium wiring cannot handle the higher amperage and fires occur.

New home owners buy a used home only to find out that it has aluminium wiring. At first, insurance companies were telling home owners you had to replace all the wiring or no insurance. Mortgage companies would then say, no insurance == no mortgage. So the home owner needed to come up with $50,000 to $70,000 to re-wire their new home.

This would be the equivalent of saving, I have too much technical debt so for the next year I'm going to add no new features and just refactor the code to remove technical debt. After a year the product will look the same from a customer's point of view but you will have spent hundreds of thousands of dollars. No customer is going to pay extra for a product that looks the same from their perspective. So you are out the money. Just like a new home owner saddled with $50,000 of extra debt on day 1 of their new home, this sort of debt can ruin a company. There is no way most companies can afford to take a year to remove technical debt.

The insurance companies soon realized they were asking the impossible and losing a LOT of potential customers. So they told the new home owners that whenever they renovated a room they needed to make removing the aluminium wiring part of the project.

Software companies need to do the same thing. Rather than give the customer nothing and spend 100% of your time reducing technical debt, whenever you adding a feature you should remove the technical debt as part of adding the feature. In other words, try and leave the software a little better than you found it.

Rather than removing all technical debt in a year it might take you 3 to 5 years but you are still adding new features and therefore getting revenue from customers.

How much time do you spend removing technical debt and how much time adding new features? This is a judgement call. You have to add enough new feature to make the customer feel they are getting their money's worth but leave enough time to reduce the technical debt.

Remember, it might have taken you 5 to 10 years to create the technical debt. It is not unreasonable to take a few years to remove it.

Tuesday, February 14, 2012

My environment for Selenium 2.0 (WebDriver)

One thing I believe to be very important for any sort of software development (and test automation with Selenium is software development) is having a good environment. So here is how I set up my environment to start with Selenium 2.0 and Java.

What you need:

  1. Java (go to
    1. I typically go with Java SE 1.6.0. If you don't need the latest, go with something more mature.
    2. Download the full SDK and docs.
    3. Install them in C:\jdk1.6.0_30 (assuming you downloaded build 30)
    4. I like to put it in the root of the C drive with no spaces in case other things don't like spaces.
    5. Unpack the docs into the same directory so they are easy to find. 
  2. Eclipse (go to
    1. I typically go with Eclipse IDE for Java Developers because it has everything for Selenium and it is small.
  3. SVN (go to
    1. You need some sort of source control.
    2. I usually go with Git but more people are still using Subversion.
    3. There are more tutorials on SVN and better support.
  4. Selenium (go to
    1. You'll need to download:
      1. (or whatever the latest version is)
      2. selenium-server-standalone-2.19.0.jar (or whatever the latest version is)
    2. Unpack the zip file to some location
      1. I'll typically unzip them into the workspace for the project
      2. I'll also put the jar file in the same location
      3. This way I can add the jar files to subversion
  5. Eclipse support for SVN (go to )
    1. Normally I would suggest using Marketplace from the Help menu but not for the latest SVN client.
    2. I like subclipse for an SVN client.
    3. Once on the subclipse site go to download and install.
    4. Copy the link for Eclipse Update Site URL for 1.8.x
    5. Go to Eclipse Help->Install New Software...
    6. Click Add...
      1. Name: subclipse 1.8.x
      2. Location: the update site URL you copied above.
    7. Select all
    8. Unselect Mylyn 3.x if you don't want it.
    9. Follow the install wizard
  6. Adding the JDK to Eclipse
    1. Go to Window->Preferences in Eclipse
    2. Expand Java
    3. Select Installed JREs
    4. Click Add...
    5. Select Standard VM
    6. Set JRE home to C:\jdk1.6.0_30\jre
    7. Set JRE name to JDK 1.6.0
    8. Everything else can be left as default.
  7. Creating a project
    1. Select File->New->Java Project
      1. Project name: something relating to the application you are testing
      2. I usually go with the defaults.
      3. Maybe change the JRE.
      4. Click Next >
      5. On the Libraries tab
      6. Add JARs... (add External JARs... if you didn't put them in the project workspace)
      7. Go to the location you unzip the Selenium files.
      8. Select the selenium-java-2.19.0.jar and selenium-java-2.19.0-src.jar files and add them.
      9. Add JARs... (add External JARs... if you didn't put them in the project workspace)
      10. Go to the location you unzip the Selenium files.
      11. Go to the libs folder and add all the jar files in there.
      12. Add JARs... (add External JARs... if you didn't put them in the project workspace)
      13. Go to the location you downloaded the selenium-server-standalone-2.19.0.jar file
      14. Add this file to the libraries.
    2. Finish the rest of the project creation using defaults.
    3. You can now start adding JUnit Test Cases to the project
      1. I select JUnit 4
      2. I select a package which follows Java conventions
      3. For example, com.mycompany.selenium.application where application is the application we are testing.
      4. Add a setUp() and tearDown()
      5. Finish
    4. Have a look at for what to put in the setUp(), tearDown() and test cases.

The most important thing about using Selenium with Java is that you really need to know Java, Source Control (Subversion), an IDE (Eclipse) and a test framework (JUnit) before you even start using Selenium. So search the web for basic Java tutorials, play with Eclipse, check out and search for tutorials on JUnit 4 then look at the docs on Selenium HQ.

If you need help with Selenium and WebDriver, post a message to I usually respond to people there on a regular basis.

Update to using WebDriver / Selenum 2.0 with Safari

Around this time last year I wrote an article about using Safari with WebDriver. You can find this article at

It has been a year and there is still no SafariDriver which extends WebDriver. Unfortunately, Safari 5.0 and 5.1 have hit the market and the Selenium 1.0 "*safari" driver is no longer working. As the Selenium look at dropping support for older web browsers they are also looking into supporting things like Safari 5.x.

Ideally, I would like to see:

    WebDriver driver = new SafariDriver();


    URL hub = new URL("http://localhost:4444/wd/hub");
    DesiredCapabilities dc = DesiredCapabilities.safari();
    WebDriver driver = new RemoteWebDriver(hub, dc);

Until this happens, we'll have to rely on using Selenium 1.0 to create an instance of Safari. But if "*safari" is no longer working and there are no plans to update it, what do we do? The solution appears to be use "*safariproxy" instead.

Looking at the code for SeleneseCommandExecutor() there are two constructors. The first takes as input a Selenium instance. The second takes as input a CommandProcessor. The second constructor actually calls the first constructor. So either method will work. So here it is:

    String baseURL = "";
    Selenium sel = new DefaultSelenium("localhost", 4444, "*safariproxy", baseURL);
    CommandExecutor executor = new SeleneseCommandExecutor(sel);
    DesiredCapabilities dc = new DesiredCapabilities();
    WebDriver browser = new RemoteWebDriver(executor, dc);


    String baseURL = "";
    CommandProcessor cp = new HttpCommandProcessor("localhost", 4444, "*safariproxy", baseURL);
    CommandExecutor executor = new SeleneseCommandExecutor(sel);
    DesiredCapabilities dc = new DesiredCapabilities();
    WebDriver browser = new RemoteWebDriver(executor, dc);

Additionally, if the browser cannot be found, i.e. it is not in the default locations, then you can specific the path to the Safari executable using:

    "*safariproxy /Application/"

This code is untested but hopefully better then the previous example.

Happy automating.


Thursday, February 9, 2012

Automated GUI Testing

From time to time I will see an automation tool with image compare. The idea is to get a screen capture, save it as a baseline then when the automation runs it gets a screen capture and compares it to the baseline.

Originally this always failed because there would be small differences. For example, the screen displayed the time and date, this would always be different and the compare would always fail.

The solution was to mask out parts of the screen capture we know will not be the same or to capture a portion of the screen and see if any part of the baseline matches the portion of screen capture.

This was a better solution but the video drivers from one machine to the next would create slightly different screen captures. So if you developed the baseline on one computer and ran it on another there was a high probability the compare would fail. Even comparing the screen capture to a baseline created on the same machine has a high probability of failing.

So what is the alternative?

We could go back to manual testing but that would be a step in the wrong direction.

We could automate the tests but hire someone to sit and watch them. If they see something wrong, do a screen capture and file a bug. Problems with this are, you still have one person tied up for the day watching the automation. Additionally, getting a screen shot and filing a bug while the automation is running could be a problem. Ideally, you'd want to be able to pause the automation, get the screen capture, file the bug and resume the automation.

However, there is a third alternative. When I create the test cases they will usually have points in the testing when you want to confirm the display looks okay. In some cases we just assume the user will be constantly looking. With automation, you want to explicitly decide when we need to confirm the display is okay. At those points in the automation, do a screen capture. The name of the file and the directory you store it in will have enough information for you to know which test case created the screen capture.

Once you have a directory of screen captures, you can quickly scan them and see that everything looks okay.

How does this save time? Most the time, GUI Testing is 80% getting the to place you want to confirm good display and 20% actually confirming the display. So if you are just reviewing the screen captures the time to get to the point to create the screen capture does not involve a tester and you save, on average, 80% of the time.

Classic examples of this being even better are confirming different configurations on the same test cases. For example, I tested a Java library for creating graphics. I would run the tests on 5 different operating systems with 3 different versions of Java.. So the test runner would essentially run:

for os in (5 different operating systems):
    for jvm in (3 different versions of Java):
        run all tests

This would result in 15 different test runs. I would then display all 15 screen captures for test case #1. If they all looked the same it would be a pass. If any one of the screen captures looked wrong I'd know which operating system, which JVM and which test case. So I could file a bug against that configuration.

You could do a similar thing for web testing. Run the same tests against the different operating systems and different web browsers. Then compare the screen captures to ensure all the configurations displayed correctly.

If you wanted to you could video capture the display as the tests ran. With a good video play back you could fast forward through the parts you know won't change.

Tuesday, January 31, 2012

CSS for WebDriver revisited

I always liked using XPath for locators in Selenium and WebDriver. It was more procedural than using Cascading Style Sheets (CSS) and felt more like programming. However, using CSS for locators is much faster than using XPath. I haven't done any tests to see just how much faster but it is significant enough that I can see the difference.

I run a test suite using XPath locators. I can see significant delays where there appears to be nothing happening on the screen. I refactor the code to use CSS and the delays disappear. I would estimate that 3 to 5 second delays (XPath) become too small to measure with the naked eye (CSS).

The only feature I have been unable to do in CSS are locators which use the XPath text() function. CSS seems to be all about the tag and its attributes but not the text contained within the tag.

The one thing it took me a little while to get used to when switching from XPath to CSS was compound statements. For example, I might have an XPath like:


So how do I convert this to CSS?


Shorter and pretty much the same attributes. The ~= isn't quite the same as the contains() function. If I have:

    class='prefspanel foo bar'

the ~= will find prefspanel because it is a 'word' separated by whitespace. The contains() function will find 'prefs', 'prefspan', 'prefspanel' and even 'panel' because these are all substrings of the original. In CSS3 they introduced *= which is just like contains().

I also notice that many attributes will change dynamically. However, the start of the attribute or the end of the attribute will be consistent. In XPath I have to use the contains() function to find the attributes which end with the unique identifier. With CSS I can use ^= to see if the attribute begins with a substring or $= to see if the attribute ends with a substring.

One other place that XPath seems to work better is when you are reversing up the DOM. For example, if you have the following:

    <div id='pair'>
            <input size='25' type='text' value=''/>

I would typically find the input using:


There is no easy way to do the same thing with CSS. With CSS I would find the input from the div tag. However, I usually find the p and input are paired together and the XPath always works but the input relative to the div changes occasionally.

Tuesday, January 17, 2012

Keeping data and code in one shell script file

In the past I have written scripts which required a lot of data. I could have the script read a second file with all the data in it but occasionally I'd lose the data file and still have the script. Or I'd have dozens of scripts and data files but no idea which data went with which script.

Solution: put the script and the data into one file. The script could then read the data from itself. Here is an example script:


for entry in $(sed -n -e '/^exit/,$ p' $0 | grep -v exit | sed -e '/^$/d' | sed -e '/^#.*/d' | sed -e 's/ /_/g'); do
        entry=$(echo $entry | sed -e 's/_/ /g')
        firstname=$(echo $entry | awk -F, '{print $1}')
        lastname=$(echo $entry | awk -F, '{print $2}')
        number=$(echo $entry | awk -F, '{print $3}')
        echo "$firstname $lastname's phone number is $number"

Darrell,Grainger,(416) 555-1212
John,Doe,(323) 555-1234
Jessica,Alba,(909) 555-9999

In this example, the data is a list of comma separated fields. Let's examine the list in the for statement. The $0 is the file currently executing, i.e. the file with this script and the data.

The sed command prints everything from the line which starts with exit to the end of file. The grep command gets rid of the line which starts with exit. The next sed command discards all blank lines. The third sed command discards all lines which start with #. This allows us to start a line with # if we want to add comments or comment out a line of data.

The final sed command on the for statement replaces all spaces with underscores. The reason for this is because if I have a line with a space, the for statement will process it as two separate records. I don't want that. I want to read one line as one record.

Inside the body of the for loop, the first line converts all the underscores back to spaces. If you want to have underscore in your data, this will not work. The solution is to pick a character which is not part of your data set. You can pick anything so long as the character you pick is the same in the for statement and the first line of the for loop body. The g in the sed statement is important in case there is more than one space.

The next three lines show how to break the line apart by commas. If you need to use commas in your data then pick a different character to separate the fields. The -F switch in the awk statement sets the field separator. So if you use exclamation mark as a field separator, you need to change the awk statement to -F!.

The echo statement is just an example of using the data.

Getting string from the clipboard in Java

Recently needed Java code to get the contents of the clipboard. For your reading pleasure:

public static String getStringFromClipboard() {
String s = null;
Clipboard c = Toolkit.getDefaultToolkit().getSystemClipboard();
Transferable contents = c.getContents(null);
if(contents != null && contents.isDataFlavorSupported(DataFlavor.stringFlavor)) {
try {
Object o = contents.getTransferData(DataFlavor.stringFlavor);
if (o instanceof String)
s = (String)o;
} catch(UnsupportedFlavorException ufe) {
} catch (IOException ioe) {
return s;

This method assumes you are getting a string from the clipboard.

Tuesday, January 10, 2012


If you have Ajax calls or javascript which updates the page after the page has finished loading, you may have to wait for that code to finish before you can interact with the page. Essentially, waiting for page load is not sufficient.

The best way to deal with this is watch for something in the DOM which signals the script has finished executing. However, this is not always the easiest solution. If you are not the developer or familiar with javascript, it could take you days to figure out what event or change you need to wait for.

A better solution is to wait for whatever you want to interact with to be done. If you are waiting for a SELECT tag to be loaded with OPTION tags via Ajax there is no need to wait for the entire page to finish loading. You can just wait for the OPTION you wait to appear on the SELECT.

The best wait to do this is:

  1. wait for a short period of time (250ms)
  2. check for element
  3. if element does not exist go to 1
There is one problem with this algorithm. If the element never appears, this becomes an infinite loop. So we add a time out:
  1. maxTime = 5 seconds
  2. timeSlice = 250 ms
  3. elapsedTime = 0
  4. wait for timeSlice
  5. elapsedTime += timeSlice
  6. check for element
  7. if element does not exist AND elapsedTime < maxTime go to 4
Finally, this method must return true or false, where false means we did not find the element:
  1. maxTime = 5 seconds
  2. timeSlice = 250 ms
  3. elapsedTime = 0
  4. wait for timeSlice
  5. elapsedTime += timeSlice
  6. check for element
  7. if element does not exist AND elapsedTime < maxTime go to 4
  8. if elapsedTime < maxTime return true // found it
  9. else return false // timed out
In Selenium and Java this might look like:

public WebElement waitForElement(By by) {
    WebElement result = null;
    long maxTime = 5 * 1000; // time in milliseconds
    long timeSlice = 250;
    long elapsedTime = 0;

    do {
            elapsedTime += timeSlice;
            result = driver.findElement(by);
        } catch(Exception e) {
    } while(result == null && elapsedTime < maxTime);

    return result;