Google Analytics


To search for specific articles you can use advanced Google features. Go to and enter "" before your search terms, e.g. CSS selectors

will search for "CSS selectors" but only on my site.

Wednesday, December 28, 2011

The 9 Oddest Job Interview Questions Asked at Tech Companies in 2011

I recent read an article on about the 9 oddest job interview questions asked at tech companies in 2011. Here they are:

  1. How many people are using Facebook in San Francisco at 2:30 p.m. on a Friday?
  2. If Germans were the tallest people in the world, how would you prove it?
  3. Given 20 ‘destructible’ light bulbs (which break at a certain height), and a building with 100 floors, how do you determine the height that the light bulbs break?
  4. How would you cure world hunger?
  5. You’re in a row boat, which is in a large tank filled with water. You have an anchor on board, which you throw overboard (the chain is long enough so the anchor rests completely on the bottom of the tank). Does the water level in the tank rise or fall?
  6. Please spell ‘diverticulitis’.
  7. You have a bouquet of flowers. All but two are roses, all but two are daisies, and all but two are tulips. How many flowers do you have?
  8. How do you feel about those jokers at Congress?
  9. If you were a Microsoft Office program, which one would you be?

I wondered how would I have answered these. 

The first one was immediately obvious to me. The answer is "all of them." Okay a little facetious but that is how I'd answer it at first. Sort of an ice-breaker. If they had a problem with that answer and moved on I'd have a bit of a problem working there. If that got a chuckle I'd see if they were looking for something else. I think I'd ask for further clarification. What is the motivation for this question? What frame of mind should someone be in to properly answer this question? Is there a specific answer you are looking for?

Number 2 seems to be a little vague for me. As a software tester, ambiguity doesn't work for me. What does "Germans were the tallest people in the world" mean? Is the combined height of all Germans greater than the combined height of all other people nations? By Germans do you mean people born in German? People who hold German citizenship? What about immigrants to Germany? What about people who hold dual citizenship? What about former Germans who immigrated to another country and no longer consider themselves German? In short, the question needs further clarification to be answered.

Number 3 is also a little vague. How accurate does the answer for each bulb have to be? Can we open a window on each floor in the building? Do we know the exact height of each window? Are we to assume all the bulbs break at the same height and there is one answer for all 20 bulbs? If this is the case and we really want to know from which floor the bulbs will break, drop one from the first floor. If it does not break, go to the second floor and repeat. When if no bulb has broken and you make it to the 20th floor, go down and collect all 20 unbroken bulbs, start again from the 21st floor. Once you find the floor the bulb breaks on, you will have used only 1 bulb. If I make it to the 100th floor and no bulb has broken. I'd have to devise some way to go above 100 floors and continue the test.

On the other hand, if the bulbs might break at different heights, I'd have to drop all of them from the first floor. then all surviving bulbs from the second floor and so on.

For number 4, if I had an answer to this I won't be sitting in the job interview at a tech company. I would be implementing my answer.

Number 5 seems like a question about displacement. A real physics question. If I am in a boat with an anchor, the boat myself and the anchor have weight. The amount of water displaced by the boat is less than the weight of myself, the boat and the anchor. If I throw the anchor overboard, the boat will rise in the water (it weighs less and displaces just as much water, therefore it will be more buoyant). However as the boat rises it will displace less water. Thus the water level will fall. The anchor dropped in the water has volume and will displace some water. This the water level will rise. The unanswered question is whether the displacement of the anchor is greater than, less than or equal to the reduced displacement of the boat. I believe the answer is equal to. So the water level in the tank will neither rise or fall.

Kind of hard to mess up number 6. This is a question which only works verbally. Since I can see the spelling it is pointless. Not sure what they are trying to test here. They like people who are good at spelling bees?

Number 7 would have to be 3 flowers. If I have 1 rose then all but two are roses (3 - 2 = 1 rose), 1 daisy then all but two are daisies (3 - 2 = 1 daisy) and 1 tulip then all but two are tulips (3 - 2 = 1 tulip).

Number 8 seems to assume I care about American politics and have an option on Congress. The question is a leading question. Personally, I'd answer it with, "I don't believe in mixing politics, religion and work. Since this is a job interview no politics or religion please."

And for number 9, I'd have to take a moment and think about it. I'd need to know myself then I'd have to relay how a Microsoft Office program could be analogous to the traits I like about myself. Outlook is good for communication, Word is probably the most popular program and good at the most jobs. Excel is great for finance, budgetting, invoicing. Powerpoint is good to convey ideas and used in presentations which inform and teach. Do we include Messenger? I think the key to this one is knowing yourself. If you can describe any Microsoft Office application as exhibiting the same traits you're probably giving a good answer. On the other hand you might want to say you could never limit yourself to one Microsoft Office program. Like the full Office solution, you do it all.

Personally, I try to avoid interview questions like these. Often the interview just thinks the answer is creative or smart and if you can get it you must be creative or smart. It does not take into account cultural differences, training background, already heard the question, etc. In other cases the interview thinks by making the question cryptic, it will be harder for the interviewee to know what the interviewer is looking for and they'll get an honest answer. Realistically, that doesn't work. If you look at research surveys, they will ask a 100 questions. Of those, 20 questions are probably related. The candidate might think they know what I'm look for on 2 or 3 of those question but the majority of the questions will give me what the honest answer to the questions are.

In the end, these are all games and statistically, employers should find a good candidate. They might not find the best candidate but they'll never know because the person they hire will be okay and possibly even great.


Selecting WebDriver locators

When you are using WebDriver for test automation a test case really boils down to:
  1. Find an element
  2. Perform an action
  3. Confirm the expected result
Finding an element and confirming the expected result requires locators. In WebDriver a locator is a way of uniquely identifying an element on the web page, i.e. in the Document Object Model (DOM). The By class is used in WebDriver to locator elements. You have:
  • By.className(className);
  • By.linkText(linkText);
  • By.partialLinkText(linkText);
  • By.tagName(name);
  • By.cssSelector(selector);
  • By.xpath(xpathExpression);
The most powerful locators are CSS and XPath. All the other selectors can actually be done using CSS or XPath. For example:

By.className("foo"); By.cssSelector(".foo"); By.xpath("//*[@class='foo']");"bar"); By.cssSelector("#bar"); By.xpath("//*[@id='bar']");
By.linkText("Click Me"); N/A By.xpath("//a[text()='Click Me']");"fee"); By.cssSelector("[name='fee']"); By.xpath("//*[@name='fee']");
By.partialLinkText("some"); N/A By.xpath("//a[contains(text(),'some')]");
By.tagname("div"); By.cssSelector("div"); By.xpath("//div");

In addition to the simple locators, CSS and XPath can selector more complex elements. Rather than saying I want all the DIV tags, I can say I want all the DIV tags whose parent is a SPAN. In CSS this would be "span>div" and in XPath this would be "//span/div".

The combinations are endless. For each tag in the XPath or CSS I can add multiple identifiers. So I could have locators for things as complex as "all DIV tags, with name containing 'foo' and class equal 'bar' whose parent is a TD but only in the TABLE with id='summary' and the class equal 'humho'"

The first thing to understand is that CSS will be noticeably faster than XPath when testing against Internet Explorer. Your tests could run as much as 10 times slower (something which runs on a day on Firefox could take a week on Internet Explorer) when using XPath.

So the first thing to remember is CSS is better than XPath. However, some things are easier to express as XPath. So occasionally you might need to use XPath.

If you have a selector like "html>body>table>tbody>tr[2]>td[3]>a" it might work but if the developer finds it does not format nicely on Chrome, they need to throw in a DIV. So the selector changes to "html>body>div>table>tbody>tr[2]>td[3]>a". Later a new version of Internet Explorer comes out and the developer finds they need to add a SPAN to make it look proper on the new Internet Explorer and still look okay on older versions. So the locator becomes "html>body>div>table>tbody>tr[2]>td[3]>span>a".

If we spend all our time maintaining the locators, it could end up that the cost of maintaining the automation is greater than running the tests manually. In which case the automation is deemed a failure.

So you have to start looking for patterns. Is there something I could use on the first version of the application which also works on the second and third version? Can I predict a locator which will work on the fourth and subsequent versions?

Often the underlying technology changes but it continues to look the same to the user. So is there something visual I can use which will not change? In this example, the text for the anchor probably never changed. So I'd use By.linkText("whatever"); locator or By.xpath("//a[text()='whatever']);.

What if I find myself changing locators because sometimes the text is "  whatever", sometimes it is "whatever" and other times it is "whatever  "? Then I'm going to use By.partialLinkText("whatever"); or By.xpath("//a[contains(text(), 'whatever')]");.

The danger is that there might be two links which contain the substring "whatever". I need to make sure I am selecting the correct link. So the locator might need to be more complex. It might need to be partial text and parent information. For example, if the text appears in two different tables and I want the text from table 2. Table 2 has the id='foo2' then the locator might be:

  • "table.foo2 a"
  • "//table[@id='foo2']/tbody/tr/td/a[contains(text(),'whatever')]"
The first locator assume there is only 1 anchor in the second table. This might not be true in all cases. The second locator finds the second table but it searches all rows (TR) and all columns (TD) for an anchor (A) whose text contains the substring "whatever". This can be extremely slow, especially for large tables.

Finding the balance between locators which are too long and too short can be an art. The trick is to pick something. If it requires maintenance, pick a new locator which works on the previous versions and the new version. As you continue to maintain the locators you will see a pattern. You will start to see that chunks of HTML code never change Outside these chunks change (so keep the locator short enough to stay inside the chunk that does not change). Within the chunk there might be multiple matches if you make the locator too short. So figure out, within that chunk, what makes the element you want different from all the other matches.

So how do I look at the DOM? I need to see what the DOM looks like to be able to see all the possible locators which would work.

If you are using Internet Explorer 8 or higher you can press F12 to open the developer tools. If you are using Firefox you need to install Firebug then F12 will open Firebug. If you are using Chrome then CTRL-SHIFT-I will open the developer tools.

Beyond that, the only tool I use is my brain and the W3 standards.

Reading the W3 standards (or any standards documentation, ISO, ANSI, IEEE, etc.) can be difficult at first. Especially if you have been learning from books like "Web Design in 21 Days" or "Software Testing for Dummies." However, the more you read and understand standards documentation, the easier it gets to read other standards documents. If generating XPath was easy enough for a piece of software then why would they pay you to do the work? There are probably a dozen XPath locators for any given element on a page. Some will work once and need to be revised on the next release of the application. Some will work within the design pattern of the application and might never need updating. There is no way for a piece of software to spot the design pattern and know which locator will work best. This is what they pay you to do.

Excessively long XPath is brittle and will need a great deal of revising from release to release. Extremely short XPath will sometimes find the wrong element between releases. This leads to a test which fails unpredictably and can be difficult to debug. Not something you want in an automation suite. Finding the right balance is your job. The first time you select a locator it might need revising for the next release. You need to look at why you selected the first locator when selecting the revised locator. The second locator should work for the first release and the second release. When the locator fails, you need to select a new locator which would have worked on the first release and all subsequent releases, including the next release. After a while you should start to see the pattern. The pattern is usually derived from some design pattern the application is being developed with. Learn about Design Patterns, it will be extremely helpful in generating good test automation. If the developers change the tools, libraries, design patterns, etc. you should expect the locators to fail. At this point, selecting a locator which works with the next release but does not work with the previous release makes sense. Major change in development usually implies major change in test automation. It would be difficult for a tool to realize when it needs to abandon old locators.

Essentially, automation is all about finding elements (locators), performing actions on them, confirming the expected results (usually involves more locators). Two thirds of the work is about the locators. Learning XPath, CSS and DOM will make your job that much easier.

When possible, use CSS selectors as they are faster. Some things are easier to locate using XPath and XQuery (XPath functions). It is better to have a test run slow and be easy to maintain. So if CSS selectors are complex and unintuitive you might want to use XPath functions instead.

This is essentially how I decide on locators.


Friday, December 23, 2011

Your automation must not dictate your test plan

One of the things I see people getting into automation doing is selecting what to automate or how to test an application based on what the automation tool will let them do. This is a dangerous approach to automation.

The test cases I create for an application are based on what is important to the customer. I want to make sure that the customer experience is a good experience. If I create a set of test cases or use stories which reflect real customer usage of the application then I am most likely to find issues which will affect the customer.

I remember working on a project for 4 years. After 4 years of testing and improving the application we were at a point that over 90% of the features important to the customer were tested and bug free. Of the remaining 10% we knew of most the defects and had a work-around. We were at a point where we were testing extreme edge cases. At this point I found a major defect. The developer looked at the fix and realized the defect had been there since the beginning. In 4 years not one single customer reported this defect. The defect was easy to automate but really added zero value to the application. This is NOT a test case you want to start with when automating.

On another project someone found a defect in a desktop application. The steps to reproduce were:

  1. Run an application not involved in the test case at all (Outlook for example)
  2. Hand edit a project file using notepad or something other than the application it was intended for
  3. Make a very specific change to the file
  4. Run the application under test
  5. Open the corrupted file
  6. Select a feature which relies on the corrupted file
  7. A modal dialog appears telling you the file is corrupt, do you wish to repair it.
  8. Ignore the dialog
  9. Use CTRL-TAB to switch to a different application not involved in the test case at all
  10. Click on the application under test in a very specific location on the MDI client window

At this point the modal dialog is hidden behind the window with focus and the window with focus appears to be frozen. It is really waiting for you to respond to the modal dialog. This was a design flaw in the operating system. It was virtually impossible to fix in the application under test without a major re-design. It was highly unlikely a customer would be able to reproduce this defect. When the tester showed me the 'locked' state it only took me a few minutes to figure out what was going on. Our customer was typically a software developer with 7+ years of experience.

This was a useless test case. In both this and the previous test case it was a bad test case regardless of creating it manually or automating it. My point is, the test case came first. Even before we attempted to automate it, we decided whether or not it was a good test case.

Test automation NEVER proceeds test case creation or test planning.

Once you know what you want to test and the steps to testing it, you automate those steps.

This is the second mistake I see people getting into automation making. They decide WHAT they want to test but when they start automating it, the steps they generate with the automation are not the same steps as they would do manually. In this case you have taken the time to create a good set of test cases and thrown them out the door when you start automating. This is not a good idea.

Rather than changing the test case to something which is easy to automate, you need to figure out how to automate the test steps. This is what separates good automation from bad automation.

Many times I have seen a test case automated. It gets run and passes. We ship the application to the customer. He uses the same feature and it fails horribly. Why? Because the steps he used to get to the defect where not the steps we automated. We had a good test case. If an experienced tester had executed the test case manually, they would have found the defect. The person automating the test case just found it easier to automate something close to but not equivalent to the test case.

I am currently using Selenium 2.x with WebDriver. One of the changes from Selenium 1.x to 2.x is that you cannot interact with invisible elements. For example, a common trick on a website is to have an Accept checkbox on a download page. If you accept the terms the Download button becomes visible. In Selenium 1.x I could click on the Download button without clicking the Accept checkbox. The REAL test case was:

  1. Go to download page
  2. Click Accept checkbox
  3. Click Download button
  4. Confirm download

What someone would automate with Selenium 1.x was:

  1. Go to download page
  2. Click Download button
  3. Confirm download

The idea is that it saves a step. One less step means quicker to code, runs quicker, one less thing to maintain. You do this a thousand times and it adds up. HOWEVER, the customer would never click on the invisible Download button.

In Selenium 2.x you would get a failure with the shortened test case. People occasionally complain that Selenium 2.x has removed an important feature. They want to know how they can click on the invisible Download button. They come up with these tricky Javascript snippets which will allow Selenium 2.x to 'see' and click the Download button. Is a customer going to create a Javascript snippet, inject it into the page, run it just so they can click the Download button? Is a manually tester going to do this? If the answer is no, then why is our automation doing this? If the manual test case calls for clicking the Accept checkbox then our automation should as well. If clicking the Accept checkbox does not enable the Download button, file a bug and move on to something else.

Finally, automation is all about finding elements on a page, interacting with them (clicking, right clicking, typing, etc.) and checking what happened. As a manual tester you are going to use your eyes and hands to do everything. The test case might have a step like, "Locate the folder called 'My Documents' and double click it." This is really two steps. The automation should locate the folder called 'My Documents', this is step 1. It should double click the element, this is step 2. As a manual tester I find the element by looking for the text 'My Documents'. If this is a web page and the HTML is:

<div id='lsk499s'><a href="...">My Documents</a></div>

I am not going to use the div id to find the element. I'm going to use the text. As a manual tester I used the text to find the element. There is no reason to do anything differently with the automation tool.

What if the web page is made to look like Window Explorer. On the left is a tree view with one of the nodes being 'My Documents' and on the right is a thumbnail view with a 'My Documents' folder. In the manual test case, does it specify which 'My Documents' to double click? If yes, follow the test case. If no, how would you as a tester decide? Do you always go with the thumbnail view? Do you pick randomly? Do you change ever test run? If you are a good manual tester, we want that experience captured by the automation. If I would normally change every test run but I never test the same feature twice in one day, it might be sufficient to say, if the day of the year is even, double click thumbnail else double click tree view. If the automation gets run daily, it will pick a different way each day.

The important thing in all this is that I am a good tester. I write good test plans and test cases. When I decide to automation my good test cases, I should not compromise the quality of my testing just because I am testing it with an automation tool rather than manually.

Happy testing!


Wednesday, December 14, 2011

Using the right tool for the job

From time to time I see people asking questions about how to use an automation tool to do something the tool was never meant to do. For example, how do I use Selenium to get the web page for a site without loading the javascript or CSS?

Selenium is designed to simulate a user browsing a website. When I open a web page with a browser, the website sends me javascript and CSS files. The browser just naturally processes those. If I don't want that, I shouldn't use a browser. If I am not using a browser, why would I use Selenium to send the HTTP request?

That is all the get() method in Selenium does. It opens a connection to the website and sends an HTTP request using the web browser. The website sends back an HTTP response and the browser processes it.

If all I want to do is send the request and get the response back, unprocessed. I don't need a web browser.

So how can I send an HTTP request and get the HTTP response back? There are a number of tools to do this.


The way Fiddler works is you add a proxy to your web browser (actually Fiddler does it automatically). Now when you use the web browser, if Fiddler is running, the web browser sends the HTTP request to Fiddler and Fiddler records the request and passes it on to the intended website. The website sends the response back to Fiddler and Fiddler passes it back to the web browser.

You can save the request/response pair and play them back. Before you play the request back you can get it. You can edit the website address, you can edit the context root of the URL and if there is POST data you can get the data as well.


Charles is much like Fiddler2 but there are two main differences. The first is that Charles is not free. You can get an evaluation copy of Charles but ultimately, you need to pay for it. So why would you use Charles? With purchase comes support. If there are things not working (SSL decrypting for example) you can get help with that. Additionally, Fiddler is only available on Windows. Charles works on Mac OS X and Linux as well.


Fiddler and Charles are GUI applications with menus and dialogs. They are intended for interacting with humans. If you are more of a script writer or want something you can add to an automated test, you want something you can run from the command line. That would be curl. Because it is lightweight and command line driven, I can run curl commands over and over again. I can even use it crude for load testing.

The most common place to find curl is checking the contents of a web page or that a website is up and running. There are many command line options (-d to pass POST data, -k to ignore certificate errors, etc.) but the general use is curl -o output.txt This will send the HTTP request for /some/context/root to the website A more real example would be:

curl -o output.txt

I could then use another command line tool to parse the output.txt file. Or I could use piping to pipe the output to another program.

Another nice command line tool is wget. The wget command, like curl, will let you send an HTTP request. The nice thing about wget is that you can use it to crawl an entire website. One of my favourite wget commands is:

wget -t 1 -nc -S --ignore-case -x -r -l 999 -k -p

The -t sets the number of tries. I always figure if they don't send it to me on the first try they probably won't send it to me ever. The -nc is for 'no clobber'. If there are two files sent with the same name, it will write the first file using the full name and the second file with a .1 on the end. You might wonder, how could it have the same file twice in the same directory? The answer is UNIX versus Windows. On a UNIX system there might be index.html and INDEX.html. To UNIX these are different files but downloading it to Windows I need to treat these as the same file. The -S prints the server reponse header to stderr. It doesn't get saved to the files but lets me see that things are still going and something is being sent back. The --ignore-case option is because Windows ignores case so we should as well. The -x option forces the creation of directories. This will create a directory structure similar to the original website. This is important because two different directories on the server might have the same file name and we want to preserve that. The -r option is for recursive. Keep going down into subdirectories. The -l option is for the number of levels to recurse. If you don't specify it, the default is 5. The -k option is for converting links. If there are links in the pages being downloaded, they get converted. Relative links like src="../../index.html" will be fine. But if they hard coded something like src="" we want to convert this to a file:// rather than go back to the original website. Finally, the -p option says to get entire pages. If the HTML page we retrieve needs other things like CSS files, javascript, images, etc. then the -p option will retrieve them as well.

These are just some of the tools I use when Selenium is not the right tool for the job.

Wednesday, November 30, 2011

Interface versus implementation

I've seen a few people posting Java code for Selenium automation where they have things like:

FirefoxDriver driver = new FirefoxDriver();

If you look at the code for Selenium you will see they have an interface called WebDriver. FirefoxDriver, InternetExplorerDriver, HtmlUnitDriver, RemoteWebDriver all implement WebDriver.

If I'm writing a test suite, I ultimately want to run the suite on Firefox, Internet Explorer, Chrome, etc. but if I implement my framework using FirefoxDriver, I have to edit the code to make it work on InternetExplorerDriver. The proper solution is to write your code using:

WebDriver driver = new FirefoxDriver();

You might argue that I need to change my code still. Okay, so you would implement it using:

WebDriver driver;
    if(browserType.equals("firefox")) {
        driver = new FirefoxDriver();
    } else if(browserType.equals("ie")) {
        driver = new InternetExplorerDriver();
    } else {
        driver = new RemoteWebDriver();

Now all subsequent code will use driver as if it was merely a WebDriver. However, if you look at RemoteWebDriver you will see that it also implements JavascriptExecutor. So if I wanted to use:

driver.executeScript(script, args);

I'll get an error because WebDriver does not support executeScript. My driver is REALLY a RemoteWebDriver and does support executeScript. So how do I access that functionality? Quite simple:

((RemoteWebDriver)driver).executeScript(script, args);

Even better would be:

if(driver instanceof RemoteWebDriver) {
        ((RemoteWebDriver)driver).executeScript(script, args);
    } else if(driver instanceof FirefoxDriver) {
        ((FirefoxDriver)driver).executeScript(script, args);
    } else if(driver instanceof InternetExplorerDriver) {
        ((InternetExplorerDriver)driver).executeScript(script, args);

Now if you run the test suite against a browser that does not support a particular feature, it will skip that part of the code automatically.

And that is proper use if polymorphism.

Monday, November 28, 2011

Random crashes

New application under test is a web based application. We have Windows and Mac OS X customers. All our Mac OS X customers are using Safari 5.0 or 5.1. Most our Windows customers are using Internet Explorer. We use the meta tag to force Internet Explorer 8 compatibility if you are using Internet Explorer 9. A handful of our Windows customers are using Firefox (whatever is latest).

So I need to create an automated test suite for:
  • Safari 5.0 on Mac OS X 10.6 (Snow Leopard) 32-bit OS
  • Safari 5.1 on Mac OS X 10.7 (Lion) 64-bit OS
  • Internet Explorer 8 on Windows XP 32-bit OS
  • Internet Explorer 9 on Windows 7 64-bit OS
  • Firefox 8 on Windows 7 64-bit OS
With this combination I feel I have adequately covered the combinations our customers will care about. If time allowed and everything was written with one automation technology, there is no reason I couldn't run it on other combinations. For example Internet Explorer 9 on Windows XP 32-bit OS or Firefox 7 on Windows XP 64-bit OS.

The assumption is that the above 5 combinations will find most the defects.

My problem is that Safari on Mac OS X is as equally important as Internet Explorer on Windows.

After poking around the web I found Selenium 2.0 (WebDriver) supports all the platforms, sort of. There is no Safari driver for WebDriver but I have a snippet here for using Selenium 1.0 to create a Safari instance, from that I can create a SeleneseCommandExecutor and with the Executor I can create a RemoteWebDriver. A bit hacky but it works.

So I started creating my test automation. I got up to 39 test cases when I decided to run them as a single suite. Ran them on Firefox 8 and everything worked perfectly. Ran them on Internet Explorer. Around 25 test cases in and Internet Explorer pops up a dialog telling me something went wrong with Internet Explorer. I can debug it (because I have Visual Studio installed) or just close it. Regardless, my automation is dead. If my automation dies after the 25th test case and I build it up to 45,000 that would be totally useless. I wouldn't be completely happy with it but if it at least recovered and continued on to the next test case it would be better. But it does not.

So the hunt goes on. I have found a commercial product which claims to work for all the above platforms and more. I'll split up time between evaluating the commercial product and debugging Selenium. Hopefully by  the end of the week I'll have a decision one way or the other.

Stay tuned...

Monday, November 7, 2011

Using Oracle SQL Developer with SQL Server

If you are looking for a SQL Client for both Oracle DB and Microsoft SQL Server you can use the Oracle SQL Developer with the jTDS plugin.

Get Oracle SQL Developer from the Oracle web site. Just use your favourite search engine to search for 'Oracle SQL Developer'. These instructions have been verified to work with SQL Developer 3.0.

Once you have installed Oracle SQL Developer, run it and go to the Help menu. There you should find the Check For Updates... option. There will be a dialog with Third Party SQL Developer Extensions as an option. Make sure this is selected.

On the next page there will be an option to select the jTDS extension. Select this for SQL Server and Sybase support.

Once you accept the license agreement and restart SQL Developer, you will now have a tab for SQL Server when creating a new connection.

Monday, September 5, 2011

maintenance versus performance

I was recently sent an example xpath for Selenium automation. The xpath, for a DIV, was:

"//*[@class="val1 val2 val3 val4"

My experience has been, especially with Internet Explorer, that xpath is very slow compared to say CSS. However, I find xpath easier to read (thus easier to maintain, especially in a team enviroinment) and certain things are just not possible in CSS.

So there is always a balance of maintenance versus performance. The golden rule for optimization is to never optimize right away. Right the code for maintainability and if you need to improve performance, then and only then optimize it.

The first thing I notice about the xpath above is the = operator. If the developer adds another value to the class attribute or the order of the values change, this will cause the automation to require maintenance. My experience has been that the values of multiple value attributes do change over time. So from a maintenance perspective I would write this as:

"//*[contains(@class,'val1') and contains(@class,'val2') and contains(@class,'val3') and contains(@class,'val4')]"

This way, if the order of the attributes change or the developer adds another value the xpath will still find the element.

The other things I'd change about the xpath is the use of //*. This will search every element for a match. Rather inefficient. It does have the nice feature that if a developer changes the element from a DIV to something else, the xpath will require no maintenance. However, and this is just from years of experience, I find it rare that a developer changes a DIV to something else. If they do, there is a huge amount of refactoring and using //* will not save you from any maintenance.

Since the likelihood of maintenance is low but it is such an obvious performance hit, I would actually use //div for the xpath.

Thursday, September 1, 2011

How to do focused testing

After years of software testing I have found one constant... there is never enough time to test everything.

So how do I pick what I'm going to test? Here are a few tricks I use to focus what I'm going to test.

First, test what changed. I like to get a snapshot of the last release (source code) and a snapshot of the current release candidate then compare them. Often the source code is readable enough for any tester to understand what has changed. If you are not a programmer, not familiar with the particular language or the code requires a little indepth knowledge only the programmers have then you might want to talk with some of the developers about what you are looking at.

For example, if I am testing a shopping cart for an online web store and I see changes to a class called TaxRules then I'm going to look at the requirements for tax rules and see if there is anything new. Even if there aren't requirements for tax rules, I can talk to the person who wanted this feature (or improvement) and I can look at the code. Comments will be for humans and I'm human (no really, I am). Good code should also be self documenting. In other words, the variables and method names should almost read like English.

The next trick is to get someone to add a code coverage tool to the web site (or whatever it is you are testing). As you test the application (automated or manually) the coverage tool will record which parts of the code you executed. Look to see of the code which has changed was executed. Set the coverage tool so it records actions within the methods and not just method level data. This way you can see which branch conditions were called. If there is a loop which never gets entered or a portion of an if/elsif/else which does not executed, you know you need another test case. You could even ask the developer to relate the statements missed to user actions. Essentially, get them to tell you what test case you missed. :)

Finally, look at the inputs to the methods. Can you see a way to affect the inputs? Does the method handle situations where the data is bad? Developers think in terms of "make the program do what it is supposed to do." They always make it do the right thing when given the right data. As a tester you need to think in terms of "make sure the program doesn't do what it is not supposed to do." In other words, if you input bad data, it should report a nice, helpful message to the user or just restrict the user's ability to input the bad data.

As you do code coverage, you can keep notes with your test cases. Just a comment about how test case XXX should be run with class YYY is altered.

Monday, July 25, 2011

Tools to determine how to use Selenium

Being able to view the source code to a page is helpful when trying to use Selenium. Many of the methods in Selenium require you to locate the element via something related to the source code. For example, if I wanted to click on an element I need to look at the source code. From the user's point of view they might see:

Click Me

But the underlying source code could be:

                <tr><td><strong onclick="javascript:clickme();">Click Me</strong></td></tr>

I might assume there is an anchor and I want to find the <a> element with the text 'Click Me'.  By looking at the source code I can see that I actually want to find a <strong> element and click on it.

Just viewing the source code isn't always sufficient however. AJAX and Web 2.0 means that the source code gets changed in memory after it is loaded. So what is actually available in the DOM (and Selenium) might be different than what you see when viewing the source code.

So how do you see what Selenium sees? A simple solution is to write your Selenium code to get to a point in the program then print out the source code using Selenium. Selenium 1.0 and 2.0 both have methods to return the HTML source code as a string, which you can print.

The problem with this is that you might have to write a program to print the source. Then with the source, find the code you want to click. Then run the program so it clicks the element then prints the new source. Then run the program to click the first element, click the second element and print the source. For n steps this will take O(n^2) run time. Not very efficient.

So how can you view the dynamically changing source? The answer is in browser tools.

If you are using IE7, search for "microsoft dev toolbar". When you install it, it gives you development tools which can view the page in real time. If you are using IE8, the tool is built in. Just press F12 and it will display the development tools.

If you are using Firefox, install the Firebug extension.

If you are using Google Chrome, right click an element and select Inspect to bring up the development tools.

Thursday, May 26, 2011

Defensive programming

Had an interesting conversation with a programmer today. He created a program for updating pricing on books. It allowed you to update the price of one book at a time or to load a list of volume ids (sort of like ISBNs) and batch update the prices.

Many of our partners use Excel to manage their list of books. So the programmer was told the batch screen had to load Excel files.

My first thought when testing the application was, "How can users mess up an Excel file?"

The application required a header row with specific text. What if:
  • there were extra columns
  • a column was missing
  • there was whitespace before or after the text
  • there was no header row
  • the text was all lowercase
  • the text was all uppercase
  • the text was camel cased (ThisIsCamelCase)
Tried all these and each one crashed the program. The programmer was using a third party library to access the Excel spreadsheet. I had used similar libraries in the past and knew these to be problems with those libraries. Essentially, you need to guard against bad data as these libraries don't.

When LAMP (Linux/Apache/MySQL/Php|Perl|Python) programmers access the database, they never assume the data coming in is safe. They check all the data before using it to access the database. This is because hackers will do things like input the following for First Name: "Hacker; DROP TABLE ..." or some other destructive text.

The lesson learned from this is to be a defensive programmer. LAMP programmers know this. I believe this to be true for people using third party libraries but also when using the code from someone else on your own team. If you assume the library does not handle edge cases and ensure the data has been checked before getting passed to the other programmer's code.

If you assume all code (third party libraries, your old code, code from other programmers on your team, etc.) does not handle code properly and check the inputs before passing them along to that code, your applications will be much more solid.

To the testers reading this, assume all programmers will assume someone else will take care of guarding against bad inputs. So think of all the bad inputs you can and send them in. You'll usually find they make it quite far before someone handles them. Often it will be the operating system that handles this with a crypt error dialog and a crash.

Sunday, May 22, 2011

Asking for automation help

I've noticed a trend on help forums and groups. When people ask for help with their automation that often post what they tried and that it did not work.

They don't give any information about their environment, versions of the various software involved or the code they are trying to automate.

Asking for help with your automation is similar to filing a defect report.

  • Specify a summary of the problem
  • Give a description of the environment (OS, browser, etc.)
  • Tell them what you did, what you expected to happen and what actually happened.
There is one additional piece of information you normally put in a defect report. You normally specify which build the defect appeared in. This is because the developer fixing the bug can find the appropriate code given a build number.

When asking people outside your organization for help with an automation problem, we don't have access to your source code. So a build number does not help us. What we need instead is a code snippet. If you are testing a web site, give us the relavent HTML code you are trying to automate.

Without these bits of information it would be impossible to solve the problem. At best, you will get a lot of guessing. If you are lucky, someone will make a lucky guess or say something which makes you realize what the problem really is. However, if you want a quicker response to your questions give us the information we need to reproduce the problem and solve it.

Wednesday, April 27, 2011

Test Web APIs

Testing web APIs essentially breaks down into:
  • Building an HTTP Request
  • Sending the HTTP Request
  • Get the HTTP Response
  • Verify the HTTP Response

If you look at RFC 2616 it will tell you everything you need to know about HTTP/1.1. However, if you are not used to reading RFC, this document can be quite confusing.

Here some highlights to an HTTP Request...

If you go to a website and they have a FORM, submitting the FORM will do an HTTP Request. The FORM tag will have an action attribute. In side the FORM will be INPUT, SELECT, BUTTON, etc. tags.

For example, the Google home page is a FORM. The action for the FORM is "/search". The INPUT has a name attribute of "q". If I enter "HTTP Request" into the INPUT field, my web browser builds an HTTP Request and sends it to Google. You can type the following URL into an address bar:

This is the same as entering 'HTTP Request' into the input field and clicking the Search button. The one oddity is the %20. A proper HTTP Request has no spaces in it. So you need to convert spaces to a hexidecimal value. The hexidecimal value 20 is ASCII for a space. Also symbols like / : & etc. have to be convert as well. If the symbol is part of the Request and not a value being passed you don't convert it. The general format is:


If you leave out the port it defaults to 80 (443 for https).

At this point you might be wondering, what does this have to do with API testing? A lot. Many APIs are implemented using HTTP Request/Response. So you need to send them an HTTP Request just like your web browser creates an HTTP Request. With the web browser, it adds a lot more you might not be aware of. The HTTP Request has the URL, a header and sometimes data. Your web browser will secretly add header information like:

Accept-Language: en-us,en;q=0.5

Finally, the FORM can be a POST or a GET form. If it sends data using the URL, like the above examples, it is a GET request. If it sends the data in such a way the user doesn't see it, it is a POST request. For example, if I was logging into a website, I would use a POST so the password isn't visible in the address bar.

So now we have talked about all three parts. The URL, the HEADER and the DATA.

Let's say I design an API such that I expect some data to be in the URL, some in the header and some in the data. So lets say the URL will be:

In the header I want them to identify which machine the request is coming from and an API Key:

Company-API-Key: foobar7

And finally, the data will be an XML file with the following information:

    <?xml version=\"1.0\" encoding=\"utf-8\"?>

So how do I take this knowledge and using it? Let's say the XML data is saved in the text file foo.txt and I have curl installed on my machine (comes default on UNIX, Linux and Mac OS X machines; you can get and install a Windows version for free).

The curl command to use this would be:

curl -k -d @foo.txt -H "Request-Host:" -H "Company-API-Key: foobar7" -o response.txt

The -k makes it so we don't worry about SSL certificates and authentication, the -d is used to send the data, the -H adds header information, the -o saves the response to the file.

The great thing about curl is you can try something, make a slight change, try again, make a slight change, try again. So if you wanted to try a bunch of different data files, you can do this quickly and easily with a shell for loop. You can play with the header information or leave/change some of the URL fields.

You can quickly probe and test a set of APIs using curl. There are more options to curl but this is the basic usage.

Friday, April 1, 2011

Upgrading Selenium 2.0b2 to 2.0b3 - Python

Selenium 2.0b3 was recently released. My current company is using Selenium 2.0 for web testing and the preferred language for the team is python. If you are familiar with python you might know that upgrading a site-package (like Selenium) is as simple as:

pip install selenium

One of the staff did this and immediately the test suite broke. The line of code which broke was:

wd = webdriver.Remote(command_executor='http://' + host + ':' + str(port) + '/wd/hub', browser_name=browser, platform='ANY')

One of the nice things about using Eclipse (my IDE of choice) is you can hold down the CTRL key and click a method to jump to the source code. So I held down the CTRL key (actually it was the Command key because I'm on a Mac) and clicked on Remote. This took me to the class for Remote. The __init__ method is the constructor.

When I looked at the constructor for 2.0b3 it was immediately obvious that the constructor had changed. The old constructor (2.0b2) was:

def __init__(self, command_executor='http://localhost:4444/wd/hub', browser_name='', platform='', version='', javascript_enabled=True)

but the new constructor (2.0b3) is:

def __init__(self, command_executor='', desired_capabilities=None, browser_profile=None)

So the first parameter, command_executor, is the same but all the other parameters have been changed to a desired_capabilities. So what is desired_capabilities? It is a python dictionary. The format for a dictionary is:

dc = { "browser_name" : browser, "platform" : 'ANY' }

This one little change almost fixed things. A little more digging into the source code and I found that the dictionary should be using "browserName" and there doesn't seem to be any support for platform anymore. Since we only care about the browser, I changed the dictionary to:

dc = { "browserName" : browser }

and this solved everything.

Lesson Learned: you have to dig into the source code to figure out what is going on.

P.S. reading the comments in the source code indicated that the start_session method, which uses the desired_capabilities, it still talks about the parameters browser_name and platform. It is clear that the comments have not been updated to reflect the code.

Knowledge sharing

Joining a new company is always difficult at first. You spend a lot of time learning what you don't know about the company and the culture. Here are some ideas to make this normally unproductive time more productive.

Most companies have a wiki or central web site like Sharepoint. Create an employee page with pictures of the employees, there name, contact information (office/location, email, phone, etc.).

Next create a Subject Matter Expert (SME) table. Put the areas of expertise and who is the main and secondary contact. You could even work this from the other direction. As a manager, think about the needs of your department, create a table with one row for each subject your department needs an expert in. In the second column put the name of the person who is the SME for that subject. In the third column put the name of who would be your second choice for SME. Any row which does not have a second name means you need to create a backup for that area. Otherwise, when your one and only SME goes on vacation you could be in trouble. Or worse, they leave the company. Any row which has no SME at all means you need to either get someone to become that SME or you have a case for hiring someone.

Whenever someone new joins the company, take their picture, add an entry to the employee page and a link to the SME table. If they are an SME for a subject you have no one, make them the SME. If they are the SME for a subject you have one SME, make them the backup.

The next step in the process is for the SME to document what they know. Some people will believe documenting what they know will allow the company to get rid of them. The truth is that an SME's practical experience can never really be captured in a document. Most often this knowledge comes into play because the SME has left the company, is away on vacation or ill. If they don't document their knowledge, they will be critical to the company. Going on vacation will be discouraged. They will most likely be put under a great deal of stress. Sooner or later they will fall ill or quit.

The ultimate goal is to create a backup for the SME. If the SME is promoted or put on other activities, the company will require him to be available to help out the backup/replacement. If someone is replacing the SME, the SME will most likely become the backup until a suitable backup can be found. By the time the replacement is fully up to speed and a suitable backup has been found, the SME will probably be an SME for something new or they can at least make sure they are becoming the SME for something new and critical to the company.

Bottom line, if they are trying to phase you out it would be obvious. So don't worry and help your company share that knowledge.

Wednesday, March 23, 2011

a new type of source control - Git

At the new company they started with Subversion for source control but are now moving everything to Git.

I've worked with a few different source control servers and clients. The basic idea was to have a central repository which holds all the source code. You check in the source code to the central repository. Subsequent changes to the code are saved in the repository as changes or deltas.

If two or more people are working on the same project they have to set up a policy or convention. In some systems the team has to follow the policy. With other source control systems you can set up rules to enforce a policy.

For example, I check out a copy of the code, you check out a copy of the code. We both make changes to the file You check in your code and the repository is updated. When I go to check in my change it tells me there is a conflict. I have to merge my changes with your change. Only after that is done can I check in my changes.

In this model source is handled file by file. If our project has 37 files and only one has a conflict, the other 36 files will check in without issue.

With Git things are different. There is no central repository. I create a Git repository on my machine. I commit my changes to the local repository. When I want to share the changes with others I can push my repository to a central location or I can push it directly to someone on the team.

I pull a copy of the repository, you pull a copy of the repository, we both change, you push your change then I attempt to push my change. Git will tell me there have been changes and I cannot push my repository to the central location. I need to pull the repository down and merge it with my changes. I can also use fetch rather than pull. Using fetch will do a pull then merge in one step.

The basic commands are:

git init
git clone
git status
git commit
git push
git pull
git rm

To create a repository on your local machine, use git init. If the central repository is located at the URL git:git@git-server:project.git then I can make a local copy using:

git clone git:git@git-server:project.git

Once I have a local copy I can use git status to see what has changed. If I look at the output of git status it will list the files which have changed and the untracked files. Issuing a git commit -a will add the changes to my LOCAL repository. To share this with others I would do a git push. If I'm in the directory which contains git controlled files, it will look at the .git/config file, determine where the "origin" repository is and push the files to there. You can also git push to another location (someone on your team or a sub-team working outside the main branch.

If you want to remove files from the repository you can use git rm. If you just rm a file, the next git pull will bring the file back. If you git rm then git commit, the file will be permanently remove from your LOCAL repository. A git push will make this permanent for everyone else.

Tuesday, March 15, 2011

floating point numbers and representation error

Floating point numbers can be a great source of error. When we think about pure mathematics it includes the concept of infinity. For example, between the numbers 0 and 1 are infinite fractions.

When you store a floating point number in memory it has a limited number of values. This means that it is impossible to represent all possible floating point numbers. Different languages have different ways to represent numbers. The more accurate the floating point numbers are, the more expensive, computationally, using them becomes.

This means that operations like 1.47 - 1.00 might result in 0.46999999997 because the math library cannot represent 0.47 and 0.46999999997 is the closest match.

If the program you are testing is financial and it only deals with dollar and cents, this representation error can be problematic.

As a programmer there is a simple solution to this problem. Use only integers (long or double long if possible) and treat them as cents. So the above example would become:

long n1 = 147;           // $1.47
long n2 = n1 - 100;      // $1.47 - $1.00
long dollars = n2 / 100; // 0
long cents = n2 % 100;   // 47
System.out.printf("$%d.%02d", dollars, cents);

This means if you have a program which uses dollars and cents, you want to check for representation error (assuming the programmer used floating point variables) but you also want to check for issues which might be integer related. So you want to consider things like integer overflow and underflow. See the article Are most binary searches broken? for a discussion of overflow.

Saturday, March 5, 2011

Performance Testing and analogies

Talking with a tester at my new company and mentioning how analogies can often help you to look at things differently. Today this got me thinking about performance testing and scuba diving. Are there concepts or practices from scuba diving that can be applied to performance testing?

One of the rules for technical divers is called The Rule of Thirds. The idea behind this rule is that you should keep one third of your air as an emergency reserve. If I understand how much air I use for a given dive I should multiply it by 1.5 to handle unforeseen emergencies. For example, if I calculate my air consumption to require 80 cubic feet of air then I should take an additional 40 cubic feet of air. Thus 120 cubic feet of air and one third (40 cubic feet) held in reserve. If nothing goes wrong, I'll end the dive with 40 cubic feet of air. If an emergency happens, I have the extra air.

So how does this relate to performance testing? Usually when you are doing performance testing, you have a set of criteria. For example, one criteria with be that there will be 450 concurrent users. What happens if some unforeseen situation occurs? You do all your testing with 450 concurrent users and it passes with a nice margin. You throw it into production and something no one anticipate occurs. Your application gets a huge hit. Apply The Rule of Thirds and use 600 concurrent users. If your tests pass with 600 concurrent users you know it will pass with 450 concurrent users and you are prepared if some unforeseen circumstance occurs.

If your application is say a web store and some celebrity mentions your product on the red carpet at the Oscars, do you want your web site to go down when 50% more users than you could have ever anticipated tries to buy your product?

Sunday, February 27, 2011

waitForCondition revisited

In March 2010, I wrote a piece on using waitForCondition with Selenium 1.0, Selenium RC [Java] waitForCondition example.

At the bottom of the article is a line of code that will let you wait for an ajax call to complete. This line of code was written when I was working on a project using scriptaculous.js.

Recently I joined a company using jQuery.js. The code for waiting is the same but the variable you wait for has changed. If you use the code from March 2010, it would fail because the variable does not exist.

Instead you want to use the last line below:

WebDriver browser = new FirefoxDriver();
    Selenium selenium = new WebDriverBackedSelenium(browser, baseURL);

    selenium.waitForCondition("selenium.browserbot.getCurrentWindow().jQuery.isReady == true", "30000");

You might wonder how I figured this out. The easiest way is to ask the developer. If that is not an option, Firebug on Firefox or the Inspect feature of Chrome will help you out. On Chrome I would right click on the web page and select Inspect Element. The Developer Tools will appear. Select the Scripts section. On the right will be Scope Variables with a pause button at the top of the frame. Clicking the pause button will display the Scope Variables. You then want to look at the variables and see if something jumps out at you.

The other approach is look at the select menu at the top of the left frame. This will list the different javascript files. When I looked at this list I saw the jquery.js script. I could then look up the APIs for jquery.js and figure things out.

Friday, February 25, 2011

The Platform class in Selenium 2

There is a class in Selenium 2 called Platform. It is very helpful in those situations where you need to do something different for different platforms. Here is some sample code:

 public void seleniumPlatformExamples() {
  System.err.println("Supported platforms:");
  for(Platform p : Platform.values()) {
  System.err.print("The current platform is " + Platform.getCurrent());
  System.err.printf(" v%d.%d\n", Platform.getCurrent().getMajorVersion(), Platform.getCurrent().getMinorVersion());

The Platform.getCurrent() will return the current instance of Platform. You can then access all the other features using that instance.

There are other functions like checking to see if current version is equal to some enum value.

Thursday, February 24, 2011

Using the features of Selenium 1.0 with Selenium 2.0 code

I recently posted about doing a screen capture on exception. This got me thinking about how I can maximize the browser window before I do the screen capture. In Selenium 1.x I would use:

Selenium sel = new DefaultSelenium("localhost", 4444, "*firefox", "");

But how do I do this using WebDriver? The solution is to use a WebDriver instance to create a WebDriverBackedSelenium instance. Here is the code:

WebDriver driver = new FirefoxDriver();
    Selenium sel = new WebDriverBackedSelenium(driver, "");

and it is that simple. If I want to use Selenium 2.0 features, I access things via the driver variable. If I want to use Selenium 1.0 features, I access things via the sel variable.

Quite simple.

Generating a screen capture on exception thrown with Selenium 2

Recently, someone asked how to have Selenium 2 (WebDriver) create a screen capture if an exception is thrown. Here is how you do it...

Let's say you have the code:

WebDriver driver;

    public void setUp() {
        driver = new FirefoxDriver();

and you want to change it so the test cases will generate a screen capture file when an exception is thrown. Here is the first change:

WebDriver driver;

    public void setUp() {
        WebDriverEventListener eventListener = new MyEventListener();
        driver = new EventFiringWebDriver(new FirefoxDriver()).register(eventListener);

The second change is to create the MyEventListener class. The MyEventListener class will be:

public class MyEventListener implements WebDriverEventListener {
    // All the methods of the WebDriverEventListener need to
    // be implemented here. You can leave most of them blank.
    // For example...
    public void afterChangeValueOf(WebElement arg0, WebDriver arg1) {
        // does nothing

    // ...

    public void onException(Throwable arg0, WebDriver arg1) {
        String filename = generateRandomFilename(arg0);

The MyEventListener class will have 15 methods, including the two examples I have given here. The main method that you must implement if you want screen captures whenever an exception is thrown would be the onException method.

The biggest trick for this method is generating a unique filename for each exception. First thought is that the filename could be in the format "YYYY-MM-DD-HH-MM-SS.jpg". Unless you get two exception in one minute this will work okay. Unfortunately, it will be hard to figure out what the exception was unless you kept some sort of log in the code execution. You'll also have to waste time figuring out which exception goes with which date/time.

Personally, I'd use the format "YYYY-MM-DD-HH-MM-SS-message-from-the-throwable-argument.jpg". Selenium tends to throw multiple line exception messages. So you could take the first line of the message, change characters which are illegal for your file system and change them to underscores. You could also have something to set the location of the screen capture files and prepend that to the filename.

Here is the code I came up with in 2 minutes:

private String generateRandomFilename(Throwable arg0) {
        Calendar c = Calendar.getInstance();
        String filename = arg0.getMessage();
        int i = filename.indexOf('\n');
        filename = filename.substring(0, i).replaceAll("\\s", "_").replaceAll(":", "") + ".jpg";
        filename = "" + c.get(Calendar.YEAR) + 
            "-" + c.get(Calendar.MONTH) + 
            "-" + c.get(Calendar.DAY_OF_MONTH) +
            "-" + c.get(Calendar.HOUR_OF_DAY) +
            "-" + c.get(Calendar.MINUTE) +
            "-" + c.get(Calendar.SECOND) +
            "-" + filename;
        return filename;

The final part is the code to actually generate the file. This is standard Robot stuff. Here is the code I whipped together a few projects back:

private void createScreenCaptureJPEG(String filename) {
  try {
   BufferedImage img = getScreenAsBufferedImage();
   File output = new File(filename);
   ImageIO.write(img, "jpg", output);
  } catch (IOException e) {
 private BufferedImage getScreenAsBufferedImage() {
  BufferedImage img = null;
  try {
   Robot r;
   r = new Robot();
   Toolkit t = Toolkit.getDefaultToolkit();
   Rectangle rect = new Rectangle(t.getScreenSize());
   img = r.createScreenCapture(rect);
  } catch (AWTException e) {
  return img;

And that is it. Whenever an exception is thrown a file will be generate.

Monday, February 21, 2011

Using Selenium 2.0 with WebDriver and Safari

I've been looking at Selenium 2.0 and writing test cases using WebDriver. Looking at the APIs I see there is a class for Android, iPhone, InternetExplorer, Firefox and Chrome which extends the RemoteWebDriver.

So how do I use Safari? The code to set the search string for Google, using WebDriver, would be:

WebDriver browser = new FirefoxDriver();

WebElement input = browser.findElement("q"));

If I wanted to do the same thing using Safari web browser, I could use:

Selenium browser = new DefaultSelenium("localhost", 4444, "*safari", "");

browser.type(name("q"), "Selenium");

The problem with this is I need to do things differently for Safari. I have to use Selenium 1.0 commands to test Safari and Selenium 2.0 for everything else. So how can I use a browser which was supported in Selenium 1.0 with all the Selenium 2.0 APIs?

The solution is:

Selenium sel = new DefaultSelenium("localhost", 4444, "*safari", baseURL);
CommandExecutor executor = new SeleneseCommandExecutor(sel);
DesiredCapabilities dc = new DesiredCapabilities();
WebDriver browser = new RemoteWebDriver(executor, dc);

WebElement input = browser.findElement("q"));

If you look at this block of code, the last 3 lines are the same as the last three lines of the first block of code. Essentially, I add the first three lines then change the WebDriver declaration to a new RemoteWebDriver.

The one thing I found however, on my Mac OS X, if I started the SeleniumServer with setSingleWindow(false) it would fail to work. I run my SeleniumServer inside my Java code using:

private static SeleniumServer ss;
private static RemoteControlConfiguration rcc;

public static void setUpBeforeClass() throws Exception {
rcc = new RemoteControlConfiguration();
ss = new SeleniumServer(rcc);

If you are running the SeleniumServer from the command line, you'll have to look at the command line options to ensure you are running in Single Window mode.

Saturday, February 12, 2011

Selenium 2

I've been to the dark side (HP Quick Test Professional) for the past year but I'm back to working with open source tools like Selenium.

When I last looked at Selenium it was just in the process of releasing version 2.0.

One of my complaint of 1.0 was how inconsistent the Java APIs were. Simple things like iterating through all the OPTIONS on a SELECT list wasn't really there. You could use Selenium to make the application do anything a user would do but for test automation you need to have access to information that a manual test would usually validate with their eyes.

I recently started looking at version 2.0. There seems to be a great deal of improvement.

  1. It will still run the Selenium 1.0 code with no change at all.
  2. You can mix and match 1.0 and 2.0 code which enables you to slowly refactor your old 1.0 automation to 2.0 over time.
  3. No more effort trying to create a jUnit 3 case to extend all your test cases from.
  4. A much more simple design with far less 'extras' to help you, i.e. a K.I.S.S. approach to the APIs.

In the old Selenium 1.0 we would create an instance of Selenium and it would be the gateway to the browser. With Selenium 2.0 we create an instance of WebDriver. WebDriver is a base class of classes that represent the different browsers. If you wanted to create a fast, non-GUI browser you can use the HTMLUnit browser. For example, using jUnit 4 annotations, your @Before method would be:

WebBrowser browser;

public void setUp() {
    browser = new HtmlUnitDriver();

Now all your test cases (@Test) can you the browser instance to access the APIs. If you then want to change the test to run on a real web browser you can use:

    browser = new FirefoxDriver();
    browser = new InternetExplorerDriver();
    browser = new IPhoneDriver();
    browser = new AndroidDriver();

As new browsers are added to the WebDriver family this list will grow.

One you have a browser instance there is a limited number of options:


The biggest methods are the findElement and findElements. The first will find a single element. The input is the By class. It can identify elements by name, id, xpath, etc. If the identifier matches more than one element, it will return the first match. Realistically, I can see this changing with different browsers or at least changing over time as systems get upgraded. You really want to make sure the identifier matches a single, unique element.

The second method, findElements, works like the first method but it returns a list. Most elements are WebElements (some exceptions). So the first method returns a single WebElement and the second method returns a List<WebElement>. The best thing is you can find ANY html element. 

Once you have a WebElement, you have the following list of methods:


The findElement and findElements will search from the given element and down in the DOM. For example, if you use browser.findElement to find a SELECT tag then you can use the result of that to find all OPTION tags. It would then be searching under the SELECT tag and therefore find all the OPTIONs for that tag.

That is essentially it. K.I.S.S. Rather than using arrays it is using Lists and Iterators. If the elements are uniquely identified and don't move around much it could be easy to maintain elements. In many cases the developer will find he needs to change a DIV to a SPAN or an A to a IMG with an onclick attribute. Rather than searching for a DIV you want to encourage the use of id attributes (which must be unique according to HTML standards) you can then find everything By id. A change in the tag or location in the DOM would require no maintenance of the automation scripts.

More to follow...