Google Analytics

Search

To search for specific articles you can use advanced Google features. Go to www.google.com and enter "site:darrellgrainger.blogspot.com" before your search terms, e.g.

site:darrellgrainger.blogspot.com CSS selectors

will search for "CSS selectors" but only on my site.


Showing posts with label Watij. Show all posts
Showing posts with label Watij. Show all posts

Wednesday, June 23, 2010

Understanding how xpath relates to web pages

When using automation tools like Selenium or Watij you often find yourself creating an xpath to find an element. Talking to a few people there seems to be a lack of understanding of how an xpath relates to a web page.

I think the step which is missing for most people is understanding how to look at a web page.

A web page is merely a group of blocks inside blocks. To illustrate I have the following image:

Image the outer block is the <BODY> of the web page. Inside the outer block, i.e. the BODY are two rectangles. Let's say they are <TABLE> elements. The top table, i.e. /HTML/BODY/TABLE[1], has one row and three columns. The lower table, i.e. /HTML/BODY/TABLE[2], has three rows and four columns.

Let's say that both tables have one row of cells where the class was 'foo', i.e. <TD class='foo'>. If I wanted to find all cells with class='foo' and the text contained 'bar' I would use:

    //TD[@class='foo' and contains(text(), 'bar')]

But what if I wanted to search only the second table? Then I would use:

    //TABLE[2]/TBODY/TR/TD[@class='foo' and contains(text(), 'bar')]

Essentially, the longer the xpath the small the area of the web page I am searching. Using //BODY will search the largest square in my example. Using //BODY/TABLE[2] will search the lower table or the second level in.

If you look at the third row of the lower table you can see the 'cells' contain another level of element. Let's say that the cells, i.e. <TD>, contains a <DIV>. Using //TABLE[2]/TR[3]/TD/DIV[1] focuses on the first div in the last row of the lower table.


Saturday, February 6, 2010

Watij versus Selenium-RC [Java]

If you want to use open source test automation frameworks for testing web applications there are two categories. Some of the frameworks have their own engines (CSS, Javascript, HTML, etc.) and others use a real web browser.

One advantage of the frameworks with their own engines is they have more control and you don't have to deal with the real life implementations of a web browser. You are also not dealing with TCP/IP, network delays, etc.

The disadvantage, on the other hand, you are not dealing with real life implementations of a web browser. You are also not dealing with TCP/IP, network delays, etc. Basically, if you use a framework with its own engine and your application passes, it does not guarantee it will work on a real web browser like Firefox or Internet Explorer.

Two frameworks which drive a real web browser are Watij (Web Application Testing In Java) and Selenium (pronounced sa-LEE-nee-am).

Watij is a Java test framework. You will need to know Java and jUnit to use Watij. The documentation is VERY lacking. The people developing Watij are developers. They are not creating a commercial product. If I was programming a calculator, adding comments to the source code would not be important. Instead, I would want the source code to be self-documenting. The method names should tell me what they do. The class names should imply what methods exist in it (i.e. I'd know what to expect in the file even before I opened it). Additionally, I would write unit test cases to show how the code was expected to be use.

As something built on the jUnit framework, I expected better documentation. I quickly realized that it was best to have the source code jars added to my project and occasionally needed to step into the Watij code to figure out how things worked.

Everything was nicely structured however. You have HtmlElement which was the base class for all other elements. You have things like Button, Table, Link, etc. which inherit HtmlElement. These are all interfaces and you have implementations like IEButton. I believe the idea was that some day you might have IEButton, FirefoxButton, ChromeButton, OperaButton, etc.

In Java I might have:

public List doSomething() {
List text = new ArrayList();
// more code here
return text;
}

Similarly, in Watij all the methods return an interface rather than a specific implementation. So the signature for a method returning an IELink is:

public Link getLink(String locator);

Nothing special for a Java test framework. Additionally, a Table extends a TableBody, a TableBody (and subsequently a Table) can tell you how many rows and columns it has. It can also return a specific TableRow or TableCell. Back to the fact there is no JavaDoc for the methods but it does not made because the code is self-documenting.

When a method need to return a collection of elements, e.g.

Links links = ie.links();

The implementation of Links is typical Java. It assumes we have Collections, Lists, Maps, etc. So the Links implementation is Iterable<link>. So we can have:

for(Link link : links) {
// do something with each link in the collection
}

Essentially, writing automation is no hard for a Java programmer than writing a simple application.

Selenium on the other hand is totally different from Watij. First, it is not a Java framework. Selenium comes in multiple parts and one of them allows you to use Java. The starting point for Selenium is Selenium IDE. You get an AddOn for Firefox. When you open the Selenium IDE it turns on recording right away. Everything you do in Firefox is then recorded into an HTML table. The table has three columns. The first column is the command, the second column is the target and the third column is an optional value. For example, if I wanted to click a button the command would be 'click' and the target would be some way of identifying the button. This could be CSS, an xpath, the id of the button, the name of the button, etc. Another example using all three columns would be entering text in a text field. The command would be 'type', the target would be the text field and the third field would be the text you want to put in the text field.

This is great for doing some quick and dirty automation but lacks the ability to build into a test suite. Additionally, if every test case started with logging into the application you would have duplicate code. If the way you log in changed, you would have to edit all the test cases. This is possibility the number one reason test automation fails. It is not maintainable.

The next part of Selenium is Selenium RC (remote control). After you record the test case using the IDE, you can save it in a variety of different languages. If you are a Java programmer, you can save it as Java code.

To run the Java code you'll need the Selenium RC jars and a Selenium Server. The way it works is the Selenium RC jar is a client. Your Java code sends commands to the server via the client. The server then uses Selenium IDE to control the web browser (via Javascript).

The nice thing about this is you can now record a snippet using the Selenium IDE, save it as Java and incorporate it into your Java test suite.

This is the theory. The reality is that how Selenium IDE recognizes the elements in the browser is not always the best choice. For example, I might have a set of SPAN elements inside the TABLE to make the cells format nicely. Along comes Internet Explorer 9 and my table looks horrible. So I have to add a DIV around each SPAN. From a customers point of view everything looks the same, but from a Selenium locator point of view my test cases all break. It can no longer find the cell. So you will need to look at the locator Selenium IDE has selected and decide if you want to change it.

Additionally, AJAX and timing are a huge issue. For example, you have a form with two select lists. The first list is a list of countries. The second list is empty. You select your country and an AJAX call populates the second list with states, provinces, territories, etc. Selenium IDE will record you waited 1 second before selecting your province because you selected Canada for the country. If you select United States it takes 2 seconds to populate the second list. A bad automator will just add a few seconds to the wait. A good automator will look for something in the DOM which signals the second list is populated. For example, the second list might be hidden. So you want to wait for the second list to become visible. Knowing to code this is one thing you need to have when using Selenium IDE and knowing what to code is even harder.

So the record feature is not that good. It will most often encourage less experienced automators to create code duplication and use sleep/wait statements to handle things like AJAX/Web 2.0 features.

Another issue with Selenium is the Java support. When you look at the Java support it is fairly simplistic. There is no hierarchy to the data structures and all the methods return arrays. You'll also find things like, I can find out how many elements match an xpath but I have no way of iterating over the list of elements. In Watij you can do the following:

List result = new ArrayList();
String myXPath = "//A[contains(text(), 'Phoebe')]";
Links phoebeLinks = ie.links(xpath(myXPath));
for(Link currentPhoebeLink : phoebeLinks) {
result.add(currentPhoebeLink.text());
}

To do the same thing in Selenium RC using Java, in theory, you need to do:

String[] allLinks = selenium.getAllLinks();

then iterate over all the strings to figure out which ones are the ones you are looking for. However, when I use this method it actually returns back an array of one element and the one string it does contain is "". In other words, it doesn't seem to work.

My workaround for this problem was to use the getHtml() method. This returns all the source code for the current page. I can then load it into something like jTidy and use the jTidy interface to traverse the page and find all the links. To actually iterate over all the links and do something with them I'd have to create a unique list of the links using jTidy then go back to Selenium and interact with them.

Essentially, I found that writing a test case in pure Java was simple with Watij and difficult if not sometimes impossible with Selenium RC. Ultimately, the maintenance on Selenium will be the potential downfall.

On the other hand, when I actually run the test cases I find that Internet Explorer has problems. It hangs and locks up occasionally. For an automated test suite this is unacceptable. With Watij I am stuck. With Selenium I can run the tests in Firefox, Opera, Safari or Internet Explorer. So automated nightly build and test is possible with Selenium but not really viable with Watij.

As I see it, Watij is a fantastic house built on a bad foundation. Selenium is a good starter home which can easily be moved to a variety of different foundations.

Neither is a great solution. I spent a year trying to work around the bugs in Internet Explorer with Watij but in the end just gave up. For now, I'm looking at Selenium. Maybe it is time to start doing some Selenium RC programming, i.e. give back to the community.

Wednesday, January 27, 2010

xpath

I have been doing a lot of web testing. The general idea behind all UI test automation tools is to locate an element on a page then do something with it, set it, clear it, read it, etc.

For the web automation tools you can use:

(a) the position on the screen (x,y coordinations)
(b) the position in the DOM (e.g. /html/body/table/tbody/tr[2]/td[4]
(c) a unique attribute

The position on the screen never works. Different browsers, fonts, screen resized, etc. will change the layout and ultimately, change the screen position of elements. I wouldn't use this. Working with development to provide alternative means will be less work than maintaining automation with x,y coordinates.

The position within the DOM is a little brittle. When a browser is patched or a new browser needs to be supported it is not uncommon for the developers to throw in some span or div elements to help with layout. So the element /html/body/table might change to /html/body/div/span/table. The more precise the positioning information the more brittle it will be.

A unique attribute would be something like the id of a tag. For example:
Darrell
I can find this via the class or id attribute, or both. This is where xpath comes in handy. The automation tools I have been using (Watij, Selenium) can use xpath to locate an element. For my td element I can use:
//TD[@class='username-cell']
or
//TD[@id='username']
or
//TD[@class='username-cell' and @id='username']
The id attribute is required to be unique. So if the element has an id, that is the attribute to use. If you start an xpath with // it tells the tool to start searching anywhere in the DOM. Starting with a single / will start at the root. For a web page that will always be /html.

Xpath can be quite powerful in identifying elements. You have a few 'tricks' you want to use. First is that id=' foo' is not the same as id='foo '. The whitespace makes a difference. To get around this I would use:
//TD[contains(@id,'foo')]
Now the whitespace does not matter. you have to be careful with this however. If there are two matches, it depends on the automation tool as to what will happen. If I have:

darrellDarrell
then:
//TD[contains(@id,'user')]
will have unpredictable results. Not something you want in test automation. So how to get around this?
//TD[contains(@id,'user') and not(contains(@id,'username'))]
Any attribute inside the tag can be used via the @ symbol. You can do things like look at the style attribute using @style. Because the order of the things in style does not matter to the browser, the contains() function helps a lot.

Finally, the text between the open and close tag can be found using the text() function. So if I had:
I can find it using:
//A[contains(text(),'Google')]
What about the difference between Google and google? For matching either you can use:
//A[contains(lower-case(text()),'google')]
This will take the text in the anchor (e.g. Google) and changing it to lower case (e.g. google) then comparing that resulting string to 'google'.

In addition to and there is an or keyword as well but I usually find it better to narrow things down (and will filter out) rather than build up the matches (or will combine).

There is a lot more to know about xpath. If you are curious, ask.

Additionally, I find a developer will put an id on something because he uses it from javascript to find it and all the elements underneath it. So if there is:

  • Darrell


  • Jerome


  • Mark

  • I would tend to use:
    //DIV[contains(@id,'users')]/UL/LI[contains(text(),'Darrell')]
    to find the element which contains 'Darrell'. The developer will tend to not want to change the structure under the id'd tag because it will cause them a lot of maintenance as well.

    Saturday, May 3, 2008

    Been a while

    It has been a while since I posted to my blog. I've been reading less techie books and taking time to myself.

    Been using Watij at work to do load testing. I am testing Foglight 5.0. Foglight is a application monitoring tool. You install it on a computer then deploy agents to other computers. The agents collect information and send it back to Foglight. Foglight saves the data in a database. A user can then log into the web console for Foglight and view the data. A Foglight cartridge is a package of agents, configuration files and schema information. You would have an Oracle Database cartridge. This would have an agent which monitors an Oracle database, sends back all the information an Oracle Database Administrator would be interested in then displays it in a manner the DBA would appreciate. The 'cartridge' has default dashboards (a dashboard is a chart/table/view of the agent data) and rules (a rule does things like email the DBA when the database crashes, if Foglight detects a bottleneck, someone tries to illegally access the database, etc.). There are other things like reports (PDF) and analysis tools.

    So, if you load all the cartridges into Foglight (OracleDB, WebLogic, WebSphere, Windows, Solaris, AIX, HP-UX, Vmware, MySQL, DB2, etc.) you will have hundreds of different views. For example, just the Windows cartridge will have agents for DiskIO, FileSystem, CPU, Memory, EventLog, AppMonitor, WebMonitor, ApacheSvr, LogMonitor, etc. and each agent will have dozens of views.

    Verifying all these views can be quite time consuming. Each dashboard has an associated URL. As a user of Foglight I would log into the console (username/password) then select a dashboard from a treeview. I could also type the URL into the address bar and go to the dashboard directly.

    This is how I use Watij. I created a set of jUnit test cases. The setup() was starting IE and logging into the Foglight Console. Each test case [test*()] was loading a URL, i.e. dashboard. The tearDown() was logging out of the console and quitting IE.

    One of the challenges I faced with Foglight's Web Console Framework (WCF) was the use of AJAX and client-side Javascript. A fair amount of the code was in the form of Javascript on the client side. This meant, the HTTP response would complete, Watij would see the HTTP request as done but the client (Internet Explorer) would still be processing the Javascript (many of the views were complex enough that a page would take an addition 1 to 5 seconds to render).

    The solution: WCF has a GIF which they set the style="VISIBILITY: visible" when the page is rendering and it gets changed to style="VISIBILITY: hidden" when the rendering completes. So I just wrote a method which gets the CSS for the image as a string then uses the match method of Java String to search for "style=\".*VISIBILITY:[\s]*visible.*" and does a loop until this changes. Basically it is a:
    do {
    // sleep 250 milliseconds
    // get the CSS in the string s
    } while(s.matches(REGEX));
    

    The moment the style changes from visible to anything else, the loop exits and I know the page is really done. As a double check I do a windowCapture from Watij then manually inspect the images.

    Darrell