Google Analytics

Search

To search for specific articles you can use advanced Google features. Go to www.google.com and enter "site:darrellgrainger.blogspot.com" before your search terms, e.g.

site:darrellgrainger.blogspot.com CSS selectors

will search for "CSS selectors" but only on my site.


Thursday, March 11, 2010

How to pick identifiers

When doing software automation pretty much all automation tools require some way of finding the elements in the application. If it is a desktop application, the tool needs a way to find the menus, buttons, text fields, etc. If it is a web application, the tool needs a way to find the various HTML elements, e.g. input, select, ul, li, table.

The trick to easily maintainable automation is to pick something unique and unlikely to change. Take for example a web page with a table and I want to find the contents of a cell. Tools like Selenium using 'locators'. A locator can be: id, name, dom, xpath, css. There are others but these five are the ones common to most web automation tools. I like to use these because if I switch to a different tool (new project, current tool no longer supported, different company, etc.) my knowledge is more transferable. If I use a recognition method unique to a tool I am tying myself to that tool.

If you read the HTML standard, you will find that the ID of an element must be unique. Being unique is paramount to automation. If two elements on a page can have the same value then there is the chance at some point your automation will fail. Some automation languages will fail with a helpful error, i.e. it will tell you there was more than one element which matched. Many will just select the first match, or the last match, or fail farther down in the test case. The problem with ID is not all elements have an ID attribute. There is nothing in the standard indicating all elements have to have an ID. But if an ID exists, this is the best choice.

The NAME attribute does not have to be unique and therefore could cause problems in the future. I will occasionally use the NAME attribute but only if I am working closely with the development team and have a feel for how they select attributes. If the attribute is selected by some automated tool (like struts) and I know this tool is guaranteed to make all the names unique, I'll use it but only if there is no better choice.

The DOM is going to be something a web developer will understand. However, the structure of the DOM might change. If the UI is fairly simple or the application is very stable then using the DOM might be okay. But of they are adding menus, moving around navigation bars, changing tables to div/span combinations, adding divs or spans to deal with new web browsers, etc. then the DOM will be changing and you will be required to update your automation. So what works today might given you a lot of headaches in the future. The whole idea behind automation is to spend more time creating the automation and reap the benefit of running it again and again. If you spend 4 times more time automating compared to manual testing and have to spend the same amount of time maintaining the automation as you would manual testing, you'll never re-coop the initial cost of automation. The more the DOM is changing the less beneficial it will be to use the DOM.

The XPATH is my preferred method of identifying elements on a web page. How you use it can make a world of difference however. If the full xpath to the cell in a table was /html/body/div[2]/span[4]/table[3]/tbody/tr[7]/td[9] I could use that. It would definitely be unique. But if the document is in flux, this xpath will change. Alternative xpaths would be //body/div[2]/span[4]/table[3]/tbody/tr[7]/td[9] or even //table[3]/tbody/tr[7]/td[9]. The shorter I can get the xpath the less likely it will change. Even with the last xpath will break if they add another table above the current one. What if they add or subtract a column? Then the td becomes td[8] or td[10].

Basically, using magic numbers is a bad thing. This is taught in first year computer programming. If we look closer at xpath we find there are functions we can use. If the cell has ID=foo I can use //TD[@id='foo']. What if today it is a cell in a table but in the future they change it to a set of DIV/SPAN. I could use //*[@id='foo'].

What if the automated tool occasionally adds a space to the attribute strings (I see this a lot). I'd have to figure this out and change it to //*[@id='foo ']. But there is a better solution, try using: //*[contains(@id,'foo')]. The danger of this is that there might be an ID='foo' and a second element with ID='foobar'. At this point you need to use your judgement. Are there attribute values who are subsets of other attribute values? If yes, don't use contains(). Is it common for spaces to get added to attribute values? If yes, do use contains(). What if both conditions occur? This is where it gets hard. There is no one right choice.

Sometimes you want to add some more in. For example, if the table has ID='bar' and the structure of the table is fairly solid, I might be okay to use //table[@id='bar']/tbody/tr[7]/td[9].

Another solution is if things are relative to one another. For example, if I have a table and the third column has an element which is unique (the text, the id, whatever) and I was the seventh column on that row, I can use a relative path. For example, say I want the input field on the same row with the text 'Enter the Quantity:' then I could use:

    //td[contains(text(),'Enter the Quantity:')]/../td[7]/input

This starts from the cell with the known text, goes up one to be on the row then down into the 7th column and finally down into the input element.

The CSS is very much the same as XPATH. The only difference is that understanding how you can match things with CSS requires knowledge of CSS. If you are a developer who uses CSS to identify and target elements for say AJAX then obviously you are going to be comfortable using CSS to identify the elements. Just like with XPATH you can do things like TD#foo (e.g. //TD[@id='foo']), #foo (e.g. //*[@id='foo']), *[id*='foo'] (e.g. //*[contains(@id,'foo')]). NOTE: I'm not an CSS expert so take these examples with a grain of salt.

Summary, pick identifiers which are (1) unique, (2) unlikely to change and (3) you understand and can maintain.

3 comments:

Matt said...

Very useful post. Thanks for sharing this info.

Darrell said...

Glad you like it. If there is anything else you think I should post, let me know.

Notque said...

Wonderful post. Thanks.