QA & Testing

Wednesday, February 20, 2008

Safely probing web sites

The whole recruitmenttech.com / Bernard Haldane scam thing got me using my old telnet trick to examine the contents of a website. A number of bad eggs like to use security flaws in IE or Firefox to infect your computer. Most people try to make sure the security patches are up to date. But, there is always a period of time between when a virus is released and when a security patch is released to deal with it. If you visit the wrong website during that time you could be in for trouble.

What I like to do is avoid the security flaw by using a method the virus writers are not expecting. I like to use telnet. I tend to telnet from different operating systems as well. You might not have the luxury of a dozen different OS. You could consider Vmware or some other OS emulator.

Anyways, here is how it works. I'll use telnet from MSDOS. The connection for a web browser and for telnet is pretty much the same. Telnet defaults to port 23 and web browsers default to port 80. So if I wanted to use telnet to connect to say www.blogger.com I'd use:

telnet www.blogger.com 80

At this point the MSDOS telnet program will print nothing. If you press CTRL-] you get to the telnet settings. In there enter set localecho. Press ENTER twice, once to turn on local echo and once to get out of the telnet settings. You are now back at a blank screen again. Enter:

GET / HTTP/1.1
Host: www.bogus-computer.com

NOTE: you have to press enter twice at the end. Once to send the Host: field and once to signal the end of the HTTP header.

If you take too long to type things in, the computer at the other end will timeout and hang up on you. If you type it in quickly enough you'll get something back like:

HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: PREF=ID=0df2c62f96b9ffe7:TM=1203534740:LM=1203534740:S=0L2HwAZgQbyqrbmI; expires=Fri, 19-Feb-2010 19:12:20 GMT; path=/; domain=.google.com
Server: gws
Transfer-Encoding: chunked
Date: Wed, 20 Feb 2008 19:12:20 GMT

...

There should be a lot more as well. What I'm not posting here is all the stuff between <html> and </html>. What I have posted is the HTTP response header. Your web browser eats this and uses the information. For example, the Set-Cookie: field will make the web browser set a cookie.

The GET command is a standard HTML header command. The next part, /, is the path you want to get and the HTTP/1.1 is the protocol to use. Some web servers only work with HTTP/1.0, some only work with HTTP/1.1, most will use both.

If you wanted the web site, http://en.wikipedia.org/wiki/List_of_HTTP_headers then the sequence would be:

telnet en.wikipedia.org 80
GET /wiki/List_of_HTTP_headers HTTP/1.1
Host: www.bogus-computer.com

You'll notice that I always put a Host: field. Many web servers will not respond to robots or automation. They want to know who is talking to them. So if you don't include the computer name in the header, they just hang up or send you a HTTP/1.1 403 Forbidden response. Try using telnet to www.google.com and they will refuse you. They not only want the Host: information but they expect a number of other fields as well. If you go to the http://en.wikipedia.org/wiki/List_of_HTTP_headers web page, they talk about some of the common header fields and have a link to the HTTP standard on www.w3.org.

When you get the response back, you'll have to look through the body and see if there are other references, e.g. <SCRIPT> tags, which will create more GET requests. Your web browser is often getting the first page and from there doing multiple GET commands for the contents (each <IMG> tag is a GET, running Javascript will create more GET commands, etc.).

Once you have downloaded everything you can then look at it with a text editor and see if there is anything in it which could harm your computer. If you don't know how to read Javascript, this is obviously not an option for you but as a big nerd this is what I do. :^)

Hope you enjoy this. Let me know if you have any questions.

Happy hunting!

Wednesday, January 16, 2008

Donald E. Knuth

Last week, Thursday, January 10th, was Donald E. Knuth's birthday. I cannot believe I missed it.

I first learned about him in university. Someone mentioned The Art of Computer Programming. They talked about it as if EVERYONE knew about it. Coming from a trade school background I had no idea who Donald E. Knuth was or any of his publications. Since my background was graphic arts and typesetting I was first intrigued by TeX and started using it to typeset my math homework. As time went on I switched from a Mathematics major to a Computer Science major, in part due to Donald E. Knuth.

When I started reading The Art of Computer Programming I noted in the preface the information about questions and how he ranked them. If it had [1] it would be a question that takes a second to answer. Something like [10] might take a few seconds to a minute, [20] might take a day, etc. (I might have these estimate wrong). The one which stuck in my head was questions ranked [50]. Remember, I had no idea who Donald E. Knuth was nor how brilliant he was. He gave as an example the following question:

If an integer n is greater than 2, then the equation aⁿ + bⁿ = cⁿ has no solutions in non-zero integers a, b, and c.

He then proceeded to say, "If anyone finds an answer to these questions I'd appreciate you letting me know."

I figured this question didn't look too hard so I'd give it a try. I spent 3 months on it and figured out that if I could prove it for an integer n as a prime number then I could prove it for any integer n. Try as I might that was the closest thing I could come up with. I'd figured out a lot of algebra and log/exp theory but I was stumped. After Christmas (I spent from mid September to Christmas working on this), I was defeated. It was the first time I couldn't answer a math puzzle. I went to my professor and asked him for the solution; I feel so stupid for not being able to figure it out myself. My professor laughed out loud when I asked for the solution. I felt so humiliated and a little angry; I was thinking he was laughing at me because of how stupid I must be. He quickly realized I actually expected an answer and thought he was laughing at my stupidity. He told me no one knows the answer to this puzzle and the most brilliant minds have been trying to prove it since Fermat wrote it over 350 years ago (this was 2 years before Andrew Wiles published his proof).

The funniest thing is a year later I was watching an episode of Star Trek NG and Picard is reading a book. The book is about Fermat's Last Theorem; he says something like, "There are things man was never meant to understand, like Fermat's Last Theorem." The writers of Star Trek NG assumed no one was going to solution this thing.

Donald E. Knuth, he always seemed to write in a very unassuming way. I have never had the pleasure of meeting the man. I would guess he just truly loves math and computers. Maybe some day.

Wednesday, November 7, 2007

My new Mac OS X

It has been a short while since I posted. For a while there I was without a computer. My Windows box had decided I needed an updated video driver. Last time it did this I was reduced to 320x200 16 colour graphics. This time it just totally puked my machine. I tried booting in safe mode; everything appeared to start okay but after all the drivers loaded, it switched to the Windows XP splash screen (just before the login dialog) and hung there. Booted the thing with a Dr.DOS CD I created (just for such an emergency). The registry had been corrupted.

Decided now was the time to buy a new computer. Had been looking at the Mac for a while. It is really a UNIX box with an Apple GUI. The thing makes as much noise turned on as it does turned off. The Windows box sounds like a jet taking off in the distance. I connected my ethernet cable, powered it up and it immediately found the DSL modem, knew what kind of protocol it used, asked me two questions (username and password for my ISP) and bang I was on the net.

The really cool thing is I don't have to install something like Cygwin to do Bourne shell scripting. The REALLY cool thing is what I can do without much knowledge of UNIX command line. There is a utility on the thing called 'Network Utility'. You go to Spotlight (Command-Spacebar) and enter 'network'. The first thing it finds is the Network Utility.

A lot of the stuff on this is pretty easy to use from the command line, e.g. netstat or ping. But there is a feature labeled 'Port Scan'. This thing is like nmap but a lot easier to use. You give it the name or IP address of a machine and it will probe the machines ports. It will see if something is at each port and figure out what that something is. You might thing, it sees something at port 80 and assumes it is http but I've put an application server at a different port and it found it.

I'm scanning a site now and seeing that they have turned off telnet (I think it is a Linux machine so telnetd is off by default) but they have sshd running on the standard port 22. It is a web site so no surprise there is an httpd running on port 80.

It is not clear where this thing has a passive mode (or if it is always in passive mode). I'm going to guess it is not and the host I'm probing sees me peeking at all the ports. If you want to do some stealth probing, this is probably not the tool for you. You'd have to go for command line.

Makes me wonder, why would you want this? Your average joe user can find you are running iiop on port 3528. This will mean nothing to them. Most the people who know about this sort of stuff, know command line utilities like nmap. Still, kind of fun to play with.

The typical UNIX or Linux start does not apply to the Mac. After the initial rc.d everything is started by a launchd application. Additionally, things you run from the GUI are run very differently then on say SuSE Linux. A GUI application has the elaborate structure and there is a special program that runs the application. For example, if the GUI indicates Foo is an application in /homes/darrell then to run it from the command line I'd have to issue: "/System/Library/Frameworks/Carbon.framework/Versions/A/Support/LaunchCFMApp /homes/darrell/Foo.app/Contents/MacOS/Foo"

The Foo.app is what you see as Foo from the GUI. You don't see the Contents folder or anything else inside the Foo.app directory.

Sunday, October 7, 2007

Good list of books you should read

A group of people/companies got together and created The SoftWare Engineering Body Of Knowledge or SWEBOK. It is basically a reference to software engineering covering everything from requirements to maintenance. There are chapters on all the stages of the software development life cycle.

A lot of that book is based on traditional, proven methodologies. A lot of what you see in Agile or eXtreme Programming environments is not well reflected in the SWEBOK. It is still a good starting point for anyone wanting to understand things like Testing or Software Quality Assurance.

The book is a list of known definitions and descriptions plus, and this is the important part, a reference to where the definition/description came from. It is the list of reference material at the back of each chapter which makes the SWEBOK most valuable. It is a great place to build a good library of books from.

When I started out in programming, I was self taught. I only knew things from trial and error or based on whatever books I could find in stores or the public library. Books like The Art Of Programming by Donald Knuth or even Algorithms by Robert Sedgwick were not available in computer stores. Once I found these books I started to realize how little I knew. To paraphrase Socrates, "I am smarter than most because I know I know nothing." Read the SWEBOK and you'll know just how much there is to know and hopefully you'll be able to realize how much more there is for you to learn.

Interview Help Sites

I recently when to Wikipedia and searched for Software Quality Assurance. The page for SQA contained two external links, both of which were the same site but different pages.

I went to the external links and found something I've found before, while trying to grow my knowledge of SQA. I found a site with Interview Questions.

The general idea behind these type of sites is a mish-mash of interview questions with answers. The site has some semblance of organization but as you go through it you will find the same questions with different answers. If I had to guess, someone set up a site to have people post questions they have been asked in interviews. The person is trying to remember the question and often forgets important details. So the questions are not well formed. On top of that, the answers to the questions are often from the people who failed the interview or various people trying to help the interviewee answer the question.

For example, there is a section on C language (NOT C++). The first question is "1.What is polymorphism?". Obviously not a C language question.

In some cases I wonder if the person who created the original question really knows what they are doing. For example,

10. What will be the output of the following code?

void main () {
int i = 0 , a[3];
a[i] = i++;
printf(“%d",a[i]);
}

The answer posted notes that the a[0] will be assigned the value 0 then i will be incremented to a value of 1. The printf will attempt to reference a[1] but since nothing has been assigned to this, you will get back a random value.

This is very true. What should also be noted, if this is a C language question, is that the ANSI C standard requires main to return an int for defined behaviour. Declaring main as "void main()" is okay in C++ but not in C. In pre-ANSI C the keyword void did not exist. When you see something like:

main()
{
printf("Hello world.\n");
return 0;
}

The default return type, when not explicitly indicated is an int. So the above snippet is the equivalent of:

int main()
{
printf("Hello world.\n");
return 0;
}

Many people wrongly assume no explicit return type means it returns void.

The questions on the interview web site have a lot of wrong answers. Oddly enough, I have conducted technical interview for hundreds of people on various languages and operating systems. I find a fair number of them seem to either have no knowledge of what they claim to know or they frequent these interview web sites and have bad knowledge of things they claim to know.

If you are surfing the web looking for answers to interview questions, think twice about the source of the information. Just surf the site and think are there things about the question which are questionable? Is the same question posted twice but with different answers? Are questions in the wrong section? Are there questions without answers? If the answer is yes to these questions then the person who is putting up the site probably knows as much or less than you.

Additionally, when I run a whois on the site, the owner of the site is hidden. If you don't know who owns the site, how do you know you can trust the information? Why don't they want you to know who they are?

Bottom line, if you try using these interview sites to make it through an interview you might get the job but you will not keep it. These sites are good for questions but you want to find out the answers for yourself and not trust the answers posted. I hang out on various forums and newsgroups. If you seem like someone who really wants to learn I'll help you out. If you just want to pass an interview I'll know it.

Tuesday, October 2, 2007

Is Automated testing development?

I'm not talking about unit testing. I am talking about regression testing. There are a number of automation tools out there and for some applications you can just use the record and playback feature. WinRunner, SilkTest, RationalRobot, etc. all have a feature where you can turn on a recorder, manually walk through an application then save the script. Later you can play the script back; if nothing has changed the script should execute without error.

This is the theory. The reality is that most projects change and the scripts fail. You then have to take the time to re-record the script or edit the code so it matches the change in the application. Additionally, the scripts tend to make the application do things but the tester still needs to add code to the script to confirm the right things happen, e.g. assert statements or capture points.

So testers are creating, maintaining, enhancing and debugging source code. This sounds a lot like development work. Yet in most the places I've seen people doing automation and with most the people I've interviewed (and some I hired), very few have knowledge of software development.

Yesterday I was talking to someone using an automated script. The script worked fine for the person developing it but did not for the person I was talking to. It turns out that the script assumes relative paths to other things. If you don't run it from the right directory (not the directory the script is in) it fails to work. To fix this flaw the 'developer' added a command line option to the script. The logic was "If there is a $1 parameter, cd $1 else assume you are in the correct directory."

There was no comments in the script, they did not reassign the $1 variable to something more sensible and they checked for $1 deep in the script, i.e. not at the top.

The person I spoke with spent an hour trying to figure out what was wrong. She even spoke with the creator of the script and he couldn't figure out she was doing wrong.

A good development practice is a coding style guideline. Using appropriate comments, parsing input parameters near the beginning of the script and possibly writing it as a function. Developers working on a team have learned that a consistent style makes it easier for everyone to take over someone else's code. At first a new developer might want everyone to switch to their standard but once they come around everyone benefits.

Creators of automated regression tests never create a coding standard. In many cases they don't use source control. Additionally, they will pick automation tools that have poor or no debugging capabilities. Developers like Visual C++ or Java because the IDEs are so advanced. Once you get familiar with Eclipse or NetBeans, you could never imagine using Java from the command line again.

If developers are using powerful tools like Eclipse to develop their code, how is an automated tester going to keep up? Every time the developer makes a required change/enhancement to the application, the tester will have to maintain their scripts. If the developer can make the change in minutes but the tester takes hours, the cost of maintaining the automation will not be worth it.

I like the idea of pair programming where one person does the development and the other person codes the tests. Agile programmers are more thinking about unit testing when they describe this concept but why not have an integration or system level tester be a person with development skills?

I'm not saying that developers should start doing automation testing. Developers have a different mindset then a tester. I'm suggesting that testers should have some fundamental development training. If you hire QA or testing staff with development experience you will probably get better automation.

Additionally, if you are an automation tester, learn development techniques and apply them to your script development. Become more efficient. In many companies, they get automation tools but end up abandoning them because they become a maintenance nightmare. Maybe you will be the person who saves the tools and keeps the company using them.

Finally, automated testing is not WHAT you want to test. It is HOW you want to test. If I'm creating an application, I'll first come up with a list of requirements and design WHAT I want to create. If I decide I'm going to write the application in Java or C++, the language does not, for the most part, dictate WHAT I'm going to create. The automation tool you use comes at the implementation stage. You still need a test strategy, test plans and a list of test cases. Only then should you be looking at HOW you are going to automate the test cases.

Wednesday, August 8, 2007

AJAX

It has been a while since I have posted. I've worked on a number of things but I think the 'hot' topic right now would be testing AJAX. AJAX stands for Asynchronous Javascript And XML. See Wikipedia's page on AJAX for more information.

A few years back a web site might be set up so you have two frames (left and right). The left frame has a tree view. When you select something in the tree view the web browser sends an HTTP Request to the web server. The web server response with an HTTP Response to the web browser. The browser then displays it to the right frame.

Basically, every time you click the appropriate link, the browser sends a synchronous HTTP Request, the user waits for the response, then it display.

Along comes AJAX. Now the tree view is javascript. When you select something in the tree view, AJAX sends an HTTP Request directly to the web server. The web browser is unaware of the request and therefore does not wait for the response. You can continue using the browser while AJAX, asynchronously, waits for the response.

The problem with testing AJAX is that most test software detects when the web browser does an HTTP Request and waits for the HTTP Response. Because AJAX handles the request and response, the test tools are unaware there is a need to wait. If the selection in the tree view took 3 seconds, the test script will click the tree view and nanoseconds later expect the results to be in the right frame.

Solution #1: I can put a sleep for 3 seconds. Problem is, network conditions change. It might be next time it takes 5 seconds or 7 seconds. We could brute force it and make the sleep 1 hour. But if it sleeps for 1 hour on each click and there are more than two dozen clicks, it will take over a day for even a simple test case. NOT A GOOD SOLUTION.

Solution #2: If my test suite is written in javascript and running in the same authentication domain as the website I'm testing (e.g. Selenium) then I could write my AJAX so it sets a flag when it does the HTTP Request and clears the flag when it gets the HTTP Response. Now the test code can wait for the flag to get set then wait again for the flag to be cleared before it assumes the right frame has the results in it. This is a good solution but it requires you to modify the application under test (AUT).

Solution #3: Create a man-in-the-middle setup. The way that man-in-the-middle works is original from a phishing (fraud) scheme. You have probably seen it. You get an email from 'your bank' telling you to log in and fix a problem with your account. The link says http://mybank.com but the actual link is to http://mybank_com.evil_doers.com. The evil_doers.com website will receive the HTTP Request from your browser, look over the information you are sending then pass it on to the real mybank.com. When mybank.com receives it, it will log you in and send an HTTP Response back to evil_doers.com. The evil_doers.com will examine the response, log it and send it back to you.

It is like putting a wire tap on your telephone. We can use this for good. I have host1.com running the web browser. I have host2.com running man-in-the-middle software and it will forward things to testsite.com. On host1.com I would normally go to http://testsite.com/my_fabulous_app/index.jsp. Now I go to http://host2.com/my_fabulous_app/index.jsp. The man-in-the-middle software will be my test software.

Realistically, I could actually run the web browser and the test software on the same machine. I'd have less security issues if I did that and the URL would become http://localhost/my_fabulous_app/index.jsp.