QA & Testing: April 2007

Friday, April 27, 2007

Sun is not the only vendor of Java

Many people who know about the Java programming language only know about the Sun implementation. But there are actually different vendors. There is:

IBM
HP
Apple
BEA
Blackdown

If you are programming Java on the AIX operating system (IBM's version of UNIX) then you would be using IBM Java. If you are programming Java on the HP-UX operating system (Hewlett-Packard's version of UNIX) then you would be using HP Java. Similarly, if you are programming Java on MacOS X then you are using Apple Java. BEA is not a creator of an operating system. They create a J2EE application server called WebLogic. It typically ships with the Sun version of Java and BEA's version of Java. The BEA version is called JRockit. Finally, Blackdown is an implementation associated with Linux.

The idea behind all these different implementations of Java is that they are better in some way. You should get better performance on your web applications if you use JRockit on the BEA WebLogic application server. If you are running Linux, the Blackdown implementation should give you better performance.

If you don't have access to HP-UX, AIX or MacOS X then you will not have the opportunity to use the OS manufacturers specific version. If you want though, you can download JRockit from BEA. Go to the BEA website, select Products then select JRockit. On the main JRockit page is an option to download JRockit for free. You can get Blackdown for free from the Blackdown website.

Wednesday, April 25, 2007

Identifying UNIX versions

I work in an environment with numerous different versions of UNIX and Linux. Sometimes I'll be accessing multiple machines from my workstation. Occasionally, I need to confirm the OS for the current terminal. The way to determine which version of UNIX you are using is with:

uname -a

For Solaris you would get something like:

SunOS rd-r220-01 5.8 Generic_117350-26 sun4u sparc SUNW,Ultra-60

For HP-UX you would get something like:

HP-UX l2000-cs B.11.11 U 9000/800 158901567 unlimited-user license

HP-UX rdhpux04 B.11.23 U ia64 0216397005 unlimited-user license

For AIX you would get something like:

AIX rd-aix09 2 5 00017F8A4C00

From this it is a little harder to see the version. It is actually AIX 5.2. If you check the man page for uname it will help you decode the hexidecimal number at the end. This will tell you things like 4C is the model ID and the 00 is the submodel ID. Additionally, AIX uses other switches to tell you about things the -a normally gives you on other platforms. For example,

uname -p # the processor architecture
uname -M # the model

For Linux things are a little tricker. The uname -a will tell you it is Linux but it will not tell you if it is SuSE Linux Enterprise Server (SLES) 10.0, Redhat AS 5.0, et cetera. To figure this out, look for a text file in /etc/ which ends in 'release', i.e.

cat /etc/*release

This text file will tell you which distribution of Linux you are using.

Tuesday, April 17, 2007

Going from small programs to large programs

Whenever I'm interviewing someone (or being interviewed) I like to know how many lines of code you have created for one project. I'm not looking for a magic number; people tend to have either programmed a few hundred to a thousand lines of code and others will have worked on something in the tens of thousands.

The reason for asking this is because you can be a junior programmer and still survive programming a few hundreds lines of code.

The trick to programming thousands of lines of code is DON'T. When junior programmers write a program they tend to write the entire program at once. If you are programming 100 lines of code, you can keep the entire concept in your head. Trying to remember 500,000 lines of code would be impossible for all but a few people.

The way you do it is to take the program and break it into sub-programs. You keep breaking it down until you have 5000 small snippets of code. They you write one of those snippets.

For example, I assigned a co-op student to write a small Bourne shell script. Our product builds in parts and has dependencies. The build system puts all the build output in a specific directory (let's call it $BUILD_DIR). The structure is:

$BUILD_DIR/$PRODUCT/$BRANCH/$BUILD/

What I wanted for the script is for the user to specify the product, branch and build. Then the script would scan the build log for references to any other product in $BUILD_DIR.

The co-op student wrote a getops loop to get the inputs from the user. Inside the loop was a case statement for each input (product, branch, build, help). In each case statement was an if/else statement for, if you did or didn't get the needed input. If you did not get the needed input was a loop to list all the possible inputs.

As you can see, I'm writing the code to get input, parse it, deal with it, etc. all in one loop/case/if/else/loop structure.

How could this be written easier?

# Check that $BUILD_DIR is defined and exists

# Get the user input
# Save the product in $PRODUCT
# Save the branch in $BRANCH
# Save the build in $BUILD

# if $BUILD_DIR/$PRODUCT is not defined or does not exist
# list possible inputs for product
# exit

# if $BUILD_DIR/$PRODUCT/$BRANCH is not defined or does not exist
# list possible inputs for branch
# exit

# if $BUILD_DIR/$PRODUCT/$BRANCH/$BUILD is not defined or does not exist
# list possible inputs for build
# exit

# build a list of all other products (omit the current product)

# search $BUILD_DIR/$PRODUCT/$BRANCH/$BUILD for references to anything from
# list of all other products

# print the results

Each break is a separate concept. I would program one at a time. I am going to write the check for $BUILD_DIR. I'm going to think about all the possible problems. The variable could be undefined, check for that. The variable could have the wrong value, check for that. The directory might not be readable by me, check for that. I'd keep thinking of things like this. Once I am positive $BUILD_DIR will hold a good value, I forget about it and focus on getting input from the user. I'm just going to get input from the user. I'm not going to validate it is good input. I'm just going to parse the command line and save all the inputs. Once I have written that, perfectly, I'm move on to validating the $PRODUCT. This will be similar to validating the $BUILD_DIR. Maybe the code to validate $BUILD_DIR should be a subroutine and I can use it to validate $PRODUCT as well.

By breaking it down into small, manageable chunks it is just writing a bunch of small code snippets. If you can write one code snippet then writing a dozen is possible.

It is good to get into this habit with small programs. If you practise this technique on small programs then writing the large ones will come naturally.

Tuesday, April 3, 2007

Planning a multi-threaded application

I have used analogies to teach programming to people for over two decades. A while ago I realized that analogies aren't only good for teaching but sometimes you can apply the concepts from something completely different to programming (and you can apply programming concepts to other aspects of life).

For example, I used to talk about Relational Database Management Systems (RDBMS) using file cabinets, drawers, folders, index cards, etc. and I realized that systems libraries used when I was a kid (before computers were affordable) actually worked and the concepts helped to form RDBMS we see today. In other words, before computers existed, secretaries, clerks, librarians, etc. had to keep track of data (books, client information, etc.). If I wanted to search for a book by author they had a set of index cards sorted by author. If I wanted to search for a book by title they had a set of index cards sorted by title. The actual books were sorted, on the shelves, according to the Dewey Decimal System. If a book was added to the library, they had to insert it on a shelf (leave room for new books so you don't have to shift the entire collection) and an index card for each index was created. Same sort of thing happens for a database. You insert the record for the book in the database and then you update all the indices.

So, how does this relate to multi-threaded applications? Well, we just need to look at a multi-threaded application differently. A professor at U of T used things like Human, Male, Female, Baby, etc. or Animal, Mammal, etc. to teach object oriented design (OOD). I learned to used project management software; people were 'resources'. You planned the activities of resources. You created dependencies between two resources, e.g. Karen might be working on something but she need work from Bob opn day 4. So at day 4 there is a dependency between Bob and Karen. If the work Bob is doing will take 9 days, I want Bob to start 5 days before Karen or I want someone to help Bob so it will get done early or I want work for Karen to do while she is waiting for Bob to finish. Basic techniques to ensure my 'resources' are used efficiently.

What if we look at the threads in an application as 'resources' and use project planning software to make sure all the different threads are working together well. Or what about systems with multiple processors. Each processor could be a resource and I want to make sure they are utilitized efficiently. If the math co-processor will take 9 seconds to complete a computation and the main cpu will need the result in 4 seconds, I have the same options as with Karen and Bob. I can start the math computation 5 seconds before the cpu needs it, I can have 5 seconds of other work for the cpu while it waits for the math computation, I can put more math co-processors. As I type this I also realized, I could put a faster math co-processor. This means for the human scenario, I can put someone who is faster than Bob on the task so it will get completed in 4 days rather than 9 days.

So for everyone who thinks learning how to use project management software is only for managing project, think again.

QA & Testing

Google Analytics

Search