Tuesday, April 17, 2007

Going from small programs to large programs

Whenever I'm interviewing someone (or being interviewed) I like to know how many lines of code you have created for one project. I'm not looking for a magic number; people tend to have either programmed a few hundred to a thousand lines of code and others will have worked on something in the tens of thousands.

The reason for asking this is because you can be a junior programmer and still survive programming a few hundreds lines of code.

The trick to programming thousands of lines of code is DON'T. When junior programmers write a program they tend to write the entire program at once. If you are programming 100 lines of code, you can keep the entire concept in your head. Trying to remember 500,000 lines of code would be impossible for all but a few people.

The way you do it is to take the program and break it into sub-programs. You keep breaking it down until you have 5000 small snippets of code. They you write one of those snippets.

For example, I assigned a co-op student to write a small Bourne shell script. Our product builds in parts and has dependencies. The build system puts all the build output in a specific directory (let's call it $BUILD_DIR). The structure is:

$BUILD_DIR/$PRODUCT/$BRANCH/$BUILD/

What I wanted for the script is for the user to specify the product, branch and build. Then the script would scan the build log for references to any other product in $BUILD_DIR.

The co-op student wrote a getops loop to get the inputs from the user. Inside the loop was a case statement for each input (product, branch, build, help). In each case statement was an if/else statement for, if you did or didn't get the needed input. If you did not get the needed input was a loop to list all the possible inputs.

As you can see, I'm writing the code to get input, parse it, deal with it, etc. all in one loop/case/if/else/loop structure.

How could this be written easier?

# Check that $BUILD_DIR is defined and exists

# Get the user input
# Save the product in $PRODUCT
# Save the branch in $BRANCH
# Save the build in $BUILD

# if $BUILD_DIR/$PRODUCT is not defined or does not exist
# list possible inputs for product
# exit

# if $BUILD_DIR/$PRODUCT/$BRANCH is not defined or does not exist
# list possible inputs for branch
# exit

# if $BUILD_DIR/$PRODUCT/$BRANCH/$BUILD is not defined or does not exist
# list possible inputs for build
# exit

# build a list of all other products (omit the current product)

# search $BUILD_DIR/$PRODUCT/$BRANCH/$BUILD for references to anything from
# list of all other products

# print the results

Each break is a separate concept. I would program one at a time. I am going to write the check for $BUILD_DIR. I'm going to think about all the possible problems. The variable could be undefined, check for that. The variable could have the wrong value, check for that. The directory might not be readable by me, check for that. I'd keep thinking of things like this. Once I am positive $BUILD_DIR will hold a good value, I forget about it and focus on getting input from the user. I'm just going to get input from the user. I'm not going to validate it is good input. I'm just going to parse the command line and save all the inputs. Once I have written that, perfectly, I'm move on to validating the $PRODUCT. This will be similar to validating the $BUILD_DIR. Maybe the code to validate $BUILD_DIR should be a subroutine and I can use it to validate $PRODUCT as well.

By breaking it down into small, manageable chunks it is just writing a bunch of small code snippets. If you can write one code snippet then writing a dozen is possible.

It is good to get into this habit with small programs. If you practise this technique on small programs then writing the large ones will come naturally.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.