Wednesday, March 23, 2011

a new type of source control - Git

At the new company they started with Subversion for source control but are now moving everything to Git.

I've worked with a few different source control servers and clients. The basic idea was to have a central repository which holds all the source code. You check in the source code to the central repository. Subsequent changes to the code are saved in the repository as changes or deltas.

If two or more people are working on the same project they have to set up a policy or convention. In some systems the team has to follow the policy. With other source control systems you can set up rules to enforce a policy.

For example, I check out a copy of the code, you check out a copy of the code. We both make changes to the file Foo.java. You check in your code and the repository is updated. When I go to check in my change it tells me there is a conflict. I have to merge my changes with your change. Only after that is done can I check in my changes.

In this model source is handled file by file. If our project has 37 files and only one has a conflict, the other 36 files will check in without issue.

With Git things are different. There is no central repository. I create a Git repository on my machine. I commit my changes to the local repository. When I want to share the changes with others I can push my repository to a central location or I can push it directly to someone on the team.

I pull a copy of the repository, you pull a copy of the repository, we both change Foo.java, you push your change then I attempt to push my change. Git will tell me there have been changes and I cannot push my repository to the central location. I need to pull the repository down and merge it with my changes. I can also use fetch rather than pull. Using fetch will do a pull then merge in one step.

The basic commands are:

git init
git clone
git status
git commit
git push
git pull
git rm

To create a repository on your local machine, use git init. If the central repository is located at the URL git:git@git-server:project.git then I can make a local copy using:

git clone git:git@git-server:project.git

Once I have a local copy I can use git status to see what has changed. If I look at the output of git status it will list the files which have changed and the untracked files. Issuing a git commit -a will add the changes to my LOCAL repository. To share this with others I would do a git push. If I'm in the directory which contains git controlled files, it will look at the .git/config file, determine where the "origin" repository is and push the files to there. You can also git push to another location (someone on your team or a sub-team working outside the main branch.

If you want to remove files from the repository you can use git rm. If you just rm a file, the next git pull will bring the file back. If you git rm then git commit, the file will be permanently remove from your LOCAL repository. A git push will make this permanent for everyone else.

Tuesday, March 15, 2011

floating point numbers and representation error

Floating point numbers can be a great source of error. When we think about pure mathematics it includes the concept of infinity. For example, between the numbers 0 and 1 are infinite fractions.

When you store a floating point number in memory it has a limited number of values. This means that it is impossible to represent all possible floating point numbers. Different languages have different ways to represent numbers. The more accurate the floating point numbers are, the more expensive, computationally, using them becomes.

This means that operations like 1.47 - 1.00 might result in 0.46999999997 because the math library cannot represent 0.47 and 0.46999999997 is the closest match.

If the program you are testing is financial and it only deals with dollar and cents, this representation error can be problematic.

As a programmer there is a simple solution to this problem. Use only integers (long or double long if possible) and treat them as cents. So the above example would become:

long n1 = 147;           // $1.47
long n2 = n1 - 100;      // $1.47 - $1.00
long dollars = n2 / 100; // 0
long cents = n2 % 100;   // 47
System.out.printf("$%d.%02d", dollars, cents);

This means if you have a program which uses dollars and cents, you want to check for representation error (assuming the programmer used floating point variables) but you also want to check for issues which might be integer related. So you want to consider things like integer overflow and underflow. See the article Are most binary searches broken? for a discussion of overflow.

Saturday, March 5, 2011

Performance Testing and analogies

Talking with a tester at my new company and mentioning how analogies can often help you to look at things differently. Today this got me thinking about performance testing and scuba diving. Are there concepts or practices from scuba diving that can be applied to performance testing?

One of the rules for technical divers is called The Rule of Thirds. The idea behind this rule is that you should keep one third of your air as an emergency reserve. If I understand how much air I use for a given dive I should multiply it by 1.5 to handle unforeseen emergencies. For example, if I calculate my air consumption to require 80 cubic feet of air then I should take an additional 40 cubic feet of air. Thus 120 cubic feet of air and one third (40 cubic feet) held in reserve. If nothing goes wrong, I'll end the dive with 40 cubic feet of air. If an emergency happens, I have the extra air.

So how does this relate to performance testing? Usually when you are doing performance testing, you have a set of criteria. For example, one criteria with be that there will be 450 concurrent users. What happens if some unforeseen situation occurs? You do all your testing with 450 concurrent users and it passes with a nice margin. You throw it into production and something no one anticipate occurs. Your application gets a huge hit. Apply The Rule of Thirds and use 600 concurrent users. If your tests pass with 600 concurrent users you know it will pass with 450 concurrent users and you are prepared if some unforeseen circumstance occurs.

If your application is say a web store and some celebrity mentions your product on the red carpet at the Oscars, do you want your web site to go down when 50% more users than you could have ever anticipated tries to buy your product?