QA & Testing: Bourne

Showing posts with label Bourne. Show all posts

Tuesday, January 17, 2012

Keeping data and code in one shell script file

In the past I have written scripts which required a lot of data. I could have the script read a second file with all the data in it but occasionally I'd lose the data file and still have the script. Or I'd have dozens of scripts and data files but no idea which data went with which script.

Solution: put the script and the data into one file. The script could then read the data from itself. Here is an example script:

#!/bin/sh

for entry in $(sed -n -e '/^exit/,$ p' $0 | grep -v exit | sed -e '/^$/d' | sed -e '/^#.*/d' | sed -e 's/ /_/g'); do
        entry=$(echo $entry | sed -e 's/_/ /g')
        firstname=$(echo $entry | awk -F, '{print $1}')
        lastname=$(echo $entry | awk -F, '{print $2}')
        number=$(echo $entry | awk -F, '{print $3}')
        echo "$firstname $lastname's phone number is $number"
done
exit

#FirstName,LastName,Phone
Darrell,Grainger,(416) 555-1212
John,Doe,(323) 555-1234
Jessica,Alba,(909) 555-9999

In this example, the data is a list of comma separated fields. Let's examine the list in the for statement. The $0 is the file currently executing, i.e. the file with this script and the data.

The sed command prints everything from the line which starts with exit to the end of file. The grep command gets rid of the line which starts with exit. The next sed command discards all blank lines. The third sed command discards all lines which start with #. This allows us to start a line with # if we want to add comments or comment out a line of data.

The final sed command on the for statement replaces all spaces with underscores. The reason for this is because if I have a line with a space, the for statement will process it as two separate records. I don't want that. I want to read one line as one record.

Inside the body of the for loop, the first line converts all the underscores back to spaces. If you want to have underscore in your data, this will not work. The solution is to pick a character which is not part of your data set. You can pick anything so long as the character you pick is the same in the for statement and the first line of the for loop body. The g in the sed statement is important in case there is more than one space.

The next three lines show how to break the line apart by commas. If you need to use commas in your data then pick a different character to separate the fields. The -F switch in the awk statement sets the field separator. So if you use exclamation mark as a field separator, you need to change the awk statement to -F!.

The echo statement is just an example of using the data.

Thursday, July 12, 2007

Bourne shell scripting made easy

Someone was having trouble writing a shell script. A common activity for Bourne shell scripting is to take the output from various commands and use it as the input for other commands. Case in point, we have a server that monitors clients. Whenever we get new monitoring software we have to use the server command line tool to install the cartridge on the server, create the agents for each client, deploy the agents, configure them and activate them.

The general steps are:

1) get a list of the cartridges (ls)
2) using a tool, install them (tool.sh)
3) using the same, tool get a list of agents
4) using the tool, get a list of clients
5) using the tool, for each client create an instance of each agent
6) using the tool, for each agent created deploy to the client
7) using the tool, configure the agents
8) using the tool, activate the agents

Just looking at the first two steps, if I was doing this by hand I would use ls to get a list of all the cartridges. I would then cut and paste the cartridge named into a command to install them.

So a Bourne shell script should just cut the same things out of the ls list.

If the cartridge files all end with the extension .cart I can use:

ls -1 *.cart

If the command to install a cartridge was:

./tool.sh --install_cart [cartridge_name]

I could use:

for c in `ls -1 *.cart`; do
./tool.sh --install_cart $c
done

This is pretty easy and straight forward. What if the command was not as clean as ls? What is the list of agents was something like:

./tool.sh --list_agents
OS: Linux, Level: 2.4, Version: 3.8, Name: Disk
OS: Linux, Level: 2.4, Version: 3.8, Name: Kernel
OS: Windows, Level: 5.1, Version: 3.8, Name: System

To install the agent I only need the Name. If I only wanted the Linux agents, how would I get just the Name? First, you want to narrow it down to the lines you want:

./tool.sh --list_agents | grep "OS: Linux"

This will remove all the other agents from the list and give me:

OS: Linux, Level: 2.4, Version: 3.8, Name: Disk
OS: Linux, Level: 2.4, Version: 3.8, Name: Kernel

Now I need to parse each line. If I use the above command in a for loop I can start with:

for a in `./tool.sh --list_agents | grep "OS: Linux"`; do
echo $a
done

Now I can try adding to the backtick command to narrow things down. The two ways I like to parse a line is using awk or cut. For cut I could use:

for a in `./tool.sh --list_agents | grep "OS: Linux" | cut -d: -f5`; do
echo $a
done

This will break the line at the colon. The cut on the first line would give the fields:

OS
Linux, Level
2.4, Version
3.8, Name
Disk

The problem is there is a space in front of Disk. I can add a cut -b2-, which will give me from character 2 to the end, i.e. cut off the first character. What if there is more than one space? This is why I like to use awk. For awk it would be:

for a in `./tool.sh --list_agents | grep "OS: Linux" | awk '{print $8}'`; do
echo $a
done

For awk the fields would become:

OS:
Linux,
Level:
2.4,
Version:
3.8,
Name:
Disk

The spaces would not be an issue.

So by using backticks, piping and grep I can break things apart into just the lines I want. Piping the result of grep to cut or awk to break the line apart and keep just the bits I want.

The only other command I like to use for parsing output like this is sed. I can use sed for things like:

cat file | sed -e '/^$/d'

The // is a regex pattern. The ^ means beginning of line. The $ means end of line. So ^$ would be a blank line. The d is for delete. This will delete blank lines.

Actually, lets give an example usage. I want to list all files in a given directory plus all subdirectories. I want the file size for each file. The ls -lR will give me a listing like:

.:
total 4
drwxrwxrwx+ 2 Darrell None   0 Apr 19 14:56 ListCarFiles
drwxr-xr-x+ 2 Darrell None   0 May  7 21:58 bin
-rw-rw-rw-  1 Darrell None 631 Oct 17  2006 cvsroots

./ListCarFiles:
total 8
-rwxrwxrwx 1 Darrell None 2158 Mar 30 22:37 ListCarFiles.class
-rwxrwxrwx 1 Darrell None 1929 Mar 31 09:09 ListCarFiles.java

./bin:
total 4
-rwxr-xr-x 1 Darrell None 823 May  7 21:58 ps-p.sh

To get rid of the blank likes I can use the sed -e '/^$/d'. To get rid of the path information I can use grep -v ":", assuming there are no colons in the filenames. To get rid of the directories I can use sed -e '/^d/d' because all directory lines start with a 'd'. So the whole thing looks like:

ls -lR | sed -e '/^$/d' -e '/^d/d' | grep -v ":"

But there is actually an easier answer. Rather than cutting out what I don't want, I can use sed to keep what I do want. The sed -n command will output nothing BUT if the script has a 'p' command it will print that. So I want to sed -n with the right 'p' commands. Here is the solution:

ls -lR | sed -n -e '/^-/p'

This is because all the files have '-' at the start of the line. This will output:

-rw-rw-rw-  1 Darrell None 631 Oct 17  2006 cvsroots
-rwxrwxrwx 1 Darrell None 2158 Mar 30 22:37 ListCarFiles.class
-rwxrwxrwx 1 Darrell None 1929 Mar 31 09:09 ListCarFiles.java
-rwxr-xr-x 1 Darrell None 823 May  7 21:58 ps-p.sh

I can now use awk to cut the file size out, i.e. awk '{print $5}'. So the whole command becomes:

ls -lR | sed -n -e '/^-/p' | awk '{print $5}'

If I want to add all the file sizes for a total I can use:

TOTAL=0
for fs in `ls -lR | sed -n -e '/^-/p' | awk '{print $5}'`; do:
TOTAL=`expr $TOTAL + $fs`
done
echo $TOTAL

The expr will let me do simple integer match with the output.

NOTE: you use use man to learn more about the various commands I've shown here:

man grep
man cut
man awk
man sed
man regex
man expr

The sed and awk commands are actually powerful enough to have entire chapters written on them. But the man page will get you started.

While you are at it, do a man man.

Enjoy!

Friday, March 23, 2007

Named pipes

If you are familiar with UNIX, you are familiar with pipes. For example, I can do:

ps -ef | sort | more

The ps command will output a list of all processes to stdout. Normally, this would be to the console window. The pipe (|) will tell UNIX to take the output of ps and make it the input to sort. Then the output from sort will become the input to more.

Without using pipes I could do:

ps -ef > temp_file1
sort < temp_file1 > temp_file2
rm temp_file1
more temp_file2
rm temp_file2

This is like using the pipe but instead we put the output of ps into temp_file1. Then we use temp_file1 as the input to sort and send the output to temp_file2. Finally, we use temp_file2 as the input to more. You should be able to see how this is a lot like the first example using pipes.

Now here is a third way using Named Pipes. To create a named pipe use:

mkfifo temp_file1

If you list this entry using ls -l you will see something like:

prw-r--r--   1 dgrainge staff          0 Mar 23 08:13 stdout

Notice the first letter is not - for a file or even d for a directory. It is p for a named pipe. Also the size of the 'file' is 0. We will need two shells to do this.

# shell 1
mkfifo temp_pipe1
mkfifo temp_pipe2
ps -ef > temp_pipe1            # this will block so switch to shell 2

# shell 2
sort < temp_pipe1 > temp_pipe2 # this will block so switch back to shell 1

# shell 1
more temp_pipe2
rm temp_pipe1
rm temp_pipe2

The interesting thing about this example is that we needed two shells to do this. At first this might seem like a downside but the truth is, this is a positive. I can do something like:

mkfifo stdout
mkfifo stderr

# shell 2
more stdout

# shell 3
more stderr

# shell 1
sh -x some_script.sh 1> stdout 2> stderr

The -x will turn on trace. Debug information will be output to stderr. By using the named pipes, I can redirect the regular output to shell 2 and the debug information to shell 3.

Thursday, March 8, 2007

Extracting part of a log using Bourne shell

Someone recently asked me how to select a range of text from a log file. Because it was a log file, each line started with the date and time for each log entry.

She wanted to extract all the log entries from a start time to an end time. For example, all log entries from 08:07 to 08:16 on March 8th, 2007. The format for the timestamp would be:

2007-03-08 08:07:ss.sss [log message]

where ss.sss was the seconds and [log message] was the actual text message written to the log.

My solution, using Bourne shell, was to determine the first occurance of "2007-03-08 08:07" using grep. The GNU grep command would be:

START=`grep -n -m1 "2007-03-08 08:07" logfile.log | cut -d: -f1`

The -n will prefix the results with the line number. The -m1 tells it to quit after the first match. The output is going to be something like:

237:2007-03-08 08:07:ss.sss [log message]

where 237 is the line number. So the cut -d: will break the line at the semicolons and the -f1 will take the first field, i.e. 237.

Next you want to find the last occurance of 08:16. I would suggest looking for 08:17 using the same grep command, e.g.

END=`grep -n -m1 "2007-03-08 08:17" logfile.log | cut -d: -f1`

The reason you want to look for the value after the real END time is because a log might have many entries for 08:16. By looking for 08:17 we know we have captured all the entries for 08:16 rather than just the first entry.

This will give us the line AFTER the line we want, so we do the following to decrement it by one:

END=`expr $END - 1`

Now we want to extract everything from START to END in the log. We start by extracting everything from 1 to the END using the head command:

head -n $END logfile.log

Now we want to trim off the first START lines from this. For that we can use the tail command. But the tail command wants to know how many lines are to be kept. The value of START is the number of lines we want to get rid of. So we really want $END - $START + 1. So:

LINES=`expr $END - $START + 1`

Finally we would have:

head -n $END logfile.log | tail -n $LINES

and this will display only the lines from 08:07 to 08:16 on March 8, 2007.

Sunday, December 24, 2006

Debugging Bourne shell scripts

When most people start programming Bourne shell scripts they are VERY small. If you take over someone else's work or you have been programming shell scripts for any significant length of time, you will find something wrong in a script of thousands of lines.

Originally you might have put a few echo statements to see what was going wrong with a script. With thousands of scripts, scripts calling scripts, aliases set to scripts, system commands written as scripts, etc. it can get quite difficult to find a bug using echo statements.

The solution is to run the script with the -x flag. If you have the script starting with:

#!/bin/sh

then you can edit it to be:

#!/bin/sh -x

Or better yet, if the script is run using:

./myscript.sh

change it by using:

sh -x ./myscript.sh

Running the script with this option will let you see each line as it is executed. You will also see variables getting expanded. With this output the echo statements should be unnecessary. Try running a small script with this option and see what it outputs.

QA & Testing

Google Analytics

Search