Now, as promised, a closer look at the pipe, list, and redirection
characters and their functionality.
Pipes are a UNIX feature which allows you to connect several commands
together in one line and pass data from one to the next much like
a chain of firemen sending buckets of water down a line. The data
in the bucket is processed by each command and then passed on
to the next command without ever coming up for air. This happens
because of two things:
- Most UNIX commands get input from stdin
and pass output to stdout
- The pipe symbol (|) directs UNIX to connect stdout
from the first command to the stdin of the second command.
So that sounds simple. How does it work in practice and what does
a command pipe look like. The example belo shows several pipes
made up of groups of commonly piped commands. You will see examples
of these syntax structures in most scripts somewhere in the code.
line_count=`wc -l $filename | cut -c1-8`
process_id=`ps -ef \
| grep $process \
| grep -v grep \
| cut -f1 -d\ `
upper_case=`echo $lower_case | tr '[a-z]' '[A-Z]'`
In all cases the pipeline has been used to set a variable to the
value returned by the last command in the pipe. In the first example,
the wc -l command counts the number of lines in
the filename contained in the variable $filename. This
text string is then piped to the cut command
which snips off the first 8 characters and passes them on to stdout,
hence setting the variable line_count.
In the second example, the pipeline has been folded using the
backslash and we are searching for the process_id
or PID of an existing command running somewhere on the system.
The ps -ef command lists the whole process table
from the machine. Piping this through to the grep command
will filter out everything except any line containing our wanted
process string. This will not return one line however,
as the grep command itself also has the process
string on its command line. So by passing the data through a second
grep -v grep command, any lines with the word
grep on are also filtered out. We now have just the one
line we need and the last thing is to get the PID from the line.
As luck would have it, the PID is the first thing on the line,
so piping through a version of cut using the field option,
we finally get the PID we are looking for. Note the field option
delimiter character is an escaped tab character here. Always test
the blank characters that UNIX commands return, they are not always
what you would think they are.
You should be able to work out the last example yourself based
on just the variable names alone. Note the shorthand version of
the complete alphabet we used for the tr in
Example Function Syntax.
Lists look similar to pipes except the pipe symbol
'|' is replaced by one of the following list
symbols between each command in the list: ';',
'&', '&&',
or '||', and optionally terminated by ';'
or '&'. The semi-colon character is interpreted by
UNIX to be a Carriage Return, so a list of commands separated
by semi-colons behaves in much the same way as a list of commands
on separate lines would behave (hence the name). The difference
is that all the list types can be executed in the current shell
or in a sub-shell by utilising a slightly different syntax and
the output from the completed list can be redirected (see below)
or piped (see above). The two syntax forms for shell locations
of lists are shown in Current Shell and
Sub-Shell below. Hint - It's all in the
brackets.
{ command; command; command; }
( command; command; command; )
The other list symbols change the way the list is processed and
they have the following meanings:
- & Asynchronously executes the preceding pipeline
(as a background task)
- && Execute only if preceding command or pipe
terminated with zero exit status (i.e. it exited okay)
- || Execute only if preceding command or pipe terminated
with non-zero exit status (i.e. it failed)
The matter of redirecting input and output follows a similar principle
to that of piping. The significant differences are that redirects
work with files, not commands, and there is a limit to how many
you can put on one line - depending on the open file descriptors.
Whereas the pipe connects one commands output to the next commands
input, a redirect tells a command to put its output into a file
or collect its input from a file. There is also a difference in
the syntax due to the way UNIX executes its commands.
Normally UNIX will try and find an executable file somewhere on
the command path ($PATH variable) which matches the first word
on the current command line. So the first command in a pipe gets
found, then executed, and data is piped on to the next command.
Because redirecting is for files and not commands, a redirect
file cannot be placed ahead of the command on the line. Take another
look at the last pipe in the above example. Rewriting this command
as a redirect would give the following:
tr '[a-z]' '[A-Z]' < $in_file > $out_file
Now you can see the difference. The command must come first, the
in_file is directed in by the less_than sign (<)
and the out_file is pointed at by the greater_than
sign (>). The file descriptor in the in_file
can include a wild card to select a number of files. However,
the out_file must be unique. Just remember the in_file
points its arrow at the command, while the out_file gets
pointed at.
The redirect arrows can also be doubled up as in the next example.
Here the output from the cat command is a file as before. The
double greater_than (>>)
directs the output to be appended to the file, if it already exists.
If the file does not exist, it is created. The single arrow form,
as used above, would always create a new file if there was none
there, or overwrite an existing file.
On the input side the double arrow has a slightly different meaning.
Here, where the single arrow gets the input from a file, the double
arrow gets its input from the shell file that is currently executing.
You may be wondering how the shell can tell where the end of this
input is and where the continuing shell script restarts. Well,
that is the reason for the word following the double less_than
sign. The word is a marker that the shell will look out for when
reading the input stream. When the word shows up, input will stop
and the script will continue.
cat >> $out_file << EOF
first line of data
second line of data
more data
the end of the data
EOF
In this case I have used the flag word EOF to indicate the End
Of File. However, any word will be acceptable as long as it is
unique in the script file. What I generally do is use EOA, EOB,
EOC, etc., if I create several files within a script. The
capitalisation is not important, but it does make the flags easy
to match up as a pair when reading the script.
There is one more thing you can do with this redirected input
from a script file. Look at this next example and you will see
a minus sign between the double arrows and the flag name:
cat >> $out_file <<-EOF
This instructs the shell to remove leading tab characters from
all lines in the input steam including the matched flag word.
This handy trick allows you to use a code indent which makes reading
much easier as in the following example which is a copy of the
previous code segment, but in this new easier to read format.
cat >> $out_file <<-EOF
first line of data
second line of data
more data
the end of the data
EOF
Now it looks more like a code block and the flag
word stands out too. The section which is indented
is easily understood to be the contents of the created file. This
feature is not available in the C Shell.
In addition, the redirect arrows can actually redirect input and
output, to and from the stdio files, known
by their descriptor names (0 and 1). These are the default input
and output files for UNIX, usually connected to keyboard (0),
display (1) and errors (2). Thus the syntax:
<&digit
uses the file associated with file descriptor digit as
the standard input. The same goes for standard output if you reverse
the arrow. You can also associate one file with another as in
this example:
ls -l $directory/*.log > $out_file 2>&1
Here we see an ls command outputting to out_file.
At the end however, is another redirect which is indicating that
stderr (file descriptor 2) should also be
sent into the stdout (file descriptor 1), which in this
case is our out_file. To say the same thing in C Shell
the syntax looks simpler, but is harder to read because the descriptor
numbers are missing:
ls -l $directory/*.log >& $out_file
Don't forget, you can use this mechanism to create files with
any content you like generated from any other combination of commands.
Here is an example of a menu file listing the files in a directory.
This can then be displayed to the screen and the users choice
selected quite simply.
count=0
for file in `ls -1 $source_directory`
do
count=`expr $count + 1`
echo "$count: $file" >> $menu_file
done
echo "Please select a number from this menu"
cat $menu_file
read $choice
echo "Thanks"
filename=`grep $choice $menu_file | cut -f2 -d:`
echo "You chose [$filename]"
This example is very simplistic however and will not cope with
filenames that contain digits or filename lists longer than 9
lines. Both of these conditions could lead to the grep
returning more than one line which is an error condition (See
- Simple Menu Functions for a
better solution to these problems).
|