Plumbing

10 Plumbing

As mentioned in passing many times above (see section 5) Unix is designed as a modular system, with many small programs all doing one job well. This in itself doesn’t make these small programs very useful until you start encountering the ways that Unix allows you to glue them together.

10.1 pipes

The pipe character ‘|’ (a single vertical bar, normally found by pressing the backslash key with shift held down) is the single most useful feature of Unix systems. It has already been encountered in passing, (see section 9.4 and 7.4) but its use has never been fully explained.

Simply put the pipe sends the output of one program into the input of another program. Formally it links the standard out (stdout) to the standard in (stdin) of two processes and transmits a “stream of bytes” between them. This idea is best shown through example:

% ls | grep 'a'

This runs the “ls” command on the current directory, but instead of sending its output to your terminal it instead sends it through the pipe and into the input of “grep”. Now grep is used to search streams of text for strings, and here it’s searching for the string ‘a’. So what this does is give you all the output of “ls” that involves the letter ‘a’ in any way.

While this doesn’t seem very useful now remember that you can also do things like this:

% who | less

Which sends the output of the who list to your pager (less) enabling you to view its output more easily.

Virtually any number of pipes can be put between any number of commands, forming rather complex filters, a good example of this would be if you wanted to see a list of a certain user’s logins you could use a line like this:

% last | grep 'frank' | less

Which would print the last list (see section 2.8), send that through grep to search for the string “frank” and finally pipe anything it did find into less, so it could be viewed with a pager.

Of course if you have $PAGER set to something else, or are using many options with it you could use a line like this:

% last | grep 'frank' | $PAGER

Which would do exactly the same thing, but would run the contents of $PAGER as a program and send the results of the grep into that.

While piping into grep or into less are the most common tasks you are likely to perform you can do rather more complex things. For example if you have a rather large directory containing amongst other things a series of HTML files, each one named for its date, and you wanted to find the newest file you could use:

% ls *.html | sort | tail -n 1

This runs ls, and makes it look only for files with the pattern “*.html” (i.e. those files that have anything ending in .html). The output of this command, which should be all .html files in the directory is then sent via a pipe into the sort command, which sorts its input in descending order. The sorted output is then sent into the tail command, which has the options “-n 1”, this makes tail only output the last line (number of lines outputted is set to 1), and so this outputs the newest file.

The same functionality can be done with the line:

% ls *.html | sort -r | head -n 1

Which sorts in reverse order (because of the ‘-r’ flag), and then uses “head” to get the first line (which because of reversal will be the newest).

The same functionality can also be done with the line:

% find . -name '*.html' | sed -e 's/\.\///' | sort | tail -n 1

Which uses find to find the files, then sed (see section 11.1.7) to remove the string ‘./’ from the start of the line, then sorts it and uses tail on it.

As you can see when you start combining commands with pipes an amazing range of ways of doing things becomes available, and will grow as you become more familiar with the various commands.

10.2 redirects

Another feature that can be quite handy to know about is that of being able to redirect the output of a program. With pipes we’re already seen how to redirect it from one program to another, but its often handy to capture it into a file for later use. And for this you need redirects.

This section will cover the basics of using redirects. And really they’re fairly simple. For example if you wanted a list of all the files in your home directory you could do this:

% find . -name '*'

Now doing this is all very well, but assume that you wanted to store a copy in a file itself (say called file-list). Now you could send it via a pipe into your editor (if your editor supported this, vim can do it by giving - as a filename). This would leave you with a command like this:

% find . -name '*' | vim -

However then you’d have to wait for the system to start up your editor, and need to save the contents to the file “file-list”, there is a much easier way, as your shell can write the output anywhere you want it.

% find . -name '*' > file-list

The angle bracket ‘>’ is used almost like an arrow, it points the output from the command into “file-list”, which if it already exists it overwrites.

10.2.1 Appending with redirects

If you want to append the output of one command to an earlier one then you’d have to use > to output it to another file, then join them together with the cat command, this is obviously a pain as you have to type two commands instead of just one. However this can be gotten around with the append redirection operator.

% ls >> foo

Now this will run ls as normal, then append the results of this to the file “foo”, or course if the file doesn’t exist it will create it, but it’s often safer to use >> rather than a single > just in case you name a file that already exists, then it won’t destroy its contents, just append some output to the end.

10.2.2 Redirecting input

There are certain situations in which you want to send the contents of a file into the input of a program. Assuming that you had a large file containing unsorted words that needed to be sorted into the right order you could use the following command:

% cat big-file | sort

This will echo the contents of “big-file” into the input of the sort command. However that requires invoking a whole separate program (cat in this case), a cleaner way of doing this is:

% sort < big-file

This will do exactly the same thing as the command above, with one less pipe and thus process. Also there are some situations where a program wants a list of commands sent into its input, in which case this syntax is usually better.

However when doing this you can also redirect the output of the sort (or whatever command you’re running) to another file as usual, this is done in exactly the way as is shown above but its worth providing an example for reference as it can look daunting at first glance:

% sort < big-file > sorted-big-file

This will run sort, inserting the contents of “big-file” into its standard input. The output of sort (which will be the sorted input) will be written into the file “sorted-big-file”.

10.3 Embedding commands in other commands

Now that you understand pipes and redirects you can do a fair deal more with your shell, combining commands using pipes as filters for output should improve your efficiency a fair degree. However there is one more operator that you can use which can prove fairly useful in writing Unix command lines, and this is the back-tick operators, which allow you to embed the results of another command inside a current one. It sounds fairly confusing but this example should clear things up.

In the description below you will encounter a character called a backtick. This character looks like this ‘ or in the font used for examples this `. It is entered by pressing the key which is above tab and to the left of the ‘1’ key. You don’t need to press shift or control, just the key.

Now let’s assume that you wanted to see the long ls (“ls -l”) information about the command ftp. Normally you’d need to run “which ftp” and then, remembering the output of that command, run “ls -l” followed by the location. Using back-ticks you can make this easier:

% ls -l `which ftp`

This starts by running the command in the back-ticks (“which ftp”). This gives the result “/usr/bin/ftp”, and the shell then runs “ls -l /usr/bin/ftp”, then prints the result of that to your console.

Back-ticks are a handy tool in your toolbox for writing Unix commands and can be used inside any command, including aliases. An example showing the many ways to do things would be a single command line to finger the last user who logged onto the system. Now there are two ways of doing this:

% last | head -n 1 | awk '{print $1}' | xargs finger

Which gets the first line from the “last” command and uses awk to print out the first column from it (which is the username, see section 11.1.1 for more information) then passes this to the xargs command (see 11.1.14), which runs finger with that username as its argument.

With the use of backticks you can avoid such logically messy methods and make your command-lines easier to get a mental image of, which makes them easier to write and use:

% finger `last | head -n 1 | awk '{print $1}'`

What this does is run the whole section inside the back-ticks first (which runs last, gets the first line, then gets the first column of the first line with awk) and then the shell runs “finger” followed by that output as its argument, meaning it does the same thing but is more obvious as “finger” is at the start of the line, not the end.