As mentioned in passing many times above (see section 5) Unix is designed as a modular system, with many small programs all doing one job well. This in itself doesn’t make these small programs very useful until you start encountering the ways that Unix allows you to glue them together.
The pipe character ‘|
’ (a single vertical bar, normally found by
pressing the backslash key with shift held down) is the single most useful
feature of Unix systems. It has already been encountered in passing, (see
section 9.4 and 7.4) but its use has never been
fully explained.
Simply put the pipe sends the output of one program into the input of another program. Formally it links the standard out (stdout) to the standard in (stdin) of two processes and transmits a “stream of bytes” between them. This idea is best shown through example:
% ls | grep 'a'
This runs the “ls
” command on the current directory, but instead of
sending its output to your terminal it instead sends it through the pipe and
into the input of “grep
”. Now grep is used to search streams of text
for strings, and here it’s searching for the string ‘a’. So what this does is
give you all the output of “ls
” that involves the letter ‘a’ in any
way.
While this doesn’t seem very useful now remember that you can also do things like this:
% who | less
Which sends the output of the who list to your pager (less) enabling you to view its output more easily.
Virtually any number of pipes can be put between any number of commands, forming rather complex filters, a good example of this would be if you wanted to see a list of a certain user’s logins you could use a line like this:
% last | grep 'frank' | less
Which would print the last list (see section 2.8), send that through grep to search for the string “frank” and finally pipe anything it did find into less, so it could be viewed with a pager.
Of course if you have $PAGER set to something else, or are using many options with it you could use a line like this:
% last | grep 'frank' | $PAGER
Which would do exactly the same thing, but would run the contents of $PAGER as a program and send the results of the grep into that.
While piping into grep or into less are the most common tasks you are likely to perform you can do rather more complex things. For example if you have a rather large directory containing amongst other things a series of HTML files, each one named for its date, and you wanted to find the newest file you could use:
% ls *.html | sort | tail -n 1
This runs ls, and makes it look only for files with the pattern “*.html”
(i.e. those files that have anything ending in .html). The output of this
command, which should be all .html files in the directory is then sent via a
pipe into the sort
command, which sorts its input in descending order.
The sorted output is then sent into the tail
command, which has the
options “-n 1
”, this makes tail only output the last line (number of
lines outputted is set to 1), and so this outputs the newest file.
The same functionality can be done with the line:
% ls *.html | sort -r | head -n 1
Which sorts in reverse order (because of the ‘-r
’ flag), and then uses
“head” to get the first line (which because of reversal will be the newest).
The same functionality can also be done with the line:
% find . -name '*.html' | sed -e 's/\.\///' | sort | tail -n 1
Which uses find to find the files, then sed (see section 11.1.7) to remove the string ‘./’ from the start of the line, then sorts it and uses tail on it.
As you can see when you start combining commands with pipes an amazing range of ways of doing things becomes available, and will grow as you become more familiar with the various commands.
Another feature that can be quite handy to know about is that of being able to redirect the output of a program. With pipes we’re already seen how to redirect it from one program to another, but its often handy to capture it into a file for later use. And for this you need redirects.
This section will cover the basics of using redirects. And really they’re fairly simple. For example if you wanted a list of all the files in your home directory you could do this:
% find . -name '*'
Now doing this is all very well, but assume that you wanted to store a copy in a file itself (say called file-list). Now you could send it via a pipe into your editor (if your editor supported this, vim can do it by giving - as a filename). This would leave you with a command like this:
% find . -name '*' | vim -
However then you’d have to wait for the system to start up your editor, and need to save the contents to the file “file-list”, there is a much easier way, as your shell can write the output anywhere you want it.
% find . -name '*' > file-list
The angle bracket ‘>
’ is used almost like an arrow, it points the
output from the command into “file-list”, which if it already exists it
overwrites.
If you want to append the output of one command to an earlier one then you’d
have to use >
to output it to another file, then join them together
with the cat command, this is obviously a pain as you have to type two
commands instead of just one. However this can be gotten around with the
append redirection operator.
% ls >> foo
Now this will run ls as normal, then append the results of this to
the file “foo”, or course if the file doesn’t exist it will create it, but
it’s often safer to use >>
rather than a single >
just in case you name a file that already exists, then it won’t destroy its contents, just append some output to the end.
There are certain situations in which you want to send the contents of a file into the input of a program. Assuming that you had a large file containing unsorted words that needed to be sorted into the right order you could use the following command:
% cat big-file | sort
This will echo the contents of “big-file” into the input of the sort
command. However that requires invoking a whole separate program (cat
in this case), a cleaner way of doing this is:
% sort < big-file
This will do exactly the same thing as the command above, with one less pipe and thus process. Also there are some situations where a program wants a list of commands sent into its input, in which case this syntax is usually better.
However when doing this you can also redirect the output of the sort (or whatever command you’re running) to another file as usual, this is done in exactly the way as is shown above but its worth providing an example for reference as it can look daunting at first glance:
% sort < big-file > sorted-big-file
This will run sort, inserting the contents of “big-file” into its standard input. The output of sort (which will be the sorted input) will be written into the file “sorted-big-file”.
Now that you understand pipes and redirects you can do a fair deal more with your shell, combining commands using pipes as filters for output should improve your efficiency a fair degree. However there is one more operator that you can use which can prove fairly useful in writing Unix command lines, and this is the back-tick operators, which allow you to embed the results of another command inside a current one. It sounds fairly confusing but this example should clear things up.
In the description below you will encounter a character called a backtick.
This character looks like this ‘ or in the font used for examples this
`
. It is entered by pressing the key which is above tab and to the
left of the ‘1’ key. You don’t need to press shift or control, just the
key.
Now let’s assume that you wanted to see the long ls (“ls -l”) information about the command ftp. Normally you’d need to run “which ftp” and then, remembering the output of that command, run “ls -l” followed by the location. Using back-ticks you can make this easier:
% ls -l `which ftp`
This starts by running the command in the back-ticks (“which ftp
”).
This gives the result “/usr/bin/ftp”, and the shell then runs “ls -l /usr/bin/ftp
”, then prints the result of that to your console.
Back-ticks are a handy tool in your toolbox for writing Unix commands and can be used inside any command, including aliases. An example showing the many ways to do things would be a single command line to finger the last user who logged onto the system. Now there are two ways of doing this:
% last | head -n 1 | awk '{print $1}' | xargs finger
Which gets the first line from the “last” command and uses awk to print out the first column from it (which is the username, see section 11.1.1 for more information) then passes this to the xargs command (see 11.1.14), which runs finger with that username as its argument.
With the use of backticks you can avoid such logically messy methods and make your command-lines easier to get a mental image of, which makes them easier to write and use:
% finger `last | head -n 1 | awk '{print $1}'`
What this does is run the whole section inside the back-ticks first (which runs last, gets the first line, then gets the first column of the first line with awk) and then the shell runs “finger” followed by that output as its argument, meaning it does the same thing but is more obvious as “finger” is at the start of the line, not the end.