The trap
utility is a special shell built-in. It's defined in POSIX, but bash adds some useful extensions as well.
Examples that are POSIX-compatible start with #!/bin/sh
, and examples that start with #!/bin/bash
use a bash extension.
The signals can either be a signal number, a signal name (without the SIG prefix), or the special keyword EXIT
.
Those guaranteed by POSIX are:
Number | Name | Notes |
---|---|---|
0 | EXIT | Always run on shell exit, regardless of exit code |
1 | SIGHUP | |
2 | SIGINT | This is what ^C sends |
3 | SIGQUIT | |
6 | SIGABRT | |
9 | SIGKILL | |
14 | SIGALRM | |
15 | SIGTERM | This is what kill sends by default |
The alias will only be available in the shell where the alias command was issued.
To persist the alias consider putting it into your .bashrc
UNIX console programs have an input file and two output files (input and output streams, as well as devices, are treated as files by the OS.) These are typically the keyboard and screen, respectively, but any or all of them can be redirected to come from — or go to — a file or other program.
STDIN
is standard input, and is how the program receives interactive input. STDIN
is usually assigned file descriptor 0.
STDOUT
is standard output. Whatever is emitted on STDOUT
is considered the "result" of the program. STDOUT
is usually assigned file descriptor 1.
STDERR
is where error messages are displayed. Typically, when running a program from the console, STDERR
is output on the screen and is indistinguishable from STDOUT
. STDERR
is usually assigned file descriptor 2.
The order of redirection is important
command > file 2>&1
Redirects both (STDOUT
and STDERR
) to the file.
command 2>&1 > file
Redirects only STDOUT
, because the file descriptor 2 is redirected to the file pointed to by file descriptor 1 (which is not the file file
yet when the statement is evaluated).
Each command in a pipeline has its own STDERR
(and STDOUT
) because each is a new process. This can create surprising results if you expect a redirect to affect the entire pipeline. For example this command (wrapped for legibility):
$ python -c 'import sys;print >> sys.stderr, "Python error!"' \
| cut -f1 2>> error.log
will print "Python error!" to the console rather than the log file. Instead, attach the error to the command you want to capture:
$ python -c 'import sys;print >> sys.stderr, "Python error!"' 2>> error.log \
| cut -f1
There are many comparator parameters available in bash. Not all are yet listed here.
cat
can read from both files and standard inputs and concatenates them to standard output
The [[ … ]]
syntax surrounds bash built-in conditional expressions. Note that spaces are required on either side of the brackets.
Conditional expressions can use unary and binary operators to test properties of strings, integers and files. They can also use the logical operators &&
, ||
and !
.
shift
shifts the positional parameters to the left so that $2
becomes $1
, $3
becomes $2
and so forth."$@"
is an array of all the positional parameters passed to the script/function."$*"
is an string composed of all the positional parameters passed to the script/function.Process substitution is a form of redirection where the input or output of a process (some sequence of commands) appear as a temporary file.
A space (" ") is required between each term (or sign) of the expression. "1+2" won't work, but "1 + 2" will work.
optstring
: The option characters to be recognizedIf a character is followed by a colon, the option is expected to have an argument, which should be separated from it by white space. The colon (
:
) and question mark (?
) can not be used as option characters.
Each time it is invoked, getopts
places the next option in the shell variable name, initializing name if it does not exist, and the index of the next argument to be processed into the variable OPTIND
. OPTIND
is initialized to 1
each time the shell or a shell script is invoked.
When an option requires an argument, getopts
places that argument into the variable OPTARG
. The shell does not reset OPTIND
automatically; it must be manually reset between multiple calls to getopts
within the same shell invocation if a new set of parameters is to be used.
When the end of options is encountered, getopts
exits with a return value greater than zero.
OPTIND
is set to the index of the first non-option argument, and name is set to ?
. getopts
normally parses the positional parameters, but if more arguments are given in args
, getopts
parses those instead.
getopts
can report errors in two ways. If the first character of optstring
is a colon (:
), silent error reporting is used. In normal operation diagnostic messages are printed when invalid options or missing option arguments are encountered.
If the variable OPTERR
is set to 0
, no error messages will be displayed, even if the first character of optstring
is not a colon.
If an invalid option is seen, getopts
places ?
into name
and, if not silent, prints an error message and unsets OPTARG
. If getopts
is silent, the option character found is placed in OPTARG
and no diagnostic message is printed.
If a required argument is not found, and getopts
is not silent, a question mark (?
) is placed in name
, OPTARG
is unset, and a diagnostic message is printed. If getopts
is silent, then a colon (:
) is placed in name and OPTARG
is set to the option character.
A common mistake is to try to execute Windows end-line formatted \r\n
script files on UNIX/Linux systems, in this case the used script interpreter in the shebang is:
/bin/bash\r
And is obliviously not found but can be hard to figure out.
Character Classes
Valid character classes for the []
glob are defined by the POSIX standard:
alnum alpha ascii blank cntrl digit graph lower print punct space upper word xdigit
Inside []
more than one character class or range can be used, e.g.,
$ echo a[a-z[:blank:]0-9]*
will match any file that starts with an a
and is followed by either a lowercase letter or a blank or a digit.
It should be kept in mind, though, that a []
glob can only be wholly negated and not only parts of it. The negating character must be the first character following the opening [
, e.g., this expression matches all files that do not start with an a
$ echo [^a]*
The following does match all files that start with either a digit or a ^
$ echo [[:alpha:]^a]*
It does not match any file or folder that starts with with letter except an a
because the ^
is interpreted as a literal ^
.
Escaping glob characters
It is possible that a file or folder contains a glob character as part of its name. In this case a glob can be escaped with a preceding \
in order for a literal match. Another approach is to use double ""
or single ''
quotes to address the file.
Bash does not process globs that are enclosed within ""
or ''
.
Difference to Regular Expressions
The most significant difference between globs and Regular Expressions is that
a valid Regular Expressions requires a qualifier as well as a quantifier.
A qualifier identifies what to match and a quantifier tells how often
to match the qualifier. The equivalent RegEx to the *
glob is .*
where
.
stands for any character and *
stands for zero or more matches of the
previous character. The equivalent RegEx for the ?
glob is .{1}
. As
before, the qualifier .
matches any character and the {1}
indicates to
match the preceding qualifier exactly once. This should not be confused with
the ?
quantifier, which matches zero or once in a RegEx.
The []
glob is can be used just the same in a RegEx, as long as it is
followed by a mandatory quantifier.
Equivalent Regular Expressions
Glob | RegEx |
---|---|
* | .* |
? | . |
[] | [] |
bind -P
show all configured shortcuts.
newvar=$var
[[ ... ]]
constructA pipeline is a sequence of simple commands separated by one of the control operators |
or |&
(source).
|
connects the output of command1
to the input of command2
.
|&
connects standard output and standard error of command1
to the standard input of command2
.
Bash configuration file:
This file is sourced whenever a new interactive Bash shell is started.
In GNU/Linux systems it's generally the ~/.bashrc file; in Mac it's ~/.bash_profile or ~/.profile
Export:
The PATH variable must be exported once (It's done by default). Once it is exported it will remain exported and any changes made to it will be applied immediately.
Apply changes:
To apply changes to a Bash configuration file, you must reload that file in a terminal (source /path/to/bash_config_file
)
Using printf -v foo '%(...)T'
is identical to foo=$(date +'...')
and saves a fork for the call to the external program date
.
Login Shell
A login shell is one whose first character of argument zero is a -, or one started with the –login option. The Initialization is more comprehensive than in an normal interactive (sub) shell.
Interactive Shell
An interactive shell is one started without non-option arguments and without the -c option whose standard input and error are both connected to terminals (as determined by isatty(3)), or one started with the -i option. PS1 is set and $- includes i if bash is interactive, allowing a shell script or a startup file to test this state.
non-interactive Shell
A non-interactive Shell is a shell in which the user can not interact with the shell. As en example, a shell running a script is always a non-interactive shell. All the same, the script can still access its tty.
Configuring a login shell
On logging in:
If '/etc/profile' exists, then source it.
If '~/.bash_profile' exists, then source it,
else if '~/.bash_login' exists, then source it,
else if '~/.profile' exists, then source it.
For non-login interactive shells
On starting up:
If `~/.bashrc' exists, then source it.
For non-interactive shells
On starting up: If the environment variable ENV is non-null, expand the variable and source the file named by the value. If Bash is not started in Posix mode, it looks for BASH_ENV before ENV.
tput
queries the terminfo database for terminal-dependent information.
From tput on Wikipedia:
In computing,
tput
is a standard Unix operating system command which makes use of terminal capabilities.Depending on the system,
tput
uses the terminfo or termcap database, as well as looking into the environment for the terminal type.
from Bash Prompt HOWTO: Chapter 6. ANSI Escape Sequences: Colours and Cursor Movement:
tput setab [1-7]
tput setb [1-7]
tput setaf [1-7]
tput setf [1-7]
tput bold
tput sgr0
Full user manual of sort
reading online
Other files of note are:
/etc/profile
, for system-wide (not user specific) initialization code.
.bash_logout
, triggered when logging out (think cleanup stuff)
.inputrc
, similar to .bashrc
but for readline.
1. Syntax differences
Long options in the table above are only supported by the GNU version.
2. No character gets special treatment
FreeBSD cut
(which comes with MacOS, for example) doesn’t have the --complement
switch, and, in the case of character ranges, one can use the colrm
command instead:
$ cut --complement -c3-5 <<<"123456789"
126789
$ colrm 3 5 <<<"123456789"
126789
However, there is a big difference, because colrm
treats TAB characters (ASCII 9) as real tabulations up to the next multiple of eight, and backspaces (ASCII 8) as -1 wide; on the contrary, cut
treats all characters as one column wide.
$ colrm 3 8 <<<$'12\tABCDEF' # Input string has an embedded TAB
12ABCDEF
$ cut --complement -c3-8 <<<$'12\tABCDEF'
12F
3. (Still no) Internationalization
When cut
was designed, all characters were one byte long and internationalization was not a problem. When writing systems with wider characters became popular, the solution adopted by POSIX was to ditinguish between the old -c
switch, which should retain its meaning of selecting characters, no matter how many bytes wide, and to introduce a new switch -b
which should select bytes, irrespective of the current character encoding. In most popular implementations, -b
was introduced and works, but -c
is still working exactly like -b
and not as it should. For example with GNU cut
:
It seems that SE’s spam filter blacklists English texts with isolated kanji characters in them. I could not overcome this limitation, so the following examples are less expressive than they could be.
# In an encoding where each character in the input string is three bytes wide,
# Selecting bytes 1-6 yields the first two characters (correct)
$ LC_ALL=ja_JP.UTF-8 cut -b1-6 kanji.utf-8.txt
...first two characters of each line...
# Selecting all three characters with the -c switch doesn’t work.
# It behaves like -b, contrary to documentation.
$ LC_ALL=ja_JP.UTF-8 cut -c1-3 kanji.utf-8.txt
...first character of each line...
# In this case, an illegal UTF-8 string is produced.
# The -n switch would prevent this, if implemented.
$ LC_ALL=ja_JP.UTF-8 cut -n -c2 kanji.utf-8.txt
...second byte, which is an illegal UTF-8 sequence...
If your characters are outside the ASCII range and you want to use cut
, you should always be aware of character width in your encoding and use -b
accordingly. If and when -c
starts working as documented, you won’t have to change your scripts.
4. Speed comparisons
cut
’s limitations have people doubting its usefulness. In fact, the same functionality can be achieved by more powerful, more popular utilities. However, cut
’s advantage is its performance. See below for some speed comparisons. test.txt
has three million lines, with five space-separated fields each. For the awk
test, mawk
was used, because it’s faster than GNU awk
. The shell itself (last line) is by far the worst performer. The times given (in seconds) are what the time
command gives as real time.
(Just to avoid misunderstandings: all tested commands gave the same output with the given input, but they are of course not equivalent and would give different outputs in different situations, in particular if the fields were delimited by a variable number of spaces)
Command | Time |
---|---|
cut -d ' ' -f1,2 test.txt | 1.138s |
awk '{print $1 $2}' test.txt | 1.688s |
join -a1 -o1.1,1.2 test.txt /dev/null | 1.767s |
perl -lane 'print "@F[1,2]"' test.txt | 11.390s |
grep -o '^\([^ ]*\) \([^ ]*\)' test.txt | 22.925s |
sed -e 's/^\([^ ]*\) \([^ ]*\).*$/\1 \2/' test.txt | 52.122s |
while read a b _; do echo $a $b; done <test.txt | 55.582s |
5. Referential man pages