Bash

Topics related to Bash:

Getting started with Bash

Using "trap" to react to signals and system events

The trap utility is a special shell built-in. It's defined in POSIX, but bash adds some useful extensions as well.

Examples that are POSIX-compatible start with #!/bin/sh, and examples that start with #!/bin/bash use a bash extension.

The signals can either be a signal number, a signal name (without the SIG prefix), or the special keyword EXIT.

Those guaranteed by POSIX are:

NumberNameNotes
0EXITAlways run on shell exit, regardless of exit code
1SIGHUP
2SIGINTThis is what ^C sends
3SIGQUIT
6SIGABRT
9SIGKILL
14SIGALRM
15SIGTERMThis is what kill sends by default

Listing Files

Aliasing

The alias will only be available in the shell where the alias command was issued.

To persist the alias consider putting it into your .bashrc

Jobs and Processes

Redirection

UNIX console programs have an input file and two output files (input and output streams, as well as devices, are treated as files by the OS.) These are typically the keyboard and screen, respectively, but any or all of them can be redirected to come from — or go to — a file or other program.

STDIN is standard input, and is how the program receives interactive input. STDIN is usually assigned file descriptor 0.

STDOUT is standard output. Whatever is emitted on STDOUT is considered the "result" of the program. STDOUT is usually assigned file descriptor 1.

STDERR is where error messages are displayed. Typically, when running a program from the console, STDERR is output on the screen and is indistinguishable from STDOUT. STDERR is usually assigned file descriptor 2.

The order of redirection is important

command > file 2>&1

Redirects both (STDOUT and STDERR) to the file.

command 2>&1 > file

Redirects only STDOUT, because the file descriptor 2 is redirected to the file pointed to by file descriptor 1 (which is not the file file yet when the statement is evaluated).

Each command in a pipeline has its own STDERR (and STDOUT) because each is a new process. This can create surprising results if you expect a redirect to affect the entire pipeline. For example this command (wrapped for legibility):

$ python -c 'import sys;print >> sys.stderr, "Python error!"' \
| cut -f1 2>> error.log

will print "Python error!" to the console rather than the log file. Instead, attach the error to the command you want to capture:

$ python -c 'import sys;print >> sys.stderr, "Python error!"' 2>> error.log \
| cut -f1 

Control Structures

There are many comparator parameters available in bash. Not all are yet listed here.

Using cat

cat can read from both files and standard inputs and concatenates them to standard output

Arrays

Functions

Bash Parameter Expansion

Sourcing

Find

Here documents and here strings

Quoting

Conditional Expressions

The [[ … ]] syntax surrounds bash built-in conditional expressions. Note that spaces are required on either side of the brackets.

Conditional expressions can use unary and binary operators to test properties of strings, integers and files. They can also use the logical operators &&, || and !.

Scripting with Parameters

  • shift shifts the positional parameters to the left so that $2 becomes $1, $3 becomes $2 and so forth.
  • "$@" is an array of all the positional parameters passed to the script/function.
  • "$*" is an string composed of all the positional parameters passed to the script/function.

Bash history substitutions

Math

Scoping

Process substitution

Process substitution is a form of redirection where the input or output of a process (some sequence of commands) appear as a temporary file.

Programmable completion

Customizing PS1

Brace Expansion

Bash Arithmetic

A space (" ") is required between each term (or sign) of the expression. "1+2" won't work, but "1 + 2" will work.

getopts : smart positional-parameter parsing

Options

optstring : The option characters to be recognized

If a character is followed by a colon, the option is expected to have an argument, which should be separated from it by white space. The colon (:) and question mark (?) can not be used as option characters.

Each time it is invoked, getopts places the next option in the shell variable name, initializing name if it does not exist, and the index of the next argument to be processed into the variable OPTIND. OPTIND is initialized to 1 each time the shell or a shell script is invoked.

When an option requires an argument, getopts places that argument into the variable OPTARG. The shell does not reset OPTIND automatically; it must be manually reset between multiple calls to getopts within the same shell invocation if a new set of parameters is to be used.

When the end of options is encountered, getopts exits with a return value greater than zero.

OPTIND is set to the index of the first non-option argument, and name is set to ?. getopts normally parses the positional parameters, but if more arguments are given in args, getopts parses those instead.

getopts can report errors in two ways. If the first character of optstring is a colon (:), silent error reporting is used. In normal operation diagnostic messages are printed when invalid options or missing option arguments are encountered.

If the variable OPTERR is set to 0, no error messages will be displayed, even if the first character of optstring is not a colon.

If an invalid option is seen, getopts places ? into name and, if not silent, prints an error message and unsets OPTARG. If getopts is silent, the option character found is placed in OPTARG and no diagnostic message is printed.

If a required argument is not found, and getopts is not silent, a question mark (?) is placed in name, OPTARG is unset, and a diagnostic message is printed. If getopts is silent, then a colon (:) is placed in name and OPTARG is set to the option character.

Debugging

Pitfalls

Script shebang

A common mistake is to try to execute Windows end-line formatted \r\n script files on UNIX/Linux systems, in this case the used script interpreter in the shebang is:

/bin/bash\r

And is obliviously not found but can be hard to figure out.

Pattern matching and regular expressions

Character Classes

Valid character classes for the [] glob are defined by the POSIX standard:

alnum alpha ascii blank cntrl digit graph lower print punct space upper word xdigit

Inside [] more than one character class or range can be used, e.g.,

$ echo a[a-z[:blank:]0-9]*

will match any file that starts with an a and is followed by either a lowercase letter or a blank or a digit.

It should be kept in mind, though, that a [] glob can only be wholly negated and not only parts of it. The negating character must be the first character following the opening [, e.g., this expression matches all files that do not start with an a

$ echo [^a]*

The following does match all files that start with either a digit or a ^

$ echo [[:alpha:]^a]*

It does not match any file or folder that starts with with letter except an a because the ^ is interpreted as a literal ^.

Escaping glob characters

It is possible that a file or folder contains a glob character as part of its name. In this case a glob can be escaped with a preceding \ in order for a literal match. Another approach is to use double "" or single '' quotes to address the file. Bash does not process globs that are enclosed within "" or ''.

Difference to Regular Expressions

The most significant difference between globs and Regular Expressions is that a valid Regular Expressions requires a qualifier as well as a quantifier. A qualifier identifies what to match and a quantifier tells how often to match the qualifier. The equivalent RegEx to the * glob is .* where . stands for any character and * stands for zero or more matches of the previous character. The equivalent RegEx for the ? glob is .{1}. As before, the qualifier . matches any character and the {1} indicates to match the preceding qualifier exactly once. This should not be confused with the ? quantifier, which matches zero or once in a RegEx. The [] glob is can be used just the same in a RegEx, as long as it is followed by a mandatory quantifier.

Equivalent Regular Expressions

GlobRegEx
*.*
?.
[][]

Keyboard shortcuts

bind -P show all configured shortcuts.

Change shell

Copying (cp)

Internal variables

Job Control

Case statement

Word splitting

  • Word splitting is not performed during assignments e.g newvar=$var
  • Word splitting is not performed in the [[ ... ]] construct
  • Use double quotes on variables to prevent word splitting

Read a file (data stream, variable) line-by-line (and/or field-by-field)?

File Transfer using scp

Pipelines

A pipeline is a sequence of simple commands separated by one of the control operators | or |& (source).

| connects the output of command1 to the input of command2.

|& connects standard output and standard error of command1 to the standard input of command2.

Managing PATH environment variable

Bash configuration file:

This file is sourced whenever a new interactive Bash shell is started.

In GNU/Linux systems it's generally the ~/.bashrc file; in Mac it's ~/.bash_profile or ~/.profile

Export:

The PATH variable must be exported once (It's done by default). Once it is exported it will remain exported and any changes made to it will be applied immediately.

Apply changes:

To apply changes to a Bash configuration file, you must reload that file in a terminal (source /path/to/bash_config_file)

Avoiding date using printf

Using printf -v foo '%(...)T' is identical to foo=$(date +'...') and saves a fork for the call to the external program date.

Chain of commands and operations

Type of Shells

Login Shell

A login shell is one whose first character of argument zero is a -, or one started with the –login option. The Initialization is more comprehensive than in an normal interactive (sub) shell.

Interactive Shell

An interactive shell is one started without non-option arguments and without the -c option whose standard input and error are both connected to terminals (as determined by isatty(3)), or one started with the -i option. PS1 is set and $- includes i if bash is interactive, allowing a shell script or a startup file to test this state.

non-interactive Shell

A non-interactive Shell is a shell in which the user can not interact with the shell. As en example, a shell running a script is always a non-interactive shell. All the same, the script can still access its tty.

Configuring a login shell

On logging in:

If '/etc/profile' exists, then source it. 
If '~/.bash_profile' exists, then source it, 
else if '~/.bash_login' exists, then source it, 
else if '~/.profile' exists, then source it. 

For non-login interactive shells

On starting up:

If `~/.bashrc' exists, then source it.

For non-interactive shells

On starting up: If the environment variable ENV is non-null, expand the variable and source the file named by the value. If Bash is not started in Posix mode, it looks for BASH_ENV before ENV.

true, false and : commands

Color script output (cross-platform)

tput queries the terminfo database for terminal-dependent information.

From tput on Wikipedia:

In computing, tput is a standard Unix operating system command which makes use of terminal capabilities.

Depending on the system, tput uses the terminfo or termcap database, as well as looking into the environment for the terminal type.

from Bash Prompt HOWTO: Chapter 6. ANSI Escape Sequences: Colours and Cursor Movement:

  • tput setab [1-7]

    • Set a background colour using ANSI escape
  • tput setb [1-7]

    • Set a background colour
  • tput setaf [1-7]

    • Set a foreground colour using ANSI escape
  • tput setf [1-7]

    • Set a foreground colour
  • tput bold

    • Set bold mode
  • tput sgr0

    • Turn off all attributes (doesn't work quite as expected)

Navigating directories

Using sort

Namespace

co-processes

Typing variables

Jobs at specific times

Associative arrays

Handling the system prompt

Creating directories

File execution sequence

Other files of note are:

  • /etc/profile, for system-wide (not user specific) initialization code.

  • .bash_logout, triggered when logging out (think cleanup stuff)

  • .inputrc, similar to .bashrc but for readline.

The cut command

1. Syntax differences

Long options in the table above are only supported by the GNU version.

2. No character gets special treatment

FreeBSD cut (which comes with MacOS, for example) doesn’t have the --complement switch, and, in the case of character ranges, one can use the colrm command instead:

  $ cut --complement -c3-5 <<<"123456789"
  126789

  $ colrm 3 5 <<<"123456789"
  126789

However, there is a big difference, because colrm treats TAB characters (ASCII 9) as real tabulations up to the next multiple of eight, and backspaces (ASCII 8) as -1 wide; on the contrary, cut treats all characters as one column wide.

  $ colrm  3 8 <<<$'12\tABCDEF' # Input string has an embedded TAB
  12ABCDEF

  $ cut --complement -c3-8 <<<$'12\tABCDEF'
  12F

3. (Still no) Internationalization

When cut was designed, all characters were one byte long and internationalization was not a problem. When writing systems with wider characters became popular, the solution adopted by POSIX was to ditinguish between the old -c switch, which should retain its meaning of selecting characters, no matter how many bytes wide, and to introduce a new switch -b which should select bytes, irrespective of the current character encoding. In most popular implementations, -b was introduced and works, but -c is still working exactly like -b and not as it should. For example with GNU cut:

It seems that SE’s spam filter blacklists English texts with isolated kanji characters in them. I could not overcome this limitation, so the following examples are less expressive than they could be.

  # In an encoding where each character in the input string is three bytes wide,
  # Selecting bytes 1-6 yields the first two characters (correct)
  $ LC_ALL=ja_JP.UTF-8 cut -b1-6 kanji.utf-8.txt
  ...first two characters of each line...
  

  # Selecting all three characters with the -c switch doesn’t work.
  # It behaves like -b, contrary to documentation.
  $ LC_ALL=ja_JP.UTF-8 cut -c1-3 kanji.utf-8.txt
  ...first character of each line...

  # In this case, an illegal UTF-8 string is produced.
  # The -n switch would prevent this, if implemented.
  $ LC_ALL=ja_JP.UTF-8 cut -n -c2 kanji.utf-8.txt
  ...second byte, which is an illegal UTF-8 sequence...

If your characters are outside the ASCII range and you want to use cut, you should always be aware of character width in your encoding and use -b accordingly. If and when -c starts working as documented, you won’t have to change your scripts.

4. Speed comparisons

cut’s limitations have people doubting its usefulness. In fact, the same functionality can be achieved by more powerful, more popular utilities. However, cut’s advantage is its performance. See below for some speed comparisons. test.txt has three million lines, with five space-separated fields each. For the awk test, mawk was used, because it’s faster than GNU awk. The shell itself (last line) is by far the worst performer. The times given (in seconds) are what the time command gives as real time.

(Just to avoid misunderstandings: all tested commands gave the same output with the given input, but they are of course not equivalent and would give different outputs in different situations, in particular if the fields were delimited by a variable number of spaces)

CommandTime
cut -d ' ' -f1,2 test.txt1.138s
awk '{print $1 $2}' test.txt1.688s
join -a1 -o1.1,1.2 test.txt /dev/null1.767s
perl -lane 'print "@F[1,2]"' test.txt11.390s
grep -o '^\([^ ]*\) \([^ ]*\)' test.txt22.925s
sed -e 's/^\([^ ]*\) \([^ ]*\).*$/\1 \2/' test.txt52.122s
while read a b _; do echo $a $b; done <test.txt55.582s

5. Referential man pages

Bash on Windows 10

Cut Command

Splitting Files

global and local variables

Design Patterns

CGI Scripts

Select keyword

When to use eval

Networking With Bash

Parallel

Grep

strace

Sleep utility

Decoding URL