Processing tabular data with awk is very easy, provided that the input is correctly formatted. Most software producing tabular data use specific features of this family of formats, and awk programs processing tabular data are often specific to a data produced by a specific software. If a more generic or robust solution is required, most popular languages provide libraries accommodating with a lot of features found in tabular data:
While it definitely possible to handle all these features cleanly and generically with awk this is probably not worth the effort.
Given a file using ;
as a column delimiter. Permuting the first and the second column is accomplished by
awk -F';' -v 'OFS=;' '{ swap = $2; $2 = $1; $1 = swap; print }'
Given a file using ;
as a column delimiter. We compute the mean of the values in the second column with the following program, the provided input is the list of grades of a student group:
awk -F';' '{ sum += $2 } END { print(sum / NR) }' <<EOF
Alice;2
Victor;1
Barbara;1
Casper;4
Deborah;0
Ernest;1
Fabiola;4
Giuseppe;4
EOF
The output of this program is 2.125
.
Remember that NR
holds the number of the line being processed, in the END
block it therefore hold the total number of lines in the file.
Remember that in many applications (monitoring, statistics), the median is a much more useful information. See the corresponding example.
We assume a file using ; as a column delimiter. Selecting a specific set of columns only requires a print statement. For instance, the following program selects the columns 3, 4 and 7 from its input:
awk -F';' -v 'OFS=;' '{ print $3, $4, $7 }'
It is as usual possible to more carefully choose lines to print. The following program selects the columns 3, 4 and 7 from its input when the first field is Alice
or Bob
:
awk -F';' -v 'OFS=;' '($1 == "Alice") || ($1 == "Bob") { print $3, $4, $7 }'
Given a file using ;
as a column delimiter. We compute the median of the values in the second column with the following program, written for GNU awk. The provided input is the list of grades of a student group:
gawk -F';' '{ sample[NR] = $2 }
END {
asort(sample);
if(NR % 2 == 1) {
print(sample[int(NR/2) + 1])
} else {
print(sample[NR/2])
}
}' <<EOF
Alice;2
Victor;1
Barbara;1
Casper;4
Deborah;0
Ernest;1
Fabiola;4
Giuseppe;4
EOF
The output of this program is 1
.
Remember that NR
holds the number of the line being processed, in the END
block it therefore hold the total number of lines in the file.
Many implementations of awk do not have a function to sort arrays, which therefore need to be defined before the code above could be used.
Pattern matching can be used effectively with awk
as it controls the actions that follows it i.e. { pattern } { action }
. One cool use of the pattern-matching is to select multiple between two patterns in a file say patternA
and patternB
$ awk '/patternA/,/patternB/' file
Assume my file contents are as follows, and I want to extract the lines only between the above pattern:-
$ cat file
This is line - 1
This is line - 2
patternA
This is line - 3
This is line - 4
This is line - 5
patternB
This is line - 6
$ awk '/patternA/,/patternB/' file
patternA
This is line - 3
This is line - 4
This is line - 5
patternB
The above command doesn't do any specific { action }
other than to print the lines matching, but any specific actions within the subset of lines can be applied with an action block ({}
).