In this article some small AWK scripts will be presented. Even if they're really simple, they can be useful to understand the logic behind this program, which is a bit different from the one of the traditional programming languages. For further informations you can read about
AWK on Wikipedia.
AWK is an interpreted language for processing text files. For the interpreter, the text file is a big table. Each text row is a table row and is called
record. Each word surrounded by spaces is a table column, and is called
field. When a text file is inputed to an AWK script, these things happens:
- The AWK interpreter opens automatically the file.
- The AWK interpreter scans all the records (rows) of the text file, and applies the rules coming from the AWK commands. Note that the file is scanned automatically, the programmer doesn't have to implement any cycle for, while, foreach.
- The interpreter saves the file and close it.
Due to its nature, AWK is perfect for processing a big amount of data with simple rules, saving the time necessary to other operations like opening and closing the file or reading and writing it.
Another great simplification is the absence of typed variables: each variable can be indifferently a text or a number.
The structure of an AWK command to process a record is:
(condition) {action}
- Condition represents a condition which selects a record or a part of it; on this selection an action will be performed.
- Action represents the action to perform on the selected record.
The following examples will always refer to a text file containing a list of nations with surface, population and continent:
USSR 86250 262 Asia
USA 3615 219 Nord America
China 3692 866 Asia
Canada 3852 24 Nord America
Brazil 3286 116 South America
Australia 2968 14 Oceania
India 1269 637 Asia
Argentina 1072 26 South America
Sudan 968 19 Africa
Italy 920 60 Europe
Angola 1246 12 Africa
Austria 83 8 Europe
Spain 504 45 Europe
Select a field
This example shows how the fields of each record are selected, that is by numbers beginning with $ (the same as the parameters passed by command line in the Bash).
#!/usr/bin/awk -f
# This script prints only nations and continents
{print $1, $4}
Conditions and variables
These examples show the use of numerical or complex conditions. Note that there's no typing in AWK: any selected string can be used without any conversion as a number.
#!/usr/bin/awk -f
# This script prints all the information relative only to
# Nations with more than 50 millions people
( $3 > 50 ) { print $0 }
#!/usr/bin/awk -f
# This script prints all the information relative only to
# nations whose name is longer than 5 characters
( length($1)>=5 ) { print $0 }
Regular expressions
These examples shows the implementation of regular expressions in AWK. Note the slashes / to delimit the regular expression and the ˜ to refer it only to a field instead of to the whole record
#!/usr/bin/awk -f
# This script prints all the information relative only to
# Asian nations
(/.Asia/) {print $0}
#!/usr/bin/awk -f
# This script prints all the information relative only to
# European nations
( /Europe/) { print $0 }
#!/usr/bin/awk -f
# This script prints all the information relative only to
# nations beginning with A
( $1~/A./ ) { print $0 }
Compound conditions
The following examples shows the combination of different conditions by boolean operators, which are very similar to the ones implemented in C programming language.
#!/usr/bin/awk -f
# This script prints all the information relative only to
# Nations with more than 50 millions people and a surface
# of more than 2 millions of square km
( $3 > 50 && $2 > 2000) { print $0 }
#!/usr/bin/awk -f
# This script prints all the information relative only to
# African nations beginning with A
( $0~/Africa/ && $1~/^A/ ) { print $0 }
#!/usr/bin/awk -f
# This script prints all the information relative only to
# European nations beginning with A or S
( $0~/Europe/ && ($1~/^A/ || $1~/^S/) ) { print $0 }