awk(1)
awk [opt] program [input]
-F <sepstr> field separator string (can be regex)
program awk program
input file or stdin if not file given
Input processing
Input is processed in two stages:
- Splitting input into a sequence of
records. By default split atnewlinecharacter, but can be changed via the builtinRSvariable. - Splitting a
recordintofields. By default strings withoutwhitespace, but can be changed via the builtin variableFSor command line option-F.
Fields are accessed as follows:
$0wholerecord$1field one$2field two- ...
Program
An awk program is composed of pairs of the form:
pattern { action }
The program is run against each record in the input stream. If a pattern
matches a record the corresponding action is executed and can access the
fields.
INPUT
|
v
record ----> ∀ pattern matched
| |
v v
fields ----> run associated action
Any valid awk expr can be a pattern.
An example is the regex pattern /abc/ { print $1 } which prints the first
field if the record matches the regex /abc/. This form is actually a short
version for $0 ~ /abc/ { print $1 }, see the regex comparison operator
below.
Special pattern
awk provides two special patterns, BEGIN and END, which can be used
multiple times. Actions with those patterns are executed exactly once.
BEGINactions are run before processing the first recordENDactions are run after processing the last record
Special variables
RSrecord separator: first char is the record separator, by defaultFSfield separator: regex to split records into fields, by defaultNRnumber record: number of current recordNFnumber fields: number of fields in the current record
Special statements & functions
-
printf "fmt", args...Print format string, args are comma separated.
%sstring%ddecimal%xhex%ffloat
Width can be specified as
%Ns, this reservesNchars for a string. For floats one can use%N.Mf,Nis the total number including.andM. -
sprintf("fmt", expr, ...)Format the expressions according to the format string. Similar as
printf, but this is a function and return value can be assigned to a variable. -
strftime("fmt")Print time stamp formatted by
fmt.%Yfull year (eg 2020)%mmonth (01-12)%dday (01-31)%Falias for%Y-%m-%d%Hhour (00-23)%Mminute (00-59)%Ssecond (00-59)%Talias for%H:%M:%S
-
S ~ R,S !~ RThe regex comparison operator, where the former returns true if the string
Smatches the regexR, and the latter is the negated form. The regex can be either a constant or dynamic regex.
Examples
Filter records
awk 'NR%2 == 0 { print $0 }' <file>
The pattern NR%2 == 0 matches every second record and the action { print $0 }
prints the whole record.
Negative patterns
awk '!/^#/ { print $1 }' <file>
Matches records not starting with #.
Range patterns
echo -e "a\nFOO\nb\nc\nBAR\nd" | \
awk '/FOO/,/BAR/ { print }'
/FOO/,/BAR/ define a range pattern of begin_pattern, end_pattern. When
begin_pattern is matched the range is turned on and when the
end_pattern is matched the range is turned off. This matches every record
in the range inclusive.
An exclusive range must be handled explicitly, for example as follows.
echo -e "a\nFOO\nb\nc\nBAR\nd" | \
awk '/FOO/,/BAR/ { if (!($1 ~ "FOO") && !($1 ~ "BAR")) { print } }'
Access last fields in records
echo 'a b c d e f' | awk '{ print $NF $(NF-1) }'
Access last fields with arithmetic on the NF number of fields variable.
Split on multiple tokens
echo 'a,b;c:d' | awk -F'[,;:]' '{ printf "1=%s | 4=%s\n", $1, $4 }'
Use regex as field separator.
Capture in variables
# /proc/<pid>/status
# Name: cat
# ...
# VmRSS: 516 kB
# ...
for f in /proc/*/status; do
cat $f | awk '
/^VmRSS/ { rss = $2/1024 }
/^Name/ { name = $2 }
END { printf "%16s %6d MB\n", name, rss }';
done | sort -k2 -n
We capture values from VmRSS and Name into variables and print them at the
END once processing all records is done.
Capture in array
echo 'a 10
b 2
b 4
a 1' | awk '{
vals[$1] += $2
cnts[$1] += 1
}
END {
for (v in vals)
printf "%s %d\n", v, vals[v] / cnts [v]
}'
Capture keys and values from different columns and some up the values.
At the END we compute the average of each key.
Run shell command and capture output
cat /proc/1/status | awk '
/^Pid/ {
"ps --no-header -o user " $2 | getline user;
print user
}'
We build a ps command line and capture the first line of the processes output
in the user variable and then print it.