awk Commands, Examples & Meaning

Learn to use awk for text and data extraction, data processing, validation, generate reports for analysis, automation with with the help of examples of if else , compare & arrays plus regular expressions & built in operational variables, meaning, examples &command syntax.

awk name & History

The name awk comes from the initials of its creators: Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan. Original version of awk was written in 1977 at AT&T Bell Laboratories. Paul Rubin wrote gawk ( gnu awk ) in 1986.

Using awk

awk can be used directly on a command line, executed as program file or from a program file referenced by command line awk .

awk Command Line

awk can be used in command line as a tool to process and format the data from one or more input files or output of another program .

Syntax to use data file as input and run awk command to process data

awk '<awk command>' <file>

or Use command output as data input using PIPE processing

Command output | awk ‘<awk command>‘

awk variable assignments

: awk works on lines and columns and process data line by line and assigns variables to each line and column.

$0 = Entire line
$1 = First Column
$2 = Second Column
$3 = Third Column

and so on column is defined as a word/characters surrounded by space/s . Common Linux/Unix commands like df ,ls , ps gives columnar outputs and awk is very useful in getting listing and processing column data. A print statement is used to print variables .

Working with awk commands

awk commands are enclosed in single quotes, any single quote after awk options is considered as awk command and a matching single quote is taken as as end of command. Next Page -

Awk Examples

awk Examples

Extract the used space ( fifth field in df output ) column by mount points using df output

localhost ~]$ df | awk '{print $3 }'
Used
10831280
0
252
1020
0
176
123767
51118256
66006

Similar operation to extract 1st and 4tgh column from a file called testfile containing following lines column1 column2 column3 column4 1111 2222 3333 4444 1111 2222 3333 4444 1111 2222 3333 4444

localhost ~]$ awk '{print $1,$4}' testfile
column1 column4
1111 4444
1111 4444
1111 4444

Comma separating fields gives a default space between the output data fields. For large number of fields a special awk variable, Output Field Separator, OFS is used . Default is a space and it can be assign to any other value , such as a pipe symbol , | , in the example below.

localhost ~]$ awk '{OFS="|" ; print $3,$4}' testfile
column3|column4
3333|4444
3333|4444
3333|4444

awk BEGIN and END statement

Multi-line program uses BEGIN and END statements to execute statements once at the beginging and at the end. basic construction is : BEGIN <statment> <processing statments > END <statment>

Example:

localhost ~]$ awk 'BEGIN { print "Count Records " }
/4444/ { ++num }
END { print "Recs " num }' testfile
Count Records
Recs 3

awk program File

awk programs can be written and invoked from a file by providing awk interpreter location in the first liner , Syntax : $awk -f <program file> <datafile> Create a awk program test file, chkrec as below.

#! /bin/awk -f
BEGIN { print "Count Records " }
/4444/ { ++num }
END { print "Recs " num }
Execute file with -f option
localhost ~]$ awk -f chkrec testfile
Count Records
Recs 3
or make it executable & directly execute with data file as argument
localhost ~]$ chmod 755 chkrec
localhost ~]$ ./chkrec testfile
Count Records
Recs 3

Awk Example programs

Compare values

print Available Use% Mounted columns if used percentage is more than 60%

localhost ~]$ df| awk '$5 > "60" { print $4,$5,$6}'
Available Use% Mounted
4522188 92% /home
32298 68% /boot/efi

awk Sum operations

Add file sizes for selective files, /var/log/yum* and total sum is printed , column from each line is added in variable n and total is printed with END statement.

localhost ~]$ ls -l /var/log/yum* | awk '{ n += $5 }
END { print "Total bytes = ", n }'
Total bytes = 63665

awk if else conditions

Check available space , print ok in front of the output if less than 60% and Problem if more than 60%

$df | awk '{ if ($5 > 60) print "Problem "$0
else
print "ok ", $0
};'
Problem Filesystem 1K-blocks Used Available Use% Mounted on
ok /dev/mapper/fedora-root 51475068 10831316 38005928 23% /
ok devtmpfs 1956180 0 1956180 0% /dev
ok tmpfs 1966388 252 1966136 1% /dev/shm
ok tmpfs 1966388 992 1965396 1% /run
ok tmpfs 1966388 0 1966388 0% /sys/fs/cgroup
ok tmpfs 1966388 176 1966212 1% /tmp
ok /dev/sda9 487652 123767 334189 28% /boot
Problem /dev/mapper/fedora-home 58642620 51118476 4522188 92% /home
Problem /dev/sda2 98304 66006 32298 68% /boot/efi

awk for loop

Print 1 to 5 numbers using a for loop by proving initial value , final value and increment function.

localhost ~]$ awk 'BEGIN { for (i = 1; i <= 5; ++i) print i }'
1
2
3
4
5

awk Arrays , creating and sorting

Create a array by assigning values to array indexes :

A["ZZ"] = "Last"
A["DD"] = "Middle"
A["AA"] = "First"

Sorting arrays

asorti - Array Sort by Indices
asort - Array Sort by value
asort(A)
A["AA"] = "First"
A["ZZ"] = "Last"
A["DD"] = "Middle"
asorti - Array Sort by Indices
asprti(A)
A["AA"] = "First"
A["DD"] = "Middle"
A["ZZ"] = "Last"

Next Page -

awk regular expressions

awk regular expressions

gsub

Global substitution for the pattern in target gsub(regexp, replacement [, target])

gensub()

it is a general substitution function providing more features than the standard sub() and gsub() functions- the ability to specify components of a regexp in the replacement text

localhost ~]$ df  | awk '{ print gensub(/\%/, " Percent", 1) }'
Filesystem              1K-blocks     Used Available Use Percent Mounted on
/dev/mapper/fedora-root  51475068 10831316  38005928  23 Percent /
devtmpfs                  1956180        0   1956180   0 Percent /dev
/dev/sda9                  487652   123767    334189  28 Percent /boot
/dev/mapper/fedora-home  58642620 51118476   4522188  92 Percent /home
/dev/sda2                   98304    66006     32298  68 Percent /boot/efi

index(in, find)

Find the index value of a sub string .

localhost ~]$ awk 'BEGIN { print index("SomeLongString", "tr") }'
10

length([string])

Find the length of string, length of lines in the example below

localhost ~]$ awk ' { print length($0) }' testfile
31
29
29
29

match(string, regexp [, array])

match alphabet characters in file and print whole line

localhost ~]$ awk ' match($0, /[a-z]/) { print $0 }' testfile
column1 column2 column3 column4

split(string, array [, fieldsep [, seps ] ])

Split a list of rpm names at dashes.

content of the files - rpms

libhbalinux-1.0.16-2.fc20.x86_64 gucharmap-3.10.1-1.fc20.x86_64 libplist-1.11-2.fc20.x86_64 libgcc-4.8.3-7.fc20.i686 glx-utils-8.1.0-4.fc20.x86_64 vlgothic-fonts-20140801-1.fc20.noarch

Split along dashes , keep in array and print selected index values , keep separators in a array called sep .

localhost ~]$ cat rpms | awk '{split($0, ary, "-", seps) ; print ary[1],ary[2],ary[3]}'
libhbalinux 1.0.16 2.fc20.x86_64
gucharmap 3.10.1 1.fc20.x86_64
libplist 1.11 2.fc20.x86_64
libgcc 4.8.3 7.fc20.i686
glx utils 8.1.0
vlgothic fonts 20140801

print both arrays , ary and sep , the seprator arry contents

localhost ~]$ cat rpms | awk '{split($0, ary, "-", seps) ; print ary[1],ary[2],ary[3],seps[1],seps[2]}'
libhbalinux 1.0.16 2.fc20.x86_64 --
gucharmap 3.10.1 1.fc20.x86_64 --
libplist 1.11 2.fc20.x86_64 --
libgcc 4.8.3 7.fc20.i686 --
glx utils 8.1.0 --
vlgothic fonts 20140801 --

sub(regexp, replacement [, target])

Substitute a pattern with a string , in the example below replace dash followed by any number with —>

localhost ~]$ cat rpms | awk ‘sub(/-[0-9]/, ” —> ” )’; libhbalinux —> .0.16-2.fc20.x86_64 gucharmap —> .10.1-1.fc20.x86_64 libplist —> .11-2.fc20.x86_64 libgcc —> .8.3-7.fc20.i686 glx-utils —> .1.0-4.fc20.x86_64 vlgothic-fonts —> 0140801-1.fc20.noarch

substr(string, start [, length ])

Get a  substring of defined length  from  a given position
Lets use this file having two fields
localhost ~]$ cat nums
123456789 abcdef
find 3rd position and print two values from first field.

localhost ~]$ awk '{print substr($1,3,2) }' nums
34
find 3rd position and print two values from second field.
localhost ~]$ awk '{print substr($2,3,2) }' nums
cd

tolower(string)

Convert alphabet string into lower case

tolower(“MiXeD cAsE 123”) returns “mixed case 123”. Changing entire files to lowercase in the example below

localhost ~]$ cat letters This is Just Some Random Text Here ..

localhost ~]$ awk '{ print tolower($0)}' letters

this is just some random text here ..

toupper(string) Convert alphabet string into upper case

localhost ~]$ awk '{ print toupper($0)}' letters

THIS IS JUST SOME RANDOM TEXT HERE ..

Selective fields can be used for this operation, to make only first field as upper case:

awk '{ print toupper($1)}' letters
THIS

Next Page -

awk Built in Operational Variables

awk Built in Operational Variables

Following environmentatl variables are defined as per tghe requirment of awk program.

IGNORECASE=

If IGNORECASE is nonzero or non-null, then all string comparisons and all regular expression matching are case-independent.

OFS

The Output Field Separator . It is output between the fields printed by a print statement. Its default value is ” ”, a string consisting of a single space. localhost ~]$ awk ’{ OFS=”|” ; print $1,$2,$3,$4}’ testfile

column1|column2|column3|column4 1111|2222|3333|4444 1111|2222|3333|4444 1111|2222|3333|4444

it can be defined by -F option also , following example define field separator as : and print first field. awk -F: ’{ print $1}’ /etc/passwd

ORS - Output Record Seprator

oeparator determines how records/ lines are separated default value is “\n”, the newline character. Lets use earlier used rpms file to print lines separated by an || operator.

localhost ~]$ awk '{ ORS="||" ; print $0}' rpms
libhbalinux-1.0.16-2.fc20.x86_64||gucharmap-3.10.1-1.fc20.x86_64||libplist-1.11-2.fc20.x86_64||libgcc-4.8.3-7.fc20.i686||glx-utils-8.1.0-4.fc20.x86_64||vlgothic-fonts-20140801-1.fc20.noarch||

NF - Number of Fields, separated by space or designated by FS value

Example : count number of fields separated by :

localhost ~]$ awk -F: '{ print $0,NF}' /etc/passwd
root:x:0:0:root:/root:/bin/bash 7
bin:x:1:1:bin:/bin:/sbin/nologin 7
daemon:x:2:2:daemon:/sbin:/sbin/nologin 7
adm:x:3:4:adm:/var/adm:/sbin/nologin 7
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin 7
sync:x:5:0:sync:/sbin:/bin/sync 7

RS

The input record separator. default is a new line but can be changed to other values depending on the input file.

ARGC, ARGV

The command-line arguments available to awk programs are stored in an array called ARGV.

ARGC is the number of command-line arguments ARGV is the value of argument. present and is indexed from 0 to ARGC -1 AWKPATH awk gets its search path from the AWKPATH environment variable. If that variable does not exist, or if it has an empty value, gawk uses a default path ‘.:/usr/local/share/awk’.