awk with examples
'awk' is one of the most versatile command or utility of unix OS, infact its a full-blown domain specific langauge in itself, it has powerfull text processing capabilities, here are some examples which shows how it can be used for text processing
We will be using below Sample File for our examples
neeraj@ubuntu:~/WorkShop$ cat Country.txt
COUNTRY,CAPITAL,AREA IN SQ KM
India,Delhi,3287263
Japan,Tokoyo,377975
USA,Washington DC,9833520
UK,London,242495
Australia,Canberra,7692024
China,Beijing,9596961
Ukraine,Kyiv,603628
Russia,Moscow,17098246
Brazil,Brasília,8515767
South Africa,Cape Town,1221037
France,Paris,643801
Taiwan,Taipei,36197
Canada,Ottawa,9984670
1. Printing Entire content (all columns ) of a comma(,) separated file, within awk "$0" represents all columns
neeraj@ubuntu:~/WorkShop$ awk -F"," '{print $0}' Country.txt
COUNTRY,CAPITAL,AREA IN SQ KM
India,Delhi,3287263
Japan,Tokoyo,377975
USA,Washington DC,9833520
UK,London,242495
Australia,Canberra,7692024
China,Beijing,9596961
Ukraine,Kyiv,603628
Russia,Moscow,17098246
Brazil,Brasília,8515767
South Africa,Cape Town,1221037
France,Paris,643801
Taiwan,Taipei,36197
Canada,Ottawa,9984670
2. Printing specific column of a comma(,) separated file, within awk "$n" represents column n, for example here we have printed 2nd column,
Please note the anything can be a input separator ( even space ), we have to give it under Double Quotes after "-F"
neeraj@ubuntu:~/WorkShop$ awk -F"," '{print $2}' Country.txt
CAPITAL
Delhi
Tokoyo
Washington DC
London
Canberra
Beijing
Kyiv
Moscow
Brasília
Cape Town
Paris
Taipei
Ottawa
3. You can use "tolower" inbuilt function to print entire content or a specific column in lower Characters
neeraj@ubuntu:~/WorkShop$ awk -F"," '{print tolower($2)}' Country.txt
capital
delhi
tokoyo
washington dc
london
canberra
beijing
kyiv
moscow
brasília
cape town
paris
taipei
ottawa
4. Similarly we have "toupper" function as well, which can print entire content or a specific column in upper Characters
neeraj@ubuntu:~/WorkShop$ awk -F"," '{print toupper($2)}' Country.txt
CAPITAL
DELHI
TOKOYO
WASHINGTON DC
LONDON
CANBERRA
BEIJING
KYIV
MOSCOW
BRASíLIA
CAPE TOWN
PARIS
TAIPEI
OTTAWA
5. here is how you can use "$0" and "toupper" together to print entire conent in upper case
neeraj@ubuntu:~/WorkShop$ awk -F"," '{print toupper($0)}' Country.txt
COUNTRY,CAPITAL,AREA IN SQ KM
INDIA,DELHI,3287263
JAPAN,TOKOYO,377975
USA,WASHINGTON DC,9833520
UK,LONDON,242495
AUSTRALIA,CANBERRA,7692024
CHINA,BEIJING,9596961
UKRAINE,KYIV,603628
RUSSIA,MOSCOW,17098246
BRAZIL,BRASíLIA,8515767
SOUTH AFRICA,CAPE TOWN,1221037
FRANCE,PARIS,643801
TAIWAN,TAIPEI,36197
CANADA,OTTAWA,9984670
6. You can use "NR" built in variable to print line numbers ( row numbers )
neeraj@ubuntu:~/WorkShop$ awk -F"," '{print NR,$0}' Country.txt
1 COUNTRY,CAPITAL,AREA IN SQ KM
2 India,Delhi,3287263
3 Japan,Tokoyo,377975
4 USA,Washington DC,9833520
5 UK,London,242495
6 Australia,Canberra,7692024
7 China,Beijing,9596961
8 Ukraine,Kyiv,603628
9 Russia,Moscow,17098246
10 Brazil,Brasília,8515767
11 South Africa,Cape Town,1221037
12 France,Paris,643801
13 Taiwan,Taipei,36197
14 Canada,Ottawa,9984670
7. You can use "NF" built in variable to print "Field numbers" ( or the number of coulmns ), which are present in each line (based on field separator )
neeraj@ubuntu:~/WorkShop$ awk -F"," '{print NF}' Country.txt
3
3
3
3
3
3
3
3
3
3
3
3
3
3
8. "NF" can also be used to specify a column, for example here we have printed last column with help of NF ( Since total number of filed or coloumn are three in this file, NF is equal to 3, means we printed 3rd column)
neeraj@ubuntu:~/WorkShop$ awk -F"," '{print $NF}' Country.txt
AREA IN SQ KM
3287263
377975
9833520
242495
7692024
9596961
603628
17098246
8515767
1221037
643801
36197
9984670
9. if you want to print line having specific value, you can do it using following way, this example shows how you can print the line having 3rd column value greater than "3287263"
neeraj@ubuntu:~/WorkShop$ awk -F"," '$3>3287263{print $0}' Country.txt
COUNTRY,CAPITAL,AREA IN SQ KM
USA,Washington DC,9833520
Australia,Canberra,7692024
China,Beijing,9596961
Russia,Moscow,17098246
Brazil,Brasília,8515767
Canada,Ottawa,9984670
10. This example shows how to print rows based on equality condition, Note "==" operator is used
neeraj@ubuntu:~/WorkShop$ awk -F"," '$3==3287263{print $0}' Country.txt
India,Delhi,3287263
11. Again Variable "$NF" hold value at NF ( 3 Column ) position
neeraj@ubuntu:~/WorkShop$ awk -F"," '$NF>3287263{print $0}' Country.txt
COUNTRY,CAPITAL,AREA IN SQ KM
USA,Washington DC,9833520
Australia,Canberra,7692024
China,Beijing,9596961
Russia,Moscow,17098246
Brazil,Brasília,8515767
Canada,Ottawa,9984670
12. One more example with "NF" used with equality operator for comparision of value "$NF" position
neeraj@ubuntu:~/WorkShop$ awk -F"," '$NF==3287263{print $0}' Country.txt
India,Delhi,3287263
13. awk By default have " " ( space ) as its input field separator, Notice that in this example we did not specifiy input field separator hence " " space is input field sperator, hence from the line "South Africa,Cape Town,1221037" it has only printed string "South"
neeraj@ubuntu:~/WorkShop$ awk '{print $1}' Country.txt
COUNTRY,CAPITAL,AREA
India,Delhi,3287263
Japan,Tokoyo,377975
USA,Washington
UK,London,242495
Australia,Canberra,7692024
China,Beijing,9596961
Ukraine,Kyiv,603628
Russia,Moscow,17098246
Brazil,Brasília,8515767
South
France,Paris,643801
Taiwan,Taipei,36197
Canada,Ottawa,9984670
14. By using "+=$N" you can calculate sum of a specific column, Please refer below example , it also make uses of "BEGIN" ,"END" and "FS" keywords , this example starts with "BEGIN" where field separator has been specified, "FS" specify "Field seperator" jist like "-F",after that sum is calculated by "+=$N " at the "END" block prints the Sum
neeraj@ubuntu:~/WorkShop$ awk 'BEGIN{FS=","}; {sum+=$3} {print $0} END{print sum;}' Country.txt
COUNTRY,CAPITAL,AREA IN SQ KM
India,Delhi,3287263
Japan,Tokoyo,377975
USA,Washington DC,9833520
UK,London,242495
Australia,Canberra,7692024
China,Beijing,9596961
Ukraine,Kyiv,603628
Russia,Moscow,17098246
Brazil,Brasília,8515767
South Africa,Cape Town,1221037
France,Paris,643801
Taiwan,Taipei,36197
Canada,Ottawa,9984670
69133584
15. You can print substring with built in function "substr", you need to provide column number , start position within column number and length
neeraj@ubuntu:~/WorkShop$ awk -F"," '{print $1,substr($1,1,2)}' Country.txt
COUNTRY CO
India In
Japan Ja
USA US
UK UK
Australia Au
China Ch
Ukraine Uk
Russia Ru
Brazil Br
South Africa So
France Fr
Taiwan Ta
Canada Ca
16. You can print two functions in combination too, here we have used "substr" result as a input of "toupper" function
neeraj@ubuntu:~/WorkShop$ awk -F"," '{print $1,toupper(substr($1,1,2))}' Country.txt
COUNTRY CO
India IN
Japan JA
USA US
UK UK
Australia AU
China CH
Ukraine UK
Russia RU
Brazil BR
South Africa SO
France FR
Taiwan TA
Canada CA
17. you can use external or SHELL variable inside awk by using "-v" option, this comes handy when you need to compare any column with externally defined value at runtime, in this example we have printed the row where Second Column String is "Delhi".
neeraj@ubuntu:~/WorkShop$ MyVar=Delhi
neeraj@ubuntu:~/WorkShop$ awk -F"," -v x=$MyVar '$2 == x {print $0}' Country.txt
India,Delhi,3287263
neeraj@ubuntu:~/WorkShop$ MyVar=Paris
neeraj@ubuntu:~/WorkShop$ awk -F"," -v x=$MyVar '$2 == x {print $0}' Country.txt
France,Paris,643801
18. You can use "awk" as grep, in this example we have searched for string "London"
neeraj@ubuntu:~/WorkShop$ awk '/London/' Country.txt
UK,London,242495
19. You can even use wild card mask while searching lines. in this example "." ( dot ) represent a single character, we have searched for six character string which surrounded by "," ( at both start and End )
neeraj@ubuntu:~/WorkShop$ awk '/,......,/' Country.txt
Japan,Tokoyo,377975
UK,London,242495
Russia,Moscow,17098246
Taiwan,Taipei,36197
Canada,Ottawa,9984670
20. Here is another example of using wild card within "awk" and using it like "grep", this example prints lines starting with "U"
neeraj@ubuntu:~/WorkShop$ awk '/^U/' Country.txt
USA,Washington DC,9833520
UK,London,242495
Ukraine,Kyiv,603628
21. Following example Searches the pattern starting with "U" followed by any single character "." and then comma (,)
neeraj@ubuntu:~/WorkShop$ awk '/U.,/' Country.txt
UK,London,242495
22. "awk" will print the output in space separated format, if you use comma between fields
neeraj@ubuntu:~/WorkShop$ awk -F"," '{print $1,$2,$3}' Country.txt
COUNTRY CAPITAL AREA IN SQ KM
India Delhi 3287263
Japan Tokoyo 377975
USA Washington DC 9833520
UK London 242495
Australia Canberra 7692024
China Beijing 9596961
Ukraine Kyiv 603628
Russia Moscow 17098246
Brazil Brasília 8515767
South Africa Cape Town 1221037
France Paris 643801
Taiwan Taipei 36197
Canada Ottawa 9984670
23. "awk" will print the output without any separator, if you dont use comma between fields.
neeraj@ubuntu:~/WorkShop$ awk -F"," '{print $1 $2 $3}' Country.txt
COUNTRYCAPITALAREA IN SQ KM
IndiaDelhi3287263
JapanTokoyo377975
USAWashington DC9833520
UKLondon242495
AustraliaCanberra7692024
ChinaBeijing9596961
UkraineKyiv603628
RussiaMoscow17098246
BrazilBrasília8515767
South AfricaCape Town1221037
FranceParis643801
TaiwanTaipei36197
CanadaOttawa9984670
24. "awk" has built in variable "OFS" For separating output fields, it can be used with "BEGIN" block, here is the example, output is separated by , (comma )
neeraj@ubuntu:~/WorkShop$ echo "A B C" | awk 'BEGIN{OFS=","} {print $1,$2,$3}'
A,B,C
25. "OFS" can also be used with -v option, refer below example
neeraj@ubuntu:~/WorkShop$ echo "A B C" | awk -v OFS="," '{print $1,$2,$3}'
A,B,C
26. if you wish to use "OFS" along with "$0" , you need reset the field value using "$1=$1", here is the example
neeraj@ubuntu:~/WorkShop$ echo "A B C" | awk -v OFS="," '{$1=$1;print $0}'
A,B,C
27. "OFS" will not work here as we are not resetting field value
neeraj@ubuntu:~/WorkShop$ echo "A B C" | awk 'BEGIN{OFS=","} {print $0}'
A B C
28. lets apply "OFS" to seperate output fields by "|" ( pipe ) symbol, for our test file "Country.txt".
neeraj@ubuntu:~/WorkShop$ awk 'BEGIN{FS=","} { $1=$1; OFS="|"} {print $0}' Country.txt
COUNTRY CAPITAL AREA IN SQ KM
India|Delhi|3287263
Japan|Tokoyo|377975
USA|Washington DC|9833520
UK|London|242495
Australia|Canberra|7692024
China|Beijing|9596961
Ukraine|Kyiv|603628
Russia|Moscow|17098246
Brazil|Brasília|8515767
South Africa|Cape Town|1221037
France|Paris|643801
Taiwan|Taipei|36197
Canada|Ottawa|9984670
29. "awk" also has inbuilt variable called "FILENAME" , which can be used to print input filename
neeraj@ubuntu:~/WorkShop$ awk -F"," '{print FILENAME,$1}' Country.txt
Country.txt COUNTRY
Country.txt India
Country.txt Japan
Country.txt USA
Country.txt UK
Country.txt Australia
Country.txt China
Country.txt Ukraine
Country.txt Russia
Country.txt Brazil
Country.txt South Africa
Country.txt France
Country.txt Taiwan
Country.txt Canada
30. You can also use "awk" to print your desired string and separator between fields, as below example
neeraj@ubuntu:~/WorkShop$ awk -F"," '{print "Unix is great os,"$1","$2","$3}' Country.txt
Unix is great os,COUNTRY,CAPITAL,AREA IN SQ KM
Unix is great os,India,Delhi,3287263
Unix is great os,Japan,Tokoyo,377975
Unix is great os,USA,Washington DC,9833520
Unix is great os,UK,London,242495
Unix is great os,Australia,Canberra,7692024
Unix is great os,China,Beijing,9596961
Unix is great os,Ukraine,Kyiv,603628
Unix is great os,Russia,Moscow,17098246
Unix is great os,Brazil,Brasília,8515767
Unix is great os,South Africa,Cape Town,1221037
Unix is great os,France,Paris,643801
Unix is great os,Taiwan,Taipei,36197
Unix is great os,Canada,Ottawa,9984670
if you want learn to Basic Unix Commands in 1 Hour, here is the link
Basic Unix Commands in 1 Hour
if you want learn Unix/Linux Commands in detail, here is the link
Learn Unix/Linux Commands in detail
Also keep visiting my blog to learn more
unixtechworld.blogspot.com
No comments:
Post a Comment