TUCoPS :: Unix :: General :: gawk.txt

A Guide to playing with gawk

::--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--::
::              .ooO A Guide to playing with gawk by Wyzewun Ooo.           ::
::--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--::
::                                                                          ::
:: I was shocked at the number of people who don't know how to use (g)awk   ::
:: properly, so I decided to write up a guide to getting starting with gawk ::
:: for text formatting or whatever. Oh, I generally refer to gawk, but if   ::
:: you have an ancient *nix then you may have another version, but awk will ::
:: probably symlink to it anyway. Here's a little chart of the evolution of ::
:: the awk utility...                                                       ::
::                                                                          ::
::             awk ------> nawk ------> POSIXawk ------> gawk               ::
::                                                                          ::
:: Right, so lets try some simple stuff with awk first. Probably the most   ::
:: commonly known thing that one can do with awk is format coloums. For     ::
:: example, the output of a command like host -l gov.za would have an       ::
:: output that looks like this...                                           ::
::                                                                          ::
:: <stuff cut out>                                                          ::
:: gp.gov.za has address 196.254.66.6                                       ::
:: <stuff cut out>                                                          ::
::                                                                          ::
:: Now, we want to format the output of our host command and save the IP    ::
:: addresses to a file called lame. We would type something to the effect   ::
:: of host -l gov.za | gawk '{print $4}' > lame                             ::
::                                                                          ::
:: We are telling awk to print the fourth coloum only, thus the $4, and so  ::
:: we will end up with a list of all the IPs with .gov.za hostnames. ;)     ::
::                                                                          ::
:: Obviously, the above is used by script kiddies a helluva lot, so they    ::
:: can use their l33t0 mscan across a third of the internet, in the hope    ::
:: that they'll find some lame .edu host that they can root and feel elite. ::
:: *Sigh* So lets look at some more useful stuff, shall we? It won't help   ::
:: you pointlessly compromise machines, but it may help you become a        ::
:: proficient Unix user (imagine that).                                     ::
::                                                                          ::
:: Okey Dokey, awk can count the number of coloums as well. We could've     ::
:: done this with the previous example by typing something like             ::
:: host -l gov.za | gawk '{print NF ": " $0}'                               ::
::                                                                          ::
:: We are telling awk to print the number of fields (print NF), followed by ::
:: a colon and a space (": "), right at the beginning of each line of text  ::
:: ($0), so we get an output that will look like...                         ::
::                                                                          ::
:: 4: gp.gov.za has address 196.254.66.6                                    ::
::                                                                          ::
:: You can use *awk for counting lines as well, instead of wc -l, by using  ::
:: NR instead of NF.                                                        ::
::                                                                          ::
:: I also find gawk useful for finding strings in files, when grep can't    ::
:: quite cut it. I could do something like gawk '/wyze1/' /etc/passwd and   ::
:: I would get an output like this...                                       ::
::                                                                          ::
:: wyze1:x:2005:12:wyze1:/home/wyze1:/bin/tcsh                              ::
:: drew:x:2006:13:wyze1:/home/drew:/bin/tcsh                                ::
::                                                                          ::
:: So, I hear you saying "So What? I can do that with grep!" Sure. You can. ::
:: But say you were only looking for the username wyze1 and not that drew   ::
:: account which has wyze1 as the real name and not the username, you can't ::
:: do that with grep, can you? So, we use awk and do something like         ::
:: gawk -F: '$1 ~ /wyze1/' /etc/passwd then I will only get the wyze1       ::
:: account. Easy, huh? =)                                                   ::
::                                                                          ::
:: Say I have given myself 500 pointless accounts on my box, and have       ::
:: specified "Wyzewun" as the Real Name for some & "Wyze1" for others. Now, ::
:: to make things more difficult, the Real Name for some other accounts     ::
:: which I DON'T want have been set as "NotSoWyze1" and "AnythingButWyze1", ::
:: so grep will find all sorts of accounts I don't want. So, I decided to   ::
:: do something like gawk -F: '$5 ~ /Wyze*/' /etc/passwd and I only find    ::
:: the accounts that I want because I specified that the field must begin   ::
:: with "Wyze" and end with anything.                                       ::
::                                                                          ::
:: Now, you can also write *awk programs using BEGIN and END blocks, and it ::
:: becomes in many places much like a proper programming language. BEGIN    ::
:: blocks are used for initializing variables and END blocks are used for   ::
:: things that are input dependant, like totals. Lets make an example       ::
:: program to find all users on the system with the username or real name   ::
:: "drew" on our machine...                                                 ::
::                                                                          ::
:: BEGIN {                                                                  ::
::  FS = ":" # /etc/passwd seperates stuff with colons, remember?           ::
::  OFS = "     " # tab                                                     ::
::  print "Username", "Real Name"                                           ::
::  }                                                                       ::
:: /drew/       {print $1, $5}                                              ::
::                                                                          ::
:: We then save this file as fk_is_lame.awk and then invoke it by typing    ::
:: gawk -f fk_is_lame.awk /etc/passwd and get an output like...             ::
::                                                                          ::
:: Username     Real Name                                                   ::
:: wizdumb      drew                                                        ::
:: drew         wyze1                                                       ::
::                                                                          ::
:: Easy enough. :) If we wanted to do something with an end tag we could    ::
:: rewrite the program like this...                                         ::
::                                                                          ::
:: BEGIN {                                                                  ::
::  FS = ":" # /etc/passwd seperates stuff with colons, remember?           ::
::  OFS = "     " # set output to a tab                                     ::
::  print "Username", "Real Name"                                           ::
::  }                                                                       ::
:: /drew/       {print $1, $5 ; counts++}                                   ::
:: END                                                                      ::
::   {print counts " accounts found."}                                      ::
::                                                                          ::
:: So our output will then look something like...                           ::
::                                                                          ::
:: Username     Real Name                                                   ::
:: wizdumb      drew                                                        ::
:: drew         wyze1                                                       ::
:: 2 accounts found.                                                        ::
::                                                                          ::
:: You can also do comparisons in awk, with the same operators you use in   ::
:: C, C++, Java, whatever. (==, <, >, <=, >=, !=, ~, ~!). The only          ::
:: unfamiliar stuff there should be ~ and ~! which represent matched by and ::
:: not matched by respectively. And if that other stuff isn't familiar, I   ::
:: highly recommend that you start learning to code, not only is it an      ::
:: extrememly rewarding experience, but it is damn useful, wether you're    ::
:: involved in the computer underground or not.                             ::
::                                                                          ::
:: Another really powerful feature of awk, are Range Patterns. Say I have   ::
:: access to an employee record sheet which follows a pattern something like::
:: Name:Employee ID:Salary that looks like...                               ::
::                                                                          ::
:: Drew:666000:14000                                                        ::
:: Koos:231876:100                                                          ::
:: John:967123:18000                                                        ::
:: Marc:000666:16000                                                        ::
::                                                                          ::
:: I want to view all employees with a salary between 13000 and 17000 per   ::
:: month, so I type...                                                      ::
::                                                                          ::
:: cat list | gawk -F: '$3 == 13000, $3 == 17000 {print $1, $3}'            ::
::                                                                          ::
:: And my result is...                                                      ::
::                                                                          ::
:: Drew 14000                                                               ::
:: Marc 16000                                                               ::
::                                                                          ::
:: I could also do something simpler like printing all people with a salary ::
:: less than R1000 with standard operators, like $3 < 1000 would only       ::
:: print Koos's details.                                                    ::
::                                                                          ::
:: We could do that using if statement, like so...                          ::
::                                                                          ::
:: { if $3 < 1000                                                           ::
::   print $1 " is such a loser"                                            ::
:: else                                                                     ::
::   print $1 " is such a pimp" }                                           ::
::                                                                          ::
:: Drew is such a pimp                                                      ::
:: Koos is such a loser                                                     ::
:: John is such a pimp                                                      ::
:: Marc is such a pimp                                                      ::
::                                                                          ::
:: You can also use the shorthand ? : style if then else statement as used  ::
:: in C/C++ and Java, which I personally prefer.                            ::
::                                                                          ::
:: Errr... I really don't have time to finish this article and there's a    ::
:: whole bunch of stuff that I haven't covered. Hrmm. I'll make a sequel    ::
:: some time, okay? ;)                                                      ::
::                                                                          ::
::                               --=====--                                  ::
::                      <WGM> Don't code Java man!!!                        ::
::                       <WGM> Total MS-run Crap!!                          ::
::                <WGM> Code Delphi instead, less MS-based                  ::
::                               --=====--                                  ::
::                                                                          ::
::--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--::

TUCoPS is optimized to look best in Firefox® on a widescreen monitor (1440x900 or better).
Site design & layout copyright © 1986-2024 AOH