The trick to understanding awk in all its terse glory is to understand its defaults. I made a screencast explaining how awk works by deconstructing a script I’d previously written for this blog 1. In this post we’ll look at deconstructing awk’s defaults so we can understand all those one-liner scripts stack overflow solutions throw your way.

The example

I have a file that contains the version info for my apps and I’d like to extract the first version number in there:

// appVersion.gradle

def baseCode = 30001

def appVersion = [
    product-1 : [
        name: "21.091.420",
        code: baseCode
    ],

    product-2: [
        name: "20.090.300",
        code: baseCode
    ],
//...

// I want to pluck 21.091.420 from this file

The first solution (meh)

Some quick googling revealed this stack overflow solution which gets us close:

 gawk -F'"' '$0=$2' appVersion.gradle

# -- output --
# 21.091.420
# 20.090.300

I only require the first number though so a quick way2 to do this would just be:

gawk -F'"' '$0=$2' appVersion.gradle | head -n 1

# -- output --
# 21.091.420

The problem with solution 1

  1. awk is powerful and to reach out to head for that last teeny tiny mile seemed sacrilegious. I want this solution to be pure awk.
  2. What the heck does that incantation gawk '$0=$2' do? 3

The basics

Let’s try to take that script apart piece by piece:

default input field delimiter

gawk  -F'"' '$0=$2' appVersion.gradle
#     ↑
#    input field delimiter

If you don’t specify the input field delimiter, awk sensibly defaults to the space character. Let’s try some examples:

echo "Hello kind world"  | gawk       '{print $2}'
echo "Hello kind world"  | gawk -F" " '{print $2}'

# -- output --
# kind

echo "Hello kind world"  | gawk -F"," '{print $2}'
# -- no output --

Notice how the line is split into numbered “segments” where $1, $2, $3 hold the first three words in our example respectively. $0 represent the entire line.

default syntax

If you watched my screencast you’ll remember that awk’s general syntax is as follows:

awk '
    BEGIN { a1; a2; a3; }      ← optional
    <pattern> { a1; a2; a3; }  ← action block (mandatory)
    END { a4; a6; }            ← optional
' <filename>

Most awk one-liners typically don’t use the begin & end blocks.

So looking back at my simple one-liner:

echo "Hello kind world"  \
    | gawk ' { print $2 }'
#            ↓
#            action block ✅

🛑 ✋ but wait, what’s going on with the original one-liner 👇?

gawk -F'"' '$0=$2' appVersion.gradle
#           ↑
#           🤔
#         is this a <pattern>?
#         is this an action block?

For this, we need to understand how the awk pattern recognition works:

# general syntax
gawk '<pattern> { a1; a2; a3; }'

echo "Hello kind world" \
  | gawk '0 { print $2 }'
#         ↑
#         forcing result of <pattern> match as 0

# -- output --
# no output

echo "Hello kind world" \
    | gawk '1 { print $2 }'
echo "Hello kind world" \
    | gawk '2 { print $2 }'
echo "Hello kind world" \
    | gawk '3 { print $2 }'
#           ↑
#           forcing result of <pattern> match as 3 / non-0

# -- output for all the above --
# kind

So the way that <pattern> condition matching works is if awk sees 0 4 the pattern match condition is “false” and awk ignores the action block. Anything > 0 and awk treats the condition as “true” and executes the action block. Ok back to the one-liner:

gawk -F'"' '$0=$2' appVersion.gradle
#           ↑
#         is this a valid pattern?
#               ✅ we're getting some non-0 value
#                  cause things are being printed
#         is this an action block? 🤔

So $0=$2 is coming back with a result of > 0 and some invisible default is being executed. Progress… but still many questions.

default action

Let’s try some commands. Notice the output for each of them:

echo "Hello kind world" | gawk '0 {print $0}'
echo "Hello kind world" | gawk '0 {print}'
echo "Hello kind world" | gawk '0'
echo "Hello kind world" | gawk ''

# -- output --
# no output


echo "Hello kind world" | gawk '1 {print $0}'
echo "Hello kind world" | gawk '1 {print}'
echo "Hello kind world" | gawk '1'

# -- output --
# Hello kind world

So when the <pattern> match is false (0) nothing is printed and when it is 1 then the default is to just print the entire line ($0). In fact you don’t have to specify anything and awk assumes you want to print $0 by default.

variable reassignment

You know how we glorify immutability with most programming? awk ain’t having any of that.

You can mutate the heck out of anything. You can mutate the current line before you even run an action on it. Check this piece of code out:

echo "Hello kind world" | gawk '{print $0" <-> "$1" <-> "$2" <-> "$3}'

# -- output --
# Hello kind world <-> Hello <-> kind <-> world
#        ↑              ↑          ↑        ↑
#        $0             $1         $2       $3

echo "Hello kind world" | gawk '1 {$0="hijack"; print $0" <-> "$1" <-> "$2" <-> "$3}'
# -- output --
# hijack <-> hijack <->  <->
#    ↑         ↑       ↑     ↑
#    $0        $1      🙅    🙅

Even before the action block is executed you can reassign the entire line.

number of fields

Here’s the last piece that should help bring this all together. Given this file again:

// appVersion.gradle

def baseCode = 30001

def appVersion = [
    product-1 : [
        name: "21.091.420",
        code: baseCode
    ],

    product-2: [
        name: "20.090.300",
        code: baseCode
    ],
//...
gawk '{print NF ": "$0}' appVersion.gradle

# -- output --
# 0:
# 4: def baseCode = 30001
# 0:
# 4: def appVersion = [
# 3:     product-1 : [
# 2:         name: "21.091.420",
# 2:         code: baseCode
# 1:     ],
# 0:
# 2:     product-2: [
# 2:         name: "20.090.300",
# 2:         code: baseCode
# 1:     ],

The first solution (again)

All right, let’s do this one last time.

 gawk -F'"' '$0=$2' appVersion.gradle

# -- output --
# 21.091.420
# 20.090.300

What’s happening here is a beautiful symphony of awk defaults stacking on top of each other.

We first reassign the variable holding the entire line ($0) to $2. Remember that $2 holds the second word/token after splitting the original content in $0 with the input field separator ". This should help point out the resulting fields with the new field separator:

 gawk -F'"' '{print NF ": "$0}' appVersion.gradle
# 0:
# 1: def baseCode = 30001
# 0:
# 1: def appVersion = [
# 1:     product-1 : [
# 3:         name: "21.091.420",
# 1:         code: baseCode
# 1:     ],
# 0:
# 1:     product-2: [
# 3:         name: "20.090.300",
# 1:         code: baseCode
# 1:     ],

The original command should make sense now:

 gawk -F'"' '$0=$2' appVersion.gradle

# -- output --
# 21.091.420
# 20.090.300

💥

This is really such a gorgeous piece of code. Clever and poetic.

The final solution

If you’re curious how I came up with my own solution, I made it a little less clever, more verbose and hopefully now simpler to understand:

gawk -F'"' 'NF==3 {print $2; exit}' appVersion.gradle

# -- output --
# 21.091.420

Go forth and awk.


  1. If you want to understand the fundamentals of awk first, I recommend that screencast. ↩︎

  2. a.k.a practical and the one I’d recommend to others on a time crunch. ↩︎

  3. down the 🐰 hole we go ↩︎

  4. In awk land 0 = false, anything > 0 is true ↩︎