Functional Programing in AWK

Speculative View on How FP Could Work in AWK

August 21, 2019

Note: This is an unedited draft.

AWK is a small utility language present on most unix style-systems today. The language it self is very niche, and has probably not aged very well when compared to other contemporary languages. In this post I would like to walk through what AWK might look like if it were given an update to support some functional programming features like lambdas and higher order functions. For the remainder of this article I will refer to this updated version as fp-awk.

Missing Features

AWK is a great tool for prototyping calculations of tabular data. It has the basic support for arrays and functions, however there are some things that trip up many folks when starting with AWK. First arrays can not be returned, they must be passed into a function as out-parameters. Second, functions are not first-class citizens, so they can not be passed around and called.

Let us look at an example of arrays as out-parameters. Below we have a simple AWK program that adds one point for every row that matches the team name. The points array is passed in every time we want to increment it. The points array lives as a global, but is assigned a local name inside of increment.

BEGIN {
  teamName=1
}

function increment(key, value, array) {
  array[key] = array[key] + value
}

$teamName ~ /teamAlice/   { increment("teamAlice", 1, points)}
$teamName ~ /teamBob/     { increment("teamBob", 1, points)}
$teamName ~ /teamCharlie/ { increment("teamCharlie", 1, points)}

Let’s say we want to create another array with a subset of the original, say an array only the highest scoring teams. We could define a function as such:

function topTeams(threshold, inputArray, outputArray) {
  for (key in inputArray) {
    if (inputArray[key] >= threshold) {
      outputArray[key] = inputArray[key]
    }
  }
}

END {
      topTeams(90, points, ninetyPointClub)
      // ...do something with ninetyPointClub...
    }

This function is simple, we want to copy only the subset of teams that are above a threshold so we can do something with it.

How much nicer would it be if we could instead return the function…have a look at the following hypothetical code:

function topTeams(threshold, inputArray) {
  local outputArray
  for (key in inputArray) {
    if (inputArray[key] >= threshold) {
      outputArray[key] = inputArray[key]
    }
  }

  return outputArray
}

END {
      ninetyPointClub = topTeams(90, points)
      // ...do something with ninetyPointClub...
    }

We can now in a sense start to see that in other languages with support for higher order functions and elementary functions on collections this migth look closer to:

function filter(predicate, array) {
  // ... some default implementation returning a new array of the subset ... 
}

function topTeams(threshold, value) {
  return value >= threshold
}

END {
      ninetyPointClub = filter(topTeams(90), points)
      // ...do something with ninetyPointClub...
    }

Now there’s quite a bit to unpack in this theoretical implementation. To make this happen in standard AWK we would need additional support for returning arrays, higher order functions, partial application, and currying.

Compatability

I feel that fp-awk could be compatible with existing AWK implementations simply by transcribing new features into an implementation that older AWK implementations understand.

Returning Arrays

For simplicity let’s look at how we migh implement the identity function that returns an array into an older version of AWK…

// In fp-awk:
newArray = identity(array)

// In standard AWK:
identity(array, newArray)

Any transpilation step would take the fp-awk assignment operation and append the target identifier as an out parameter to be processed in standard awk. While this example seems trivial, and the syntax provides very little, we gain much more usability when we compose array producing functions.

For example composing the identity function any number of times could generate the intermediate variables that would be required in standard AWK.

// In fp-awk:
newArray = identity(identity(identity(array)))

// In standard AWK:
identity(array, _tempArray1)
identity(_tempArray1, _tempArray2)
identity(_tempArray2, newArray)

Higher Order Functions

Let us take a look at how we might implement higher order functions.

In awk we are unable to read identifiers or push identifiers into a function, so what we end up having to do is function generation for any number of function cominations. For example:

// fp-awk
function map(fn, array) { 
  for (key in array) {
    newArray[key] = fn(array[key])
  }
  return newArray
}

result = map(identity, array)

// In standard AWK:
function map_identity(array, newArray) { 
  for (key in array) {
    newArray[key] = identity(array[key])
  }
}

map_identity(array, result)

Map accepts a single function as it’s first paramter, it then applies the function to each of the elements in the provided array. The only way to have an AWK compatible way to do this is to generate functions that bind the higher order function and the function parameter.

In our instance map and identity result in map_identity. We do get reuse from using the same combinations, but for varying combinations, we get a new function for every combination.

What about functions that return functions? We can follow a similar pattern as above, but generally endup with a more complex code generation.

// fp-awk
function getFn(choice) {
  if (choice == 1) {
    return identity
  } else {
    // Nothing
    return
  }
}

result = map(getFn($1), array)

// Standrd AWK 
function map_getFn(choice, array, result) {
  if (choice == 1) {
    map_identity(array, result)
  } else {
    // Nothing
  }
}

map_getFn($1, array, result)

Conclusion

This has been an interesting thought experiment. If AWK had support for returning array, and some ideas from functional programming it would help with the developer experience. We’ve really only touched a small part of what could be. There’s still issues of inter-operability with natively available functions. Perhaps someday fp-awk can become a reality, but for now, it was fun to think about how it could look syntactically and how one might go about backward compatability through code-generation.