Two little scripts: <code>addup</code> and <code>sumup</code>

utcc.utoronto.ca/~ckscks2026年04月05日 01:20

(Once again it's been a while since the last little script.)

Every so often I find myself in a situation where I have a bunch of lines with multiple columns and I want to either add up all of the numbers in one column (for example, to get total transfer volume from Apache log files) or add up all of the numbers in one column grouped by the value of a second column. This leads to two scripts, which I call 'addup' and 'sumup'.

Addup is a simple awk script that adds up all the values from some column:

#!/bin/sh
# add up column N
awk '{sum += $('$1') } END {print sum}'

(Looking at this now, I should use printf and specify a format to avoid scientific notation. A more sophisticated version would do things like allow you to set the column separator character(s) rather than just using the awk default of whitespace, but so far I haven't needed anything more.)

My version of sumup is more complicated than I've described, partly it either counts up how many times each value happened for a particular field or it computes a sum of another field for the particular field. This sounds abstract, so let me make it more concrete. Suppose that you have a file of lines that look like:

300 thing1
800 thing2
900 thing1
100 thing3
[...]

Sumup can either tell you how many times each of the second field occurs, or sum up the value of the first field for each of the values of the second field (giving you 1200 for thing1, 800 for thing2, and 100 for thing3 in this simple case).

The actual sumup that I currently use is a Python program, partly so that I can conveniently print output sorted by the breakdown field. However, my older awk-based version is:

#!/bin/sh
# sum up field $1 by field $2
# if no $2 is provided, it just counts by one.
(
if [ -n "$2" ]; then 
        awk '{sums[$'$1'] += $'$2'} END {for (i in sums) print sums[i], i}'
else
        awk '{sums[$'$1'] += 1} END {for (i in sums) print sums[i], i}'
fi
) | sort -nr

My memory is that this version works fine, although it's been a while since I used it.

If there are relatively widely available Unix utilities that will do these jobs, I'm not aware of them, although I wouldn't be surprised if they've emerged by now.

PS: Looking at the sort of things I do with these tools, I should also write an 'avgup', although that strays into the lands of statistical analysis where I may also want things like the median.