2013-06-09

GitHub pro­vides code fre­quency plots that show the num­ber of lines added and re­moved within a repos­i­tory over time:

Code frequency plot from GitHub

Here’s a quick and dirty method to vi­su­al­ize this in­for­ma­tion from any lo­cal repos­i­tory in more chrono­log­i­cal de­tail, us­ing the cal­en­darHeat R func­tion from makeR:

git log --format=format:%cd --date=short --shortstat --no-merges master \
| paste - - - | sort --key 1 | sed '$a\\' \
| awk --field-separator "\t" '
$1 != date { print date, ins, del; date = $1; ins = 0; del = 0; }
{ match($2, /([0-9]+) ins/, m); ins += m[1];
match($2, /([0-9]+) del/, m); del += m[1]; }'
\
| r --eval '
library("makeR")
attach(read.table(textConnection(readLines("stdin"))))
png("heatmap.png")
calendarHeat(V1, sapply(pmax(V2, V3), log))'

and the re­sult:

Calendar heatmap of Git repository for Rack

Instead of show­ing line in­ser­tion and dele­tion counts sep­a­rately, I’ve cho­sen to use the sim­pli­fied met­ric of the max­i­mum of the two counts for each com­mit, and I’ve col­ored it on a log scale to ac­cen­tu­ate small vari­a­tions.