Server statistics with UNIX

Let's construct a pipeline to parse some line-based text data!

Network connections and TLS for this gemini server are handled by relayd(8) (see my first blog post), which by default logs some basic information for each request. It spits out something like the following into /var/log/daemon:

Nov 15 23:52:48 bvnf relayd[89123]: relay gemini, session 170 (1 active), 0, XXX.XXX.XXX.XXX -> 127.0.0.1:11965, done
Nov 16 00:22:19 bvnf relayd[21497]: relay gemini, session 186 (1 active), 0, YYY.YYY.YYY.YYY -> 127.0.0.1:11965, done
Nov 16 09:49:01 bvnf relayd[89123]: relay gemini, session 170 (1 active), 0, XXX.XXX.XXX.XXX -> 127.0.0.1:11965, done

(XXX.XXX.XXX.XXX is a cleverly obfuscated IP address).

=> first blog post: setting up vger(8) on OpenBSD

Since setting up the server, let's have a look at how many unique visitors the server is getting.

Firstly, extract the bits we want from the log:

$ awk '/relay gemini/ { printf("%s %s %s %s\n", $1, $2, $3, $13) } ' < /var/log/daemon | tee tmp
Nov 15 23:52:48 XXX.XXX.XXX.XXX
Nov 16 00:22:18 YYY.YYY.YYY.YYY
Nov 16 06:49:01 XXX.XXX.XXX.XXX

Now filter out lines with duplicate IP addresses:

$ sort -uk 4 < tmp | tee tmp.2
Nov 15 23:52:48 XXX.XXX.XXX.XXX
Nov 16 00:22:18 YYY.YYY.YYY.YYY

So we have a list of all the times an IP address first made a request to the server.

Now, we need some way to visualise these data; there are many options but for this post, let's use Python.

So that Python can read the dates as the correct data type, we make our dates look more approachable. We could try to put them into Unix time, but lets try ISO 8601-ish.

=> https://armaanb.net/iso8601.html

Some options:

Let's be (mostly) portable, and go for sed. Our task involves changing something like "Nov 15" to "2021-11-15".

$ sed 's/Oct/10/;s/Nov/11/;s/Dec/12/;' < tmp.2
11 15 23:52:48 XXX.XXX.XXX.XXX
11 16 00:22:18 YYY.YYY.YYY.YYY

It's a bit messy, as currently this code isn't going to work after December this year. However, we are aware of that potential problem and it's ok, since I'm only doing this for fun, in November.

Now add in the year and change some spaces for hyphens:

$ sed 's/Oct/10/;s/Nov/11/;s/Dec/12/;s/^\(..\) /2021-\1-/' < tmp.2
2021-11-15 23:52:48 XXX.XXX.XXX.XXX
2021-11-16 00:22:18 YYY.YYY.YYY.YYY

Nice! One last thing: days in the log whose day number is less than 10 have been written using a single digit, which isn't the ISO way.

Pad those:

$ sed 's/Oct/10/;s/Nov/11/;s/Dec/12/;s/^\(..\) /2021-\1-/;' < tmp.2 \
  | sed 's/-\([0-9]\) /-0\1 /'
2021-11-15 23:52:48 XXX.XXX.XXX.XXX
2021-11-16 00:22:18 YYY.YYY.YYY.YYY

Good! Note that we could move some of these calls around to make the whole pipeline a bit quicker:

$ awk '/relay gemini/ {printf("2021-%s-%s %s %s\n", $1, $2, $3, $13) }' < /var/log/daemon \
  | sort -uk 4 \
  | sed 's/Nov/11/;s/Oct/10/;s/Dec/12/;s/-\([0-9]\) /-0\1 /' \
  | sort > tmp.3

The ISO format is particularly useful for sorting correctly.

Now, to make a graph, we don't care about which specific IP addresses are connecting when, so get rid of them:

cut -d' ' -f 1-2 < tmp.3 | tee tmp.4
2021-11-15 23:52:48
2021-11-16 00:22:18

Now Python makes it easy:

import numpy as np
import matplotlib.pyplot as plt

with open("tmp.4", "r") as f:
    bytes = f.read()
strtimes = bytes.split('\n')[:-1]

times = [np.datetime64(i) for i in strtimes]
times = sorted(times)

plt.hist(times, bins=23)
plt.savefig("../media/004-unique-visitors.pdf")

=> the histogram

It's not beautiful, but I can see what I wanted to. There was a wave of new people around the 31st October, just after I added a link to the server from my website, and then a bigger peak two days ago when I added a link to the blog on ew0k's antenna. Between these two events there's some noise, probably from random crawlers and me on various devices.

=> commit adding the gemini link to https://bvnf.space/ | antenna

--

written 2021-11-19

=> blog home | home

Proxy Information
Original URL
gemini://gemini.bvnf.space/blog/004_counting_with_unix.gmi
Status Code
Success (20)
Meta
text/gemini;
Capsule Response Time
472.991598 milliseconds
Gemini-to-HTML Time
1.645021 milliseconds

This content has been proxied by September (ba2dc).