2019-03-01 Podcast Numbers

I was wondering about the download statistics for my tiny little podcast. How would I figure this out? On my server, I keep four days of access logs. (See Privacy Policy for more information.) I posted the last episode three days ago and I verified that the last of my log files does not mention it. That means I didn’t miss any of the downloads.

=> Privacy Policy | three days ago

So what I did is I grepped through the logs for the MP3 file, saving those lines for me to look through.

A quick inspection shows that I have to discard HEAD requests. I should just be counting GET requests!

Further inspection shows that quite a few of these requests have the status 206 (”partial content”) so that’s a single application downloading various parts of the file. But how to figure out which of them belong together? I’m trying to figure this out without looking at IP numbers.

Visually, it looks like I can determine what’s going on by looking at the minutes and the user agent for all the 206 results. Let’s try this. (And yes, I did have a HEAD request with a 206 result!)

grep 'GET.* 206 ' 20-halberds-and-helmets.log | perl -e '
while () {
  chomp; my ($ts, $ua) =
    /\[\d\d\/\w+\/\d\d\d\d:\d\d:(\d\d).*"([^"\/]*)[^"]*"$/;
  print "$ts $ua\n";
}
'

And here’s the result, with an arrow indicating the rows I consider to be “duplicates.”

20 Mozilla
44 AppleCoreMedia
44 AppleCoreMedia ←
39 iTMS
44 AppleCoreMedia
44 AppleCoreMedia ←
44 AppleCoreMedia ←
46 Mozilla
46 Mozilla ←
46 Mozilla ←
19 Mozilla

Manually counting them, I think we could get away by saying that we need to discount three AppleCoreMedia and two Mozilla results, right?

So let’s count the hits per user agent and then we’ll correct for the partial content results above.

grep GET 20-halberds-and-helmets.log | perl -e '
my %count;
while () {
  chomp; my ($ua) = /"([^"\/]*)[^"]*"$/;
  $count{$ua}++;
  $total++;
}
for my $ua (sort {$count{$b} <=> $count{$a}} keys %count) {
  print sprintf("%5d %s\n", $count{$ua}, $ua);
}
print "---- --------------------\n";
print sprintf("%5d total\n", $total);
'

This would be the result without correcting for the partial content:

   10 Mozilla
    5 AppleCoreMedia
    4 Pocket Casts
    3 Overcast
    3 PodcastAddict
    2 Dalvik
    2 okhttp
    2 Googlebot-Video
    2 stagefright
    1 AndroidDownloadManager
    1 iTMS
    1 Player FM
    1 iCatcher!
---- --------------------
   37 total

Making the correction I mentioned above:

    7 Mozilla ←
    4 Pocket Casts
    3 Overcast
    3 PodcastAddict
    2 AppleCoreMedia ←
    2 Googlebot-Video
    2 stagefright
    2 okhttp
    2 Dalvik
    1 AndroidDownloadManager
    1 iCatcher!
    1 iTMS
    1 Player FM
---- --------------------
   32 total ←

The result shows that Pocket Casts is popular. I guess it’s a podcatcher. iCatcher! is the one I use. 🙂

I’m surprised that Mozilla is up there. When I looked at the details of the user agent strings, I noticed that they mostly belong to bots:

What the hell is Googlebot-Video doing, here? Is google offering audio search results somewhere? Using podcasts to train their AI overlords?

All the other user agents look like legitimate tools, frameworks, programming languages, libraries, etc.

That’s why I think that 32 people listened to the podcast episode 20, and a few probably didn’t listen to all of it.

​#Halberds and Helmets Podcast ​#Podcast ​#Administration

Comments

(Please contact me if you want to remove your comment.)

I use PodBean as my catcher, but I’m not sure what engine it uses for download.

– Shelby 2019-03-01 18:28 UTC

=> Shelby


Hey! I’m the guy using Overcast! Home, work, and mobile.

I absolutely love your podcast, and your content holistically. More soon?

– Tim McDowell 2019-06-04 07:28 UTC


Thanks! Maybe. I wanted to talk about Mass Combat but then last session the players opted not to use the rules so I’m a bit stumped. 😆

– Alex Schroeder 2019-06-04 17:52 UTC

Proxy Information
Original URL
gemini://alexschroeder.ch/2019-03-01_Podcast_Numbers
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
161.652413 milliseconds
Gemini-to-HTML Time
0.687246 milliseconds

This content has been proxied by September (3851b).