VF-1 updates and tips

I'm very happy to have had both the time and motivation to get quite a

bit of good progress made on my gopher client VF-1 recently. This

post will mostly be an update on some of the new functionality. But

first, some usage tips! I noticed recently that tfurrows has been

keeping a list of tips[1] on his circumlunar gopherhole (where he

explores the bookmarking functionality more deeply than I ever have!),

and thought I might contribute a few of my own.

First of all, an embarrassing revelation - when I recently wrote my

allegedly definitive guide[2] to viewing "long stuff", I forgot about

one option for handling long menus! There are so many options even I

forget them. That said, this one is not one of my favourites, it was

an early hack to help people who were struggling in earlier days

before "less" worked on menus. When you use "ls" to list the current

menu selectors, you can give it the "-r" option (i.e. "ls -r") to view

the listing in the reverse of the usual order. Thus, items at the top

which would normally go flying off the top of your screen appear at

the bottom where you can always see them. The obvious downside, of

course, is that stuff is backward. I don't really recommend this

approach, but thought I'd mention it for completeness.

Onto something a little more useful! You are probably already aware

that VF-1 lets you set any external command you like as a handler for

different kinds of content. By default, the handler for "text/plain"

is just good old "cat", which does nothing other than spit the text

onto your screen. If it overflows, you can run the "less" command to

look at it in your favourite pager. An alternative to this is to use

less as your text/plain handler, but feed it a few more options. For

the past few days I have been using "less -FXR %s" as my default plain

text handler. The -F option tells less to immediately quit if the

file is short enough that it fits entirely on one screen, and -X

option tells it not to clear the screen after exiting (as is the

default behaviour of more). What this does is basically turn less

into an automatic "cat if short, less if long" viewer. The -R is

just there so that ANSI colour codes don't get mangled (more on that

later). This means stuff never flies off the top of your screen, and

you never have to manually run less to read the top of something.

This results in a pretty seamless experience and I think I'll stick

with it.

Okay, time for new features.

Starting with something very minor, the "text/plain" handler is now

used for both item types 0 and 1, whereas previously it only worked

for type 0. This change was inspired by Tomasino who, when learning

about handlers, immediately set his to lolcat - something I'd never

heard of. I encourage you to check it out, even if only briefly for,

well, the lols. Basically you can pipe text through it and it uses

ANSI colour codes to render that text into a GLORIOUS RAINBOW. We're

talking hundreds of colours, each character slightly different from

it's neighbours. Tomasino was disappointed that this worked on

content but not on menus (which in his case means his entire phlog),

the handler is now applied to menus too so you can enjoy ubiquitous

rainbows in gopherspace. Tomasino was also disappointed that the

colours disappeared when he used the "less" command, because until

now that command ignored the text/plain handler and just fed the

content straight to less. Now, the "less" command runs your

text/plain handler and pipes the output of that to less (or rather,

less -R, to preserve colours), so you can get colours even when you

are lessing!

To more fundamental changes, Tomasino has once again spurred me to

make some improvements, in his recent championing of better support

for the "+" item type which is used to specify redundant severs, i.e.

gopher servers which host a mirror of the content at the current

server. The RFC is pretty vague about exactly how this is supposed

to work. Most modern clients take a very minimal approach to

supporting this, and just list the mirror items like they would any

other link but do something minor to indicated "hey, this is a

mirror". I think the intent was probably for clients to do a bit

more with this. The RFC has various comments in it which makes it

pretty clear (to me at least) that the target environment for gopher

was under-resourced university departments setting up servers on

whatever old and under-powered hardware they had lying around, and

spreading information over as many servers as possible to reduce

load. Early gopher servers were probably expected to fail regularly.

So VF-1 tries to handle + items in such a way as to reduce the pain

of servers. After seeing that content at server A is mirrored at

server B, if an attempt to fetch something from server A later

during the same VF-1 session results in any kind of network error,

VF-1 will automatically try to fetch the content from server B

instead. The usefulness of this in 2019 is arguably limited - for

one thing, modern gopher servers are probably extremely powerful and

extremely under-loaded compared to early servers, and for another

there is no caching of redundant servers, so if the "main" server

you attempt to visit is down, you have no way to learn what the

backups are. It's not perfect, but it's better than nothing, and

I'm proud that VF-1 actually makes an effort to do something with

this information.

Speaking of being proud, the other significant changes are related

to text decoding, and I suspect VF-1 might now be the best gopher

client in town for people who regularly visit content encoded in a

variety of non-UTF-8 forms. Tomasino had nothing to do with this

change, which was instead prompted by the latest user at

circumlunar.space, tengu[3], who had some initial problems serving

Russian text from his gopherhole there, whether using UTF-8 or

older Cyrillic encodings like KOI8-R or CP1251. With some digging,

it turned out that this was mostly the fault of Gophernicus, but

VF-1 could stand some improvement too.

In the earliest versions of VF-1, I assumed that all text coming

over the wire would be either ASCII or UTF-8 (which decode

identically) and left it at that. This worked fine for about a

week until someone on BBOARD reported that VF-1 died when trying

to read some news article over at floodgap's feeds. It turned out

that the article contained a name with an accented character in it,

which was encoded in ISO-8859-1. So, I did a bit of research,

learned that the 3 most commonly used encodings on the web are, in

order, UTF-8, ISO-8559-1 and CP1251. So, I updated VF-1 to try

these, in order, moving down the list each time one failed.

If you know anything about text encoding you'll recognise how

naive this was. Any text which is valid CP1251 is also valid

ISO-8559-1, so an attempt to decode as ISO-8559-1 will never fail.

It may result in gibberish, but it won't throw an exception, and

so CP1251 text will never be decoded properly. So, now VF-1

attempts to decode everything as UTF-8 first and, if that fails,

tries a single fallback encoding. That fallback defaults to

ISO-8559-1, but it is under direct and easy user control using the

"set" command, so you can do "set encoding cp1251" to change the

fallback. If you regularly deal with just one non-UTF-8 encoding,

you can of course stick this in your ~/.vf1rc file to make it

permanently.

But wait, there's more. There is a very nice Python library

called chardet which attempts to automatically detect text

encodings making use of language statistics. You can decode

CP1251 as if it were ISO-8559-1 no problem, but you'll end up with

gibberish text whose n-gram distribution won't match any natural

language. Chardet uses this fact to guess encodings and with a

little practice it seems to work quite well. Now, I am very proud

of the fact that VF-1 has no dependencies outside the Python

standard library and that all of the code is in one single file.

All of this makes it extremely easy to install, even in weird

environments where modern tools like pip are not available. I

don't ever want to change this, so VF-1 does not depend on

chardet. But, if you install it yourself, VF-1 will recognise that

it's there and adopt the alternative strategy of autodetecting

the encoding if UTF-8 fails, and will drop back to the

user-specified encoding only if chardet fails to identify an

encoding with confidence above 0.5. With chardet installed, I

was able to use VF-1 to cruise around some Russian gopher sites

tengu linked me to, and whether I encountered UTF-8, KOI8-R or

CP1251 encoding, it all Just Worked, which was tremendously

satisfying. VF-1+chardet seems bullet-proofly international,

which is fantastic.

As an aside, I was amused to note that the chardet FAQ[4] has the

following entry:

Yippie! Screw the standards, I'll just auto-detect everything!

Don't do that. Virtually every format and protocol contains

a method for specifying character encoding.

The FAQ goes on to talk about HTTP, HTML, XML, etc. Out here on

the plain text frontier, of course, there ain't any such thing

(well, maybe the /caps.txt hack does something about this,

hmm...), so I don't feel bad at all about auto-detecting

everything. It's pretty much the only choice we have.

For the record, this is not something that I think it is

worth extending gopher to work around. There is a much simpler

and nicer solution, which is simply to use UTF-8 for absolutely

all new content in gopherspace, so that there is no need to

explicitly specify the character encoding.

That's all that's new, aside from some tiny tidy ups and fixes.

There are a few other small things I'd like to tackle, but it's

starting to feel pretty complete for me.

[1] gopher://circumlunar.space:70/1/~tfurrows/tips/

[2] gopher://circumlunar.space:70/0/~solderpunk/phlog/looking-at-long-stuff-with-vf1.txt

[3] gopher://circumlunar.space:70/1/~tengu/

[4] https://chardet.readthedocs.io/en/latest/faq.html

Proxy Information

Original URL: gemini://zaibatsu.circumlunar.space/~solderpunk/phlog/vf1-updates-and-tips.txt
Status Code: Success (20)
Meta: text/plain; charset=utf-8
Capsule Response Time: 393.037516 milliseconds
Gemini-to-HTML Time: 3.492647 milliseconds

This content has been proxied by September (3851b).