2024-10-24 - Some more GitLab issues closed

I have closed another three issues at the old GitLab repositories. One of these closures actually could have happened at the same time as the 11 I closed last month, because no actual changes to the specification or decisions were necessary. The other two are the result of me deciding not to make changes to the specification on the relevant issues. So these are just "declarations" which don't obligate users or developers to do or stop doing anything. The closure of these three issues reduces the open issue count from 25 to 22.

The one I should have closed earlier was G04, "URL vs URI for link lines". In my August 15th post where I sorted all open issues into categories, I accidentally placed this one in the "Internationalisation questions" category along with P01 and G08, but I shouldn't have because it doesn't concern internationslisation at all.

The URL vs URI distinction is not really a difficult one, but it's widely misunderstood and misused. Suffice it to say, they are not the same thing, neither term has ever been officially deprecated, and the distinction between them is in fact meaningful and important. All URLs are URIs, but not all URIs are URLs, it's as simple as that. I hold any statement contrary to the above to be nonsense. Here is the ground truth, straight from RFC3986 section 1.1.3:

The term "Uniform Resource Locator" (URL) refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network "location")

The original Gemini specification spoke deliberately about URLs in link lines because, heck, that's what they were supposed to be. They are link lines, after all. You are supposed to be able to "click on them" (or whatever UI paradigm your client uses) and follow them somewhere. How on Earth can a client do that if what comes after the => does not describe the primary access mechanism for something? The overwhelming majority of link lines in Geminispace do in fact use URLs, like gemini://, gopher://, http(s):// and ftp://. Those are about the only things I ever imagined anybody using in a link line.

So why broaden the allowed to scope to all URIs, which the new version of the specification does? Well, there are unobjectionable URIs that it seems a shame to prohibit, such as mailto: and magnet:// (which works, after all, much like a URL, in that it can be used to trigger the act of downloading something from the internet). Are there objectionable URIs we certainly wouldn't want? Well, for me personally, I very strongly dislike the use of data:// URIs in Gemini link lines (RFC2397 actually defines data:// as a URL scheme, not a URI scheme, but well, they can call it that until the cows come home, the actual definitions of URI and URL obviously take precedence and it sure as heck isn't a URL. I suspect this is maybe a historical matter, with RFC2397 predating the current definitions of the term). But I am reluctant to explicitly prohibit a single specific URI by name. To be principled about it, I'd need to actually audit all 350 IRI schemes currently registered with IANA and sort them into "allowed" and "forbidden" categories. That's a Herculean task for very little reward, and of course new schemes will be registered in the future and I don't want the Gemini spec to need to be constantly updated to keep step. Practically speaking, it's either all or nothing.

There's also the consideration of the still open URI vs IRI issue. To my bafflement, RFC3987 defines an Internationalized Resource Identifiers (IRI) but does not define an IRL or an IRN. Every sane person who has read both RFC3986 and RFC3987 understands perfectly well what IRLs and IRNs "are", yet they are not actually officially defined terms which one can refer to. This means that choosing to die on the hill of "locators, not identifiers!" necessarily involves saying "no" to internationalisation as well, and that seems like an extremely bad basis on which to make that decision.

So, for all these reasons, URIs and not URLs it is. There is another open issue (G16) which suggests an upper limit on the length of URIs in link lines to restrict the worst abuses of data://, and I think that's a good idea and will probably implement it in the next spec update. gemini:// URLs already have a maximum length, enforced by the maximum request header length, and most web browsers have a maximum length, too, so there's no compelling need to leave it completely open ended.

The two issues I closed due to making decisions were both in the network transport specification repository.

P18 suggested changing the scope of client certificates to simply the hostname from which the request for the certificate originates, instead of the current behaviour where scope is bound to the URL path the request originates from and any paths below it. The proposed change would make it impossible to e.g. host multiple distinct applications with separate user databases under the same hostname or to have multiple distinct users at a pubnix or other shared hosting service operate their own separate "friends only" gemlogs which are secured by client certificiates. I see those as perfectly valid use cases and I wouldn't want to rule them out without a compelling argument, which IMHO the issue fails to provide.

P17 suggested removing the status code 11 ("sensitive input") from the specification. This issue spawned a surprising amount of discussion! Some fair points were made, and the next update to the spec will probably include some brief discussion of the security issues surrounding their use. However, it's also true that the overwhelming majority of the discussion in that thread took place under the assumption that the purpose of status 11 was to facilitate the use of passwords for authentication in a completely different way to the one I had in mind when proposing the status (which dates back to very early versions of Gemini where there was an explicit distinction between transient and persistent client certificates). There are now concrete examples of the intended paradigm (where passwords are used in conjunction with client certificates rather than in their place), complete with security warnings, in the recently published Gemini app developer's guide. Between this and the many widely known and used examples of authenticated user applications which do not make use of the maximally insecure paradigm which was of concern, I hope that the risk of widespread misuse of this feature is ameliorated.

Proxy Information

Original URL: gemini://geminiprotocol.net/news/2024_10_24.gmi
Status Code: Success (20)
Meta: text/gemini
Capsule Response Time: 281.446161 milliseconds
Gemini-to-HTML Time: 0.432784 milliseconds

This content has been proxied by September (3851b).