Martin Chang replied [1] to my post about Gemini crawlers [2], saying that it was his crawler that had sent links like gemini://gemini.conman.org/boston/2008/04/30/2008/04/30.1 and decided to look into the issue. Well, he did, and he found it wasn't his issue, but mine.
Oh my.
Okay, so how did I end up generating links like gemini://gemini.conman.org/boston/2008/04/30/2008/04/30.1?
This is, first and foremost, a blog on the web. Each entry is stored as HTML (HyperText Markup Language), and when a request is made via gopher [3] or Gemini [4], the entries making up the request are retrieved and converted to the appropriate format [5]. As part of that conversion, links to the blog itself have to be translated appropriately, and that's where the error happened.
So, for example, the links for the above entry are collected:
Those links with a URL (Uniform Resource Locator) scheme are passed through as is, but #4 is special, not only is it a relative link to my blog, but it also contains a URL fragment, and that's where things went pear-shaped. The code to do the URL translations parsed each link as a URL, but for relative links, I used the string, not the parsed URL structure. As such, the code didn't work so well with URL fragments, and thus, I ended up with links like gemini://gemini.conman.org/boston/2008/04/30/2008/04/30.1 (for the record, the same bug was in the gopher translation code as well).
The fix, as for most bugs, was easy once the core issue was identified. The other issues I talked about are, as far as I can tell, not stuff I can fix.
=> [1] gemini://gemini.clehaxze.tw/gemlog/2022/04-22-re-my-common-gemini-crawler-pitfalls.gmi | [2] /boston/2022/04/16.1 | [3] gopher://gopher.conman.org:70/1Phlog: | [4] gemini://gemini.conman.org/boston/ | [5] /boston/2021/12/06.2
=> Gemini Mention this post | Contact the author This content has been proxied by September (ba2dc).Proxy Information
text/gemini