Loopy links 🔄

One thing I found by looking at where my #hashtag crawler went was that some people have a bottomless pit of links.

The first one I saw was someone exposing a repository of the site content. That included a link to the content itself. Not a link to the actual site, but to the copy of it in the repo. That had a repo link, where you could find a site link, and so on. I spotted this when it got to several levels of site/repo/site/repo/site/repo and told the crawler to give up. I'm mildy curious how deep that could go. I suppose it's limited by the maximum length of a gemini request (assuming that either the server or the client respected that limit).

There's another one where someone has a public bookmarking system which is paginated. The first page has a link like /bookmarks?2 that goes to the next page, and so on. The interesting part is that this link is there even if there are no more bookmarks. I tried /bookmarks?9999999999999999999 and it broke. One less 9 was ok. I told my crawler to give up on those too.

Having read Sean Connor's experiences with the crawlers that won't give up on an infinite redirect loop, I think that some crawlers are probably probing the limits of those two capsules.

=> #hashtags | #crawler

=> back to gemlog

Proxy Information

Original URL: gemini://freeshell.de/gemlog/2022-05-15_Loopy_links.gmi
Status Code: Success (20)
Meta: text/gemini;lang=en-GB
Capsule Response Time: 103.139149 milliseconds
Gemini-to-HTML Time: 0.400668 milliseconds

This content has been proxied by September (3851b).