Ancestors

Toot

Written by Dragan Espenschied on 2024-12-30 at 17:44

LLM training bots crawling everything, including every change log entry in a Mediawiki, and doing it multiple times as well.

https://pod.geraspora.de/posts/17342163

=> More informations about this toot | More toots from despens@post.lurk.org

Descendants

Written by Raphaël Bastide on 2024-12-30 at 18:43

@despens The web is not dead! It’s massively visited by LLM bots! Yeeeaaaay!

=> More informations about this toot | More toots from raphael@post.lurk.org

Written by * on 2024-12-30 at 19:17

@despens with the output of all of these large models being rather unpredictable, it's almost surprising that their input is so consistently gathered by appropriating other people's work without any regard for the consequences, be it for indie hosting or click workers

=> More informations about this toot | More toots from computersandblues@post.lurk.org

Written by Ed Summers on 2024-12-31 at 07:19

@despens wish he said more about his robots.txt -- I thought some of those bots were supposed to be checking that?

=> More informations about this toot | More toots from edsu@social.coop

Written by Dragan Espenschied on 2025-01-10 at 06:01

@edsu …yeah they should. My experience is that a server I manage with a lot of material on it is constantly stressed by AI bots that do not reveal themselves in the user agent. Instead it is weird old browsers like Internet Explorer 9 or whatever, which is not a good blocking indicator because that might be true given the emulation framework I'm also using… Overall that server needed an additional 2GB of RAM to not constantly fail.

=> More informations about this toot | More toots from despens@post.lurk.org

Proxy Information
Original URL
gemini://mastogem.picasoft.net/thread/113743015477866981
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
272.393725 milliseconds
Gemini-to-HTML Time
1.425147 milliseconds

This content has been proxied by September (ba2dc).