LLM training bots crawling everything, including every change log entry in a Mediawiki, and doing it multiple times as well.
https://pod.geraspora.de/posts/17342163
=> More informations about this toot | More toots from despens@post.lurk.org
@despens The web is not dead! It’s massively visited by LLM bots! Yeeeaaaay!
=> More informations about this toot | More toots from raphael@post.lurk.org
@despens with the output of all of these large models being rather unpredictable, it's almost surprising that their input is so consistently gathered by appropriating other people's work without any regard for the consequences, be it for indie hosting or click workers
=> More informations about this toot | More toots from computersandblues@post.lurk.org
@despens wish he said more about his robots.txt -- I thought some of those bots were supposed to be checking that?
=> More informations about this toot | More toots from edsu@social.coop
@edsu …yeah they should. My experience is that a server I manage with a lot of material on it is constantly stressed by AI bots that do not reveal themselves in the user agent. Instead it is weird old browsers like Internet Explorer 9 or whatever, which is not a good blocking indicator because that might be true given the emulation framework I'm also using… Overall that server needed an additional 2GB of RAM to not constantly fail.
=> More informations about this toot | More toots from despens@post.lurk.org This content has been proxied by September (ba2dc).Proxy Information
text/gemini