"AI haters…"
https://arstechnica.com/tech-policy/2025/01/ai-haters-build-tarpits-to-trap-and-trick-ai-scrapers-that-ignore-robots-txt/
I didn't even read the whole headline of this article and I like it already. Mentions @tante and @algernon, gibberish-serving (and AI-poisoning?) Nepenthes (calling it malware? Oh no!!) …
All I'll say is that I fear something like Nepenthes will just tie up too many resources of mine (RAM, file handles, bandwidth). "CO₂ for the CO₂ god!" But perhaps I'm wrong? I should investigate.
=> More informations about this toot | More toots from alex@social.alexschroeder.ch
@alex The DeepSeek news of the last month or so makes it clear that LLMs’ profligate use of computation isn’t a law of nature.
I’d really like to know if this portends a diminished appetite for web crawling.🤨
@tante @algernon
=> More informations about this toot | More toots from babelcarp@social.tchncs.de
@babelcarp @alex @algernon DeepSeek is a distillation of larger models. Their finetuning was comparatively cheap but they still needed the huge model as base
=> More informations about this toot | More toots from tante@tldr.nettime.org
@tante If DeepSeek’s model is totally parasitic—to use a possibly unjust word—then maybe they did no crawling whatsoever?
@alex @algernon
=> More informations about this toot | More toots from babelcarp@social.tchncs.de
@babelcarp @tante I wouldn't be surprised if the Chineese IPs @alex was seeing were DeepSeek. We just didn't know then.
Myself, I didn't see a significant number of Chineese IPs, but I only started digging into logs a few weeks ago, they may have stopped (or paused) crawling by then. Or they just didn't crawl my sites.
=> More informations about this toot | More toots from algernon@come-from.mad-scientist.club This content has been proxied by September (3851b).Proxy Information
text/gemini