Ancestors

Written by Alex Schroeder on 2025-01-29 at 13:39

"AI haters…"

https://arstechnica.com/tech-policy/2025/01/ai-haters-build-tarpits-to-trap-and-trick-ai-scrapers-that-ignore-robots-txt/

I didn't even read the whole headline of this article and I like it already. Mentions @tante and @algernon, gibberish-serving (and AI-poisoning?) Nepenthes (calling it malware? Oh no!!) …

All I'll say is that I fear something like Nepenthes will just tie up too many resources of mine (RAM, file handles, bandwidth). "CO₂ for the CO₂ god!" But perhaps I'm wrong? I should investigate.

=> More informations about this toot | More toots from alex@social.alexschroeder.ch

Written by Lew Perin on 2025-01-29 at 15:48

@alex The DeepSeek news of the last month or so makes it clear that LLMs’ profligate use of computation isn’t a law of nature.

I’d really like to know if this portends a diminished appetite for web crawling.🤨

@tante @algernon

=> More informations about this toot | More toots from babelcarp@social.tchncs.de

Written by tante on 2025-01-29 at 15:53

@babelcarp @alex @algernon DeepSeek is a distillation of larger models. Their finetuning was comparatively cheap but they still needed the huge model as base

=> More informations about this toot | More toots from tante@tldr.nettime.org

Toot

Written by Lew Perin on 2025-01-29 at 21:54

@tante If DeepSeek’s model is totally parasitic—to use a possibly unjust word—then maybe they did no crawling whatsoever?

@alex @algernon

=> More informations about this toot | More toots from babelcarp@social.tchncs.de

Descendants

Written by algernon ludd on 2025-01-29 at 21:57

@babelcarp @tante I wouldn't be surprised if the Chineese IPs @alex was seeing were DeepSeek. We just didn't know then.

Myself, I didn't see a significant number of Chineese IPs, but I only started digging into logs a few weeks ago, they may have stopped (or paused) crawling by then. Or they just didn't crawl my sites.

=> More informations about this toot | More toots from algernon@come-from.mad-scientist.club

Proxy Information
Original URL
gemini://mastogem.picasoft.net/thread/113913867412325258
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
461.818943 milliseconds
Gemini-to-HTML Time
1.220996 milliseconds

This content has been proxied by September (3851b).