Hey everyone! Hate AI web crawlers? Have some spare CPU cycles you want to use to punish them?
Meet Nepenthes!
https://zadzmo.org/code/nepenthes
This little guy runs nicely on low power hardware, and generates an infinite maze of what appear to be static files with no exit links. Web crawlers will merrily hop right in and just .... get stuck in there! Optional randomized delay to waste their time and conserve your CPU, optional markovbabble to poison large language models.
=> More informations about this toot | More toots from aaron@zadzmo.org
@aaron am wondering what it would take to swap the text with Rick Astley lyrics.
=> More informations about this toot | More toots from Workshopshed@mastodon.scot
@Workshopshed Trival. It starts with no corpus by design; you provide one and POST it intoa specific training input with curl.
=> More informations about this toot | More toots from aaron@zadzmo.org
@aaron @Workshopshed I wonder if there would be worse but more efficient algorithms to replace the probably very accurate Markov Chains you're using now...
=> More informations about this toot | More toots from mdione@en.osm.town
@mdione Markov chains are extremely simple - and thus, fast. The way I put this one together also trades increased corpus size for more speed. In Nepenthes it has a depth of two, which is rather incoherent but the fastest you'll get with realistic text. I consider that extra incoherence to be a positive thing in this use case.
It's slowed, however, by the fact the corpus it's stored in SQLite, and not RAM. This causes the bottleneck to be IO throughout to disk reads, somewhat mitigated by OS buffering if you have spare memory for it.
Holding the corpus entirely in memory is a thing I've done, but it both consumes a huge amount of RAM and requires retraining at every restart. @Workshopshed
=> More informations about this toot | More toots from aaron@zadzmo.org
@mdione I tried several different SQLite schemas with various amounts and ways of normalization, and succeeded in reducing table or index sizes or simplify query plans - but the current dead simple basic one in use won every time, often by huge margins. I tried LightningMDB - it's performance is truly exceptional. But ultimately, it was half as fast, because there's not a way to represent the Markov corpus purely in key-value pairs. I got it to work by serializing a Lua table; that step completely swamped all performance gains and then some.
Feel free to try to find something faster. I'll be impressed if you do :)
=> More informations about this toot | More toots from aaron@zadzmo.org
@aaron thanks for all the details. I keep asking myself if we shouldn't document failures more...
=> More informations about this toot | More toots from mdione@en.osm.town This content has been proxied by September (ba2dc).Proxy Information
text/gemini