Ancestors

Written by Aaron on 2025-01-14 at 22:58

KICKS DOOR DOWN*

Hey everyone! Hate AI web crawlers? Have some spare CPU cycles you want to use to punish them?

Meet Nepenthes!

https://zadzmo.org/code/nepenthes

This little guy runs nicely on low power hardware, and generates an infinite maze of what appear to be static files with no exit links. Web crawlers will merrily hop right in and just .... get stuck in there! Optional randomized delay to waste their time and conserve your CPU, optional markovbabble to poison large language models.

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by Workshopshed on 2025-01-15 at 08:12

@aaron am wondering what it would take to swap the text with Rick Astley lyrics.

=> More informations about this toot | More toots from Workshopshed@mastodon.scot

Written by Aaron on 2025-01-15 at 08:30

@Workshopshed Trival. It starts with no corpus by design; you provide one and POST it intoa specific training input with curl.

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by Marcos Dione on 2025-01-16 at 08:21

@aaron @Workshopshed I wonder if there would be worse but more efficient algorithms to replace the probably very accurate Markov Chains you're using now...

=> More informations about this toot | More toots from mdione@en.osm.town

Toot

Written by Aaron on 2025-01-16 at 08:31

@mdione Markov chains are extremely simple - and thus, fast. The way I put this one together also trades increased corpus size for more speed. In Nepenthes it has a depth of two, which is rather incoherent but the fastest you'll get with realistic text. I consider that extra incoherence to be a positive thing in this use case.

It's slowed, however, by the fact the corpus it's stored in SQLite, and not RAM. This causes the bottleneck to be IO throughout to disk reads, somewhat mitigated by OS buffering if you have spare memory for it.

Holding the corpus entirely in memory is a thing I've done, but it both consumes a huge amount of RAM and requires retraining at every restart. @Workshopshed

=> More informations about this toot | More toots from aaron@zadzmo.org

Descendants

Written by Aaron on 2025-01-16 at 08:52

@mdione I tried several different SQLite schemas with various amounts and ways of normalization, and succeeded in reducing table or index sizes or simplify query plans - but the current dead simple basic one in use won every time, often by huge margins. I tried LightningMDB - it's performance is truly exceptional. But ultimately, it was half as fast, because there's not a way to represent the Markov corpus purely in key-value pairs. I got it to work by serializing a Lua table; that step completely swamped all performance gains and then some.

Feel free to try to find something faster. I'll be impressed if you do :)

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by Marcos Dione on 2025-01-16 at 09:12

@aaron thanks for all the details. I keep asking myself if we shouldn't document failures more...

=> More informations about this toot | More toots from mdione@en.osm.town

Proxy Information

Original URL: gemini://mastogem.picasoft.net/thread/113837102367427549
Status Code: Success (20)
Meta: text/gemini
Capsule Response Time: 772.259307 milliseconds
Gemini-to-HTML Time: 1.184411 milliseconds

This content has been proxied by September (ba2dc).