Ancestors

Toot

Written by Aaron on 2025-01-14 at 22:58

Hey everyone! Hate AI web crawlers? Have some spare CPU cycles you want to use to punish them?

Meet Nepenthes!

https://zadzmo.org/code/nepenthes

This little guy runs nicely on low power hardware, and generates an infinite maze of what appear to be static files with no exit links. Web crawlers will merrily hop right in and just .... get stuck in there! Optional randomized delay to waste their time and conserve your CPU, optional markovbabble to poison large language models.

=> More informations about this toot | More toots from aaron@zadzmo.org

Descendants

Written by Aaron on 2025-01-14 at 23:11

The software gets it's name from a genus of carnivorous pitcher plants with a climbing vine-like growth habit and often vividly colored traps to consume insects. They were popular in Victorian times as decoration in greenhouse.

He's one of mine, a hybrid cultivar. (This is almost #bloomscrolling )

=> View attached media | View attached media

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by Aaron on 2025-01-14 at 23:20

If you decide to run this software, please let me know the instance URL. Partially for my own curiosity, but possible future plans might be having different instances coordinate with each other.

All feedback in general is welcome! Feel to reach out for assistance as well. @ me publicly, DM, or email me, my primary address is also my Fediverse ID.

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by Aaron on 2025-01-15 at 01:50

Whelp, this blew past my most viral post so far, one which took three days, in about three hours.

It's almost like people hate this AI shit.

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by Aaron on 2025-01-15 at 17:26

It mentions AI, and number (boost count) go up every time I look at it.

Rich investors give me a shitload of money? I promise, Buckminster Fuller style, it will be lost with none returned to you.

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by yetzt on 2025-01-15 at 08:49

@aaron so not from the star trek planet where riker and troi settled with their kids or the nanotech oil used to service driods in star wars.

=> More informations about this toot | More toots from yetzt@yetzt.me

Written by Aaron on 2025-01-15 at 08:55

@yetzt Not lore I'm familiar with, lol. I'm just (also) a plant nerd and it seemed really fitting.

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by Earthshine on 2025-01-15 at 17:57

@aaron @yetzt Big Prax Energy

=> View attached media

=> More informations about this toot | More toots from earthshine@hackers.town

Written by saxnot on 2025-01-16 at 00:47

@aaron ah i know that one

the cat is kicking it for play

poor plant

=> More informations about this toot | More toots from saxnot@chaos.social

Written by Irenes (many) on 2025-01-16 at 10:49

@aaron what a gorgeous plant :)

=> More informations about this toot | More toots from ireneista@irenes.space

Written by My camera shoots fascists on 2025-01-17 at 04:55

@aaron

Dunno about this species, but many pitcher plants have downward facing hair likes spines that make it impossible for its insect pray to back out. The bug just keeps going deeper and deeper until it lands in the little pool of the plant's digestive juices at the bottom. Truly devilish.

=> More informations about this toot | More toots from Mikal@sfba.social

Written by Aaron on 2025-01-17 at 05:05

@Mikal I don't recall hairs on the inside of any of my nepenthes plants, but, I've also not looked inside the pitchers in detail. My favorite detail is how the upper and lower pitchers vary in color.

Carnivores are really cool in general! I love sarracenia and darlingtonia quite a bit too but never acquired any.

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by My camera shoots fascists on 2025-01-17 at 05:56

@aaron

I've seen lots of Darlingtonia in the wild in Northern California and southern Oregon.

=> More informations about this toot | More toots from Mikal@sfba.social

Written by Aaron on 2025-01-17 at 14:49

@Mikal That's awesome. I'd love to get out that way. I spent a weekend in Seattle but never left the city to see any of the real ecology.

Exploring the west coast has been on my list for a long time now, but it's such a long way to go.

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by sb :q! on 2025-01-14 at 23:55

@aaron

This sounds like a lot of fun! I have a 56 core blade sitting around here somewhere...

=> More informations about this toot | More toots from sb@metroholografix.ca

Written by Aaron on 2025-01-14 at 23:59

@sb Oh let's. Fucking. GO! Sadly it can't saturate more than one CPU yet.

Yet.

....

I just posted a list of other projects I want to make headway on this year, and bam, now I see I missed one. Was halfway through something that'd easily max out that blade!

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by sb :q! on 2025-01-15 at 00:17

@aaron

I've just be working with multithreaded http requests in #python. Alas my #lua is limited or I'd offer to help make it so.

It would really be fun to set that thing up with a handful of VMs, each running some of these projects and collect some data.

=> More informations about this toot | More toots from sb@metroholografix.ca

Written by Aaron on 2025-01-15 at 00:37

@sb Nepenthes already aggregates statistics of IP and User-agent info, and it is indeed interesting to pick through.

Google is the only crawler smart enough to escape - but it keeps coming back eventually.

Facebook is the only one that seems to use IPv6.

There's a lot of shadow crawlers, with fake Chrome user agents to remain hidden, mostly in China.

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by okanogen VerminEnemyFromWithin on 2025-01-15 at 23:41

@aaron @sb

50 some virtual machines fed by a proxy server?

=> More informations about this toot | More toots from Okanogen@mastodon.social

Written by Aaron on 2025-01-15 at 23:42

@Okanogen That would work for now. A lot of administrator overhead though.

@sb

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by Marcos Dione on 2025-01-15 at 16:32

@sb @aaron do you? can I send you an SSD with OSM data for rendering maps? :-P

=> More informations about this toot | More toots from mdione@en.osm.town

Written by Oceanotter :verified_paw: on 2025-01-15 at 00:25

@aaron @bersl2 this is like barrier mazes in ghost in the shell

=> More informations about this toot | More toots from oceanotter@meow.social

Written by Aaron on 2025-01-15 at 00:38

@oceanotter @bersl2 LOL never thought of it that way, but yes!

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by algernon ludd on 2025-01-15 at 00:58

@aaron this is amazing! I used to serve the bee movie to ai bots, now I'll use this tool to trap them in an endless pit of markov garbage trained on the bee movie script. Thank you!

=> More informations about this toot | More toots from algernon@come-from.mad-scientist.club

Written by tired blip on 2025-01-15 at 07:14

@algernon @aaron This is a whole different level of those "the bee movie, but every time X, it gets slower" memes from years back :)

=> More informations about this toot | More toots from klardotsh@merveilles.town

Written by Aaron on 2025-01-15 at 07:27

@klardotsh @algernon @bees bzzzzzzz

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by millennial falcon on 2025-01-15 at 01:42

@aaron magNIFICO

=> More informations about this toot | More toots from falcennial@mastodon.social

Written by mkj on 2025-01-15 at 08:44

@aaron What, people on Fedi hate AI? No way. Everyone here absolutely loooooves playing games with generative AI.

=> More informations about this toot | More toots from mkj@social.mkj.earth

Written by Richard Webb on 2025-01-15 at 08:59

@aaron A site I support and contribute to is the n a near permanent state of near DoS because of these things. It keeps going by shutting down parts of the site when stressed

=> More informations about this toot | More toots from Fasgadh@mastodon.scot

Written by Aaron on 2025-01-15 at 14:32

@Fasgadh That is one use case I hope to support. It could easily be plugged into fail2ban or blocklistd but that hasn't happened yet.

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by fedops 💙💛 on 2025-01-15 at 11:43

@aaron yes and we thank you for building this. Awesome stuff! 🙏

=> More informations about this toot | More toots from fedops@fosstodon.org

Written by Teknikal_Domain on 2025-01-15 at 03:48

@aaron should call it the ashtray...

=> More informations about this toot | More toots from tek_dmn@mastodon.tekdmn.me

Written by Aaron on 2025-01-15 at 03:51

@tek_dmn I am proud of my work and named it after beautiful living things that help me clean up my kitchen when I've forgotten to take out the trash for too long and get a drain fly infestation.

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by hanno on 2025-01-15 at 08:52

@aaron

I think that is a reference to the "Ashtray Maze" in the video game Control. It is a pretty climactic scene in the game, so it would be a positive reference.

@tek_dmn

=> More informations about this toot | More toots from hanno@fosstodon.org

Written by Teknikal_Domain on 2025-01-15 at 10:59

@hanno @aaron that's exactly what it is!

Is Control actually that unknown?

=> More informations about this toot | More toots from tek_dmn@mastodon.tekdmn.me

Written by Aaron on 2025-01-15 at 14:44

@tek_dmn It's definitely unknown to me. Excluding the once every few years OpenTTD kick I haven't really been a gamer in decades; I think the last thing I really completed was the first Max Payne in 2001. @hanno

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by Jay Thoden van Velzen ☁️​🛡️​:lolsob: on 2025-01-15 at 04:22

@aaron this is brilliant. thank you!

=> More informations about this toot | More toots from jaythvv@infosec.exchange

Written by Maoulkavien on 2025-01-15 at 07:33

@aaron a simple robots.txt might help differentiate between search engine crawlers and LLM crawlers, the latter often not even bothering reading said file.

So it might be possible to let robots know there is nothing worth reading here, and let robots that don't care get lost indefinitely :)

=> More informations about this toot | More toots from Maoulkavien

Written by Aaron on 2025-01-15 at 07:45

@Maoulkavien Google and Microsoft both run search engines - most alternative search engines are ultimately just front ends for Bing - and both are investing heavily in AI if not outright training their own models. There is absolutely nothing preventing Google from putting it's search corpus into the LLM, in fact it's significantly more efficient than crawling the web twice.

Which is why, top of the project's web page, I place a clear warning that this WILL tank your search results.

Or sure, you could use robots.txt to give a warning to one of the biggest AI players where you placed your defensive minefield. Up to you.

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by Maoulkavien on 2025-01-15 at 07:54

@aaron Yeah that makes sense. Just sayin' there could be a slightly less aggressive approach that would not tank search results and punish only those not following standard implementations for how crawlers should behave.

This could be deployed alongside a "real" running website which would still tank/poison many LLMs in the long run.

Thanks for the tool though, I'll try and find some time to deploy it somewhere of mine 👍

=> More informations about this toot | More toots from Maoulkavien

Written by su_liam on 2025-01-15 at 16:45

@aaron @Maoulkavien Well, put robots.txt everywhere you don’t want them crawling. The traps remind them that maybe other people exist and have rights.

=> More informations about this toot | More toots from su_liam@mas.to

Written by Midgard on 2025-01-16 at 10:58

@aaron But this will also trap genuine search engines not using the web as LLM fodder... You could write a robots.txt that lets Googlebot through while allowing others.

https://seirdy.one/posts/2021/03/10/search-engines-with-own-indexes/

=> More informations about this toot | More toots from midgard@framapiaf.org

Written by Aaron on 2025-01-16 at 16:07

@midgard I find it difficult to believe that Google wouldn't using their existing index of the web as training material for their LLM projects.

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by Midgard on 2025-01-16 at 18:12

@aaron I meant: let Googlebot through, into the endless maze. Tell other search engines not to enter. Google respects robots.txt, but many AI scrapers disregard it, or so I read.

=> More informations about this toot | More toots from midgard@framapiaf.org

Written by Aaron on 2025-01-16 at 19:01

@midgard This makes sense now. Response has been so overwhelming it has been difficult to keep track of context in individual threads.

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by su_liam on 2025-01-15 at 16:42

@Maoulkavien @aaron The occasional trap to incentivize good behavior. Don’t ignore the owner’s rights or be punished. Give robots.txt some sharp teeth…

=> More informations about this toot | More toots from su_liam@mas.to

Written by corbẏn on 2025-01-15 at 07:44

@aaron Doesn't this worsen the AI crawlers energy and carbon footprint instead of dropping their connection?

=> More informations about this toot | More toots from a_corbin@mas.to

Written by Aaron on 2025-01-15 at 07:59

@a_corbin Few to zero reasonable humans will click more than a dozen links inside the tarpit - so the gathered hit statistics can be used to aggregate a block list for dropping connections. That's a valid use case I intend to support.

Another major feature is the connection delay. I've kept crawlers waiting upwards of an entire minute for a single page to load - an entire minute during which they've could have slurped down dozens of real pages elsewhere on the internet. This really hurts them.

Lastly, and I admit this is ugly and cold blooded, I see this as a war. War by definition is a waste of resources on both sides: I'm burning CPU time I've paid for to send them literal shit, in hopes the poisoning of their models costs them exponentially more than it costs me, hoping to push them into bankruptcy faster.

Because this is a bubble. It will eventually pop. It's simply too expensive for the debatable benefits viewed from any angle. The best thing for the planet is to pop the bubble ASAP and that's what I'm trying to speed up, fully aware it may hurt the planet somewhat more in the short term.

You are welcome to disagree with my calculus.

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by cuan_knaggs on 2025-01-15 at 11:14

@aaron @a_corbin another option i've been contemplating is to give them 301 redirects to their own services

=> More informations about this toot | More toots from mensrea@freeradical.zone

Written by Aaron on 2025-01-15 at 14:29

@mensrea Oooof. I like that!

I've also considered various gzip bombs and an infinite chain of 302 redirects. Might still implement those one day.

@a_corbin

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by mkj on 2025-01-15 at 08:43

@a_corbin AI is still gonna AI; that's what AI does. But poisoning the dataset makes the resulting service less useful for people, even if only marginally, who are therefore less likely to pay for it. Even VCs presumably look at number of paying customers and active users, and Line Must Go Up.

So (without having looked closely) it seems like the Markov chain generator causes this to come at a short-term cost to discourage future use and reducing providers' financial incentives.

@aaron

=> More informations about this toot | More toots from mkj@social.mkj.earth

Written by Workshopshed on 2025-01-15 at 08:12

@aaron am wondering what it would take to swap the text with Rick Astley lyrics.

=> More informations about this toot | More toots from Workshopshed@mastodon.scot

Written by Aaron on 2025-01-15 at 08:30

@Workshopshed Trival. It starts with no corpus by design; you provide one and POST it intoa specific training input with curl.

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by Marcos Dione on 2025-01-16 at 08:21

@aaron @Workshopshed I wonder if there would be worse but more efficient algorithms to replace the probably very accurate Markov Chains you're using now...

=> More informations about this toot | More toots from mdione@en.osm.town

Written by Aaron on 2025-01-16 at 08:31

@mdione Markov chains are extremely simple - and thus, fast. The way I put this one together also trades increased corpus size for more speed. In Nepenthes it has a depth of two, which is rather incoherent but the fastest you'll get with realistic text. I consider that extra incoherence to be a positive thing in this use case.

It's slowed, however, by the fact the corpus it's stored in SQLite, and not RAM. This causes the bottleneck to be IO throughout to disk reads, somewhat mitigated by OS buffering if you have spare memory for it.

Holding the corpus entirely in memory is a thing I've done, but it both consumes a huge amount of RAM and requires retraining at every restart. @Workshopshed

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by Aaron on 2025-01-16 at 08:52

@mdione I tried several different SQLite schemas with various amounts and ways of normalization, and succeeded in reducing table or index sizes or simplify query plans - but the current dead simple basic one in use won every time, often by huge margins. I tried LightningMDB - it's performance is truly exceptional. But ultimately, it was half as fast, because there's not a way to represent the Markov corpus purely in key-value pairs. I got it to work by serializing a Lua table; that step completely swamped all performance gains and then some.

Feel free to try to find something faster. I'll be impressed if you do :)

=> More informations about this toot | More toots from aaron@zadzmo.org

Written by Marcos Dione on 2025-01-16 at 09:12

@aaron thanks for all the details. I keep asking myself if we shouldn't document failures more...

=> More informations about this toot | More toots from mdione@en.osm.town

Written by Simon Michalke on 2025-01-15 at 08:26

@aaron

fuck yes, I have been waiting for this.

=> More informations about this toot | More toots from simon_m@infosec.exchange

Written by Simon Michalke on 2025-01-15 at 08:31

@aaron

So lets say I deny access to this tool via robots.txt to not deter good crawlers? Would that work?

I am thinking about deploying this on actual production websites.

=> More informations about this toot | More toots from simon_m@infosec.exchange

Written by Aaron on 2025-01-15 at 08:39

@simon_m You could if you want to. But at that point you'll be forcing respect for robots.txt and not directly harming AI as a whole.

I would strongly advise against putting this in a production site right away. Try it on something less critical and see what happens to your CPU load and bandwidth consumption first - keeping in mind it may take a while for crawlers to locate it.

=> More informations about this toot | More toots from aaron@zadzmo.org

Proxy Information
Original URL
gemini://mastogem.picasoft.net/thread/113829186026040387
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
608.822768 milliseconds
Gemini-to-HTML Time
23.471271 milliseconds

This content has been proxied by September (ba2dc).