Ancestors

Toot

Written by Clive Thompson on 2025-01-23 at 23:31

Via the Algorithmic Sabotage Research Group (@asrg) ...

... here's a list of code designed to poison the well for AI web-scrapers

https://tldr.nettime.org/@asrg/113867412641585520

(thanks to @peterfr for pointing this one out!)

=> View attached media

=> More informations about this toot | More toots from clive@saturation.social

Descendants

Written by peterfr on 2025-01-23 at 23:41

@clive @asrg

That heading – Sabot in the Age of AI – is 👌🏽

=> More informations about this toot | More toots from peterfr@mastodon.art

Written by Clive Thompson on 2025-01-23 at 23:42

@peterfr @asrg

heh heh yes

=> More informations about this toot | More toots from clive@saturation.social

Written by Marisa on 2025-01-23 at 23:49

@clive @asrg @peterfr not sure what this means but sounds helpful!

=> More informations about this toot | More toots from marisa@mastodon.scot

Written by Clive Thompson on 2025-01-24 at 00:45

@marisa @asrg @peterfr

basically, one key way that companies like OpenAI train their language AI is by using "web crawler" software that roams around online, copying the text off web sites ("web scraping", as it's called) so they can have a consistently refreshed pile o' text for training their AI

you need lots of freshly written human words to train an AI -- and people are constantly writing stuff on their sites!

So what these tools do is ...

=> More informations about this toot | More toots from clive@saturation.social

Written by Clive Thompson on 2025-01-24 at 00:47

@marisa @asrg @peterfr

... attempt to detect if an OpenAI web-crawler is trying to copy the text off a web site ...

... and if so, it generates fake pages with crap text -- which the OpenAI web crawler assumes are real, and thus dutifully copies

So OpenAI winds up feeding junk/fake/mangled text as training material into its next version of ChatGPT

The attitude is: "So, you wanna copy our site, so you can train your AI -- without us getting a penny from you? Okay, here's some junk data"

=> More informations about this toot | More toots from clive@saturation.social

Written by Marisa on 2025-01-24 at 00:49

@clive Oh... I love it(!)

=> More informations about this toot | More toots from marisa@mastodon.scot

Written by Clive Thompson on 2025-01-24 at 00:51

@marisa

🤘 🤖

=> More informations about this toot | More toots from clive@saturation.social

Written by Marisa on 2025-01-24 at 01:05

@clive thank you for taking the time to explain. ✨ am learning so much here

=> More informations about this toot | More toots from marisa@mastodon.scot

Written by Clive Thompson on 2025-01-24 at 02:33

@marisa

that's what the internet is for!

=> More informations about this toot | More toots from clive@saturation.social

Written by Jonathan Lamothe on 2025-01-23 at 23:49

@clive I love this!

=> More informations about this toot | More toots from me@social.jlamothe.net

Written by Clive Thompson on 2025-01-24 at 00:49

@me

🤘

=> More informations about this toot | More toots from clive@saturation.social

Written by Devin on 2025-01-23 at 23:53

@clive @asrg @peterfr Always pleased to see when this idea comes around again -- we were doing this ~20 years ago when spammers were crawling the web looking for email addresses, and "make sure the 'Danger: Keep Out' sign has some danger behind it" keeps being useful in new ways.

=> More informations about this toot | More toots from carraway@sfba.social

Written by Clive Thompson on 2025-01-24 at 00:49

@carraway @asrg @peterfr

very cool!

=> More informations about this toot | More toots from clive@saturation.social

Written by 𝕎𝕦𝕝𝕗𝕪 on 2025-01-24 at 01:18

@clive @asrg @peterfr

“The Net interprets censorship as damage and routes around it.”

=> More informations about this toot | More toots from n_dimension@infosec.exchange

Written by elle mundy on 2025-01-24 at 04:48

@clive @asrg @peterfr iocaine has fork bomb vibes. i love it

=> More informations about this toot | More toots from exchgr@mastodon.world

Written by Chris on 2025-01-24 at 08:33

@clive @asrg @peterfr adding a watermark over text, making and flattening a pdf is a solid way to really mess with anything involving OCR for machine learning

=> More informations about this toot | More toots from ASprinkleofSage@mastodon.social

Written by Clive Thompson on 2025-01-24 at 16:40

@ASprinkleofSage @asrg @peterfr

yep yep

=> More informations about this toot | More toots from clive@saturation.social

Written by Raph V. on 2025-01-28 at 17:55

@clive @asrg @peterfr is there a solution that would work well with a Static Site Generator?

=> More informations about this toot | More toots from raph_v@mstdn.social

Written by ASRG on 2025-01-28 at 18:01

@raph_v @clive @peterfr .. Here’s an interesting approach from @gedankenstuecke that you might find helpful!

⟶ https://scholar.social/@gedankenstuecke/113899799818100252

=> More informations about this toot | More toots from asrg@tldr.nettime.org

Written by Clive Thompson on 2025-01-28 at 21:30

@raph_v @asrg @peterfr

good question!

I don't really know

=> More informations about this toot | More toots from clive@saturation.social

Written by peterfr on 2025-01-28 at 22:15

@clive @raph_v maybe you missed @asrg ‘s reply?

https://tldr.nettime.org/@asrg/113907290321491880

=> More informations about this toot | More toots from peterfr@mastodon.art

Written by peterfr on 2025-01-28 at 22:18

@clive @raph_v @asrg

@gedankenstuecke’s how-to …

https://scholar.social/@gedankenstuecke/113899799818100252

=> More informations about this toot | More toots from peterfr@mastodon.art

Written by Bastian Greshake Tzovaras on 2025-01-28 at 23:53

@peterfr @clive @raph_v @asrg thanks for sharing it! And happy to help if anyone tries and runs into issues!

=> More informations about this toot | More toots from gedankenstuecke@scholar.social

Written by Clive Thompson on 2025-01-30 at 12:51

@raph_v @peterfr @asrg

Oh I did, thank you for pointing it out!

=> More informations about this toot | More toots from clive@saturation.social

Proxy Information
Original URL
gemini://mastogem.picasoft.net/thread/113880280248033232
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
473.018338 milliseconds
Gemini-to-HTML Time
6.90023 milliseconds

This content has been proxied by September (3851b).