Via the Algorithmic Sabotage Research Group (@asrg) ...
... here's a list of code designed to poison the well for AI web-scrapers
https://tldr.nettime.org/@asrg/113867412641585520
(thanks to @peterfr for pointing this one out!)
=> More informations about this toot | More toots from clive@saturation.social
@clive @asrg
That heading – Sabot in the Age of AI – is 👌🏽
=> More informations about this toot | More toots from peterfr@mastodon.art
@peterfr @asrg
heh heh yes
=> More informations about this toot | More toots from clive@saturation.social
@clive @asrg @peterfr not sure what this means but sounds helpful!
=> More informations about this toot | More toots from marisa@mastodon.scot
@marisa @asrg @peterfr
basically, one key way that companies like OpenAI train their language AI is by using "web crawler" software that roams around online, copying the text off web sites ("web scraping", as it's called) so they can have a consistently refreshed pile o' text for training their AI
you need lots of freshly written human words to train an AI -- and people are constantly writing stuff on their sites!
So what these tools do is ...
=> More informations about this toot | More toots from clive@saturation.social
@marisa @asrg @peterfr
... attempt to detect if an OpenAI web-crawler is trying to copy the text off a web site ...
... and if so, it generates fake pages with crap text -- which the OpenAI web crawler assumes are real, and thus dutifully copies
So OpenAI winds up feeding junk/fake/mangled text as training material into its next version of ChatGPT
The attitude is: "So, you wanna copy our site, so you can train your AI -- without us getting a penny from you? Okay, here's some junk data"
=> More informations about this toot | More toots from clive@saturation.social
@clive Oh... I love it(!)
=> More informations about this toot | More toots from marisa@mastodon.scot
@marisa
🤘 🤖
=> More informations about this toot | More toots from clive@saturation.social
@clive thank you for taking the time to explain. ✨ am learning so much here
=> More informations about this toot | More toots from marisa@mastodon.scot
@marisa
that's what the internet is for!
=> More informations about this toot | More toots from clive@saturation.social
@clive I love this!
=> More informations about this toot | More toots from me@social.jlamothe.net
@me
🤘
=> More informations about this toot | More toots from clive@saturation.social
@clive @asrg @peterfr Always pleased to see when this idea comes around again -- we were doing this ~20 years ago when spammers were crawling the web looking for email addresses, and "make sure the 'Danger: Keep Out' sign has some danger behind it" keeps being useful in new ways.
=> More informations about this toot | More toots from carraway@sfba.social
@carraway @asrg @peterfr
very cool!
=> More informations about this toot | More toots from clive@saturation.social
@clive @asrg @peterfr
“The Net interprets censorship as damage and routes around it.”
=> More informations about this toot | More toots from n_dimension@infosec.exchange
@clive @asrg @peterfr iocaine has fork bomb vibes. i love it
=> More informations about this toot | More toots from exchgr@mastodon.world
@clive @asrg @peterfr adding a watermark over text, making and flattening a pdf is a solid way to really mess with anything involving OCR for machine learning
=> More informations about this toot | More toots from ASprinkleofSage@mastodon.social
@ASprinkleofSage @asrg @peterfr
yep yep
=> More informations about this toot | More toots from clive@saturation.social
@clive @asrg @peterfr is there a solution that would work well with a Static Site Generator?
=> More informations about this toot | More toots from raph_v@mstdn.social
@raph_v @clive @peterfr .. Here’s an interesting approach from @gedankenstuecke that you might find helpful!
⟶ https://scholar.social/@gedankenstuecke/113899799818100252
=> More informations about this toot | More toots from asrg@tldr.nettime.org
@raph_v @asrg @peterfr
good question!
I don't really know
=> More informations about this toot | More toots from clive@saturation.social
@clive @raph_v maybe you missed @asrg ‘s reply?
https://tldr.nettime.org/@asrg/113907290321491880
=> More informations about this toot | More toots from peterfr@mastodon.art
@clive @raph_v @asrg
@gedankenstuecke’s how-to …
https://scholar.social/@gedankenstuecke/113899799818100252
=> More informations about this toot | More toots from peterfr@mastodon.art
@peterfr @clive @raph_v @asrg thanks for sharing it! And happy to help if anyone tries and runs into issues!
=> More informations about this toot | More toots from gedankenstuecke@scholar.social
@raph_v @peterfr @asrg
Oh I did, thank you for pointing it out!
=> More informations about this toot | More toots from clive@saturation.social This content has been proxied by September (3851b).Proxy Information
text/gemini