Ancestors

Written by Clive Thompson on 2025-01-23 at 23:31

Via the Algorithmic Sabotage Research Group (@asrg) ...

... here's a list of code designed to poison the well for AI web-scrapers

https://tldr.nettime.org/@asrg/113867412641585520

(thanks to @peterfr for pointing this one out!)

=> View attached media

=> More informations about this toot | More toots from clive@saturation.social

Toot

Written by Marisa on 2025-01-23 at 23:49

@clive @asrg @peterfr not sure what this means but sounds helpful!

=> More informations about this toot | More toots from marisa@mastodon.scot

Descendants

Written by Clive Thompson on 2025-01-24 at 00:45

@marisa @asrg @peterfr

basically, one key way that companies like OpenAI train their language AI is by using "web crawler" software that roams around online, copying the text off web sites ("web scraping", as it's called) so they can have a consistently refreshed pile o' text for training their AI

you need lots of freshly written human words to train an AI -- and people are constantly writing stuff on their sites!

So what these tools do is ...

=> More informations about this toot | More toots from clive@saturation.social

Written by Clive Thompson on 2025-01-24 at 00:47

@marisa @asrg @peterfr

... attempt to detect if an OpenAI web-crawler is trying to copy the text off a web site ...

... and if so, it generates fake pages with crap text -- which the OpenAI web crawler assumes are real, and thus dutifully copies

So OpenAI winds up feeding junk/fake/mangled text as training material into its next version of ChatGPT

The attitude is: "So, you wanna copy our site, so you can train your AI -- without us getting a penny from you? Okay, here's some junk data"

=> More informations about this toot | More toots from clive@saturation.social

Written by Marisa on 2025-01-24 at 00:49

@clive Oh... I love it(!)

=> More informations about this toot | More toots from marisa@mastodon.scot

Written by Clive Thompson on 2025-01-24 at 00:51

@marisa

🤘 🤖

=> More informations about this toot | More toots from clive@saturation.social

Written by Marisa on 2025-01-24 at 01:05

@clive thank you for taking the time to explain. ✨ am learning so much here

=> More informations about this toot | More toots from marisa@mastodon.scot

Written by Clive Thompson on 2025-01-24 at 02:33

@marisa

that's what the internet is for!

=> More informations about this toot | More toots from clive@saturation.social

Proxy Information
Original URL
gemini://mastogem.picasoft.net/thread/113880345198795966
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
288.178723 milliseconds
Gemini-to-HTML Time
3.211267 milliseconds

This content has been proxied by September (3851b).