Via the Algorithmic Sabotage Research Group (@asrg) ...
... here's a list of code designed to poison the well for AI web-scrapers
https://tldr.nettime.org/@asrg/113867412641585520
(thanks to @peterfr for pointing this one out!)
=> More informations about this toot | More toots from clive@saturation.social
@clive @asrg @peterfr not sure what this means but sounds helpful!
=> More informations about this toot | More toots from marisa@mastodon.scot
@marisa @asrg @peterfr
basically, one key way that companies like OpenAI train their language AI is by using "web crawler" software that roams around online, copying the text off web sites ("web scraping", as it's called) so they can have a consistently refreshed pile o' text for training their AI
you need lots of freshly written human words to train an AI -- and people are constantly writing stuff on their sites!
So what these tools do is ...
=> More informations about this toot | More toots from clive@saturation.social
@marisa @asrg @peterfr
... attempt to detect if an OpenAI web-crawler is trying to copy the text off a web site ...
... and if so, it generates fake pages with crap text -- which the OpenAI web crawler assumes are real, and thus dutifully copies
So OpenAI winds up feeding junk/fake/mangled text as training material into its next version of ChatGPT
The attitude is: "So, you wanna copy our site, so you can train your AI -- without us getting a penny from you? Okay, here's some junk data"
=> More informations about this toot | More toots from clive@saturation.social
@clive Oh... I love it(!)
=> More informations about this toot | More toots from marisa@mastodon.scot
@marisa
🤘 🤖
=> More informations about this toot | More toots from clive@saturation.social
@clive thank you for taking the time to explain. ✨ am learning so much here
=> More informations about this toot | More toots from marisa@mastodon.scot
@marisa
that's what the internet is for!
=> More informations about this toot | More toots from clive@saturation.social This content has been proxied by September (3851b).Proxy Information
text/gemini