AI companies are gonna force Google into abandoning important crawling standards.
I'm seeing lots of traffic in various places lately about how much worse #Search is these days and a lot about how great LLM chatbots are now at finding information.
The reality of this is that Google is honouring the robots.txt file, as well as the other no-crawl hints that sites provide. LLM trainers are just scraping anything and everything they can, at the expense of site owners.
If Google starts bleeding enough users, they will likewise be forced to abandon crawler instructions.
Site owners will of course then be forced to wall off their content. Get used to signing into sites to even see basic stuff. The entire Internet is about to go deep into #Enshittification and it'll basically be the death knell of the open idea of the WWW.
I don't see a path through that doesn't end up there at least.
=> More informations about this toot | More toots from fennix@infosec.space
@fennix the fact that neither @bsi nor @EUCommission make honoring #RobotsTXT legally mandatory under penalty of fines and forced disconnects is a problem.
[#]WhatYouAllowIsWhatWillContinue applies here and I kniw some folks intent to literally ban entire ASNs for hosting crawlers because those literally #DDoS sites offline and criminally incompetent, value-removing middlemen like #ClownFlare do jack shit about even when tasked to do so.
[#]sarcasm #vent #AI #LLM #Enshittification
=> More informations about this toot | More toots from kkarhan@infosec.space
@kkarhan @bsi @EUCommission
Until recently I was diving into the safety and security of residential proxies. They're effectively the end run around the bot problem, and are how bypassing the whole subnet blocking approach becomes possible.
They're not secure but that's for another storytime when life permits me to write it up.
=> More informations about this toot | More toots from fennix@infosec.space
@fennix
"“They were essentially siphoning off 80 [gigabytes] a day, or something crazy like that, from us,” he alleges. (Again, OpenAI disagrees with this.)"
Who are you gonna believe? The guy that openAI wants to take training data from, or the company that made a computer that can't do math?
=> More informations about this toot | More toots from RnDanger@infosec.exchange
@RnDanger
Haha, right? Journalism died awhile back and we just all forgot to have the wake.
=> More informations about this toot | More toots from fennix@infosec.space This content has been proxied by September (3851b).Proxy Information
text/gemini