Ancestors

Toot

Written by Fennix on 2024-12-30 at 15:54

AI companies are gonna force Google into abandoning important crawling standards.

I'm seeing lots of traffic in various places lately about how much worse #Search is these days and a lot about how great LLM chatbots are now at finding information.

The reality of this is that Google is honouring the robots.txt file, as well as the other no-crawl hints that sites provide. LLM trainers are just scraping anything and everything they can, at the expense of site owners.

If Google starts bleeding enough users, they will likewise be forced to abandon crawler instructions.

Site owners will of course then be forced to wall off their content. Get used to signing into sites to even see basic stuff. The entire Internet is about to go deep into #Enshittification and it'll basically be the death knell of the open idea of the WWW.

I don't see a path through that doesn't end up there at least.

=> More informations about this toot | More toots from fennix@infosec.space

Descendants

Written by Kevin Karhan :verified: on 2024-12-30 at 17:18

@fennix the fact that neither @bsi nor @EUCommission make honoring #RobotsTXT legally mandatory under penalty of fines and forced disconnects is a problem.

[#]WhatYouAllowIsWhatWillContinue applies here and I kniw some folks intent to literally ban entire ASNs for hosting crawlers because those literally #DDoS sites offline and criminally incompetent, value-removing middlemen like #ClownFlare do jack shit about even when tasked to do so.

[#]sarcasm #vent #AI #LLM #Enshittification

=> More informations about this toot | More toots from kkarhan@infosec.space

Written by Fennix on 2024-12-30 at 17:24

@kkarhan @bsi @EUCommission

Until recently I was diving into the safety and security of residential proxies. They're effectively the end run around the bot problem, and are how bypassing the whole subnet blocking approach becomes possible.

They're not secure but that's for another storytime when life permits me to write it up.

=> More informations about this toot | More toots from fennix@infosec.space

Written by 2xfo on 2024-12-30 at 17:26

@fennix

"“They were essentially siphoning off 80 [gigabytes] a day, or something crazy like that, from us,” he alleges. (Again, OpenAI disagrees with this.)"

Who are you gonna believe? The guy that openAI wants to take training data from, or the company that made a computer that can't do math?

=> More informations about this toot | More toots from RnDanger@infosec.exchange

Written by Fennix on 2024-12-30 at 17:28

@RnDanger

Haha, right? Journalism died awhile back and we just all forgot to have the wake.

=> More informations about this toot | More toots from fennix@infosec.space

Proxy Information
Original URL
gemini://mastogem.picasoft.net/thread/113742583211677318
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
357.803579 milliseconds
Gemini-to-HTML Time
1.629839 milliseconds

This content has been proxied by September (3851b).