Toot

Written by Bret Carmichael on 2025-01-29 at 12:05

Robots.txt is a declaration that website owners use to articulate terms of service to crawlers and #AI scrapers, the latter of which use content to train LLMs. AI scrapers routinely ignore Robots.txt and advertise fake user agents to circumvent a website owner’s intent and prevent discovery. It’s cynical that #OpenAI called out #DeepSeek for doing the same. If your site has differentiated knowledge and depends on visitor relationships, block AI scrapers with #Cloudflare.

https://mastodon.social/@arstechnica/113908061814101292

=> View attached media

=> More informations about this toot | View the thread | More toots from bretcarmichael@mastodon.social

Toot

Written by Bret Carmichael on 2025-01-29 at 12:05

Mentions

Tags