Robots.txt is a declaration that website owners use to articulate terms of service to crawlers and #AI scrapers, the latter of which use content to train LLMs. AI scrapers routinely ignore Robots.txt and advertise fake user agents to circumvent a website owner’s intent and prevent discovery. It’s cynical that #OpenAI called out #DeepSeek for doing the same. If your site has differentiated knowledge and depends on visitor relationships, block AI scrapers with #Cloudflare.
https://mastodon.social/@arstechnica/113908061814101292
=> More informations about this toot | View the thread | More toots from bretcarmichael@mastodon.social
=> View ai tag | View openai tag | View deepseek tag | View cloudflare tag This content has been proxied by September (3851b).Proxy Information
text/gemini