Toot

Written by Bret Carmichael on 2025-01-29 at 12:05

Robots.txt is a declaration that website owners use to articulate terms of service to crawlers and #AI scrapers, the latter of which use content to train LLMs. AI scrapers routinely ignore Robots.txt and advertise fake user agents to circumvent a website owner’s intent and prevent discovery. It’s cynical that #OpenAI called out #DeepSeek for doing the same. If your site has differentiated knowledge and depends on visitor relationships, block AI scrapers with #Cloudflare.

https://mastodon.social/@arstechnica/113908061814101292

=> View attached media

=> More informations about this toot | View the thread | More toots from bretcarmichael@mastodon.social

Mentions

Tags

=> View ai tag | View openai tag | View deepseek tag | View cloudflare tag

Proxy Information
Original URL
gemini://mastogem.picasoft.net/toot/113911552621461461
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
227.709422 milliseconds
Gemini-to-HTML Time
1.100936 milliseconds

This content has been proxied by September (3851b).