Ancestors

Written by Eric Zarowny on 2024-08-01 at 13:24

@jasonkoebler I'm listening to your segment on the podcast about crawlers and it reminded me of an incident we ran into with the FacebookExternalHit crawler (https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) back in late March. For reasons I cannot possibly fathom, the crawler got into a loop on our site scanning the same few URLs over and over again at a rate of up to 80 requests per second for at least a few weeks. I don't think Meta had a separate crawler for training AI models at the time but I'd have to check.

=> More informations about this toot | More toots from ezarowny@file-explorers.club

Toot

Written by Eric Zarowny on 2024-08-01 at 13:28

@jasonkoebler we were fortunate to have enough technical skill to be able to add rate limiting for specific URLs on a per-user-agent basis. If you look around the Meta developer forums you can find evidence of this happening to other people and I cannot imagine how less technically-savvy companies deal with this as there isn't really a way to contact someone at Meta about it. If you block this bot, Facebook won't be able to load previews of links to your site.

=> More informations about this toot | More toots from ezarowny@file-explorers.club

Descendants

Written by Jason Koebler on 2024-08-01 at 16:32

@ezarowny this is super interesting ... i am kind of underwater now but i'm going to file this away for the moment and might have some additional questions if we end up writing about this topic again. thank you for sending my way

=> More informations about this toot | More toots from jasonkoebler@mastodon.social

Proxy Information
Original URL
gemini://mastogem.picasoft.net/thread/112887000760816956
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
278.074193 milliseconds
Gemini-to-HTML Time
0.703154 milliseconds

This content has been proxied by September (ba2dc).