A huge thanks to @yossarian for zizmor - found a bunch of issues across all of our GitHub repos.
Only slight downside - had to open over 140 pull requests to sort it out!
https://github.com/woodruffw/zizmor
=> More informations about this toot | View the thread
No other bot seems to have this issue.
I'm sure any well behaved bot would limit the amount of traffic to the same URL path but different query strings if it was failing to get any new meaningful content. I know some sites will have content that's only accessible with varying query strings, but you'd hope that they can add some protection to avoid excessively scraping a site.
The amount of resources this is wasting is ridiculous.
=> More informations about this toot | View the thread
Time for another bot scraping rant.
GPTBot is currently ~16% of HTTP request traffic across a bunch of our hosted sites. It's stuck in a loop across a variety of pages with filtering list results. It's just constantly adding additional junk into a query string.
https://community.openai.com/t/bots-generating-errors-on-my-website/995234 describes it pretty well, no errors - but it's just relentless and refuses to give up.
=> More informations about this toot | View the thread
nginx rant - by default it accepts URIs up to 8k in length, sends thats to your backend server just fine. However if your backend server sends response headers of a similar length back - it drops the response and returns a 502 Bad Gateway.
Such an annoying mismatch in default configuration.
=> More informations about this toot | View the thread
=> This profile with reblog | Go to alextomkins@fosstodon.org account This content has been proxied by September (ba2dc).Proxy Information
text/gemini