Ancestors

Written by froztbyte@awful.systems on 2024-10-21 at 05:48

Stubsack: weekly thread for sneers not worth an entire post, week ending Sunday 27 October 2024

https://awful.systems/post/2668740

=> More informations about this toot | More toots from froztbyte@awful.systems

Written by froztbyte@awful.systems on 2024-10-21 at 07:21

has the era of active sabotage of the autoplag inputs begun? let’s hope so

=> More informations about this toot | More toots from froztbyte@awful.systems

Written by BlueMonday1984@awful.systems on 2024-10-21 at 10:16

Considering Glaze and Nightshade have been around for a while, and I talked about sabotaging scrapers back in July, arguably, it already has.

Hell, I ran across a much smaller scale case of this a couple days ago:

Not sure how effective it is, but if Elon’s stealing your data for his autoplag no matter what, you might as well try to force-feed it as much poison as you can.

=> More informations about this toot | More toots from BlueMonday1984@awful.systems

Written by corbin@awful.systems on 2024-10-21 at 16:06

It’s almost completely ineffective, sorry. It’s certainly not as effective as exfiltrating weights via neighborly means.

On Glaze and Nightshade, my prior rant hasn’t yet been invalidated and there’s no upcoming mathematics which tilt the scales in favor of anti-training techniques. In general, scrapers for training sets are now augmented with alignment models, which test inputs to see how well the tags line up; your example might be rejected as insufficiently normal-cat-like.

I think that “force-feeding” is probably not the right metaphor. At scale, more effort goes into cleaning and tagging than into scraping; most of that “forced” input is destined to be discarded or retagged.

=> More informations about this toot | More toots from corbin@awful.systems

Toot

Written by froztbyte@awful.systems on 2024-10-21 at 18:28

yeah this is the thing I’ve been thinking a lot about

fucking reCaptcha is literally mass-weaponising users for data filtration, and there is no good counter besides just not using reCaptcha (which is something one can’t easily pull off without things like regulatory action, massive reputational problems that make people gtfo, etc)

I have similar worries about cloudflare being such a massive chokepoint and using that position to enable “ai bot filter” services. feels extremely monopolistic, but ianal and I’m not entirely sure what the case grounds/structure on that would be (if any)

the only other viable strategy at the moment is fully breaking contact with any potential bad traffic systems, and that’s extremely fucking dire because that’s yet another nail in the coffin of the increasingly less open internet

=> More informations about this toot | More toots from froztbyte@awful.systems

Descendants

Written by bitofhope@awful.systems on 2024-10-22 at 01:08

The whole Cloudflare bot detection is so weird and eerie. I’ve had issues where I can’t get past it presumably just because I’m using some in-application browser just to get a login cookie, but other times it just lets fucking curl through no questions asked.

=> More informations about this toot | More toots from bitofhope@awful.systems

Written by flavia on 2024-10-22 at 17:15

it just lets fucking curl through no questions asked

Fucking what. I’ve heard of sites blocking curl and I’ve been able to get around it by copying user agent and sometimes cookies from the browser. Now I’m cursed with the knowledge that I could probably just scrape stuff from everywhere

=> More informations about this toot | More toots from ibt3321@lemmy.blahaj.zone

Proxy Information
Original URL
gemini://mastogem.picasoft.net/thread/113346824543760128
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
344.386304 milliseconds
Gemini-to-HTML Time
2.47865 milliseconds

This content has been proxied by September (3851b).