Ancestors

Toot

Written by Lilian Edwards on 2025-01-12 at 16:01

Finally one previously advertised but I'll throw in here as it's actually edging nearer to "actual" publication ( whatever the hell that means nowadays!!!!

https://someone.elses.computer/@RDBinns/113622759079704904

=> More informations about this toot | More toots from lilianedwards@someone.elses.computer

Descendants

Written by Ulrike Hahn on 2025-01-12 at 16:23

@lilianedwards @RDBinns 👆the paper linked to here is of direct interest to the posts and discussions on ChatGPT providing output on people’s Mastodon posts that have been circulating here in recent days.

In particular, it discusses potential data privacy remedies.

Written by Lilian Edwards on 2025-01-12 at 16:27

@UlrikeHahn @RDBinns I haven't seen this discussion - can u point me at some?

=> More informations about this toot | More toots from lilianedwards@someone.elses.computer

Written by Ulrike Hahn on 2025-01-12 at 16:29

@lilianedwards @RDBinns this was the first post I saw on this. there have been several independent ones since

https://atomicpoet.org/objects/324d56ed-afe4-4165-b928-0de011d3b84e

=> More informations about this toot | More toots from UlrikeHahn@fediscience.org

Written by Ulrike Hahn on 2025-01-12 at 16:31

@lilianedwards @RDBinns …here the most recent post on this that came into my TL

https://aoir.social/@aram/113811386580314915

=> More informations about this toot | More toots from UlrikeHahn@fediscience.org

Written by Lilian Edwards on 2025-01-12 at 16:32

@UlrikeHahn @RDBinns I've been pretty exclusively on BSky where the discussion was that it was impossible to keep anything private from scrapers but "real" fediverse could choose to do so? So this was wrong? Or is it just ordinary ignoring of robots.txt ? @tnhh ( this is next thing we are writing!)

=> More informations about this toot | More toots from lilianedwards@someone.elses.computer

Written by Ulrike Hahn on 2025-01-12 at 16:45

@lilianedwards @RDBinns @tnhh I haven’t looked too far into this, but one aspect definitely seems to be ignoring robots.txt . The other is that a lot of servers have explicit guidance against scraping, not just an automated response, and my Mastodon server allows me to opt out of Mastodon wide search.

So from a user consent perspective, it seems even more problematic than the already problematic Bluesky Huggingface data set case to me….

the biggest problem, I think, is if they were also scraping followers only posts…

=> More informations about this toot | More toots from UlrikeHahn@fediscience.org

Written by Ulrike Hahn on 2025-01-12 at 16:47

@lilianedwards @RDBinns @tnhh what I thought was the most interesting about your piece, though, was the stuff about generation and processing!

=> More informations about this toot | More toots from UlrikeHahn@fediscience.org

Written by Lilian Edwards on 2025-01-12 at 17:27

@UlrikeHahn @RDBinns @tnhh I suppose I am utterly cynical that scrapers will scrape unless physically compelled not to. But after that y the argt goes to the effect of various tactics on consent & optput Vs option. Robots.txt is esp interesting right now as 1st step in creative AI scraping regulation yet simply not up to job

=> More informations about this toot | More toots from lilianedwards@someone.elses.computer

Written by Lilian Edwards on 2025-01-12 at 17:29

@UlrikeHahn @RDBinns @tnhh new EDPB and ICO guidance pretty much replicates our paper ( and previous papers)which is nice ,( & ofc rather more authoritative!)

=> More informations about this toot | More toots from lilianedwards@someone.elses.computer

Written by Lilian Edwards on 2025-01-12 at 16:36

@UlrikeHahn @RDBinns thanks. This seems a lot like the arguments that broke out on BSky re Hugging Face employee also scraping BSky

=> More informations about this toot | More toots from lilianedwards@someone.elses.computer

Written by Reuben Binns⁉️ on 2025-01-20 at 11:27

@UlrikeHahn @lilianedwards thanks for sharing this! I think one difference (important from a DP perspective) here is that these results are from the service using live web searches to answer, rather than just the contents of the 'raw' model which was the context we were concerned with in our paper. (Not that this case isn't also important)

=> More informations about this toot | More toots from RDBinns@someone.elses.computer

Written by Ulrike Hahn on 2025-01-20 at 11:30

@RDBinns @lilianedwards indeed- and that (as noted) raises additional questions of interest such as the fact that I, on my Mastodon server, have explicitly opted out of indexed search….

I’d love to see questions about the legal status of such options settled by a court.

=> More informations about this toot | More toots from UlrikeHahn@fediscience.org

Proxy Information

Original URL: gemini://mastogem.picasoft.net/thread/113816218362717301
Status Code: Success (20)
Meta: text/gemini
Capsule Response Time: 316.265354 milliseconds
Gemini-to-HTML Time: 2.842753 milliseconds

This content has been proxied by September (ba2dc).