@lilianedwards @RDBinns @tnhh I haven’t looked too far into this, but one aspect definitely seems to be ignoring robots.txt . The other is that a lot of servers have explicit guidance against scraping, not just an automated response, and my Mastodon server allows me to opt out of Mastodon wide search.
So from a user consent perspective, it seems even more problematic than the already problematic Bluesky Huggingface data set case to me….
the biggest problem, I think, is if they were also scraping followers only posts…
=> More informations about this toot | View the thread | More toots from UlrikeHahn@fediscience.org
=> View lilianedwards@someone.elses.computer profile | View RDBinns@someone.elses.computer profile | View tnhh@social.tnhh.org profile
text/gemini
This content has been proxied by September (ba2dc).