Newsmast can't stop setting up new domains to scrape the Fediverse for profit.
[#]FediBlock
newsmast.org
newsmast.social
newsmast.community
channel.org
[#]Ai #scraper #newsmast
=> More informations about this toot | More toots from PaulaToThePeople@climatejustice.social
@PaulaToThePeople is it really for-profit? this says its a charity...
https://www.blog-pat.ch/our-birthday/
Or this:
https://www.newsmastfoundation.org/
It also doesn't mention AI but if it is using that, I'd like to know (and block them)
[#]newsmast
=> More informations about this toot | More toots from elduvelle@neuromatch.social
@elduvelle @PaulaToThePeople
"Charity" or not, they don't seem to understand boundaries. Also, organizing as a charity doesn't necessarily indicate that it's above board.
=> More informations about this toot | More toots from TheGreatLlama@kolektiva.social
@TheGreatLlama what do you mean about not understanding boundaries? Is it because they boost certain hashtags?
@PaulaToThePeople
=> More informations about this toot | More toots from elduvelle@neuromatch.social
@elduvelle @PaulaToThePeople
Servers keep blocking them and they keep creating new servers. Seems like block evasion to me.
=> More informations about this toot | More toots from TheGreatLlama@kolektiva.social
@TheGreatLlama Yeah, that definitely not cool. Is there any record of this? @PaulaToThePeople
=> More informations about this toot | More toots from elduvelle@neuromatch.social
@elduvelle @PaulaToThePeople
Aside from everyone's ongoing memory of them continually showing up in fediblock posts over a couple of years now? Sorry, you'll have to work that out on your own... Through I'm sure whois searches might shed some light on when the different domains were registered.
=> More informations about this toot | More toots from TheGreatLlama@kolektiva.social
@TheGreatLlama @elduvelle @PaulaToThePeople
That is the main concern I keep coming across, that they have a bunch of server URLs and I don't know why they would need them. That doesn't necessarily mean block evasion is intended, but it's certainly a side effect.
Newsmast has responded to a chunk of this in a long response to one of my posts at...
https://newsmast.social/@newsmast/113859958444958207
=> More informations about this toot | More toots from Raccoon@techhub.social
@Raccoon @TheGreatLlama @elduvelle In march 2024 Newsmast sent mails to multiple Fediverse admins (I received a few because our foundation hosts more than one server) proudly presenting a pdf with creepy data about the server and the Fediverse.
I sent them a reply in the name of climatejustice.social and global to never contact us again, delete all the data they have about us and stop scraping us.
Since then they created new urls and started federating with us from them, aka. scraping us again.
=> More informations about this toot | More toots from PaulaToThePeople@climatejustice.social
@PaulaToThePeople @TheGreatLlama @elduvelle
I didn't see that because that email isn't under my purview, and we are currently busy with a move from the US to Canada, but I will go ahead and ask the admin anyway so I can get context when he has a moment...
@nicdex, do you have an email sent to us by Newsmast back in March? Paula here is talking about what sounds like some sort of data-mining report and I'd like to see that for a few reasons.
Something I will say, I'm not sure that "scraping" applies here: it sounds like they have bots that a human points at people who reliably post relevant things on hashtags, then boosts posts they see with certain hashtags. That's basic Fedi functionality, following and boosting, like the bots people use to make Groups.
Can you give an example of something beyond just following and boosting? (Also, can you be more specific about what was in the email?)
=> More informations about this toot | More toots from Raccoon@techhub.social
@PaulaToThePeople @TheGreatLlama @elduvelle @nicdex
Also, Newsmast just got back to me, saying those URLs are 3 very different things, and thus not block-evasion. They are saying that the bots are all on Newsmast.Community, while .Social is the general use server, and that .Org is their foundation's web-front, which seems to be true based on what I see from our end.
https://newsmast.social/@newsmast/113861016173303067
=> More informations about this toot | More toots from Raccoon@techhub.social
@Raccoon @PaulaToThePeople @TheGreatLlama @elduvelle
Sorry this email fell through the crack. I'm sending you the report they attached in a separate DM on or our Discord chat.
=> More informations about this toot | More toots from nicdex@techhub.social
@nicdex @PaulaToThePeople @TheGreatLlama @elduvelle
Thanks Nic! I'm very curious, and I will look when I have a moment! 🙂
=> More informations about this toot | More toots from Raccoon@techhub.social
@nicdex @PaulaToThePeople @TheGreatLlama @elduvelle
Sitting in a Drs Office (I do that a lot these days) looking over the document. It looks like this is just some aggregate data on their own traffic and bots: they don't even mention any servers beyond Mastodon.Social. None of it seems particularly personal or alarming, just numbers from what their bots are doing.
Are you sure this is Scraping and not just a bunch of graphs on the general data of their bots' activity, gotten from following?
=> More informations about this toot | More toots from Raccoon@techhub.social
@nicdex @PaulaToThePeople @TheGreatLlama @elduvelle
This is the page where they published the doc in question for anyone who's curious...
https://www.newsmastfoundation.org/our-blog/mapping-fediverse-communities/
=> More informations about this toot | More toots from Raccoon@techhub.social
@Raccoon thanks! I would say that's pretty interesting.. not sure I'd call it scraping if it's all from following accounts and "relays" (not sure what that is):
data: all the content is on our server through federation, via follows and relays."
@nicdex @PaulaToThePeople @TheGreatLlama
=> More informations about this toot | More toots from elduvelle@neuromatch.social
@elduvelle @nicdex @TheGreatLlama
Removing Paula as she's asked to stop talking about it.
I agree that there isn't much troubling beyond perhaps not communicating enough with general staff around the network. I don't think it warrants a widespread block, simply because what they are doing is relatively innocuous, and I don't see anything happening beyond automating the boosting of public posts.
That said, it's always up to individual server staffs to decide who they should and shouldn't block for their community. If people feel the need to block this after seeing all this information, they should block it. Really though, I think this is a good example of doing a quick bit of research when someone posts something on the FediBlock tag.
=> More informations about this toot | More toots from Raccoon@techhub.social
@Raccoon @nicdex @PaulaToThePeople @elduvelle
Does anyone know if they're respecting hashtags like #nobot ?
I personally think that making something like this "opt out" rather than "opt in" is on the shady side, but I'd specifically like to opt out of this and all similar future endeavors.
=> More informations about this toot | More toots from TheGreatLlama@kolektiva.social
@TheGreatLlama I don't know but I just asked them there: https://neuromatch.social/@elduvelle/113861525918814300
We'll see if they answer!
@Raccoon @nicdex @PaulaToThePeople
=> More informations about this toot | More toots from elduvelle@neuromatch.social
@elduvelle @TheGreatLlama @nicdex Removing Paula because she asked to be removed.
Y'all should maybe be thinking out some questions to ask them, and just asking: they are responding pretty readily to communications, and they seem to be good-faith.
=> More informations about this toot | More toots from Raccoon@techhub.social
@elduvelle Hm, I'm not sure. Their websites don't give a lot of info, but look hella shady.
Cookie policy:
"Other cookies, which are optional, also enable us to track and target the interests of our users to enhance the experience on our Online Properties."
"Cookies are not the only way to recognize or track visitors to a website. We may use other, similar technologies from time to time, like web beacons (sometimes called “tracking pixels” or “clear gifs”). These are tiny graphics files that contain a unique identifier that enables us to recognize when someone has visited our Website or opened an email including them. This allows us, for example, to monitor the traffic patterns of users from one page within a website to another, to deliver or communicate with cookies, to understand whether you have come to the website from an online advertisement displayed on a third-party website, to improve site performance, and to measure the success of email marketing campaigns. In many instances, these technologies are reliant on cookies to function properly, and so declining cookies will impair their functioning."
This doesn't sound like something that should be part of the Fediverse to me.
=> More informations about this toot | More toots from PaulaToThePeople@climatejustice.social
@PaulaToThePeople hmm, but this is about the cookies for their website not their accounts on the Fediverse right?
I don't really know either 🤷
=> More informations about this toot | More toots from elduvelle@neuromatch.social
@PaulaToThePeople
newsmast.og is a typo.
should be newsmast.org
(sorry :neocat: )
=> More informations about this toot | More toots from alex@anarres.family
@alex Thanks, fixed. :)
=> More informations about this toot | More toots from PaulaToThePeople@climatejustice.social
@PaulaToThePeople @alex
Passing this along, Newsmast has responded to a chunk of this in a long response to one of my posts at...
https://newsmast.social/@newsmast/113859958444958207
=> More informations about this toot | More toots from Raccoon@techhub.social This content has been proxied by September (3851b).Proxy Information
text/gemini