A Mastodon server probably shouldn't pre-fetch media in the toots it receives.
Instead, make the clients request the media from their local server, if it isn't present then the client does a direct request to the source, then sends the result back to their local server for other people to use.
Opportunity for shenanigans of course, if this first client decides to hand back something incorrect. However, you could defend against this in a number of ways, such as random client cache misses to make sure your user population converges on truth, or for smaller servers with a community-oriented base, just accept it.
This seems much more "distributed" than having a central server responsible for everything.
=> More informations about this toot | More toots from yojimbo@hackers.town
@yojimbo what you describe is a hash check
=> More informations about this toot | More toots from saxnot@chaos.social
@yojimbo do I understand correctly that this aims towards reduction of disk space?
do you understand that not everyone aims towards that and there are many who would rather design the backend to respond quickly and with least roundtrips necessary regardless of hardware cost?
=> More informations about this toot | More toots from saxnot@chaos.social
@saxnot Nope, it's to mitigate the "herd of elephants" problem where upstream media providers get swamped with simultaneous requests from every mastodon server that the referring toot went to, regardless of whether any of their user population actually viewed/rendered the post at all.
And also an attempt at thinking about the problem in a more distributed manner than I normally do.
=> More informations about this toot | More toots from yojimbo@hackers.town
@saxnot That's certainly one way to do part of the check.
Things get complicated when you consider that any of the clients submitting their media for local cache might be lying; and when the upstream changes the contents available from the URI.
If upstream only published content using a content hash as an identifier ... that would be interesting.
=> More informations about this toot | More toots from yojimbo@hackers.town
@yojimbo i'm not saying this
in fact I think it makes no sense what you propose.
=> More informations about this toot | More toots from saxnot@chaos.social
@saxnot /shrug that's OK, I didn't put much longer into thinking about the suggestion than it took me to type. And already I've had a few helpful comments in response, so that's good too. I don't need to be "right" all the time :-)
=> More informations about this toot | More toots from yojimbo@hackers.town
@yojimbo with end-to-end signing, it should be pretty safe, i don't know if anyone in ActivityPub specifies/implements that, though.
(i.e. the originating client or server signs any media it publishes with an instance key, any other node (e.g. the receiving server) in the network can easy verify the origin of media)
=> More informations about this toot | More toots from jn@boopsnoot.de
@jn That's a nice way to increase confidence that the end-users aren't messing with you.
Would it also help identify when upstream changes their content?
=> More informations about this toot | More toots from yojimbo@hackers.town
@yojimbo (disclaimer, i haven't read the AP spec, i'm reasoning from first principles)
i would expect changes (edits) to be associated with some kind of change message in the protocol, which would tell downstream nodes to fetch the media again (possibly from a new URL). if it works roughly like that, signing won't be in the way.
for silent changes, however, caches are already a problem when not using signatures.
=> More informations about this toot | More toots from jn@boopsnoot.de
@yojimbo yep. Someday somebody will fix this. I want to believe
=> More informations about this toot | More toots from shlee@aus.social
@yojimbo @yojimbo one of the original authors intended activitypub to have a way more client-to-server (c2s) focus, but then everything just evolved to be s2s.
=> More informations about this toot | More toots from amd@gts.amd.im
@yojimbo why leave the client in charge of handling the cache miss? would it not be better to have the instance software fetch from source?
=> More informations about this toot | More toots from 0x57e11a@void.lgbt
@0x57e11a This is the current situation, and right now upstream servers get hit with a huge number of almost-simultaneous requests from un-coordinated mastodon servers when a toot with media in it goes out. Switching it to a client-driven activity means that this load would be spread out based on when humans are actually rendering the content, and hopefully this reduces the peak load problem.
=> More informations about this toot | More toots from yojimbo@hackers.town
@yojimbo oh ofc, having it only be loaded when the user needs to render it is far better! but to avoid the mess that is trusting the client’s upload, why not have the user ask the server for the cached media, and from then the server can handle making the request to the source url?
a cache hit:
|--cache please->|
|<-cached media--|
a cache miss (with client fetching media, your solution, 3 requests)
| |--cache please->|
| |<-no------------|
|<-media please---|
|----------media->|
| |----cache this->|
| |<-ok------------|
a cache miss (with server fetching media, this ones solution, 2 requests)
|--cache please->| |
| |---media please->|
| |<-media----------|
|<-media---------| |
=> More informations about this toot | More toots from 0x57e11a@void.lgbt
@0x57e11a Actually, I guess there's no particular reason that you couldn't use a normal cache miss mechanism like that :-)
Perhaps my idea was unduly influenced by seeing upstream site operators talking about actively blocking requests coming from a mastodon server user-agent ...
=> More informations about this toot | More toots from yojimbo@hackers.town
@yojimbo that is a valid reason to want a way around it, but ideally fixing the issue before that happens could reduce request amounts
plus good luck blocking the UA of every single fedi software (every misskey fork :neobot_woozy:)
=> More informations about this toot | More toots from 0x57e11a@void.lgbt
@0x57e11a I'm not affected by it ... but it seems like plenty of others are.
https://news.itsfoss.com/mastodon-link-problem/ is a few months old and perhaps not the best ... but it's a good indication of the pain.
JWZ's article uses regexs to look at the user agent ... https://www.jwz.org/blog/2022/11/mastodon-stampede/ and also notes that Mastodon recently changed their user-agent string https://github.com/mastodon/mastodon/pull/31192
=> More informations about this toot | More toots from yojimbo@hackers.town This content has been proxied by September (ba2dc).Proxy Information
text/gemini