Ancestors

Written by infinite love ⴳ on 2025-01-19 at 16:25

i think honestly what i want out of a storage solution is to like maybe upload stuff to some kind of object storage and then separately have a graph database or rdf quad store for the metadata. but i’m wondering if it can be made simpler without losing the ability to handle any general use case, or if this is as simple as it gets

(a filesystem is the wrong level of abstraction, i’m interested in logical handling of objects which may or may not have content that is either binary or text)

=> More informations about this toot | More toots from trwnh@mastodon.social

Written by infinite love ⴳ on 2025-01-19 at 16:31

the area where my thinking seems to differ from most of what i've seen so far is that i don't make a real distinction between text and binary anymore. content is content. so this implies that the object storage is not just for media, but for everything, including text files. but the thing is that content should be strictly content. what most people get "wrong" with HTML for example is that they require an entire structural wrapper, which is itself wrapping not just a body but also a header

=> More informations about this toot | More toots from trwnh@mastodon.social

Toot

Written by infinite love ⴳ on 2025-01-19 at 16:37

entirely natural, of course -- HTML is most often used to serialize documents that are browsed on the Web. but for the most part, that's all presentational stuff! if you removed it, you would lose aesthetics, but the core of the message is only some certain part, that can be extracted and handled on its own. the rest of it is just part of the view or (re)presentation.

i've talked before about how you can combine body content with header metadata and get a document that can itself be body,

=> More informations about this toot | More toots from trwnh@mastodon.social

Descendants

Written by infinite love ⴳ on 2025-01-19 at 16:42

and you can of course unwrap those layers of containers of head+body as well. when there's nothing left to unwrap or extract, that leaves you with some base atomic content, which may be just some plain text.

i argue that this is not meaningfully different than, say, a png file. it's just that text is often inlined into the presentation layer. we don't go about browsing the metadata and then linking to the content, we just stick the content directly into the same view as the metadata...

=> More informations about this toot | More toots from trwnh@mastodon.social

Written by infinite love ⴳ on 2025-01-19 at 16:55

but just as we can inline text content, we can also separate it and link to it, we can stick it into object storage and address it as that literal text. and we can then arbitrarily package and wrap it into whatever containers or as many layers of containers as we care to do so. we could take some plaintext content and wrap it in some HTML template that itself gets wrapped in what eventually becomes an HTML document which itself gets wrapped in an HTTP message. this is more or less how we do rn.

=> More informations about this toot | More toots from trwnh@mastodon.social

Written by infinite love ⴳ on 2025-01-19 at 17:00

the takeaway really, is that if we want the content to be replicable and manageable by some user(s), then it needs to be put into some data store, and it is exactly that data store that i am trying to model. the data model needs to support lossless serialization into whatever convenient format some person might ask for or need. i don't think "sql data dump" is an ideal export format, and i don't think "a really big json file" is ideal either. i want something that can be split up into atoms.

=> More informations about this toot | More toots from trwnh@mastodon.social

Written by infinite love ⴳ on 2025-01-19 at 17:06

currently, my answer is "object storage to store the content + metadata store to describe the logical objects"

is there a better way? idk, but that's it, that's all i got rn

the object storage can be serialized to a filesystem of arbitrary files (even just text files), and the metadata store can be serialized to a bunch of RDF graphs (filesystem hierarchy mostly irrelevant but yeah you can just import an entire folder and subtree)

this seems convenient enough?

=> More informations about this toot | More toots from trwnh@mastodon.social

Written by cyb3r w0fl on 2025-01-19 at 17:03

@trwnh its funny to me how we often reply to this by not the big json file, because it's impractical and a deeper structure would be even worse, but json-lines of individual activities

=> More informations about this toot | More toots from alice@gts.void.dog

Written by infinite love ⴳ on 2025-01-19 at 17:08

@alice yeah this is likely because people just "know" json and how to work with it, but it's not at all efficient for large data stores lol

=> More informations about this toot | More toots from trwnh@mastodon.social

Written by cyb3r w0fl on 2025-01-19 at 17:17

@trwnh yea json exists as a cross-language serialization convenience all the way, mostly i mean that we went towards a plain text list of serialized atomic activities instead of handling collections. (now im thinking of a world where we export and import AP collections as zip archives of xml files along with their signatures)

=> More informations about this toot | More toots from alice@gts.void.dog

Written by infinite love ⴳ on 2025-01-19 at 17:21

@alice AP collections are still a lil weird even conceptually (they have several flaws and bugs and are imo underspecced), but yes this is the general idea. it's all really just a graph merge at the end of the day. fedi doesn't even care about activities, it cares about the content (mostly Note but really Note.content primarily)

the other part of this is can you imagine stuffing an entire HTML Article into a JSON string? is that really a good idea. like really.

=> More informations about this toot | More toots from trwnh@mastodon.social

Written by cyb3r w0fl on 2025-01-19 at 17:23

@trwnh i would have put all that json into a anyway

=> More informations about this toot | More toots from alice@gts.void.dog

Written by infinite love ⴳ on 2025-01-19 at 17:33

@alice i would split the head off into a separate descriptor

but yes this is basically the issue here. you have head and body, the content is ideally the last remaining body after you unwrap all the layers and extract whatever profiles. you shouldn't be required to use any specific format or container just to pass some atomic content around (metadata optional)

=> More informations about this toot | More toots from trwnh@mastodon.social

Written by Erin 💽✨ on 2025-01-19 at 17:20

@trwnh @alice unironically more people need to learn how to work with goddamn binary formats and if not that MIME Multipart

=> More informations about this toot | More toots from erincandescent@erincandescent.net

Written by tuban_muzuru on 2025-01-19 at 20:33

@erincandescent @alice @trwnh

... if you're going to export huge amounts of data, really huge - have the decency to write a reader and encoder for the data you're trying to preserve. When someone will , in the distant future, try to decode your trove, they'll appreciate your foresight.

=> More informations about this toot | More toots from tuban_muzuru@ohai.social

Written by infinite love ⴳ on 2025-01-19 at 21:02

@tuban_muzuru @erincandescent @alice in most cases "always bet on plain text" is good enough for that kind of thing imo. this is more about strategy and architecture of like... managing content. a sort of storage strategy, one that can handle abstract backends

it's probably going to look less like an sql database and a lot more like object storage in the end: the blob being the content (even if it's as simple as a literal string), and the metadata being whatever attribute-value pairs

=> More informations about this toot | More toots from trwnh@mastodon.social

Written by Erin 💽✨ on 2025-01-19 at 21:04

@trwnh @tuban_muzuru @alice I've got a bit of a visceral reaction to this after dealing with formats with 3 nested layers of Base64

But it's reasonable for purely textual data

=> More informations about this toot | More toots from erincandescent@erincandescent.net

Written by infinite love ⴳ on 2025-01-19 at 21:10

@erincandescent @alice @tuban_muzuru the goal of this thought exercise is to use exactly 0 layers of nesting

=> More informations about this toot | More toots from trwnh@mastodon.social

Written by tuban_muzuru on 2025-01-19 at 22:34

@trwnh @erincandescent @alice

That's right. Drive space is cheap.

=> More informations about this toot | More toots from tuban_muzuru@ohai.social

Written by tuban_muzuru on 2025-01-19 at 22:35

@trwnh @erincandescent @alice

Yeah - I dislike writing a parser for anything but plain text.

=> More informations about this toot | More toots from tuban_muzuru@ohai.social

Written by Eugenus Optimus 🇺🇦 on 2025-01-19 at 17:09

@trwnh I dunno, you need a search over multiple instances of non-structured data which organized into hierarchy or mesh?

=> More informations about this toot | More toots from ujeenator@mastodon.social

Written by tech himbo on 2025-01-19 at 17:06

@trwnh what does splitting things into atoms buy us? it sounds nice conceptually, but im not sure i understand the use-case

=> More informations about this toot | More toots from tech_himbo@mastodon.social

Written by infinite love ⴳ on 2025-01-19 at 17:12

@tech_himbo the main use case is easier handling of the content as data. for the most part, content is just data meant for human consumption. but what we lack is convenience around managing content as data. think of how a CMS works for example. now say you have a friend who runs a different CMS. what's the easiest way to export/import some content from you to them? how do you "share" content? reuse? and so on.

having the atomic data makes it easier to handle, logic with, reason about, wrap up,

=> More informations about this toot | More toots from trwnh@mastodon.social

Written by infinite love ⴳ on 2025-01-19 at 17:14

@tech_himbo so basically it amounts to operating at the same semantic level as your communication peers. if i say "hi" to you, the core of the message is "hi", but in order to get it across i need to wrap it in some document and then wrap that document in an HTTP message or whatever. that's a lot of overhead for literally just two characters!

=> More informations about this toot | More toots from trwnh@mastodon.social

Written by infinite love ⴳ on 2025-01-19 at 17:18

@tech_himbo it compounds beyond that, too. obviously i am not the only person who has ever strung those two characters together in that order. but that string may be used by other people in other ways. in much the same way that bittorrent allows peers to split a file into packets and reconstruct it piecewise, the string "hi" may be simply a component in some other larger piece of data -- for example, the content "hi" can be paired with metadata showing that it came from:me and to:you

=> More informations about this toot | More toots from trwnh@mastodon.social

Written by tech himbo on 2025-01-19 at 17:21

@trwnh doesn’t this double the problem, though? if CMS A uses HTML snippets, and CMS B uses markdown with a header, you just need to translate from HTML to markdown. but if A uses JSON-LD and plaintext, and B uses XML and RTF, then you have to do two translations. although maybe your proposal is to ban everything but plaintext — in which case, why make plaintext the standard over any other format?

=> More informations about this toot | More toots from tech_himbo@mastodon.social

Written by infinite love ⴳ on 2025-01-19 at 17:27

@tech_himbo not so much "make plaintext the standard" as much as it is "make the human meaningful thing the standard"

so i could write plaintext or i could write semantic html or i could write markdown or asciidoc or whatever. doesn't matter. the serialization is less important than the actual information being conveyed

so CMS A and CMS B don't need to agree on jsonld or xml, but they do need to agree on what an "article" is. they can then (separately) describe that "article" with metadata.

=> More informations about this toot | More toots from trwnh@mastodon.social

Written by infinite love ⴳ on 2025-01-19 at 17:29

@tech_himbo i am for the most part using the RDF data model here because it's generic enough to allow describing basically anything in the form of a graph. sharing some metadata is just a graph merge. sharing the content should be as simple as just copying one file (which can be any format, text or binary)

=> More informations about this toot | More toots from trwnh@mastodon.social

Written by infinite love ⴳ on 2025-01-19 at 17:30

@tech_himbo the proposal is basically "store content separately from metadata" and "have metadata link to content instead of inlining it in your canonical data format"

=> More informations about this toot | More toots from trwnh@mastodon.social

Written by tech himbo on 2025-01-19 at 17:30

@trwnh so, they need to agree on a serialization format for articles, and we need a mapping from A’s metadata format to B’a metadata format to enable import/export. is that right?

=> More informations about this toot | More toots from tech_himbo@mastodon.social

Written by infinite love ⴳ on 2025-01-19 at 17:37

@tech_himbo no, they need to agree on the semantics of the "content". serializations and formats can be anything, and can be negotiated between peers ("i understand a b c", "i understand c d e", "okay let's agree to use c for this session")

this is about the semantic content model basically

in practical terms say instead of sending you an entire HTML document i just sent you a single paragraph element or perhaps only its inner text

=> More informations about this toot | More toots from trwnh@mastodon.social

Written by tech himbo on 2025-01-19 at 18:52

@trwnh ok, so the requirement is that both services must support a common format, even if that format isn’t a broad standard. is that right?

=> More informations about this toot | More toots from tech_himbo@mastodon.social

Written by infinite love ⴳ on 2025-01-19 at 20:30

@tech_himbo yes, but also the most straightforward solution to content is to literally just pass it along 1:1 without any containers or metadata

the "problem" is essentially that, for something like an HTML document saved to disk as .html, we pre-bundle the content in the middle of a bunch of presentational stuff that is not content. or for a JSON document, we put an escaped string as the value of some key. i'm saying we don't need to always do that

=> More informations about this toot | More toots from trwnh@mastodon.social

Proxy Information

Original URL: gemini://mastogem.picasoft.net/thread/113855996642179692
Status Code: Success (20)
Meta: text/gemini
Capsule Response Time: 463.673498 milliseconds
Gemini-to-HTML Time: 9.611109 milliseconds

This content has been proxied by September (3851b).