Ancestors

Toot

Written by Tobin Baker on 2024-12-13 at 20:38

There seems to be convergence among MVCC DB systems (except those requiring address stability) to the "update-in-place with undo log" approach to version storage. Why not use this approach to make any standalone data structure multiversioned? Just take any update-in-place structure and add an append-only undo log. The primary structure stores current versions and the undo log stores previous versions. Each entry in either the primary structure or the undo log has a pointer (or log offset) to the previous version in the undo log, so the primary structure effectively holds a linked list of versions for each entry. Because the undo log is totally ordered by update recency, it can easily be truncated at a particular LSN or timestamp (e.g., the timestamp of the oldest active transaction in a database). For a concurrent primary structure, the undo log itself can provide a total order on operations (e.g., via fetch-and-add on the next log offset), which may have additional applications besides version management (e.g., in replication).

=> More informations about this toot | More toots from tobinbaker@discuss.systems

Descendants

Written by Tobin Baker on 2024-12-13 at 23:09

(Of course, you'd need to maintain tombstone entries in the primary structure for recently deleted keys; to avoid space blowup you could augment log truncation with an async "vacuum" pass over the primary structure to remove all tombstones pointing to a log entry older than the truncation point.)

=> More informations about this toot | More toots from tobinbaker@discuss.systems

Written by Tobin Baker on 2024-12-13 at 23:11

(Also, this approach is efficient enough for MVCC scenarios but probably not for true point-in-time queries; for that you'd want something more specialized like a time-split B-tree.)

=> More informations about this toot | More toots from tobinbaker@discuss.systems

Written by Tobin Baker on 2024-12-14 at 00:32

(Maybe you could optimize this a bit for queries over current versions, at the expense of queries in the past, by maintaining a "tombstone" instance of the data structure that only holds entries for deleted keys, pointing into the undo log as before. That would require merging queries over both structures for queries in the past.)

=> More informations about this toot | More toots from tobinbaker@discuss.systems

Written by Per Vognsen on 2024-12-14 at 00:36

@tobinbaker There's also the "fat node" approach where each node can contain multiple versions (at least 2) so you can amortize the O(log n) path copy to O(1) for isolated updates. You don't need to overwrite the existing version in place for that.

=> More informations about this toot | More toots from pervognsen@mastodon.social

Written by Tobin Baker on 2024-12-14 at 00:39

@pervognsen Right, I haven't really worked out the pros/cons of this approach WRT traditional persistent data structures (either imperative or purely functional). It's clearly heavily optimized (just like MVCC databases) for queries over the latest versions.

=> More informations about this toot | More toots from tobinbaker@discuss.systems

Written by Tony Finch on 2024-12-14 at 01:31

@tobinbaker for in-memory data structures any mutate-in-place requires a lot of locking; if it’s a read-heavy workload it’s probably better to copy on write with lock-free safe memory reclamation, like RCU

=> More informations about this toot | More toots from fanf@mendeddrum.org

Written by Tobin Baker on 2024-12-14 at 02:06

@fanf concurrent (lock-based) btrees with lock-free reads have been known since the 80s, eg https://www.csd.uoc.gr/~hy460/pdf/p650-lehman.pdf

=> More informations about this toot | More toots from tobinbaker@discuss.systems

Proxy Information

Original URL: gemini://mastogem.picasoft.net/thread/113647439282206899
Status Code: Success (20)
Meta: text/gemini
Capsule Response Time: 274.089824 milliseconds
Gemini-to-HTML Time: 1.456785 milliseconds

This content has been proxied by September (ba2dc).