-
Notifications
You must be signed in to change notification settings - Fork 44
disk: replaces lmdb with book #936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Overall looks good, couple of comments: |
|
On my laptop it took 0.320835 seconds to iterate over 119054 events. On ~dozreg-toplud (far from the busiest ship on the network) there are epochs with ~20M events. This means that it would take around a minute to just read the event log in order to boot. Surely the last offset should just be stored in the header, and |
|
_book_scan_end will also attempt to truncate all events after a corrupted event was encountered. Is this desirable? |
dec2562 to
6e95311
Compare
6e95311 to
6f621f7
Compare
3a3610d to
3d994bd
Compare
This PR replaces LMDB with book, a custom append-only file-based event log persistence layer tailored to Urbit's sequential access patterns.
Motivation
Unlimited event size
LMDB's general-purpose key-value store features (random access, transactions) are unnecessary overhead for Urbit's strictly append-only event log. With LMDB, reducing log size on disk is impossible (due to B+tree) and maximum value size (event size, in our case) is limited to 4GB or less. This new API provides a simpler, more focused solution.
Faster writes
Additionally, write speeds with
bookwill exceed LMDB's, thus removing a potential bottleneck (should we approach it after integrating SKA with the core operating function).Implementation
Events are stored in
book.log, with a preceding immutable header:Events on-disk are written as
deeds, which are jam buffers sandwiched by heads and tails:reeds are used to representdeeds temporarily:Finally, a
u3_bookstructure is used for operations like reading, writing, etc., internally and in thedisk.cAPI:preadandpwritesyscalls are used for thread-safety and stateless (no cursor position tracking) operation.Features:
u3_book_walk_*)libuv(maintains existing async patterns)play -f) replay supportpreadandpwriteu3_lmdb_*functions)Testing
Tests focus on failure mode, edge case, and recovery scenarios.
Run:
zig build book-testCompatibility
This PR changes how events are stored in future epochs, but it continues to use LMDB to store global pier metadata in the top-leve log directory (
$pier/.urb/log/data.mdb). This ensures that helpful error messages can be printed even when users attempt to boot their book-style piers with old binaries. It should be noted that the top-level metadata should be considered canonical. Metadata stored within epochs (meta.bin, as of this PR) maintains consistency with the top-level as far as I can tell, however.To-do