WAL/WBL: Make iterating on format schema easier; consider versioning & forward compatibility #15200

bwplotka · 2024-10-23T08:34:43Z

Context

Recently we experiment or discuss on various improvements & optimizations to WAL/WBL. The NHCB work, CT work, some planned fixes 1 and many optimizations we wanted to try (not writing unchanged samples, store histogram bucketing separately, store multiple similar samples in more efficient way, add details to segments allowing faster replays/sharding etc).

This is all beautiful and amazing, but during the NHCB work we noticed that:

WAL is not versioned in general. There are some "extensibility" mechanisms record.Type like 247 options left. But we have ideas for changes beyond a single record. Plus it's not really effective to have 10 Histogram records, because we iterated 10 times. Versioning also does not immediately help with rollout/migration of data.
Even if WAL would be versioned, the migration options, risks and patterns are not well documented. How contributor could optimize/improve WAL record and understand that it might require 2-fold migration rollout? How we can automate this migration or allow users to explicitly ignore migration, because they are using agent or are happy with non-revertability? What if we double write instead of 2-step migration?
WAL does not have "unknown fields" or schema mechanisms like other protocols have e.g. protobuf capnproto. One exception is Metadata record, which on decoding supports unknown labels. Unknown fields makes no-migration scenarios possible if you only add things to schema. Those also increase overhead of encoding/decoding/storing, but wouldn't that overhead be a good trade-off for the amount of optimizations and saved SWE/Ops time it unlocks?

The main motivation here is the development velocity. We need to be able to experiment with different optimizations and features to effectively maintain Prometheus across old and new use cases.

Proposal

Add better schema / unknown fields support, perhaps consider https://capnproto.org/ or https://flatbuffers.dev/. We can start slow by experimenting with capnproto on specific records to see the efficiency impact.
- The alternative is some basic size based logic (e.g. for every record), to skip certain stuff at the end of the record, but it limits some options e.g. ability to deprecate certain fields in future (maybe fine since we have record type). This requires reinventing the wheel a bit a bit, but maybe is easier to change now and cheaper (although still it will likely require to buffer a lot more when encoding, to know the size).
Document WAL migration strategies contributor has to think through when proposing schema changes.

WDYT? Thanks @krajorama @bboreham for the initial discussions around this already!

The text was updated successfully, but these errors were encountered:

bwplotka · 2024-11-25T11:39:03Z

Good idea for promtool wal-migrate command or something came from @bboreham https://cloud-native.slack.com/archives/C082ALTBY4S/p1732532437908059

bwplotka · 2024-11-26T21:38:44Z

Proposal for this prometheus/proposals#40

bwplotka added the kind/feature label Oct 23, 2024

bwplotka mentioned this issue Oct 23, 2024

nhcb: store custom buckets in WAL, WBL #14730

Open

bwplotka added the component/wal label Nov 25, 2024

bwplotka linked a pull request Nov 26, 2024 that will close this issue

Propose WAL format versioning and change strategy. prometheus/proposals#40

Open

bwplotka assigned bboreham, codesome and carrieedwards and unassigned bboreham, codesome and carrieedwards Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WAL/WBL: Make iterating on format schema easier; consider versioning & forward compatibility #15200

WAL/WBL: Make iterating on format schema easier; consider versioning & forward compatibility #15200

bwplotka commented Oct 23, 2024 •

edited

Loading

bwplotka commented Nov 25, 2024

bwplotka commented Nov 26, 2024

WAL/WBL: Make iterating on format schema easier; consider versioning & forward compatibility #15200

WAL/WBL: Make iterating on format schema easier; consider versioning & forward compatibility #15200

Comments

bwplotka commented Oct 23, 2024 • edited Loading

Context

Proposal

bwplotka commented Nov 25, 2024

bwplotka commented Nov 26, 2024

bwplotka commented Oct 23, 2024 •

edited

Loading