forked from cockroachdb/cockroach
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
25789: ts: Implement new on-disk format for time series r=mrtracy a=mrtracy The first commit is cockroachdb#25587 and can be ignored for this PR. Implement a new *columnar* on-disk format for time series samples, replacing the previous row-like format. Previously, each slab of time series data contained a collection of "samples", where each sample was a message containing a number of fields; the timestamp (offset), and a number of value fields intended to contain "rolled-up" data for long-term, low resolution storage. The new format is columnar; the top-level slab contains multiple parallel arrays, with each array containing the ordered values for the individual samples. This gives us the following advantages: This has a columnar layout, made up of parallel arrays. This gives us all of the advantages we were missing in the previous layout: + High-resolution data can leave the aggregate fields completely empty; only the "last" aggregate fields. This will reduce the in-memory size of each sample from the full Sample structure down to a int32 and a float64. This also means we can add several more aggregates without inflating the size of each sample. + The columnar format takes advantage of protobuffer repeated field packing, which should save considerable space for the encoded on-disk format. + When querying, we can iterate directly over the data fields we need, which may improve data locality (and thus cache-miss performance) or allow us to more aggressively release memory for data that is not needed. This commit does the following: + Define new columnar fields in roachpb/internal.proto. + Write new C++ merge logic for the columnar fields. This does not replace the existing row logic; it lives alongside it. + Create an upgrade path in the merge logic for row-formatted slabs; this occurs whenever columnar data is merged into an existing key with row-formatted data. This ensures than any individual slab of data will contain only row-formatted samples or column-formatted samples, but never both. Release note: none. Co-authored-by: Matt Tracy <[email protected]>
- Loading branch information
Showing
12 changed files
with
2,161 additions
and
151 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.