-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: sstables virtual SQL table #102604
Comments
Hi @jbowens, please add a C-ategory label to your issue. Check out the label system docs. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
@jbowens @nicktrav @RahulAggarwal1016 I figured we should brainstorm in more detail. I am pasting a more detailed description from @nicktrav and there are some questions below to think about.
Should the API use raw keys or should it use SQL-level values (e.g. table + PK)? In the latter case, it might make more sense to extend I assume we want to return info about relevant SSTables from all replicas? Would it be important to make it easily visible in the result whether an SSTable is from a replica or from the leaseholder? |
I vote raw keys; there have been instances where we've wanted better observability into the liveness range's on-disk data, which wouldn't be expressible with the high-level SQL values. I think this is also what some other storage internal funcs use (
If a query filters on store ID, is it possible to apply that filter before the fan out, or would that be impossible or too tricky through
I don't think it's too important, as long as we can look up whether the relevant store is leaseholder for a particular range somewhere (I think we have this, but IDK where). We have an existing DB.SSTables function that we can use and build off. We want to expose sstable properties, but it's a bit unfortunate that this would require loading every table in the database even if we might only want properties for some tables. |
Thanks! The problem with the virtual table proposal is that we don't really support doing anything smart with range filters (for the key). The only thing we can optimize is strict equality on a column. I believe it would be more appropriate to have a couple of built-in functions along the lines of:
Two proposals for what these functions output:
|
Those two built-in functions make sense. I don't have much of an opinion on the output format. If we output rows, would it make sense to put some mandatory fields as top-level columns (eg, level, file number, size, etc)? The |
I'll need to investigate if it's possible to have a set-generating-function that can be unnested to multiple columns.
I think we can start by just pulling all metadata into memory and reassess if performance is an issue. Isn't this metadata already in memory inside Pebble? |
Sounds good.
All the |
And |
I am writing down here some pointers around existing functionality that has some overlap with what we're trying to implement. Set-generating functionsThe proposed
demo@127.0.0.1:26257/demoapp/defaultdb> SELECT unnest(ARRAY[1,2,3]);
unnest
----------
1
2
3
(3 rows)
# Can also be used as if it's a table:
demo@127.0.0.1:26257/demoapp/defaultdb> SELECT * FROM unnest(ARRAY[1,2,3]);
unnest
----------
1
2
3
(3 rows)
Issuing RPCs out to other nodesOur proposed function will need to send an RPC to each node to retrieve the relevant SSTables. An existing function that issues an RPC is crdb_internal.compact_engine_span. The implementation calls out to the StorageEngineClient which is handled on the other side in the stores server. There will be some additional work to plumb through the list of all nodes. But as a start, we can first implement a version of Pebble sideThe stores server code above will call out to Pebble, specifically DB.SSTables. It would be good to add an |
Created draft RFC for querying SSTable metrics. Related issue: cockroachdb#102604 Release note: None
Created draft RFC for querying SSTable metrics. Related issue: cockroachdb#102604 Release note: None
Created draft RFC for querying SSTable metrics. Related issue: cockroachdb#102604 Release note: None
Created draft RFC for querying SSTable metrics. Related issue: cockroachdb#102604 Release note: None
Created draft RFC for querying SSTable metrics. Informs: cockroachdb#102604 Release note: None
Created draft RFC for querying SSTable metrics. Informs: cockroachdb#102604 Release note: None
104222: rfc: query SSTable metrics r=RahulAggarwal1016 a=RahulAggarwal1016 Draft RFC for querying SSTable metrics. Related issue: #102604 Release note: None Co-authored-by: Rahul Aggarwal <[email protected]>
Should we go for creating a new variant? Also where in code is the |
This change allows an option `WithApproximateSpanBytes` to be included to a `db.SSTables()` call. This will add a metric `approximateSpanBytes` which will be the number of bytes that overlap the given key span. More detail in cockroachdb/cockroach#102604 (comment) Informs: cockroachdb/cockroach#102604
This change allows an option `WithApproximateSpanBytes` to be included to a `db.SSTables()` call. This will add a metric `approximateSpanBytes` which will be the number of bytes that overlap the given key span. More detail in cockroachdb/cockroach#102604 (comment) Informs: cockroachdb/cockroach#102604
This change allows an option `WithApproximateSpanBytes` to be included to a `db.SSTables()` call. This will add a metric `approximateSpanBytes` which will be the number of bytes that overlap the given key span. More detail in cockroachdb/cockroach#102604 (comment) Informs: cockroachdb/cockroach#102604
This change allows an option `WithApproximateSpanBytes` to be included to a `db.SSTables()` call. This will add a metric `approximateSpanBytes` which will be the number of bytes that overlap the given key span. More detail in cockroachdb/cockroach#102604 (comment) Informs: cockroachdb/cockroach#102604
This change allows an option `WithApproximateSpanBytes` to be included to a `db.SSTables()` call. This will add a metric `approximateSpanBytes` which will be the number of bytes that overlap the given key span. More detail in cockroachdb/cockroach#102604 (comment) Informs: cockroachdb/cockroach#102604
This change allows an option `WithApproximateSpanBytes` to be included to a `db.SSTables()` call. This will add a metric `approximateSpanBytes` which will be the number of bytes that overlap the given key span. More detail in cockroachdb/cockroach#102604 (comment) Informs: cockroachdb/cockroach#102604
This change allows an option `WithApproximateSpanBytes` to be included to a `db.SSTables()` call. This will add a metric `approximateSpanBytes` which will be the number of bytes that overlap the given key span. More detail in cockroachdb/cockroach#102604 (comment) Informs: cockroachdb/cockroach#102604
This change allows an option `WithApproximateSpanBytes` to be included to a `db.SSTables()` call. This will add a metric `approximateSpanBytes` which will be the number of bytes that overlap the given key span. More detail in cockroachdb/cockroach#102604 (comment) Informs: cockroachdb/cockroach#102604
Currently, SSTable-level statistics are difficult to obtain and require work from the customer and support teams to find appropriate files to pull from the filesystem to send to us. As a result, the new SRF added in this pull-request returns SSTable-level statistics that overlap a provided key span for a certain node and store id. RFC: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/query_sst_metrics.md Informs: cockroachdb#102604 Release note: None
Currently, SSTable-level statistics are difficult to obtain and require work from the customer and support teams to find appropriate files to pull from the filesystem to send to us. As a result, the new SRF added in this pull-request returns SSTable-level statistics that overlap a provided key span for a certain node and store id. RFC: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/query_sst_metrics.md Informs: cockroachdb#102604 Release note: None
Currently, SSTable-level statistics are difficult to obtain and require work from the customer and support teams to find appropriate files to pull from the filesystem to send to us. As a result, the new SRF added in this pull-request returns SSTable-level statistics that overlap a provided key span for a certain node and store id. RFC: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/query_sst_metrics.md Informs: cockroachdb#102604 Release note: None
104739: Added new generator function for querying sstables. r=RahulAggarwal1016 a=RahulAggarwal1016 Currently, SSTable-level statistics are difficult to obtain and require work from the customer and support teams to find appropriate files to pull from the filesystem to send to us. As a result, the new SRF added in this pull-request returns SSTable-level statistics that overlap a provided key span for a certain node and store id. <img width="1796" alt="image" src="https://github.com/cockroachdb/cockroach/assets/35639417/c88a6e36-e76f-48da-b74a-2220f1e15327"> RFC: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/query_sst_metrics.md Informs: #102604 Release note: None Co-authored-by: Rahul Aggarwal <[email protected]>
This pr has the following fixes for the builtin `crdb_internal.sstable_metrics` 1. Remove the ',' from `node_id` 2. Change `approximate_span_bytes` to be a `uint64` instead of `[]byte` 3. Convert the `MVCCTimeInterval` user property to be human readable. Informs: cockroachdb#102604 Release-note: None
This pr has the following fixes for the builtin `crdb_internal.sstable_metrics` 1. Remove the ',' from `node_id` 2. Change `approximate_span_bytes` to be a `uint64` instead of `[]byte` 3. Convert the `MVCCTimeInterval` user property to be human readable. Informs: cockroachdb#102604 Release-note: None
105813: concurrency: re-introduce exclusive lock strength into the lock table r=nvanbenschoten a=arulajmani First 6 commits from #105474 This patch re-introduces exclusive lock strength into the lock table. Now, unreplicated locks are considered to have exclusive lock strength whereas replicated locks have intent lock strength. The testing diff follows from this mapping. This distinction between exclusive lock strength and intent lock strength is not meaningful for serializable transactions by default. As such, this patch doesn't change anything functionally. However, once we plumb in the isolation level of a request in all the right places, this change will be useful. In particular, it'll allow non-locking reads from read committed transactions to not block on exclusive locks. Informs #94729 Release note: None 106943: server: move procCh init into Server.serveImpl r=rafiss a=ecwall Informs #105448 This changes `procCh` to a `sync.WaitGroup` because the channel is never read from and moves initialization into `Server.serveImpl`. Also `processCommandsAsync` is changed to `processCommands` and the goroutine is created inside `Server.serverImpl` to avoid needing a `procCh` parameter. Release note: None 107303: builtins: `crdb_internal.sstable_metrics` fixes r=RahulAggarwal1016 a=RahulAggarwal1016 This pr has the following fixes for the builtin `crdb_internal.sstable_metrics` 1. Remove the extra `,` from `node_id` 2. Change `approximate_span_bytes` to be a `uint64` instead of `[]byte` Next Steps: - Fix `MVCCTimeInterval` display format Informs: #102604 Release-note: None Co-authored-by: Arul Ajmani <[email protected]> Co-authored-by: Evan Wall <[email protected]> Co-authored-by: Rahul Aggarwal <[email protected]>
This is a bit of a wishlist item, but it would be nice if we could query the sstables as a virtual SQL table. The various sstable properties could be exposed through this same interface.
Current Progress So Far:
approximateSpanBytes
row on cockroach sideWithApproximateSpanBytes
filter option on pebble sideJira issue: CRDB-27555
The text was updated successfully, but these errors were encountered: