- Feature Name: sstable_metrics
- Status: draft
- Start Date: 2023-05-31
- Authors: Rahul Aggarwal
- Cockroach Issue: #102604
Storage Team engineers are often involved in support escalations from customers that require inspection of SSTable-level statistics. Currently, these statistics are difficult to obtain and require work from the customer and support teams to find appropriate files to pull from the filesystem to send to us. As a result, this RFC outlines how we will add the ability for operators to query sstable metrics which is useful for debugging storage issues pertaining to a specific key range. This will be implemented using a set-generating function (SRF) and used as follows.
SELECT * FROM crdb_internal.engine_stats('start-key', 'end-key')
or
SELECT * FROM crdb_internal.engine_stats(node-id, store-id, 'start-key', 'end-key')
Audience: CockroachDB team members
The proposed solution is creating a new SRF which will be added to the existing built-in generators. This SRF will have two overloads (for the two variants above). The latter one only talks to one node, while the other will need to send an RPC to each node in order to retrieve all the relevant SSTables. This can be achieved by calling Dial for each separate node inside of a function that is part of evalCtx
. This function will call out to the StorageEngineClient and be handled in the stores server (see Pebble side).
The SRF will be structured similar to json_populate_record i.e. using a generator to return each output row.
Inside the store's server code is where Pebble will be used, specifically DB.SSTables. When calling DB.SSTables
we will need to specify a SSTableOption
which will be a function allowing us to filter SSTables for the key range specified by the user. Note that filtering can be performed based on the FileMetadata
alone, which allows us to skip unnecessary getTableProperties
calls (which can read metadata from storage and affect caches).
- Define the required functions with dummy implementations
- Create the new SSTableOption that filters by key ranges
- Implement the generator function for single node queries
- Add logic to support multiple node queries
Audience: all participants to the RFC review.
-
How will we get a list of all the nodes we will need to send an RPC request to (in the case the user does not specify the node-id).
-
What columns do we want to display?
- node_id
- store_id
- file_num
- json_props