Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: sstables virtual SQL table #102604

Closed
6 tasks done
jbowens opened this issue Apr 28, 2023 · 18 comments
Closed
6 tasks done

storage: sstables virtual SQL table #102604

jbowens opened this issue Apr 28, 2023 · 18 comments
Assignees
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team

Comments

@jbowens
Copy link
Collaborator

jbowens commented Apr 28, 2023

This is a bit of a wishlist item, but it would be nice if we could query the sstables as a virtual SQL table. The various sstable properties could be exposed through this same interface.

Current Progress So Far:

  • add approximateSpanBytes row on cockroach side
  • add WithApproximateSpanBytes filter option on pebble side
  • create set generating function (cockroach)
  • figure out iterator structure to display rows (cockroach)
  • make RPC request to retrieve information from Pebble (cockroach)
  • add sstable filter option (pebble)

Jira issue: CRDB-27555

@jbowens jbowens added A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team labels Apr 28, 2023
@blathers-crl
Copy link

blathers-crl bot commented Apr 28, 2023

Hi @jbowens, please add a C-ategory label to your issue. Check out the label system docs.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@RaduBerinde
Copy link
Member

@jbowens @nicktrav @RahulAggarwal1016 I figured we should brainstorm in more detail. I am pasting a more detailed description from @nicktrav and there are some questions below to think about.

One way of making these summary statistics more readily available is to allow a user to run a builtin SQL command from a SQL shell - e.g. SELECT * FROM crdb_internal.engine_stats WHERE start_key > ‘\x00’ AND end_key < ‘\xff’. The builtin would return SSTable level properties from the tables that match the predicates, presented in a row-oriented format (e.g. one row per node, store, SSTable tuple with a JSON blob of the table stats and properties).

This command would ideally avoid needing to scan through actual data. Instead, the command could leverage the in-memory table stats maintained in Pebble. This would allow for a relatively fast and inexpensive means of taking our a read state on the DB, and iterating over tables that match the various predicates.

Should the API use raw keys or should it use SQL-level values (e.g. table + PK)? In the latter case, it might make more sense to extend SHOW RANGES instead of a new construct. That said, we can use SHOW RANGES WITH KEYS to get the raw keys so a raw key API would be fine too.

I assume we want to return info about relevant SSTables from all replicas? Would it be important to make it easily visible in the result whether an SSTable is from a replica or from the leaseholder?

@jbowens
Copy link
Collaborator Author

jbowens commented May 23, 2023

Should the API use raw keys or should it use SQL-level values (e.g. table + PK)? In the latter case, it might make more sense to extend SHOW RANGES instead of a new construct. That said, we can use SHOW RANGES WITH KEYS to get the raw keys so a raw key API would be fine too.

I vote raw keys; there have been instances where we've wanted better observability into the liveness range's on-disk data, which wouldn't be expressible with the high-level SQL values. I think this is also what some other storage internal funcs use (compact_engine_span).

I assume we want to return info about relevant SSTables from all replicas?

If a query filters on store ID, is it possible to apply that filter before the fan out, or would that be impossible or too tricky through crdb_internal? If it's too tricky, then I think the caller should have to pass a single store ID and operators can always construct queries to query every store.

Would it be important to make it easily visible in the result whether an SSTable is from a replica or from the leaseholder?

I don't think it's too important, as long as we can look up whether the relevant store is leaseholder for a particular range somewhere (I think we have this, but IDK where).


We have an existing DB.SSTables function that we can use and build off. We want to expose sstable properties, but it's a bit unfortunate that this would require loading every table in the database even if we might only want properties for some tables.

@RaduBerinde
Copy link
Member

Thanks! The problem with the virtual table proposal is that we don't really support doing anything smart with range filters (for the key). The only thing we can optimize is strict equality on a column.

I believe it would be more appropriate to have a couple of built-in functions along the lines of:

# Returns stats for all sstables (on all nodes and stores) that intersect this key range.
SELECT crdb_internal.engine_stats('startkey', 'endkey')

# Returns stats for all sstables on the given store that intersect this key range.
SELECT crdb_internal.engine_stats('startkey', 'endkey', store-id)

Two proposals for what these functions output:

  1. multiple rows with one JSON object per row (one for each SSTable).
  2. an ARRAY of JSON objects, one JSON object for each SStable. Note that an array can easily be converted in multiple rows by wrapping the function call in unnest().

I don't think it's too important, as long as we can look up whether the relevant store is leaseholder for a particular range somewhere (I think we have this, but IDK where).

SHOW RANGES presents all this information.

@jbowens
Copy link
Collaborator Author

jbowens commented May 25, 2023

Those two built-in functions make sense.

I don't have much of an opinion on the output format. If we output rows, would it make sense to put some mandatory fields as top-level columns (eg, level, file number, size, etc)?

The DB.SSTables function currently returns the sstables as an array of slices. Do we need an iterator-style interface for SQL to consume, or would SQL end up pulling all the returned sstable metadata into memory at once regardless?

@RaduBerinde
Copy link
Member

If we output rows, would it make sense to put some mandatory fields as top-level columns (eg, level, file number, size, etc)?

I'll need to investigate if it's possible to have a set-generating-function that can be unnested to multiple columns.

The DB.SSTables function currently returns the sstables as an array of slices. Do we need an iterator-style interface for SQL to consume, or would SQL end up pulling all the returned sstable metadata into memory at once regardless?

I think we can start by just pulling all metadata into memory and reassess if performance is an issue. Isn't this metadata already in memory inside Pebble?

@jbowens
Copy link
Collaborator Author

jbowens commented May 25, 2023

I think we can start by just pulling all metadata into memory and reassess if performance is an issue.

Sounds good.

Isn't this metadata already in memory inside Pebble?

All the FileMetadata is, but the 'properties' from the properties block are only in-memory for tables that are currently in the table cache.

@RaduBerinde
Copy link
Member

RaduBerinde commented May 25, 2023

All the FileMetadata is, but the 'properties' from the properties block are only in-memory for tables that are currently in the table cache.

And FileMetadata would be sufficient to filter out based on what intersects a key range, right? It sounds like the best API would be one that allows you to specify a key range of interest. We could extend SSTables like that.

@RaduBerinde
Copy link
Member

I am writing down here some pointers around existing functionality that has some overlap with what we're trying to implement.

Set-generating functions

The proposed crdb_internal.engine_stats is a set-generating function (SRF). It will be useful to look at how other set-generating functions are defined and maybe play around with how they work. Some examples:

  • unnest is a function that takes an array and returns each element as a different row. Example:
demo@127.0.0.1:26257/demoapp/defaultdb> SELECT unnest(ARRAY[1,2,3]);                                                                                                                                                                                                                                                                                    
  unnest
----------
       1
       2
       3
(3 rows)

# Can also be used as if it's a table:
demo@127.0.0.1:26257/demoapp/defaultdb> SELECT * FROM unnest(ARRAY[1,2,3]);                                                                                                                                                                                                                                                                             
  unnest
----------
       1
       2
       3
(3 rows)
  • json_populate_recordset is an example of an SRF that returns "labeled tuples" which can be treated like multiple columns; some examples in the tests
  • pg_get_keywords is another example of a multi-column SRF. Try to run SELECT pg_get_keywords() and SELECT * FROM pg_get_keywords() in cockroach demo.

Issuing RPCs out to other nodes

Our proposed function will need to send an RPC to each node to retrieve the relevant SSTables. An existing function that issues an RPC is crdb_internal.compact_engine_span. The implementation calls out to the StorageEngineClient which is handled on the other side in the stores server.

There will be some additional work to plumb through the list of all nodes. But as a start, we can first implement a version of crdb_internal.engine_stats that takes the node ID and the store ID (just like compact_engine_span does), e.g. SELECT * FROM crdb_internal.engine_stats(node-id, store-id, start-key, end-key).

Pebble side

The stores server code above will call out to Pebble, specifically DB.SSTables. It would be good to add an SSTablesOption that filters the tables on a key span; we can perform the filtering using only the FileMetadata and avoid unnecessary getTableProperties calls.

raggar added a commit to raggar/cockroach that referenced this issue Jun 1, 2023
Created draft RFC for querying SSTable metrics.

Related issue: cockroachdb#102604
Release note: None
raggar added a commit to raggar/cockroach that referenced this issue Jun 7, 2023
Created draft RFC for querying SSTable metrics.

Related issue: cockroachdb#102604
Release note: None
raggar added a commit to raggar/cockroach that referenced this issue Jun 7, 2023
Created draft RFC for querying SSTable metrics.

Related issue: cockroachdb#102604
Release note: None
raggar added a commit to raggar/cockroach that referenced this issue Jun 7, 2023
Created draft RFC for querying SSTable metrics.

Related issue: cockroachdb#102604
Release note: None
@raggar raggar self-assigned this Jun 8, 2023
raggar added a commit to raggar/cockroach that referenced this issue Jun 8, 2023
Created draft RFC for querying SSTable metrics.

Informs: cockroachdb#102604
Release note: None
raggar added a commit to raggar/cockroach that referenced this issue Jun 8, 2023
Created draft RFC for querying SSTable metrics.

Informs: cockroachdb#102604
Release note: None
craig bot pushed a commit that referenced this issue Jun 11, 2023
104222: rfc: query SSTable metrics r=RahulAggarwal1016 a=RahulAggarwal1016

Draft RFC for querying SSTable metrics.

Related issue: #102604
Release note: None

Co-authored-by: Rahul Aggarwal <[email protected]>
@raggar
Copy link
Contributor

raggar commented Jun 28, 2023

Another option is to have a SHOW RANGES variant that automatically obtains all the relevant sst metrics.

Should we go for creating a new variant? Also where in code is the SHOW RANGES command?

raggar added a commit to raggar/pebble that referenced this issue Jun 28, 2023
This change allows an option `WithApproximateSpanBytes` to be included
to a `db.SSTables()` call. This will add a metric `approximateSpanBytes`
which will be the number of bytes that overlap the given key span.

More detail in cockroachdb/cockroach#102604 (comment)

Informs: cockroachdb/cockroach#102604
raggar added a commit to raggar/pebble that referenced this issue Jun 28, 2023
This change allows an option `WithApproximateSpanBytes` to be included
to a `db.SSTables()` call. This will add a metric `approximateSpanBytes`
which will be the number of bytes that overlap the given key span.

More detail in cockroachdb/cockroach#102604 (comment)

Informs: cockroachdb/cockroach#102604
raggar added a commit to raggar/pebble that referenced this issue Jun 28, 2023
This change allows an option `WithApproximateSpanBytes` to be included
to a `db.SSTables()` call. This will add a metric `approximateSpanBytes`
which will be the number of bytes that overlap the given key span.

More detail in cockroachdb/cockroach#102604 (comment)

Informs: cockroachdb/cockroach#102604
raggar added a commit to raggar/pebble that referenced this issue Jun 28, 2023
This change allows an option `WithApproximateSpanBytes` to be included
to a `db.SSTables()` call. This will add a metric `approximateSpanBytes`
which will be the number of bytes that overlap the given key span.

More detail in cockroachdb/cockroach#102604 (comment)

Informs: cockroachdb/cockroach#102604
raggar added a commit to raggar/pebble that referenced this issue Jun 28, 2023
This change allows an option `WithApproximateSpanBytes` to be included
to a `db.SSTables()` call. This will add a metric `approximateSpanBytes`
which will be the number of bytes that overlap the given key span.

More detail in cockroachdb/cockroach#102604 (comment)

Informs: cockroachdb/cockroach#102604
raggar added a commit to raggar/pebble that referenced this issue Jun 29, 2023
This change allows an option `WithApproximateSpanBytes` to be included
to a `db.SSTables()` call. This will add a metric `approximateSpanBytes`
which will be the number of bytes that overlap the given key span.

More detail in cockroachdb/cockroach#102604 (comment)

Informs: cockroachdb/cockroach#102604
raggar added a commit to raggar/pebble that referenced this issue Jun 29, 2023
This change allows an option `WithApproximateSpanBytes` to be included
to a `db.SSTables()` call. This will add a metric `approximateSpanBytes`
which will be the number of bytes that overlap the given key span.

More detail in cockroachdb/cockroach#102604 (comment)

Informs: cockroachdb/cockroach#102604
raggar added a commit to cockroachdb/pebble that referenced this issue Jun 29, 2023
This change allows an option `WithApproximateSpanBytes` to be included
to a `db.SSTables()` call. This will add a metric `approximateSpanBytes`
which will be the number of bytes that overlap the given key span.

More detail in cockroachdb/cockroach#102604 (comment)

Informs: cockroachdb/cockroach#102604
raggar added a commit to raggar/cockroach that referenced this issue Jun 30, 2023
Currently, SSTable-level statistics are difficult to obtain and require
work from the customer and support teams to find appropriate files to
pull from the filesystem to send to us. As a result, the new SRF added
in this pull-request returns SSTable-level statistics that overlap a
provided key span for a certain node and store id.

RFC: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/query_sst_metrics.md

Informs: cockroachdb#102604
Release note: None
raggar added a commit to raggar/cockroach that referenced this issue Jun 30, 2023
Currently, SSTable-level statistics are difficult to obtain and require
work from the customer and support teams to find appropriate files to
pull from the filesystem to send to us. As a result, the new SRF added
in this pull-request returns SSTable-level statistics that overlap a
provided key span for a certain node and store id.

RFC: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/query_sst_metrics.md

Informs: cockroachdb#102604
Release note: None
raggar added a commit to raggar/cockroach that referenced this issue Jul 5, 2023
Currently, SSTable-level statistics are difficult to obtain and require
work from the customer and support teams to find appropriate files to
pull from the filesystem to send to us. As a result, the new SRF added
in this pull-request returns SSTable-level statistics that overlap a
provided key span for a certain node and store id.

RFC: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/query_sst_metrics.md

Informs: cockroachdb#102604
Release note: None
craig bot pushed a commit that referenced this issue Jul 5, 2023
104739: Added new generator function for querying sstables. r=RahulAggarwal1016 a=RahulAggarwal1016

Currently, SSTable-level statistics are difficult to obtain and require
work from the customer and support teams to find appropriate files to
pull from the filesystem to send to us. As a result, the new SRF added
in this pull-request returns SSTable-level statistics that overlap a
provided key span for a certain node and store id.

<img width="1796" alt="image" src="https://github.com/cockroachdb/cockroach/assets/35639417/c88a6e36-e76f-48da-b74a-2220f1e15327">

RFC: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/query_sst_metrics.md

Informs: #102604
Release note: None

Co-authored-by: Rahul Aggarwal <[email protected]>
@jbowens jbowens closed this as completed Jul 19, 2023
raggar added a commit to raggar/cockroach that referenced this issue Jul 20, 2023
This pr has the following fixes for the builtin `crdb_internal.sstable_metrics`

1. Remove the ',' from `node_id`
2. Change `approximate_span_bytes` to be a `uint64` instead of `[]byte`
3. Convert the `MVCCTimeInterval` user property to be human readable.

Informs: cockroachdb#102604
Release-note: None
raggar added a commit to raggar/cockroach that referenced this issue Jul 21, 2023
This pr has the following fixes for the builtin `crdb_internal.sstable_metrics`

1. Remove the ',' from `node_id`
2. Change `approximate_span_bytes` to be a `uint64` instead of `[]byte`
3. Convert the `MVCCTimeInterval` user property to be human readable.

Informs: cockroachdb#102604
Release-note: None
craig bot pushed a commit that referenced this issue Jul 21, 2023
105813: concurrency: re-introduce exclusive lock strength into the lock table r=nvanbenschoten a=arulajmani

First 6 commits from #105474

This patch re-introduces exclusive lock strength into the lock table.
Now, unreplicated locks are considered to have exclusive lock strength
whereas replicated locks have intent lock strength. The testing diff
follows from this mapping.

This distinction between exclusive lock strength and intent lock
strength is not meaningful for serializable transactions by default.
As such, this patch doesn't change anything functionally. However, once
we plumb in the isolation level of a request in all the right places,
this change will be useful. In particular, it'll allow non-locking reads
from read committed transactions to not block on exclusive locks.

Informs #94729

Release note: None

106943: server: move procCh init into Server.serveImpl r=rafiss a=ecwall

Informs #105448

This changes `procCh` to a `sync.WaitGroup` because the channel is never read
from and moves initialization into `Server.serveImpl`.

Also `processCommandsAsync` is changed to `processCommands` and the goroutine
is created inside `Server.serverImpl` to avoid needing a `procCh` parameter.

Release note: None

107303: builtins: `crdb_internal.sstable_metrics` fixes r=RahulAggarwal1016 a=RahulAggarwal1016

This pr has the following fixes for the builtin `crdb_internal.sstable_metrics`

1. Remove the extra `,` from `node_id`
2. Change `approximate_span_bytes` to be a `uint64` instead of `[]byte`

Next Steps:
- Fix `MVCCTimeInterval` display format 

Informs: #102604
Release-note: None

Co-authored-by: Arul Ajmani <[email protected]>
Co-authored-by: Evan Wall <[email protected]>
Co-authored-by: Rahul Aggarwal <[email protected]>
@jbowens jbowens moved this to Done in [Deprecated] Storage Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team
Projects
No open projects
Archived in project
Development

No branches or pull requests

3 participants