Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: leasing of virtual table privileges adds to latency on first use #93182

Closed
ajwerner opened this issue Dec 7, 2022 · 2 comments · Fixed by #93557
Closed

sql: leasing of virtual table privileges adds to latency on first use #93182

ajwerner opened this issue Dec 7, 2022 · 2 comments · Fixed by #93557
Labels
A-multiregion Related to multi-region A-sql-privileges SQL privilege handling and permission checks. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@ajwerner
Copy link
Contributor

ajwerner commented Dec 7, 2022

Describe the problem
In 22.2 we added synthetic privileges which are used for things like external connections, and, critically, virtual tables. These privileges need to be checked when iterating through various system catalog schemas (pg_catalog, information schema, SHOW TABLES, ...).

We populate these caches lazily.

Describe the solution you'd like
We could eagerly launch routines to populate these caches for the different virtual tables concurrently rather than serially.

Describe alternatives you've considered
We could do some sort of special initialization to preload data. These tables are almost always empty and are small when populated. We could scan it all and avoid the cost of all the parallel work.

Epic: CRDB-18596

Jira issue: CRDB-22207

@ajwerner ajwerner added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-sql-privileges SQL privilege handling and permission checks. A-multiregion Related to multi-region labels Dec 7, 2022
@blathers-crl blathers-crl bot added the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label Dec 7, 2022
@exalate-issue-sync exalate-issue-sync bot added T-sql-schema-deprecated Use T-sql-foundations instead and removed T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Dec 7, 2022
@ajwerner
Copy link
Contributor Author

We could eagerly launch routines to populate these caches for the different virtual tables concurrently rather than serially.

This turns out to be a less-than-ideal approach because there are 290 virtual tables. It's a lot of work to fetch them all in parallel if you go "the long way" by executing queries against each of them. It's hard to get at the cache directly to have it do the cheaper thing.

@ajwerner
Copy link
Contributor Author

Ideally we could load all of the data for all the virtual tables in one scan.

ajwerner added a commit to ajwerner/cockroach that referenced this issue Dec 12, 2022
This commit attempts to alleviate the pain caused by synthetic virtual table
privileges introduced in 22.2. We need to fetch privileges for virtual tables
to determine whether the user has access. This is done lazily. However, when
a user attempts to read a virtual table like pg_class or run `SHOW TABLES` it
will force the privileges to be determined for each virtual table (of which
there are 290 at the time of writing). This sequential process can be somewhat
slow in a single region cluster and will be *very* slow in an MR cluster.

This patch attempts to somewhat alleviate this pain by kicking off goroutines
to fetch these privileges eagerly at server startup. This is okay, but is
pretty expensive. Each privilege fetch itself internally runs a query which
means we're running 290 statements in 290 transactions to fetch this
information.

Ideally we'd do something better, like pre-warm the cache with one scan against
the system table.

Partially addresses cockroachdb#93182.

Release note (performance improvement): In 22.2 we introduced privileges on
virtual tables (system catalogs like pg_catalog, information_schema, and
crdb_internal). A problem with this new feature is that we now must fetch those
privileges into a cache before we can use those tables or determine their
visibility in other system catalogs. This process used to occur on-demand, when
the privilege was needed. Now we'll fetch these privileges eagerly during
startup, with some parallelism, to mitigate the latency when accessing
pg_catalog right after the server boots up.
rafiss pushed a commit to rafiss/cockroach that referenced this issue Dec 12, 2022
This commit attempts to alleviate the pain caused by synthetic virtual table
privileges introduced in 22.2. We need to fetch privileges for virtual tables
to determine whether the user has access. This is done lazily. However, when
a user attempts to read a virtual table like pg_class or run `SHOW TABLES` it
will force the privileges to be determined for each virtual table (of which
there are 290 at the time of writing). This sequential process can be somewhat
slow in a single region cluster and will be *very* slow in an MR cluster.

This patch attempts to somewhat alleviate this pain by scanning the
table eagerly during server startup.

Partially addresses cockroachdb#93182.

Release note (performance improvement): In 22.2 we introduced privileges on
virtual tables (system catalogs like pg_catalog, information_schema, and
crdb_internal). A problem with this new feature is that we now must fetch those
privileges into a cache before we can use those tables or determine their
visibility in other system catalogs. This process used to occur on-demand, when
the privilege was needed. Now we'll fetch these privileges eagerly during
startup to mitigate the latency when accessing pg_catalog right after the
server boots up.
rafiss pushed a commit to rafiss/cockroach that referenced this issue Dec 12, 2022
fixes cockroachdb#93182

This commit attempts to alleviate the pain caused by synthetic virtual table
privileges introduced in 22.2. We need to fetch privileges for virtual tables
to determine whether the user has access. This is done lazily. However, when
a user attempts to read a virtual table like pg_class or run `SHOW TABLES` it
will force the privileges to be determined for each virtual table (of which
there are 290 at the time of writing). This sequential process can be somewhat
slow in a single region cluster and will be *very* slow in an MR cluster.

This patch attempts to somewhat alleviate this pain by scanning the
table eagerly during server startup.

Release note (performance improvement): In 22.2 we introduced privileges on
virtual tables (system catalogs like pg_catalog, information_schema, and
crdb_internal). A problem with this new feature is that we now must fetch those
privileges into a cache before we can use those tables or determine their
visibility in other system catalogs. This process used to occur on-demand, when
the privilege was needed. Now we'll fetch these privileges eagerly during
startup to mitigate the latency when accessing pg_catalog right after the
server boots up.
rafiss pushed a commit to rafiss/cockroach that referenced this issue Dec 12, 2022
fixes cockroachdb#93182

This commit attempts to alleviate the pain caused by synthetic virtual table
privileges introduced in 22.2. We need to fetch privileges for virtual tables
to determine whether the user has access. This is done lazily. However, when
a user attempts to read a virtual table like pg_class or run `SHOW TABLES` it
will force the privileges to be determined for each virtual table (of which
there are 290 at the time of writing). This sequential process can be somewhat
slow in a single region cluster and will be *very* slow in an MR cluster.

This patch attempts to somewhat alleviate this pain by scanning the
table eagerly during server startup.

Release note (performance improvement): In 22.2 we introduced privileges on
virtual tables (system catalogs like pg_catalog, information_schema, and
crdb_internal). A problem with this new feature is that we now must fetch those
privileges into a cache before we can use those tables or determine their
visibility in other system catalogs. This process used to occur on-demand, when
the privilege was needed. Now we'll fetch these privileges eagerly during
startup to mitigate the latency when accessing pg_catalog right after the
server boots up.
rafiss pushed a commit to rafiss/cockroach that referenced this issue Dec 13, 2022
fixes cockroachdb#93182

This commit attempts to alleviate the pain caused by synthetic virtual table
privileges introduced in 22.2. We need to fetch privileges for virtual tables
to determine whether the user has access. This is done lazily. However, when
a user attempts to read a virtual table like pg_class or run `SHOW TABLES` it
will force the privileges to be determined for each virtual table (of which
there are 290 at the time of writing). This sequential process can be somewhat
slow in a single region cluster and will be *very* slow in an MR cluster.

This patch attempts to somewhat alleviate this pain by scanning the
table eagerly during server startup.

Release note (performance improvement): In 22.2 we introduced privileges on
virtual tables (system catalogs like pg_catalog, information_schema, and
crdb_internal). A problem with this new feature is that we now must fetch those
privileges into a cache before we can use those tables or determine their
visibility in other system catalogs. This process used to occur on-demand, when
the privilege was needed. Now we'll fetch these privileges eagerly during
startup to mitigate the latency when accessing pg_catalog right after the
server boots up.
ajwerner added a commit to ajwerner/cockroach that referenced this issue Dec 14, 2022
Fixes cockroachdb#93182

This commit attempts to alleviate the pain caused by synthetic virtual table
privileges introduced in 22.2. We need to fetch privileges for virtual tables
to determine whether the user has access. This is done lazily. However, when a
user attempts to read a virtual table like pg_class or run SHOW TABLES it will
force the privileges to be determined for each virtual table (of which there
are 290 at the time of writing). This sequential process can be somewhat slow
in a single region cluster and will be very slow in an MR cluster.

This patch attempts to somewhat alleviate this pain by scanning the table
eagerly during server startup.

Release note (performance improvement): In 22.2 we introduced privileges on
virtual tables (system catalogs like pg_catalog, information_schema, and
crdb_internal). A problem with this new feature is that we now must fetch those
privileges into a cache before we can use those tables or determine their
visibility in other system catalogs. This process used to occur on-demand, when
the privilege was needed. Now we'll fetch these privileges eagerly during
startup to mitigate the latency when accessing pg_catalog right after the
server boots up.
ajwerner added a commit to ajwerner/cockroach that referenced this issue Dec 14, 2022
Fixes cockroachdb#93182

This commit attempts to alleviate the pain caused by synthetic virtual table
privileges introduced in 22.2. We need to fetch privileges for virtual tables
to determine whether the user has access. This is done lazily. However, when a
user attempts to read a virtual table like pg_class or run SHOW TABLES it will
force the privileges to be determined for each virtual table (of which there
are 290 at the time of writing). This sequential process can be somewhat slow
in a single region cluster and will be very slow in an MR cluster.

This patch attempts to somewhat alleviate this pain by scanning the table
eagerly during server startup.

Release note (performance improvement): In 22.2 we introduced privileges on
virtual tables (system catalogs like pg_catalog, information_schema, and
crdb_internal). A problem with this new feature is that we now must fetch those
privileges into a cache before we can use those tables or determine their
visibility in other system catalogs. This process used to occur on-demand, when
the privilege was needed. Now we'll fetch these privileges eagerly during
startup to mitigate the latency when accessing pg_catalog right after the
server boots up.
craig bot pushed a commit that referenced this issue Dec 14, 2022
93153: rttanalysis: don't count leasing the database desc r=andreimatei a=andreimatei

A bunch of rtt-analysis tests were counting a request for leasing the database descriptor. This is not interesting. This patch makes the test framework lease it first through a "USE" statement.

The number of KV requests required for leasing is currently mis-counted. We count 1, but in reality it's 4. A different patch will correct the miscounting that, at which point that would be too significant for the tests.

Release note: None
Epic: None

93325: multitenant: retain range splits after TRUNCATE for secondary tenants r=knz a=ecwall

Fixes #69499
Fixes #82944

Existing split points are preserved after a TRUNCATE statement is executed by a secondary tenant.

Release note: None

93354: tracing: disallow children of sterile span with different Tracer r=andreimatei a=andreimatei

Before this patch, creating a "child" of a sterile span with a different Tracer than the one used to create the sterile span was tolerated - on the argument that sterile spans don't actually get children (the would-be child span is created as a root), so the arguments for not allowing a children to be created with different tracers don't apply. At the same time, creating a child of a noop span with a different Tracer than the noop span's Tracer was documented to not be allowed. In practice, it was, because the code was confused [1].

This patch disallows creating children of sterile spans with a different tracer, for consistency with all the other spans. The patch also makes it a panic for the children of noop spans to be created with a different Tracer.

This is all meant as a cleanup / code simplification.

[1] WithParent(sp) meant to treat sterile spans differently than noop spans but it was using sp.IsSterile(), which returns true for both. As such, it was unintentionally returning an empty parent option. startSpanGeneric() meant to check the tracer of parent noop spans, but it was failing to actually do so because it was going through the opts.Parent.empty().

Release note: None
Epic: None

93545: sql: make SHOW RANGES FOR TABLE include all indexes r=ajwerner a=knz

Informs #80906.
Fixes #93546.
Fixes #82995.

Release note (backward-incompatible change): `SHOW RANGES FOR TABLE`
now includes rows for all indexes that support the table. Prior to
this change, `SHOW RANGES FOR TABLE foo` was an alias for `SHOW RANGES
FOR INDEX foo@primary`. This was causing confusion, as it would miss
data for secondary indexes. It is still possible to filter to just the
primary index using `SHOW RANGES FOR INDEX foo@primary`.

The statement output now also includes the index name.

93557: syntheticprivilegecache: scan all privileges at startup  r=ajwerner a=ajwerner

#### syntheticprivilegecache: move caching logic out of sql
This is a pure refactor to move the logic for caching synthetic privileges
from the sql package. This will make it easier to add features later.

#### syntheticprivilegecache: scan all privileges at startup 


Fixes #93182

This commit attempts to alleviate the pain caused by synthetic virtual table
privileges introduced in 22.2. We need to fetch privileges for virtual tables
to determine whether the user has access. This is done lazily. However, when a
user attempts to read a virtual table like pg_class or run SHOW TABLES it will
force the privileges to be determined for each virtual table (of which there
are 290 at the time of writing). This sequential process can be somewhat slow
in a single region cluster and will be very slow in an MR cluster.

This patch attempts to somewhat alleviate this pain by scanning the table
eagerly during server startup.

Release note (performance improvement): In 22.2 we introduced privileges on
virtual tables (system catalogs like pg_catalog, information_schema, and
crdb_internal). A problem with this new feature is that we now must fetch those
privileges into a cache before we can use those tables or determine their
visibility in other system catalogs. This process used to occur on-demand, when
the privilege was needed. Now we'll fetch these privileges eagerly during
startup to mitigate the latency when accessing pg_catalog right after the
server boots up.

93563: pgwire: handle decoding Geometry/Geography in binary r=rafiss a=otan

Resolves #81066
Resolves #93352

Release note (bug fix): Previously, CockroachDB would error when receiving Geometry/Geography using binary parameters. This is now resolved.

93618: cli,cliccl: move some mt commands to cliccl r=ajwerner a=ajwerner

Part of #91714

Epic: none

Release note: None

Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Evan Wall <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Oliver Tan <[email protected]>
@craig craig bot closed this as completed in 268d0f4 Dec 14, 2022
ajwerner added a commit to ajwerner/cockroach that referenced this issue Dec 15, 2022
Fixes cockroachdb#93182

This commit attempts to alleviate the pain caused by synthetic virtual table
privileges introduced in 22.2. We need to fetch privileges for virtual tables
to determine whether the user has access. This is done lazily. However, when a
user attempts to read a virtual table like pg_class or run SHOW TABLES it will
force the privileges to be determined for each virtual table (of which there
are 290 at the time of writing). This sequential process can be somewhat slow
in a single region cluster and will be very slow in an MR cluster.

This patch attempts to somewhat alleviate this pain by scanning the table
eagerly during server startup.

Release note (performance improvement): In 22.2 we introduced privileges on
virtual tables (system catalogs like pg_catalog, information_schema, and
crdb_internal). A problem with this new feature is that we now must fetch those
privileges into a cache before we can use those tables or determine their
visibility in other system catalogs. This process used to occur on-demand, when
the privilege was needed. Now we'll fetch these privileges eagerly during
startup to mitigate the latency when accessing pg_catalog right after the
server boots up.
@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multiregion Related to multi-region A-sql-privileges SQL privilege handling and permission checks. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
1 participant