Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save index recommendations per statement fingerprint id #83782

Closed
maryliag opened this issue Jul 4, 2022 · 1 comment · Fixed by #85343
Closed

Save index recommendations per statement fingerprint id #83782

maryliag opened this issue Jul 4, 2022 · 1 comment · Fixed by #85343
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@maryliag
Copy link
Contributor

maryliag commented Jul 4, 2022

Collect index recommendations per fingerprint id.

We don't want to collect recommendation for all executions, so only calculate if there is no recommendations calculated in-memory so far. Steps:

  1. Statement is executed
  2. Check if there is a recommendation in-memory (crdb_internal.cluster_statement_statistics)
    2a. If there isn't one, set a new flag to true indicating it should be calculated, and update here to also check this flag, calculate the recommendation and save in-memory
    2b. If there is one, keep the flag as false
  3. When the data in-memory flushed, replace the value persisted, with the latest value from memory

This solution keep the recommendations updated, and also doesn't generate a problem by calculating on every execution.

Challenge: since I'll calculate the rec the first time is added to in-memory, this means after a flush, there will be a spike on calculation, since everything will be new in-memory. Need to find a way to spread the calculation on the 10min period

Jira issue: CRDB-17261

@maryliag maryliag added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 4, 2022
@maryliag maryliag self-assigned this Jul 4, 2022
maryliag added a commit to maryliag/cockroach that referenced this issue Jul 12, 2022
As part of cockroachdb#83782, we will generate index recommendation
per statement fingerprint id. As a safety, creating the
cluster setting
`sql.metrics.statement_details.index_recommendation_collection.enabled`
that can be disable, in case there are any issues with
performance.

Release note (sql change): Creation of cluster setting
`sql.metrics.statement_details.index_recommendation_collection.enabled`
that can be disabled if index recommendation generation is
causing performance issues.
craig bot pushed a commit that referenced this issue Jul 13, 2022
84282: sql: new cluster setting for index recommendation r=maryliag a=maryliag

As part of #83782, we will generate index recommendation
per statement fingerprint id. As a safety, creating the
cluster setting
`sql.metrics.statement_details.index_recommendation_collection.enabled`
that can be disable, in case there are any issues with
performance.

Release note (sql change): Creation of cluster setting
`sql.metrics.statement_details.index_recommendation_collection.enabled`
that can be disabled if index recommendation generation is
causing performance issues.

84316: authors: add Jake to authors r=j82w a=j82w

Release note: None

84367: kvserver: Fix missing increment in SSTSnapshotStorage refcount r=erikgrinaker a=itsbilal

Previously, in rare cases where we had two SSTSnapshotStorageScratch
for a given range, we'd have not incremented the refcount for the
second scratch creation, leading to an inconsistent refcount
panic down the line. This change fixes it and adds a test to
exercise the concurrent case.

Bug popped up in #84100.

Release note: None.

Co-authored-by: Marylia Gutierrez <[email protected]>
Co-authored-by: j82w <[email protected]>
Co-authored-by: Bilal Akhtar <[email protected]>
@kevin-v-ngo
Copy link

It would be amazing if this recommendation can be included in our telemetry pipeline and in our slow execution/outlier insights (versus the fingerprint level). I'm assuming our approach to storing this information in statement_statistics would enable us to easily add the information at the execution level and in our internal logging channel?

FYI @THardy98, @matthewtodd

maryliag added a commit to maryliag/cockroach that referenced this issue Jul 19, 2022
This commit adds a new column index_recommendations
STRING[] to:
crdb_internal.node_statement_statistics
crdb_internal.cluster_statement_statistics
system.statement_statistics
crdb_internal.statement_statistics

Part of cockroachdb#83782

Release note (sql change): Adding new column index_recommendations
to crdb_internal.node_statement_statistics,
crdb_internal.cluster_statement_statistics, system.statement_statistics
and crdb_internal.statement_statistics
maryliag added a commit to maryliag/cockroach that referenced this issue Jul 21, 2022
This commit adds a new column index_recommendations
STRING[] to:
crdb_internal.node_statement_statistics
crdb_internal.cluster_statement_statistics
system.statement_statistics
crdb_internal.statement_statistics

Part of cockroachdb#83782

Release note (sql change): Adding new column index_recommendations
to crdb_internal.node_statement_statistics,
crdb_internal.cluster_statement_statistics, system.statement_statistics
and crdb_internal.statement_statistics
maryliag added a commit to maryliag/cockroach that referenced this issue Jul 22, 2022
This commit adds a new column index_recommendations
STRING[] to:
crdb_internal.node_statement_statistics
crdb_internal.cluster_statement_statistics
system.statement_statistics
crdb_internal.statement_statistics

Part of cockroachdb#83782

Release note (sql change): Adding new column index_recommendations
to crdb_internal.node_statement_statistics,
crdb_internal.cluster_statement_statistics, system.statement_statistics
and crdb_internal.statement_statistics
maryliag added a commit to maryliag/cockroach that referenced this issue Jul 25, 2022
This commit adds a new column index_recommendations
STRING[] to:
crdb_internal.node_statement_statistics
crdb_internal.cluster_statement_statistics
system.statement_statistics
crdb_internal.statement_statistics

Part of cockroachdb#83782

Release note (sql change): Adding new column index_recommendations
to crdb_internal.node_statement_statistics,
crdb_internal.cluster_statement_statistics, system.statement_statistics
and crdb_internal.statement_statistics
maryliag added a commit to maryliag/cockroach that referenced this issue Jul 26, 2022
This commit adds a new column index_recommendations
STRING[] to:
crdb_internal.node_statement_statistics
crdb_internal.cluster_statement_statistics
system.statement_statistics
crdb_internal.statement_statistics

Part of cockroachdb#83782

Release note (sql change): Adding new column index_recommendations
to crdb_internal.node_statement_statistics,
crdb_internal.cluster_statement_statistics, system.statement_statistics
and crdb_internal.statement_statistics
craig bot pushed a commit that referenced this issue Jul 26, 2022
84618: sql: add new index_recommendation column r=maryliag a=maryliag

This commit adds a new column index_recommendations
STRING[] to:
crdb_internal.node_statement_statistics
crdb_internal.cluster_statement_statistics
system.statement_statistics
crdb_internal.statement_statistics

Part of #83782

Release note (sql change): Adding new column index_recommendations
to crdb_internal.node_statement_statistics,
crdb_internal.cluster_statement_statistics, system.statement_statistics
and crdb_internal.statement_statistics

84674: changefeedccl/schemafeed: ignore unwatched column drops  r=jayshrivastava a=jayshrivastava

Closes #80982

Release note (enterprise change): Previously, if you dropped
a column with the schema_change_policy 'stop', the changefeed
would stop. Dropping a column with a different policy would
result in previous rows being retransmitted with the
dropped column omitted.

In some cases, a changefeed may target specific columns
(a column family) of a table. In these cases, if a non-target
column is dropped, it does not make sense to stop the changefeed
or retransmit values because the column was not visible to
a consumer sink to begin with.

With this change, dropping an non-target column from a
table will not stop the changefeed when the
schema_change_policy is 'stop'. With any other policy,
dropping a non-target column will not trigger a backfill.

85034: sql: better tracing of apply joins r=yuzefovich a=yuzefovich

**sql: add tracing spans for each iteration of apply join**

This commit creates a separate tracing span for each iteration of apply
join and recursive CTE execution to make the traces easier to digest.

Release note: None

**sql: log DistSQL diagrams when tracing is enabled**

This commit makes it so that we always include all DistSQL diagrams into
the trace. This could be especially helpful when executing apply join
iterations to understand the plan that each iteration gets. This commit
also removes an environment variable that used to control this logging
previously since I don't think anyone has used it in years now that we
have better tools for debugging (like a stmt bundle).

Informs: #https://github.com/cockroachlabs/support/issues/1681.

Release note: None

85042: logictest: remove spec-planning configs r=yuzefovich a=yuzefovich

This commit removes `local-spec-planning`, `fakedist-spec-planning`, and
`5node-spec-planning` logic test configurations since they seem to be
not very useful at the moment. They were introduced to support the new
DistSQL spec factory, but that work has been postponed with no active
development at the moment, so it seems silly to run most of the logic
tests through the configs that are largely duplicate of the other
default ones (because most of the `Construct*` methods are not
implemented in the new factory). Once the active development on the new
factory resumes, it'll be pretty easy to bring them back to life, but at
the moment let's reduce the amount of tests we run without really losing
any test coverage.

Release note: None

Co-authored-by: Marylia Gutierrez <[email protected]>
Co-authored-by: Jayant Shrivastava <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
maryliag added a commit to maryliag/cockroach that referenced this issue Jul 26, 2022
Fixes cockroachdb#83782

Release note (sql change): Generate index recommendation per
statement fingerprint id and save to its stats.
maryliag added a commit to maryliag/cockroach that referenced this issue Aug 8, 2022
We want to collect index recommendations
per statement and save it to the statement_statistics table.
To accomplish this, a new cache map was created.
The key for the map is a combination of the
statement fingerprint, the database and the plan hash,
since those are the parameters that would affect
an index recommendation (from the parameters we use
as keys for the statement fingerprint ID at least).

This cache has a limit of unique recommendations,
limited by a new cluster setting called
`sql.metrics.statement_details.max_mem_reported_idx_recommendations`
with a default value of 100k.
If this limit is reached, we delete entries that
had their last update made more than 24h ago.

A new recommendations is generated if has been more
than 1h since the last recommendation (saved on cache)
and it has at least a few executions (so we handle the case
where we have just one execution of several different
fingerprints, and is not worth calculating recommendations
for them).

The recommendations are saved as a list, with each
recommendations being "type : sql command", e.g.
`{"creation : CREATE INDEX ON t2 (i) STORING (k);"}`
and
`{"replacement : CREATE UNIQUE INDEX ON t1 (i) STORING (k);
DROP INDEX t1@existing_t1_i;"}`

Closes cockroachdb#83782

Release note (sql change): Index recommendations are generated
for statements. New recommendations are generated every hour and
available on `system.statement_statistics` and
`crdb_internal.statement_statistics`.
New cluster setting with default value of 100k
sql.metrics.statement_details.max_mem_reported_idx_recommendations
maryliag added a commit to maryliag/cockroach that referenced this issue Aug 8, 2022
We want to collect index recommendations
per statement and save it to the statement_statistics table.
To accomplish this, a new cache map was created.
The key for the map is a combination of the
statement fingerprint, the database and the plan hash,
since those are the parameters that would affect
an index recommendation (from the parameters we use
as keys for the statement fingerprint ID at least).

This cache has a limit of unique recommendations,
limited by a new cluster setting called
`sql.metrics.statement_details.max_mem_reported_idx_recommendations`
with a default value of 100k.
If this limit is reached, we delete entries that
had their last update made more than 24h ago.

A new recommendations is generated if has been more
than 1h since the last recommendation (saved on cache)
and it has at least a few executions (so we handle the case
where we have just one execution of several different
fingerprints, and is not worth calculating recommendations
for them).

The recommendations are saved as a list, with each
recommendations being "type : sql command", e.g.
`{"creation : CREATE INDEX ON t2 (i) STORING (k);"}`
and
`{"replacement : CREATE UNIQUE INDEX ON t1 (i) STORING (k);
DROP INDEX t1@existing_t1_i;"}`

Closes cockroachdb#83782

Release note (sql change): Index recommendations are generated
for statements. New recommendations are generated every hour and
available on `system.statement_statistics` and
`crdb_internal.statement_statistics`.
New cluster setting with default value of 100k
sql.metrics.statement_details.max_mem_reported_idx_recommendations
maryliag added a commit to maryliag/cockroach that referenced this issue Aug 8, 2022
We want to collect index recommendations
per statement and save it to the statement_statistics table.
To accomplish this, a new cache map was created.
The key for the map is a combination of the
statement fingerprint, the database and the plan hash,
since those are the parameters that would affect
an index recommendation (from the parameters we use
as keys for the statement fingerprint ID at least).

This cache has a limit of unique recommendations,
limited by a new cluster setting called
`sql.metrics.statement_details.max_mem_reported_idx_recommendations`
with a default value of 100k.
If this limit is reached, we delete entries that
had their last update made more than 24h ago.

A new recommendations is generated if has been more
than 1h since the last recommendation (saved on cache)
and it has at least a few executions (so we handle the case
where we have just one execution of several different
fingerprints, and is not worth calculating recommendations
for them).

The recommendations are saved as a list, with each
recommendations being "type : sql command", e.g.
`{"creation : CREATE INDEX ON t2 (i) STORING (k);"}`
and
`{"replacement : CREATE UNIQUE INDEX ON t1 (i) STORING (k);
DROP INDEX t1@existing_t1_i;"}`

Closes cockroachdb#83782

Release note (sql change): Index recommendations are generated
for statements. New recommendations are generated every hour and
available on `system.statement_statistics` and
`crdb_internal.statement_statistics`.
New cluster setting with default value of 100k
sql.metrics.statement_details.max_mem_reported_idx_recommendations
maryliag added a commit to maryliag/cockroach that referenced this issue Aug 9, 2022
We want to collect index recommendations
per statement and save it to the statement_statistics table.
To accomplish this, a new cache map was created.
The key for the map is a combination of the
statement fingerprint, the database and the plan hash,
since those are the parameters that would affect
an index recommendation (from the parameters we use
as keys for the statement fingerprint ID at least).

This cache has a limit of unique recommendations,
limited by a new cluster setting called
`sql.metrics.statement_details.max_mem_reported_idx_recommendations`
with a default value of 100k.
If this limit is reached, we delete entries that
had their last update made more than 24h ago.

A new recommendations is generated if has been more
than 1h since the last recommendation (saved on cache)
and it has at least a few executions (so we handle the case
where we have just one execution of several different
fingerprints, and is not worth calculating recommendations
for them).

The recommendations are saved as a list, with each
recommendations being "type : sql command", e.g.
`{"creation : CREATE INDEX ON t2 (i) STORING (k);"}`
and
`{"replacement : CREATE UNIQUE INDEX ON t1 (i) STORING (k);
DROP INDEX t1@existing_t1_i;"}`

Closes cockroachdb#83782

Release note (sql change): Index recommendations are generated
for statements. New recommendations are generated every hour and
available on `system.statement_statistics` and
`crdb_internal.statement_statistics`.
New cluster setting with default value of 100k
sql.metrics.statement_details.max_mem_reported_idx_recommendations
maryliag added a commit to maryliag/cockroach that referenced this issue Aug 9, 2022
We want to collect index recommendations
per statement and save it to the statement_statistics table.
To accomplish this, a new cache map was created.
The key for the map is a combination of the
statement fingerprint, the database and the plan hash,
since those are the parameters that would affect
an index recommendation (from the parameters we use
as keys for the statement fingerprint ID at least).

This cache has a limit of unique recommendations,
limited by a new cluster setting called
`sql.metrics.statement_details.max_mem_reported_idx_recommendations`
with a default value of 100k.
If this limit is reached, we delete entries that
had their last update made more than 24h ago.

A new recommendations is generated if has been more
than 1h since the last recommendation (saved on cache)
and it has at least a few executions (so we handle the case
where we have just one execution of several different
fingerprints, and is not worth calculating recommendations
for them).

The recommendations are saved as a list, with each
recommendations being "type : sql command", e.g.
`{"creation : CREATE INDEX ON t2 (i) STORING (k);"}`
and
`{"replacement : CREATE UNIQUE INDEX ON t1 (i) STORING (k);
DROP INDEX t1@existing_t1_i;"}`

Closes cockroachdb#83782

Release note (sql change): Index recommendations are generated
for statements. New recommendations are generated every hour and
available on `system.statement_statistics` and
`crdb_internal.statement_statistics`.
New cluster setting with default value of 100k
sql.metrics.statement_details.max_mem_reported_idx_recommendations
craig bot pushed a commit that referenced this issue Aug 10, 2022
85343: sql: generate index recommendations r=maryliag a=maryliag

We want to collect index recommendations
per statement and save it to the statement_statistics table.
To accomplish this, a new cache map was created.
The key for the map is a combination of the
statement fingerprint, the database and the plan hash,
since those are the parameters that would affect
an index recommendation (from the parameters we use
as keys for the statement fingerprint ID at least).

This cache has a limit of unique recommendations,
limited by a new cluster setting called
`sql.metrics.statement_details.max_mem_reported_idx_recommendations`
with a default value of 100k.
If this limit is reached, we delete entries that
had their last update made more than 24h ago.

A new recommendations is generated if has been more
than 1h since the last recommendation (saved on cache)
and it has at least a few executions (so we handle the case
where we have just one execution of several different
fingerprints, and is not worth calculating recommendations
for them).

The recommendations are saved as a list, with each
recommendations being "type : sql command", e.g.
`{"creation : CREATE INDEX ON t2 (i) STORING (k);"}`
and
`{"replacement : CREATE UNIQUE INDEX ON t1 (i) STORING (k);
DROP INDEX t1@existing_t1_i;"}`

Fixes #83782

---
Different approaches tested:
| branch  | Cache | Min Execution Count | Clear Cache|
| ------------- | ------------- | ------------- | ------------- |
| idx-rec-cache-cnt | Yes  | 1 | Older than 24h |
| idx-rec-large-cnt | Yes  | 10 | Older than 24h |
| idx-rec-random | Yes  | 1 | Random 1000 |
| idx-rec-no-cnt | Yes  | 0 | Older than 24h |
| idx-rec-no-cache | No  | N/A | N/A |

Min Execution Count is the minimum value of execution the statement must have so we decide to generate recommendation

Running TPC C on all branches, results:
 | branch | elapsed | tpmC | efc | avg(ms) | p50(ms) | p90(ms) | p95(ms) | p99(ms) | pMax(ms)|
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
| main | 4200.1s | 12.4 | 96.8%  | 19.8 | 18.9 | 25.2 | 31.5 | 46.1 | 302.0 | 
| idx-rec-cache-cnt | 3941.0s |12.6 | 98.3% | 18.8 | 18.9 | 23.1 | 25.2 | 37.7 | 52.4 | 
| idx-rec-large-cnt | 4138.8s | 12.3 | 95.7% | 18.0 | 17.8 | 22.0 | 24.1 | 39.8 | 130.0 | 
| idx-rec-random | 4321.9s | 12.6 | 98.1% | 19.3 | 18.9 | 24.1 | 27.3 | 39.8 | 209.7 | 
| idx-rec-no-cnt |  4200.2s | 12.4 | 96.4% | 19.3 | 18.9 | 24.1 | 26.2 | 44.0 | 67.1 | 
| idx-rec-no-cache | 4199.9s | 12.5 | 97.4% | 31.1 | 27.3 | 52.4 | 60.8 | 83.9 | 151.0 | 

The most significant change is when there is no cache, confirming the cache is necessary.  
The decision on keeping or removing the execution count check wasn't clear, so I decided to keep the count with a value of 5 and run another workload.

Running YSCB A:

 | branch | elapsed | errors | ops(total) | ops/sec(cum) | avg(ms) | p50(ms) | p95(ms) | p99(ms) | pMax(ms)|
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
| main | 4200.1s | 0 | 33334570 | 7936.6 | 4.0 | 2.9 | 11.0 | 18.9 | 436.2 |
| idx-rec-cache-cnt | 4012.4s | 0 | 20153193 | 5022.7 | 6.4 | 3.8 | 19.9 | 37.7 | 1879.0 | 
| idx-rec-no-cnt | 4200.1s | 0 | 29112702 | 6931.4 | 4.6 | 3.1 | 13.1 | 23.1 | 1342.2 | 
| idx-rec-no-cache | 4200.1s | 0 | 27216030 | 6479.8 | 4.9 | 3.4 | 13.6 | 23.1 | 872.4 |

Keeping the execution count has a lower performance when there is more contention, but since we have customers that generate several similar fingerprints by using different columns order on selects etc, we want to avoid creating recommendations for those cases, since it would be almost as having no cache. 
So I created another option (branch idx-rec-cache), where we do have a count, but once the count reaches the min count, we don't need to update it anymore (helping with contention on the cache itself). 
Running workloads with those options again:

TPC C
 | branch | elapsed | tpmC | efc | avg(ms) | p50(ms) | p90(ms) | p95(ms) | p99(ms) | pMax(ms)|
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
| main | 4200.1s | 12.6 | 98.0% | 19.0 | 18.9 | 22.0 | 23.1 | 27.3 | 62.9 |
| idx-rec-cache | 4200.0s | 12.5 | 97.0% | 19.3 | 19.9 | 23.1 | 24.1 | 31.5 | 62.9 |
| idx-rec-cnt | 4200.1s | 12.6 | 97.9% |  20.0 | 19.9 | 24.1 | 26.2 | 39.8 | 79.7 |
| idx-rec-no-cache | 4200.3s | 12.3 | 96.0% | 26.3 | 26.2 | 29.4 | 31.5 | 46.1 | 75.5 |

YSCB A
 | branch | elapsed | errors | ops(total) | ops/sec(cum) | avg(ms) | p50(ms) | p95(ms) | p99(ms) | pMax(ms)|
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
| main | 4200.0s  | 0 | 27374582 | 6517.8 | 4.9 | 3.4 | 13.1 | 28.3 | 872.4 |
| idx-rec-cache | 4200.0s | 0 | 26575359 | 6327.4 | 5.1 | 3.5 | 13.6 | 29.4 | 906.0 |
| idx-rec-cnt | 4199.9s | 0 | 27459203 | 6538.1 | 4.9 | 3.4 | 13.1 | 28.3 | 704.6 |
| idx-rec-no-cache | 4199.9s | 0 | 24826078 | 5911.1 | 5.4 | 3.7 | 14.2 | 30.4 | 738.2 |

---

Release note (sql change): Index recommendations are generated
for statements. New recommendations are generated every hour and
available on `system.statement_statistics` and
`crdb_internal.statement_statistics`.
New cluster setting with default value of 100k
sql.metrics.statement_details.max_mem_reported_idx_recommendations

85833:  sql: fix flaky size error in max_size session serialization test r=cucaroach a=cucaroach

Fixes: #85821
    
Release note: None

Co-authored-by: Marylia Gutierrez <[email protected]>
Co-authored-by: Tommy Reilly <[email protected]>
@craig craig bot closed this as completed in 4655f97 Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants