Save index recommendations per statement fingerprint id #83782

maryliag · 2022-07-04T16:46:09Z

Collect index recommendations per fingerprint id.

We don't want to collect recommendation for all executions, so only calculate if there is no recommendations calculated in-memory so far. Steps:

Statement is executed
Check if there is a recommendation in-memory (crdb_internal.cluster_statement_statistics)
2a. If there isn't one, set a new flag to true indicating it should be calculated, and update here to also check this flag, calculate the recommendation and save in-memory
2b. If there is one, keep the flag as false
When the data in-memory flushed, replace the value persisted, with the latest value from memory

This solution keep the recommendations updated, and also doesn't generate a problem by calculating on every execution.

Challenge: since I'll calculate the rec the first time is added to in-memory, this means after a flush, there will be a spike on calculation, since everything will be new in-memory. Need to find a way to spread the calculation on the 10min period

Jira issue: CRDB-17261

The text was updated successfully, but these errors were encountered:

As part of cockroachdb#83782, we will generate index recommendation per statement fingerprint id. As a safety, creating the cluster setting `sql.metrics.statement_details.index_recommendation_collection.enabled` that can be disable, in case there are any issues with performance. Release note (sql change): Creation of cluster setting `sql.metrics.statement_details.index_recommendation_collection.enabled` that can be disabled if index recommendation generation is causing performance issues.

84282: sql: new cluster setting for index recommendation r=maryliag a=maryliag As part of #83782, we will generate index recommendation per statement fingerprint id. As a safety, creating the cluster setting `sql.metrics.statement_details.index_recommendation_collection.enabled` that can be disable, in case there are any issues with performance. Release note (sql change): Creation of cluster setting `sql.metrics.statement_details.index_recommendation_collection.enabled` that can be disabled if index recommendation generation is causing performance issues. 84316: authors: add Jake to authors r=j82w a=j82w Release note: None 84367: kvserver: Fix missing increment in SSTSnapshotStorage refcount r=erikgrinaker a=itsbilal Previously, in rare cases where we had two SSTSnapshotStorageScratch for a given range, we'd have not incremented the refcount for the second scratch creation, leading to an inconsistent refcount panic down the line. This change fixes it and adds a test to exercise the concurrent case. Bug popped up in #84100. Release note: None. Co-authored-by: Marylia Gutierrez <[email protected]> Co-authored-by: j82w <[email protected]> Co-authored-by: Bilal Akhtar <[email protected]>

kevin-v-ngo · 2022-07-13T20:14:57Z

It would be amazing if this recommendation can be included in our telemetry pipeline and in our slow execution/outlier insights (versus the fingerprint level). I'm assuming our approach to storing this information in statement_statistics would enable us to easily add the information at the execution level and in our internal logging channel?

FYI @THardy98, @matthewtodd

This commit adds a new column index_recommendations STRING[] to: crdb_internal.node_statement_statistics crdb_internal.cluster_statement_statistics system.statement_statistics crdb_internal.statement_statistics Part of cockroachdb#83782 Release note (sql change): Adding new column index_recommendations to crdb_internal.node_statement_statistics, crdb_internal.cluster_statement_statistics, system.statement_statistics and crdb_internal.statement_statistics

84618: sql: add new index_recommendation column r=maryliag a=maryliag This commit adds a new column index_recommendations STRING[] to: crdb_internal.node_statement_statistics crdb_internal.cluster_statement_statistics system.statement_statistics crdb_internal.statement_statistics Part of #83782 Release note (sql change): Adding new column index_recommendations to crdb_internal.node_statement_statistics, crdb_internal.cluster_statement_statistics, system.statement_statistics and crdb_internal.statement_statistics 84674: changefeedccl/schemafeed: ignore unwatched column drops r=jayshrivastava a=jayshrivastava Closes #80982 Release note (enterprise change): Previously, if you dropped a column with the schema_change_policy 'stop', the changefeed would stop. Dropping a column with a different policy would result in previous rows being retransmitted with the dropped column omitted. In some cases, a changefeed may target specific columns (a column family) of a table. In these cases, if a non-target column is dropped, it does not make sense to stop the changefeed or retransmit values because the column was not visible to a consumer sink to begin with. With this change, dropping an non-target column from a table will not stop the changefeed when the schema_change_policy is 'stop'. With any other policy, dropping a non-target column will not trigger a backfill. 85034: sql: better tracing of apply joins r=yuzefovich a=yuzefovich **sql: add tracing spans for each iteration of apply join** This commit creates a separate tracing span for each iteration of apply join and recursive CTE execution to make the traces easier to digest. Release note: None **sql: log DistSQL diagrams when tracing is enabled** This commit makes it so that we always include all DistSQL diagrams into the trace. This could be especially helpful when executing apply join iterations to understand the plan that each iteration gets. This commit also removes an environment variable that used to control this logging previously since I don't think anyone has used it in years now that we have better tools for debugging (like a stmt bundle). Informs: #https://github.com/cockroachlabs/support/issues/1681. Release note: None 85042: logictest: remove spec-planning configs r=yuzefovich a=yuzefovich This commit removes `local-spec-planning`, `fakedist-spec-planning`, and `5node-spec-planning` logic test configurations since they seem to be not very useful at the moment. They were introduced to support the new DistSQL spec factory, but that work has been postponed with no active development at the moment, so it seems silly to run most of the logic tests through the configs that are largely duplicate of the other default ones (because most of the `Construct*` methods are not implemented in the new factory). Once the active development on the new factory resumes, it'll be pretty easy to bring them back to life, but at the moment let's reduce the amount of tests we run without really losing any test coverage. Release note: None Co-authored-by: Marylia Gutierrez <[email protected]> Co-authored-by: Jayant Shrivastava <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>

Fixes cockroachdb#83782 Release note (sql change): Generate index recommendation per statement fingerprint id and save to its stats.

We want to collect index recommendations per statement and save it to the statement_statistics table. To accomplish this, a new cache map was created. The key for the map is a combination of the statement fingerprint, the database and the plan hash, since those are the parameters that would affect an index recommendation (from the parameters we use as keys for the statement fingerprint ID at least). This cache has a limit of unique recommendations, limited by a new cluster setting called `sql.metrics.statement_details.max_mem_reported_idx_recommendations` with a default value of 100k. If this limit is reached, we delete entries that had their last update made more than 24h ago. A new recommendations is generated if has been more than 1h since the last recommendation (saved on cache) and it has at least a few executions (so we handle the case where we have just one execution of several different fingerprints, and is not worth calculating recommendations for them). The recommendations are saved as a list, with each recommendations being "type : sql command", e.g. `{"creation : CREATE INDEX ON t2 (i) STORING (k);"}` and `{"replacement : CREATE UNIQUE INDEX ON t1 (i) STORING (k); DROP INDEX t1@existing_t1_i;"}` Closes cockroachdb#83782 Release note (sql change): Index recommendations are generated for statements. New recommendations are generated every hour and available on `system.statement_statistics` and `crdb_internal.statement_statistics`. New cluster setting with default value of 100k sql.metrics.statement_details.max_mem_reported_idx_recommendations

85343: sql: generate index recommendations r=maryliag a=maryliag We want to collect index recommendations per statement and save it to the statement_statistics table. To accomplish this, a new cache map was created. The key for the map is a combination of the statement fingerprint, the database and the plan hash, since those are the parameters that would affect an index recommendation (from the parameters we use as keys for the statement fingerprint ID at least). This cache has a limit of unique recommendations, limited by a new cluster setting called `sql.metrics.statement_details.max_mem_reported_idx_recommendations` with a default value of 100k. If this limit is reached, we delete entries that had their last update made more than 24h ago. A new recommendations is generated if has been more than 1h since the last recommendation (saved on cache) and it has at least a few executions (so we handle the case where we have just one execution of several different fingerprints, and is not worth calculating recommendations for them). The recommendations are saved as a list, with each recommendations being "type : sql command", e.g. `{"creation : CREATE INDEX ON t2 (i) STORING (k);"}` and `{"replacement : CREATE UNIQUE INDEX ON t1 (i) STORING (k); DROP INDEX t1@existing_t1_i;"}` Fixes #83782 --- Different approaches tested: | branch | Cache | Min Execution Count | Clear Cache| | ------------- | ------------- | ------------- | ------------- | | idx-rec-cache-cnt | Yes | 1 | Older than 24h | | idx-rec-large-cnt | Yes | 10 | Older than 24h | | idx-rec-random | Yes | 1 | Random 1000 | | idx-rec-no-cnt | Yes | 0 | Older than 24h | | idx-rec-no-cache | No | N/A | N/A | Min Execution Count is the minimum value of execution the statement must have so we decide to generate recommendation Running TPC C on all branches, results: | branch | elapsed | tpmC | efc | avg(ms) | p50(ms) | p90(ms) | p95(ms) | p99(ms) | pMax(ms)| | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | main | 4200.1s | 12.4 | 96.8% | 19.8 | 18.9 | 25.2 | 31.5 | 46.1 | 302.0 | | idx-rec-cache-cnt | 3941.0s |12.6 | 98.3% | 18.8 | 18.9 | 23.1 | 25.2 | 37.7 | 52.4 | | idx-rec-large-cnt | 4138.8s | 12.3 | 95.7% | 18.0 | 17.8 | 22.0 | 24.1 | 39.8 | 130.0 | | idx-rec-random | 4321.9s | 12.6 | 98.1% | 19.3 | 18.9 | 24.1 | 27.3 | 39.8 | 209.7 | | idx-rec-no-cnt | 4200.2s | 12.4 | 96.4% | 19.3 | 18.9 | 24.1 | 26.2 | 44.0 | 67.1 | | idx-rec-no-cache | 4199.9s | 12.5 | 97.4% | 31.1 | 27.3 | 52.4 | 60.8 | 83.9 | 151.0 | The most significant change is when there is no cache, confirming the cache is necessary. The decision on keeping or removing the execution count check wasn't clear, so I decided to keep the count with a value of 5 and run another workload. Running YSCB A: | branch | elapsed | errors | ops(total) | ops/sec(cum) | avg(ms) | p50(ms) | p95(ms) | p99(ms) | pMax(ms)| | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | main | 4200.1s | 0 | 33334570 | 7936.6 | 4.0 | 2.9 | 11.0 | 18.9 | 436.2 | | idx-rec-cache-cnt | 4012.4s | 0 | 20153193 | 5022.7 | 6.4 | 3.8 | 19.9 | 37.7 | 1879.0 | | idx-rec-no-cnt | 4200.1s | 0 | 29112702 | 6931.4 | 4.6 | 3.1 | 13.1 | 23.1 | 1342.2 | | idx-rec-no-cache | 4200.1s | 0 | 27216030 | 6479.8 | 4.9 | 3.4 | 13.6 | 23.1 | 872.4 | Keeping the execution count has a lower performance when there is more contention, but since we have customers that generate several similar fingerprints by using different columns order on selects etc, we want to avoid creating recommendations for those cases, since it would be almost as having no cache. So I created another option (branch idx-rec-cache), where we do have a count, but once the count reaches the min count, we don't need to update it anymore (helping with contention on the cache itself). Running workloads with those options again: TPC C | branch | elapsed | tpmC | efc | avg(ms) | p50(ms) | p90(ms) | p95(ms) | p99(ms) | pMax(ms)| | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | main | 4200.1s | 12.6 | 98.0% | 19.0 | 18.9 | 22.0 | 23.1 | 27.3 | 62.9 | | idx-rec-cache | 4200.0s | 12.5 | 97.0% | 19.3 | 19.9 | 23.1 | 24.1 | 31.5 | 62.9 | | idx-rec-cnt | 4200.1s | 12.6 | 97.9% | 20.0 | 19.9 | 24.1 | 26.2 | 39.8 | 79.7 | | idx-rec-no-cache | 4200.3s | 12.3 | 96.0% | 26.3 | 26.2 | 29.4 | 31.5 | 46.1 | 75.5 | YSCB A | branch | elapsed | errors | ops(total) | ops/sec(cum) | avg(ms) | p50(ms) | p95(ms) | p99(ms) | pMax(ms)| | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | main | 4200.0s | 0 | 27374582 | 6517.8 | 4.9 | 3.4 | 13.1 | 28.3 | 872.4 | | idx-rec-cache | 4200.0s | 0 | 26575359 | 6327.4 | 5.1 | 3.5 | 13.6 | 29.4 | 906.0 | | idx-rec-cnt | 4199.9s | 0 | 27459203 | 6538.1 | 4.9 | 3.4 | 13.1 | 28.3 | 704.6 | | idx-rec-no-cache | 4199.9s | 0 | 24826078 | 5911.1 | 5.4 | 3.7 | 14.2 | 30.4 | 738.2 | --- Release note (sql change): Index recommendations are generated for statements. New recommendations are generated every hour and available on `system.statement_statistics` and `crdb_internal.statement_statistics`. New cluster setting with default value of 100k sql.metrics.statement_details.max_mem_reported_idx_recommendations 85833: sql: fix flaky size error in max_size session serialization test r=cucaroach a=cucaroach Fixes: #85821 Release note: None Co-authored-by: Marylia Gutierrez <[email protected]> Co-authored-by: Tommy Reilly <[email protected]>

maryliag added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 4, 2022

maryliag self-assigned this Jul 4, 2022

blathers-crl bot added the T-sql-observability label Jul 4, 2022

maryliag mentioned this issue Jul 4, 2022

add index recommendation to the UI #83783

Closed

maryliag mentioned this issue Jul 12, 2022

sql: new cluster setting for index recommendation #84282

Merged

kevin-v-ngo mentioned this issue Jul 13, 2022

Enrich the existing query log format for cloud telemetry #71328

Open

11 tasks

maryliag mentioned this issue Jul 18, 2022

sql: add new index_recommendation column #84618

Merged

maryliag added a commit to maryliag/cockroach that referenced this issue Jul 26, 2022

sql: generate index recommendation and save on stats

51f997a

Fixes cockroachdb#83782 Release note (sql change): Generate index recommendation per statement fingerprint id and save to its stats.

kevin-v-ngo mentioned this issue Jul 27, 2022

sql: add columns to node_execution_insights #85131

Merged

maryliag mentioned this issue Aug 8, 2022

sql: generate index recommendations #85343

Merged

craig bot closed this as completed in 4655f97 Aug 10, 2022

kevin-v-ngo mentioned this issue Oct 10, 2022

Enrich the existing query log format for cloud telemetry - V2 #89681

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save index recommendations per statement fingerprint id #83782

Save index recommendations per statement fingerprint id #83782

maryliag commented Jul 4, 2022 •

edited

Loading

kevin-v-ngo commented Jul 13, 2022

Save index recommendations per statement fingerprint id #83782

Save index recommendations per statement fingerprint id #83782

Comments

maryliag commented Jul 4, 2022 • edited Loading

kevin-v-ngo commented Jul 13, 2022

maryliag commented Jul 4, 2022 •

edited

Loading