-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt, sql: locality optimized scans with more than 1 row #64862
Labels
A-multiregion
Related to multi-region
A-partitioning
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
docs-done
docs-known-limitation
T-multiregion
Comments
rytaft
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
A-partitioning
A-multiregion
Related to multi-region
T-multiregion
T-sql-queries
SQL Queries Team
labels
May 7, 2021
rytaft
added a commit
to rytaft/cockroach
that referenced
this issue
Aug 26, 2021
This commit updates the logic for planning locality optimized search to allow the optimization the be planned if there are fewer than 10,000 keys selected. The optimization is not yet supported for scans with a hard limit. Informs cockroachdb#64862 Release justification: Low risk, high benefit change to existing functionality. Release note (performance improvement): locality optimized search is now supported for scans that are guaranteed to return 10000 keys or less. This optimization allows the execution engine to avoid visiting remote regions if all requested keys are found in the local region, thus reducing the latency of the query.
rytaft
added a commit
to rytaft/cockroach
that referenced
this issue
Aug 26, 2021
This commit updates the logic for planning locality optimized search to allow the optimization the be planned if there are fewer than 10,000 keys selected. The optimization is not yet supported for scans with a hard limit. Informs cockroachdb#64862 Release justification: Low risk, high benefit change to existing functionality. Release note (performance improvement): locality optimized search is now supported for scans that are guaranteed to return 10000 keys or less. This optimization allows the execution engine to avoid visiting remote regions if all requested keys are found in the local region, thus reducing the latency of the query.
exalate-issue-sync
bot
added
T-multi-region
and removed
T-multiregion
T-sql-queries
SQL Queries Team
labels
Aug 26, 2021
rytaft
added a commit
to rytaft/cockroach
that referenced
this issue
Aug 27, 2021
This commit updates the logic for planning locality optimized search to allow the optimization the be planned if there are no more than 10,000 keys selected. The optimization is not yet supported for scans with a hard limit. Informs cockroachdb#64862 Release justification: Low risk, high benefit change to existing functionality. Release note (performance improvement): locality optimized search is now supported for scans that are guaranteed to return 10000 keys or less. This optimization allows the execution engine to avoid visiting remote regions if all requested keys are found in the local region, thus reducing the latency of the query.
rytaft
added a commit
to rytaft/cockroach
that referenced
this issue
Aug 27, 2021
This commit updates the logic for planning locality optimized search to allow the optimization the be planned if there are no more than 100,000 keys selected. The optimization is not yet supported for scans with a hard limit. Informs cockroachdb#64862 Release justification: Low risk, high benefit change to existing functionality. Release note (performance improvement): locality optimized search is now supported for scans that are guaranteed to return 100,000 keys or less. This optimization allows the execution engine to avoid visiting remote regions if all requested keys are found in the local region, thus reducing the latency of the query.
rytaft
added a commit
to rytaft/cockroach
that referenced
this issue
Aug 27, 2021
This commit updates the logic for planning locality optimized search to allow the optimization the be planned if there are no more than 100,000 keys selected. The optimization is not yet supported for scans with a hard limit. Informs cockroachdb#64862 Release justification: Low risk, high benefit change to existing functionality. Release note (performance improvement): locality optimized search is now supported for scans that are guaranteed to return 100,000 keys or less. This optimization allows the execution engine to avoid visiting remote regions if all requested keys are found in the local region, thus reducing the latency of the query.
yuzefovich
pushed a commit
to rytaft/cockroach
that referenced
this issue
Aug 28, 2021
This commit updates the logic for planning locality optimized search to allow the optimization the be planned if there are no more than 100,000 keys selected. The optimization is not yet supported for scans with a hard limit. Informs cockroachdb#64862 Release justification: Low risk, high benefit change to existing functionality. Release note (performance improvement): locality optimized search is now supported for scans that are guaranteed to return 100,000 keys or less. This optimization allows the execution engine to avoid visiting remote regions if all requested keys are found in the local region, thus reducing the latency of the query.
craig bot
pushed a commit
that referenced
this issue
Aug 28, 2021
69238: server: create api to query persisted stats by date range r=xinhaoz a=xinhaoz This commit creates a new API endpoint /_status/combinedstmts to fetch combined in-memory and persisted statements and transactions from crdb_internal.statement_statistics and crdb_internal.transaction_statistics. The request supports optional start and end parameters which represent the unix time at which the data was aggregated. The parameteres start, end and combined have also been added to the StatementsRequest message. Setting combined to true will forward the request to fetch data from the new combined api, using the start and end parameters provided. Release justification: Category 2 low-risk updates to new functionality Release note (api change): New endpoint /_status/combinedstmts to retrieve persisted and in-memory statements from crdb_internal.statement_statistics and crdb_internal.transaction_statistics by aggregated_ts range. The request supports optional query string parameters start and end, which are the date range in unix time. The response returned is currently the response expected from /_status/statements. /_status/statements has also been udpated to support the parameters combined, start, and end. If combined is true, then the statements endpoint will use /_status/combinedstmts with the optional parameters start and end. 69395: opt: support locality optimized search for scans with more than 1 row r=yuzefovich a=rytaft This commit updates the logic for planning locality optimized search to allow the optimization the be planned if there are fewer than 100,000 keys selected. The optimization is not yet supported for scans with a hard limit. Informs #64862 Release justification: Low risk, high benefit change to existing functionality. Release note (performance improvement): locality optimized search is now supported for scans that are guaranteed to return 100000 keys or less. This optimization allows the execution engine to avoid visiting remote regions if all requested keys are found in the local region, thus reducing the latency of the query. 69469: ts: include histogram quantiles in tsdump r=dhartunian a=tbg `cockroach debug tsdump` previously silently did not return metrics backed by histograms. This is for technical reasons related to the bookkeeping of metrics names and is rectified here by requiring some extra tagging of metrics that are histograms so that they can be picked up by tsdump. It's not pretty, but pragmatic: it works and it'll be clear to anyone adding a histogram in the future how to proceed, even if they may wonder why things work in such a roundabout manner (and if they're curious about that, the relevant issues are linked in comments as well). I also renamed AllMetricsNames to AllInternalTimeseriesMetricsNames to make clear what is being returned. Demo: ``` killall -9 cockroach; rm -rf cockroach-data; ./cockroach start-single-node --insecure --background && \ ./cockroach workload run kv --init \ 'postgres://[email protected]:26257?sslmode=disable' --duration=300s && \ ./cockroach debug tsdump --format=raw --insecure > tsdump.gob && \ killall -9 cockroach && rm -rf cockroach-data && \ COCKROACH_DEBUG_TS_IMPORT_FILE=tsdump.gob ./cockroach start-single-node --insecure ``` ![image](https://user-images.githubusercontent.com/5076964/131134624-b5471621-d23b-4ce7-9026-e8aeb3613231.png) Release justification: low-risk observability fix Release note (ops change): The ./cockroach debug tsdump command now downloads histogram timeseries it silently omitted previously. Co-authored-by: Xin Hao Zhang <[email protected]> Co-authored-by: Rebecca Taft <[email protected]> Co-authored-by: Tobias Grieger <[email protected]>
Adding docs-known-limitation tag to note that locality optimized search only works for queries selecting up to 100,000 unique keys. It does not yet work for |
msirek
pushed a commit
to msirek/cockroach
that referenced
this issue
Jan 24, 2022
This commit adds locality optimized scan support for queries which place a hard limit on the number of rows returned via the LIMIT clause. This optimization benefits tables with REGIONAL BY ROW locality by splitting the spans accessed into a local spans set and a remote spans set, combined via a UNION ALL operation where each branch of the UNION ALL has the same hard limit as the original SELECT query block. If the limit is reached by scanning just the local spans, then latency is improved. The optimization is not applied if the LIMIT is more than the KV batch size of 100000 rows or if the number of spans in the scan exceeds 10000. This commit also adds an improvement to span merging to avoid merging local spans with remote spans in order to maximize the number of queries that can utilize locality optimized scan. Informs cockroachdb#64862 Release note (Performance Improvement): Queries with a LIMIT clause applied against a single table, either explicitly written, or implicit such as in an uncorrelated EXISTS subquery, now scan that table with improved latency if the table is defined with LOCALITY REGIONAL BY ROW and the number of qualified rows residing in the local region is less than or equal to the hard limit (sum of the LIMIT clause and optional OFFSET clause values). This optimization is only applied if the hard limit is 100000 or less.
msirek
pushed a commit
to msirek/cockroach
that referenced
this issue
Jan 31, 2022
This commit adds locality optimized scan support for queries which place a hard limit on the number of rows returned via the LIMIT clause. This optimization benefits tables with REGIONAL BY ROW locality by splitting the spans accessed into a local spans set and a remote spans set, combined via a UNION ALL operation where each branch of the UNION ALL has the same hard limit as the original SELECT query block. If the limit is reached by scanning just the local spans, then latency is improved. The optimization is not applied if the LIMIT is more than the KV batch size of 100000 rows or if the number of spans in the scan exceeds 10000. This commit also adds an improvement to span merging to avoid merging local spans with remote spans in order to maximize the number of queries that can utilize locality optimized scan. Informs cockroachdb#64862 Release note (Performance Improvement): Queries with a LIMIT clause applied against a single table, either explicitly written, or implicit such as in an uncorrelated EXISTS subquery, now scan that table with improved latency if the table is defined with LOCALITY REGIONAL BY ROW and the number of qualified rows residing in the local region is less than or equal to the hard limit (sum of the LIMIT clause and optional OFFSET clause values). This optimization is only applied if the hard limit is 100000 or less.
msirek
pushed a commit
to msirek/cockroach
that referenced
this issue
Feb 4, 2022
This commit adds locality optimized scan support for queries which place a hard limit on the number of rows returned via the LIMIT clause. This optimization benefits tables with REGIONAL BY ROW locality by splitting the spans accessed into a local spans set and a remote spans set, combined via a UNION ALL operation where each branch of the UNION ALL has the same hard limit as the original SELECT query block. If the limit is reached by scanning just the local spans, then latency is improved. The optimization is not applied if the LIMIT is more than the KV batch size of 100000 rows or if the number of spans in the scan exceeds 10000. This commit also adds an improvement to span merging to avoid merging local spans with remote spans in order to maximize the number of queries that can utilize locality optimized scan. Informs cockroachdb#64862 Release note (Performance Improvement): Queries with a LIMIT clause applied against a single table, either explicitly written, or implicit such as in an uncorrelated EXISTS subquery, now scan that table with improved latency if the table is defined with LOCALITY REGIONAL BY ROW and the number of qualified rows residing in the local region is less than or equal to the hard limit (sum of the LIMIT clause and optional OFFSET clause values). This optimization is only applied if the hard limit is 100000 or less.
msirek
pushed a commit
to msirek/cockroach
that referenced
this issue
Feb 4, 2022
This commit adds locality optimized scan support for queries which place a hard limit on the number of rows returned via the LIMIT clause. This optimization benefits tables with REGIONAL BY ROW locality by splitting the spans accessed into a local spans set and a remote spans set, combined via a UNION ALL operation where each branch of the UNION ALL has the same hard limit as the original SELECT query block. If the limit is reached by scanning just the local spans, then latency is improved. The optimization is not applied if the LIMIT is more than the KV batch size of 100000 rows or if the number of spans in the scan exceeds 10000. This commit also adds an improvement to span merging to avoid merging local spans with remote spans in order to maximize the number of queries that can utilize locality optimized scan. Informs cockroachdb#64862 Release note (Performance Improvement): Queries with a LIMIT clause applied against a single table, either explicitly written, or implicit such as in an uncorrelated EXISTS subquery, now scan that table with improved latency if the table is defined with LOCALITY REGIONAL BY ROW and the number of qualified rows residing in the local region is less than or equal to the hard limit (sum of the LIMIT clause and optional OFFSET clause values). This optimization is only applied if the hard limit is 100000 or less.
msirek
pushed a commit
to msirek/cockroach
that referenced
this issue
Feb 4, 2022
This commit adds locality optimized scan support for queries which place a hard limit on the number of rows returned via the LIMIT clause. This optimization benefits tables with REGIONAL BY ROW locality by splitting the spans accessed into a local spans set and a remote spans set, combined via a UNION ALL operation where each branch of the UNION ALL has the same hard limit as the original SELECT query block. If the limit is reached by scanning just the local spans, then latency is improved. The optimization is not applied if the LIMIT is more than the KV batch size of 100000 rows or if the number of spans in the scan exceeds 10000. This commit also adds an improvement to span merging to avoid merging local spans with remote spans in order to maximize the number of queries that can utilize locality optimized scan. Informs cockroachdb#64862 Release note (Performance Improvement): Queries with a LIMIT clause applied against a single table, either explicitly written, or implicit such as in an uncorrelated EXISTS subquery, now scan that table with improved latency if the table is defined with LOCALITY REGIONAL BY ROW and the number of qualified rows residing in the local region is less than or equal to the hard limit (sum of the LIMIT clause and optional OFFSET clause values). This optimization is only applied if the hard limit is 100000 or less.
msirek
pushed a commit
to msirek/cockroach
that referenced
this issue
Feb 8, 2022
This commit adds locality optimized scan support for queries which place a hard limit on the number of rows returned via the LIMIT clause. This optimization benefits tables with REGIONAL BY ROW locality by splitting the spans accessed into a local spans set and a remote spans set, combined via a UNION ALL operation where each branch of the UNION ALL has the same hard limit as the original SELECT query block. If the limit is reached by scanning just the local spans, then latency is improved. The optimization is not applied if the LIMIT is more than the KV batch size of 100000 rows or if the number of spans in the scan exceeds 10000. This commit also adds an improvement to span merging to avoid merging local spans with remote spans in order to maximize the number of queries that can utilize locality optimized scan. Fixes cockroachdb#64862 Release note (Performance Improvement): Queries with a LIMIT clause applied against a single table, either explicitly written, or implicit such as in an uncorrelated EXISTS subquery, now scan that table with improved latency if the table is defined with LOCALITY REGIONAL BY ROW and the number of qualified rows residing in the local region is less than or equal to the hard limit (sum of the LIMIT clause and optional OFFSET clause values). This optimization is only applied if the hard limit is 100000 or less.
msirek
pushed a commit
to msirek/cockroach
that referenced
this issue
Feb 8, 2022
This commit adds locality optimized scan support for queries which place a hard limit on the number of rows returned via the LIMIT clause. This optimization benefits tables with REGIONAL BY ROW locality by splitting the spans accessed into a local spans set and a remote spans set, combined via a UNION ALL operation where each branch of the UNION ALL has the same hard limit as the original SELECT query block. If the limit is reached by scanning just the local spans, then latency is improved. The optimization is not applied if the LIMIT is more than the KV batch size of 100000 rows or if the number of spans in the scan exceeds 10000. This commit also adds an improvement to span merging to avoid merging local spans with remote spans in order to maximize the number of queries that can utilize locality optimized scan. Fixes cockroachdb#64862 Release note (Performance Improvement): Queries with a LIMIT clause applied against a single table, either explicitly written, or implicit such as in an uncorrelated EXISTS subquery, now scan that table with improved latency if the table is defined with LOCALITY REGIONAL BY ROW and the number of qualified rows residing in the local region is less than or equal to the hard limit (sum of the LIMIT clause and optional OFFSET clause values). This optimization is only applied if the hard limit is 100000 or less.
msirek
pushed a commit
to msirek/cockroach
that referenced
this issue
Feb 16, 2022
This commit adds locality optimized scan support for queries which place a hard limit on the number of rows returned via the LIMIT clause. This optimization benefits tables with REGIONAL BY ROW locality by splitting the spans accessed into a local spans set and a remote spans set, combined via a UNION ALL operation where each branch of the UNION ALL has the same hard limit as the original SELECT query block. If the limit is reached by scanning just the local spans, then latency is improved. The optimization is not applied if the LIMIT is more than the KV batch size of 100000 rows or if the number of spans in the scan exceeds 10000. This commit also adds an improvement to span merging to avoid merging local spans with remote spans in order to maximize the number of queries that can utilize locality optimized scan. Fixes cockroachdb#64862 Release note (Performance Improvement): Queries with a LIMIT clause applied against a single table, either explicitly written, or implicit such as in an uncorrelated EXISTS subquery, now scan that table with improved latency if the table is defined with LOCALITY REGIONAL BY ROW and the number of qualified rows residing in the local region is less than or equal to the hard limit (sum of the LIMIT clause and optional OFFSET clause values). This optimization is only applied if the hard limit is 100000 or less.
craig bot
pushed a commit
that referenced
this issue
Feb 16, 2022
75431: opt: locality optimized scan for queries with a LIMIT clause r=msirek a=msirek This commit adds locality optimized scan support for queries which place a hard limit on the number of rows returned via the LIMIT clause. This optimization benefits tables with REGIONAL BY ROW locality by splitting the spans accessed into a local spans set and a remote spans set, combined via a UNION ALL operation where each branch of the UNION ALL has the same hard limit as the original SELECT query block. If the limit is reached by scanning just the local spans, then latency is improved. The optimization is not applied if the LIMIT is more than the KV batch size of 100000 rows or if the number of spans in the scan exceeds 10000. This commit also adds an improvement to span merging to avoid merging local spans with remote spans in order to maximize the number of queries that can utilize locality optimized scan. Fixes #64862 Release note (Performance Improvement): Queries with a LIMIT clause applied against a single table, either explicitly written, or implicit such as in an uncorrelated EXISTS subquery, now scan that table with improved latency if the table is defined with LOCALITY REGIONAL BY ROW and the number of qualified rows residing in the local region is less than or equal to the hard limit (sum of the LIMIT clause and optional OFFSET clause values). This optimization is only applied if the hard limit is 100000 or less. 76639: builtins: fix topological sort for SHOW CREATE ALL r=otan a=rafiss fixes #76077 Release note (bug fix): Fixed an error that could sometimes happen when sorting the output of the SHOW CREATE ALL TABLES command. Co-authored-by: Mark Sirek <[email protected]> Co-authored-by: Rafi Shamim <[email protected]>
RajivTS
pushed a commit
to RajivTS/cockroach
that referenced
this issue
Mar 6, 2022
This commit adds locality optimized scan support for queries which place a hard limit on the number of rows returned via the LIMIT clause. This optimization benefits tables with REGIONAL BY ROW locality by splitting the spans accessed into a local spans set and a remote spans set, combined via a UNION ALL operation where each branch of the UNION ALL has the same hard limit as the original SELECT query block. If the limit is reached by scanning just the local spans, then latency is improved. The optimization is not applied if the LIMIT is more than the KV batch size of 100000 rows or if the number of spans in the scan exceeds 10000. This commit also adds an improvement to span merging to avoid merging local spans with remote spans in order to maximize the number of queries that can utilize locality optimized scan. Fixes cockroachdb#64862 Release note (Performance Improvement): Queries with a LIMIT clause applied against a single table, either explicitly written, or implicit such as in an uncorrelated EXISTS subquery, now scan that table with improved latency if the table is defined with LOCALITY REGIONAL BY ROW and the number of qualified rows residing in the local region is less than or equal to the hard limit (sum of the LIMIT clause and optional OFFSET clause values). This optimization is only applied if the hard limit is 100000 or less.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-multiregion
Related to multi-region
A-partitioning
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
docs-done
docs-known-limitation
T-multiregion
A locality optimized scan is a scan that is known to produce a limited number of rows, and it is planned such that local nodes (those in the same region as the gateway) are searched for those rows before remote nodes. If all rows are found locally, there is no need for the execution engine to communicate with remote nodes. This can significantly reduce the latency of queries in a multi-region cluster.
#60831 added support for locality optimized scans when it was known that a scan would produce at most 1 row due to selecting a single value of a unique key. However, it should be possible to apply this optimization for any scan with a known finite number of rows. For example, if a scan has a hard limit, we should be able to avoid visiting remote nodes if the limit is reached by scanning only local nodes.
We'll probably want to set a threshold and only apply this optimization if the limit is, say, less than 10000 rows (the size of a kv batch).
See the original issue for more context and details: #55185.
Epic CRDB-9596
The text was updated successfully, but these errors were encountered: