Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt, sql: locality optimized scans with more than 1 row #64862

Closed
rytaft opened this issue May 7, 2021 · 1 comment · Fixed by #75431
Closed

opt, sql: locality optimized scans with more than 1 row #64862

rytaft opened this issue May 7, 2021 · 1 comment · Fixed by #75431
Assignees
Labels
A-multiregion Related to multi-region A-partitioning C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. docs-done docs-known-limitation T-multiregion

Comments

@rytaft
Copy link
Collaborator

rytaft commented May 7, 2021

A locality optimized scan is a scan that is known to produce a limited number of rows, and it is planned such that local nodes (those in the same region as the gateway) are searched for those rows before remote nodes. If all rows are found locally, there is no need for the execution engine to communicate with remote nodes. This can significantly reduce the latency of queries in a multi-region cluster.

#60831 added support for locality optimized scans when it was known that a scan would produce at most 1 row due to selecting a single value of a unique key. However, it should be possible to apply this optimization for any scan with a known finite number of rows. For example, if a scan has a hard limit, we should be able to avoid visiting remote nodes if the limit is reached by scanning only local nodes.

We'll probably want to set a threshold and only apply this optimization if the limit is, say, less than 10000 rows (the size of a kv batch).

See the original issue for more context and details: #55185.

Epic CRDB-9596

@rytaft rytaft added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-partitioning A-multiregion Related to multi-region T-multiregion T-sql-queries SQL Queries Team labels May 7, 2021
@rytaft rytaft self-assigned this Aug 26, 2021
rytaft added a commit to rytaft/cockroach that referenced this issue Aug 26, 2021
This commit updates the logic for planning locality optimized search to
allow the optimization the be planned if there are fewer than 10,000
keys selected.

The optimization is not yet supported for scans with a hard limit.

Informs cockroachdb#64862

Release justification: Low risk, high benefit change to existing functionality.

Release note (performance improvement): locality optimized search
is now supported for scans that are guaranteed to return 10000 keys
or less. This optimization allows the execution engine to avoid
visiting remote regions if all requested keys are found in the local
region, thus reducing the latency of the query.
rytaft added a commit to rytaft/cockroach that referenced this issue Aug 26, 2021
This commit updates the logic for planning locality optimized search to
allow the optimization the be planned if there are fewer than 10,000
keys selected.

The optimization is not yet supported for scans with a hard limit.

Informs cockroachdb#64862

Release justification: Low risk, high benefit change to existing functionality.

Release note (performance improvement): locality optimized search
is now supported for scans that are guaranteed to return 10000 keys
or less. This optimization allows the execution engine to avoid
visiting remote regions if all requested keys are found in the local
region, thus reducing the latency of the query.
rytaft added a commit to rytaft/cockroach that referenced this issue Aug 27, 2021
This commit updates the logic for planning locality optimized search to
allow the optimization the be planned if there are no more than 10,000
keys selected.

The optimization is not yet supported for scans with a hard limit.

Informs cockroachdb#64862

Release justification: Low risk, high benefit change to existing functionality.

Release note (performance improvement): locality optimized search
is now supported for scans that are guaranteed to return 10000 keys
or less. This optimization allows the execution engine to avoid
visiting remote regions if all requested keys are found in the local
region, thus reducing the latency of the query.
rytaft added a commit to rytaft/cockroach that referenced this issue Aug 27, 2021
This commit updates the logic for planning locality optimized search to
allow the optimization the be planned if there are no more than 100,000
keys selected.

The optimization is not yet supported for scans with a hard limit.

Informs cockroachdb#64862

Release justification: Low risk, high benefit change to existing functionality.

Release note (performance improvement): locality optimized search
is now supported for scans that are guaranteed to return 100,000 keys
or less. This optimization allows the execution engine to avoid
visiting remote regions if all requested keys are found in the local
region, thus reducing the latency of the query.
rytaft added a commit to rytaft/cockroach that referenced this issue Aug 27, 2021
This commit updates the logic for planning locality optimized search to
allow the optimization the be planned if there are no more than 100,000
keys selected.

The optimization is not yet supported for scans with a hard limit.

Informs cockroachdb#64862

Release justification: Low risk, high benefit change to existing functionality.

Release note (performance improvement): locality optimized search
is now supported for scans that are guaranteed to return 100,000 keys
or less. This optimization allows the execution engine to avoid
visiting remote regions if all requested keys are found in the local
region, thus reducing the latency of the query.
yuzefovich pushed a commit to rytaft/cockroach that referenced this issue Aug 28, 2021
This commit updates the logic for planning locality optimized search to
allow the optimization the be planned if there are no more than 100,000
keys selected.

The optimization is not yet supported for scans with a hard limit.

Informs cockroachdb#64862

Release justification: Low risk, high benefit change to existing functionality.

Release note (performance improvement): locality optimized search
is now supported for scans that are guaranteed to return 100,000 keys
or less. This optimization allows the execution engine to avoid
visiting remote regions if all requested keys are found in the local
region, thus reducing the latency of the query.
craig bot pushed a commit that referenced this issue Aug 28, 2021
69238: server: create api to query persisted stats by date range r=xinhaoz a=xinhaoz

This commit creates a new API endpoint /_status/combinedstmts
to fetch combined in-memory and persisted statements and
transactions from crdb_internal.statement_statistics
and crdb_internal.transaction_statistics. The request
supports optional start and end parameters which
represent the unix time at which the data was aggregated.

The parameteres start, end and combined have also been
added to the StatementsRequest message. Setting combined
to true will forward the request to fetch data from the
new combined api, using the start and end parameters
provided.

Release justification: Category 2 low-risk updates to new
functionality

Release note (api change): New endpoint /_status/combinedstmts
to retrieve persisted and in-memory statements from
crdb_internal.statement_statistics and
crdb_internal.transaction_statistics by aggregated_ts
range. The request supports optional query string
parameters start and end, which are the date range in unix
time. The response returned is currently the response
expected from /_status/statements.

/_status/statements has also been udpated to support
the parameters combined, start, and end.
If combined is true, then the statements endpoint will
use /_status/combinedstmts with the optional parameters
start and end.

69395: opt: support locality optimized search for scans with more than 1 row r=yuzefovich a=rytaft

This commit updates the logic for planning locality optimized search to
allow the optimization the be planned if there are fewer than 100,000
keys selected.

The optimization is not yet supported for scans with a hard limit.

Informs #64862

Release justification: Low risk, high benefit change to existing functionality.

Release note (performance improvement): locality optimized search
is now supported for scans that are guaranteed to return 100000 keys
or less. This optimization allows the execution engine to avoid
visiting remote regions if all requested keys are found in the local
region, thus reducing the latency of the query.

69469: ts: include histogram quantiles in tsdump r=dhartunian a=tbg

`cockroach debug tsdump` previously silently did not return metrics
backed by histograms. This is for technical reasons related to the
bookkeeping of metrics names and is rectified here by requiring some
extra tagging of metrics that are histograms so that they can be picked
up by tsdump. It's not pretty, but pragmatic: it works and it'll be
clear to anyone adding a histogram in the future how to proceed, even if
they may wonder why things work in such a roundabout manner (and if
they're curious about that, the relevant issues are linked in comments
as well).

I also renamed AllMetricsNames to AllInternalTimeseriesMetricsNames
to make clear what is being returned.

Demo:

```
killall -9 cockroach; rm -rf cockroach-data;
./cockroach start-single-node --insecure --background && \
./cockroach workload run kv --init \
    'postgres://[email protected]:26257?sslmode=disable' --duration=300s && \
./cockroach debug tsdump --format=raw --insecure > tsdump.gob && \
killall -9 cockroach && rm -rf cockroach-data && \
COCKROACH_DEBUG_TS_IMPORT_FILE=tsdump.gob ./cockroach start-single-node --insecure
```

![image](https://user-images.githubusercontent.com/5076964/131134624-b5471621-d23b-4ce7-9026-e8aeb3613231.png)

Release justification: low-risk observability fix
Release note (ops change): The ./cockroach debug tsdump command now
downloads histogram timeseries it silently omitted previously.

Co-authored-by: Xin Hao Zhang <[email protected]>
Co-authored-by: Rebecca Taft <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
@rytaft
Copy link
Collaborator Author

rytaft commented Oct 18, 2021

Adding docs-known-limitation tag to note that locality optimized search only works for queries selecting up to 100,000 unique keys. It does not yet work for LIMIT.

@msirek msirek self-assigned this Jan 16, 2022
msirek pushed a commit to msirek/cockroach that referenced this issue Jan 24, 2022
This commit adds locality optimized scan support for queries which
place a hard limit on the number of rows returned via the LIMIT clause.
This optimization benefits tables with REGIONAL BY ROW locality by
splitting the spans accessed into a local spans set and a remote spans
set, combined via a UNION ALL operation where each branch of the UNION
ALL has the same hard limit as the original SELECT query block. If the
limit is reached by scanning just the local spans, then latency is
improved.

The optimization is not applied if the LIMIT is more than the KV batch
size of 100000 rows or if the number of spans in the scan exceeds 10000.

This commit also adds an improvement to span merging to avoid merging
local spans with remote spans in order to maximize the number of queries
that can utilize locality optimized scan.

Informs cockroachdb#64862

Release note (Performance Improvement): Queries with a LIMIT clause
applied against a single table, either explicitly written, or implicit
such as in an uncorrelated EXISTS subquery, now scan that table with
improved latency if the table is defined with LOCALITY REGIONAL BY ROW
and the number of qualified rows residing in the local region is less
than or equal to the hard limit (sum of the LIMIT clause and optional
OFFSET clause values). This optimization is only applied if the hard
limit is 100000 or less.
msirek pushed a commit to msirek/cockroach that referenced this issue Jan 31, 2022
This commit adds locality optimized scan support for queries which
place a hard limit on the number of rows returned via the LIMIT clause.
This optimization benefits tables with REGIONAL BY ROW locality by
splitting the spans accessed into a local spans set and a remote spans
set, combined via a UNION ALL operation where each branch of the UNION
ALL has the same hard limit as the original SELECT query block. If the
limit is reached by scanning just the local spans, then latency is
improved.

The optimization is not applied if the LIMIT is more than the KV batch
size of 100000 rows or if the number of spans in the scan exceeds 10000.

This commit also adds an improvement to span merging to avoid merging
local spans with remote spans in order to maximize the number of queries
that can utilize locality optimized scan.

Informs cockroachdb#64862

Release note (Performance Improvement): Queries with a LIMIT clause
applied against a single table, either explicitly written, or implicit
such as in an uncorrelated EXISTS subquery, now scan that table with
improved latency if the table is defined with LOCALITY REGIONAL BY ROW
and the number of qualified rows residing in the local region is less
than or equal to the hard limit (sum of the LIMIT clause and optional
OFFSET clause values). This optimization is only applied if the hard
limit is 100000 or less.
msirek pushed a commit to msirek/cockroach that referenced this issue Feb 4, 2022
This commit adds locality optimized scan support for queries which
place a hard limit on the number of rows returned via the LIMIT clause.
This optimization benefits tables with REGIONAL BY ROW locality by
splitting the spans accessed into a local spans set and a remote spans
set, combined via a UNION ALL operation where each branch of the UNION
ALL has the same hard limit as the original SELECT query block. If the
limit is reached by scanning just the local spans, then latency is
improved.

The optimization is not applied if the LIMIT is more than the KV batch
size of 100000 rows or if the number of spans in the scan exceeds 10000.

This commit also adds an improvement to span merging to avoid merging
local spans with remote spans in order to maximize the number of queries
that can utilize locality optimized scan.

Informs cockroachdb#64862

Release note (Performance Improvement): Queries with a LIMIT clause
applied against a single table, either explicitly written, or implicit
such as in an uncorrelated EXISTS subquery, now scan that table with
improved latency if the table is defined with LOCALITY REGIONAL BY ROW
and the number of qualified rows residing in the local region is less
than or equal to the hard limit (sum of the LIMIT clause and optional
OFFSET clause values). This optimization is only applied if the hard
limit is 100000 or less.
msirek pushed a commit to msirek/cockroach that referenced this issue Feb 4, 2022
This commit adds locality optimized scan support for queries which
place a hard limit on the number of rows returned via the LIMIT clause.
This optimization benefits tables with REGIONAL BY ROW locality by
splitting the spans accessed into a local spans set and a remote spans
set, combined via a UNION ALL operation where each branch of the UNION
ALL has the same hard limit as the original SELECT query block. If the
limit is reached by scanning just the local spans, then latency is
improved.

The optimization is not applied if the LIMIT is more than the KV batch
size of 100000 rows or if the number of spans in the scan exceeds 10000.

This commit also adds an improvement to span merging to avoid merging
local spans with remote spans in order to maximize the number of queries
that can utilize locality optimized scan.

Informs cockroachdb#64862

Release note (Performance Improvement): Queries with a LIMIT clause
applied against a single table, either explicitly written, or implicit
such as in an uncorrelated EXISTS subquery, now scan that table with
improved latency if the table is defined with LOCALITY REGIONAL BY ROW
and the number of qualified rows residing in the local region is less
than or equal to the hard limit (sum of the LIMIT clause and optional
OFFSET clause values). This optimization is only applied if the hard
limit is 100000 or less.
msirek pushed a commit to msirek/cockroach that referenced this issue Feb 4, 2022
This commit adds locality optimized scan support for queries which
place a hard limit on the number of rows returned via the LIMIT clause.
This optimization benefits tables with REGIONAL BY ROW locality by
splitting the spans accessed into a local spans set and a remote spans
set, combined via a UNION ALL operation where each branch of the UNION
ALL has the same hard limit as the original SELECT query block. If the
limit is reached by scanning just the local spans, then latency is
improved.

The optimization is not applied if the LIMIT is more than the KV batch
size of 100000 rows or if the number of spans in the scan exceeds 10000.

This commit also adds an improvement to span merging to avoid merging
local spans with remote spans in order to maximize the number of queries
that can utilize locality optimized scan.

Informs cockroachdb#64862

Release note (Performance Improvement): Queries with a LIMIT clause
applied against a single table, either explicitly written, or implicit
such as in an uncorrelated EXISTS subquery, now scan that table with
improved latency if the table is defined with LOCALITY REGIONAL BY ROW
and the number of qualified rows residing in the local region is less
than or equal to the hard limit (sum of the LIMIT clause and optional
OFFSET clause values). This optimization is only applied if the hard
limit is 100000 or less.
msirek pushed a commit to msirek/cockroach that referenced this issue Feb 8, 2022
This commit adds locality optimized scan support for queries which
place a hard limit on the number of rows returned via the LIMIT clause.
This optimization benefits tables with REGIONAL BY ROW locality by
splitting the spans accessed into a local spans set and a remote spans
set, combined via a UNION ALL operation where each branch of the UNION
ALL has the same hard limit as the original SELECT query block. If the
limit is reached by scanning just the local spans, then latency is
improved.

The optimization is not applied if the LIMIT is more than the KV batch
size of 100000 rows or if the number of spans in the scan exceeds 10000.

This commit also adds an improvement to span merging to avoid merging
local spans with remote spans in order to maximize the number of queries
that can utilize locality optimized scan.

Fixes cockroachdb#64862

Release note (Performance Improvement): Queries with a LIMIT clause
applied against a single table, either explicitly written, or implicit
such as in an uncorrelated EXISTS subquery, now scan that table with
improved latency if the table is defined with LOCALITY REGIONAL BY ROW
and the number of qualified rows residing in the local region is less
than or equal to the hard limit (sum of the LIMIT clause and optional
OFFSET clause values). This optimization is only applied if the hard
limit is 100000 or less.
msirek pushed a commit to msirek/cockroach that referenced this issue Feb 8, 2022
This commit adds locality optimized scan support for queries which
place a hard limit on the number of rows returned via the LIMIT clause.
This optimization benefits tables with REGIONAL BY ROW locality by
splitting the spans accessed into a local spans set and a remote spans
set, combined via a UNION ALL operation where each branch of the UNION
ALL has the same hard limit as the original SELECT query block. If the
limit is reached by scanning just the local spans, then latency is
improved.

The optimization is not applied if the LIMIT is more than the KV batch
size of 100000 rows or if the number of spans in the scan exceeds 10000.

This commit also adds an improvement to span merging to avoid merging
local spans with remote spans in order to maximize the number of queries
that can utilize locality optimized scan.

Fixes cockroachdb#64862

Release note (Performance Improvement): Queries with a LIMIT clause
applied against a single table, either explicitly written, or implicit
such as in an uncorrelated EXISTS subquery, now scan that table with
improved latency if the table is defined with LOCALITY REGIONAL BY ROW
and the number of qualified rows residing in the local region is less
than or equal to the hard limit (sum of the LIMIT clause and optional
OFFSET clause values). This optimization is only applied if the hard
limit is 100000 or less.
msirek pushed a commit to msirek/cockroach that referenced this issue Feb 16, 2022
This commit adds locality optimized scan support for queries which
place a hard limit on the number of rows returned via the LIMIT clause.
This optimization benefits tables with REGIONAL BY ROW locality by
splitting the spans accessed into a local spans set and a remote spans
set, combined via a UNION ALL operation where each branch of the UNION
ALL has the same hard limit as the original SELECT query block. If the
limit is reached by scanning just the local spans, then latency is
improved.

The optimization is not applied if the LIMIT is more than the KV batch
size of 100000 rows or if the number of spans in the scan exceeds 10000.

This commit also adds an improvement to span merging to avoid merging
local spans with remote spans in order to maximize the number of queries
that can utilize locality optimized scan.

Fixes cockroachdb#64862

Release note (Performance Improvement): Queries with a LIMIT clause
applied against a single table, either explicitly written, or implicit
such as in an uncorrelated EXISTS subquery, now scan that table with
improved latency if the table is defined with LOCALITY REGIONAL BY ROW
and the number of qualified rows residing in the local region is less
than or equal to the hard limit (sum of the LIMIT clause and optional
OFFSET clause values). This optimization is only applied if the hard
limit is 100000 or less.
craig bot pushed a commit that referenced this issue Feb 16, 2022
75431: opt: locality optimized scan for queries with a LIMIT clause r=msirek a=msirek

This commit adds locality optimized scan support for queries which
place a hard limit on the number of rows returned via the LIMIT clause.
This optimization benefits tables with REGIONAL BY ROW locality by
splitting the spans accessed into a local spans set and a remote spans
set, combined via a UNION ALL operation where each branch of the UNION
ALL has the same hard limit as the original SELECT query block. If the
limit is reached by scanning just the local spans, then latency is
improved.

The optimization is not applied if the LIMIT is more than the KV batch
size of 100000 rows or if the number of spans in the scan exceeds 10000.

This commit also adds an improvement to span merging to avoid merging
local spans with remote spans in order to maximize the number of queries
that can utilize locality optimized scan.

Fixes #64862

Release note (Performance Improvement): Queries with a LIMIT clause
applied against a single table, either explicitly written, or implicit
such as in an uncorrelated EXISTS subquery, now scan that table with
improved latency if the table is defined with LOCALITY REGIONAL BY ROW
and the number of qualified rows residing in the local region is less
than or equal to the hard limit (sum of the LIMIT clause and optional
OFFSET clause values). This optimization is only applied if the hard
limit is 100000 or less.

76639: builtins: fix topological sort for SHOW CREATE ALL r=otan a=rafiss

fixes #76077

Release note (bug fix): Fixed an error that could sometimes happen when
sorting the output of the SHOW CREATE ALL TABLES command.

Co-authored-by: Mark Sirek <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
@craig craig bot closed this as completed in 2fdd9ae Feb 16, 2022
RajivTS pushed a commit to RajivTS/cockroach that referenced this issue Mar 6, 2022
This commit adds locality optimized scan support for queries which
place a hard limit on the number of rows returned via the LIMIT clause.
This optimization benefits tables with REGIONAL BY ROW locality by
splitting the spans accessed into a local spans set and a remote spans
set, combined via a UNION ALL operation where each branch of the UNION
ALL has the same hard limit as the original SELECT query block. If the
limit is reached by scanning just the local spans, then latency is
improved.

The optimization is not applied if the LIMIT is more than the KV batch
size of 100000 rows or if the number of spans in the scan exceeds 10000.

This commit also adds an improvement to span merging to avoid merging
local spans with remote spans in order to maximize the number of queries
that can utilize locality optimized scan.

Fixes cockroachdb#64862

Release note (Performance Improvement): Queries with a LIMIT clause
applied against a single table, either explicitly written, or implicit
such as in an uncorrelated EXISTS subquery, now scan that table with
improved latency if the table is defined with LOCALITY REGIONAL BY ROW
and the number of qualified rows residing in the local region is less
than or equal to the hard limit (sum of the LIMIT clause and optional
OFFSET clause values). This optimization is only applied if the hard
limit is 100000 or less.
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multiregion Related to multi-region A-partitioning C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. docs-done docs-known-limitation T-multiregion
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants