Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: make transaction_rows_{read|written}_err and large_full_scan_rows guardrails halt query execution #70473

Open
michae2 opened this issue Sep 21, 2021 · 4 comments
Labels
A-sql-execution Relating to SQL execution. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) docs-done docs-known-limitation O-postmortem Originated from a Postmortem action item. P-3 Issues/test failures with no fix SLA T-sql-queries SQL Queries Team

Comments

@michae2
Copy link
Collaborator

michae2 commented Sep 21, 2021

We recently added guardrails to limit the number of rows read or written by a single transaction. A transaction reading more than transaction_rows_read_err rows (or writing more than transaction_rows_written_err rows) now fails with an error.

To simplify implementation, we chose to emit this error after the violating query has already finished executing and delivered results to the application. This is still quite useful: well-behaved applications will see the error, and the transaction will be aborted. But there is a risk that poorly written applications might not see the error after receiving results of a read-only autocommit query. There is also a risk that the violating query will have already hurt cluster performance before hitting the guardrail.

It would be more defensive for the current query execution to halt immediately upon violating one of these limits.

Here's an example:

CREATE TABLE t (i INT PRIMARY KEY);
INSERT INTO t SELECT generate_series(0, 19);
SET transaction_rows_read_err = 10;
SET transaction_rows_written_err = 10;

[email protected]:26257/defaultdb> SELECT * FROM t;
  i
------
   0
   1
   2
   3
   4
   5
   6
   7
   8
   9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
(20 rows)
(error encountered after some results were delivered)
ERROR: txn has read 20 rows, which is above the limit: TxnID cea43f29-a4a0-4996-a92d-f283c2626462 SessionID 16a6bd70640294880000000000000001
SQLSTATE: 54000

[email protected]:26257/defaultdb> INSERT INTO t SELECT generate_series(20, 39) RETURNING i;
  i
------
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
(20 rows)
(error encountered after some results were delivered)
ERROR: txn has written 20 rows, which is above the limit: TxnID 32334f62-5054-4c26-afca-5b88ff5a317f SessionID 16a6bd70640294880000000000000001
SQLSTATE: 54000

Jira issue: CRDB-10088

@michae2 michae2 added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-execution Relating to SQL execution. T-sql-queries SQL Queries Team labels Sep 21, 2021
@vy-ton
Copy link
Contributor

vy-ton commented Oct 14, 2021

There is also a risk that the violating query will have already hurt cluster performance before hitting the guardrail.

@rytaft @jordanlewis We should revisit this issue. The motivation of these guardrails would be to protect the cluster before destabilization.

@rytaft
Copy link
Collaborator

rytaft commented Oct 26, 2021

This is likely a non-trivial amount of work. We need to see if it's needed. @vy-ton

@michae2
Copy link
Collaborator Author

michae2 commented Sep 28, 2022

Note for posterity: when fixing this, we might be able to take advantage of whatever mechanism is used for transaction timeouts (though there will be more to this fix than just invoking that mechanism).

@michae2
Copy link
Collaborator Author

michae2 commented May 31, 2023

When we fix this, it would be nice to use the same technique to make a full scan halt immediately if it reads more than large_full_scan_rows when disallow_full_table_scans is in use.

rytaft added a commit to rytaft/cockroach that referenced this issue Jun 2, 2023
Prior to this commit, setting transaction_rows_read_err to a non-zero
value would cause a transaction to fail as soon as a statement caused
the total number of rows read to exceed transaction_rows_read_err. However,
it was possible that each statement could read many more than
transaction_rows_read_err rows. This commit adds logic so that a single
scan never reads more than transaction_rows_read_err+1 rows if
transaction_rows_read_err is set.

Informs cockroachdb#70473

Release note (performance improvement): If transaction_rows_read_err is
set to a non-zero value, we now ensure that any single scan never reads
more than transaction_rows_read_err+1 rows. This prevents transactions that
would error due to the transaction_rows_read_err setting from causing
a large performance overhead due to large scans.
rytaft added a commit to rytaft/cockroach that referenced this issue Jun 3, 2023
Prior to this commit, setting transaction_rows_read_err to a non-zero
value would cause a transaction to fail as soon as a statement caused
the total number of rows read to exceed transaction_rows_read_err. However,
it was possible that each statement could read many more than
transaction_rows_read_err rows. This commit adds logic so that a single
scan never reads more than transaction_rows_read_err+1 rows if
transaction_rows_read_err is set.

Informs cockroachdb#70473

Release note (performance improvement): If transaction_rows_read_err is
set to a non-zero value, we now ensure that any single scan never reads
more than transaction_rows_read_err+1 rows. This prevents transactions that
would error due to the transaction_rows_read_err setting from causing
a large performance overhead due to large scans.
craig bot pushed a commit that referenced this issue Jun 5, 2023
104290: sql: make transaction_rows_read_err prevent large scans r=rytaft a=rytaft

Prior to this commit, setting `transaction_rows_read_err` to a non-zero value would cause a transaction to fail as soon as a statement caused the total number of rows read to exceed `transaction_rows_read_err`. However, it was possible that each statement could read many more than `transaction_rows_read_err` rows. This commit adds logic so that a single scan never reads more than `transaction_rows_read_err+1` rows if `transaction_rows_read_err` is set.

Informs #70473

Release note (performance improvement): If `transaction_rows_read_err` is set to a non-zero value, we now ensure that any single scan never reads more than `transaction_rows_read_err+1` rows. This prevents transactions that would error due to the `transaction_rows_read_err` setting from causing a large performance overhead due to large scans.

104337: sql: fix reset_sql_stats to truncate activity tables r=j82w a=j82w

Previously:
The reset stats on the ui and crdb_internal.reset_sql_stats() would only reset the statement_statistics and transaction_statics tables. This would leave the sql_activity table with old data. The reset stats now truncates the sql_activity table as well.

Fixes: #104321
Epic: none

Release note (sql change): Fix crdb_internal.reset_sql_stats() to
 cleanup the sql_activity table which work as a cache for the stats.

Co-authored-by: Rebecca Taft <[email protected]>
Co-authored-by: j82w <[email protected]>
rytaft added a commit to rytaft/cockroach that referenced this issue Jun 5, 2023
Prior to this commit, setting transaction_rows_read_err to a non-zero
value would cause a transaction to fail as soon as a statement caused
the total number of rows read to exceed transaction_rows_read_err. However,
it was possible that each statement could read many more than
transaction_rows_read_err rows. This commit adds logic so that a single
scan never reads more than transaction_rows_read_err+1 rows if
transaction_rows_read_err is set.

Informs cockroachdb#70473

Release note (performance improvement): If transaction_rows_read_err is
set to a non-zero value, we now ensure that any single scan never reads
more than transaction_rows_read_err+1 rows. This prevents transactions that
would error due to the transaction_rows_read_err setting from causing
a large performance overhead due to large scans.
rytaft added a commit to rytaft/cockroach that referenced this issue Jun 5, 2023
Prior to this commit, setting transaction_rows_read_err to a non-zero
value would cause a transaction to fail as soon as a statement caused
the total number of rows read to exceed transaction_rows_read_err. However,
it was possible that each statement could read many more than
transaction_rows_read_err rows. This commit adds logic so that a single
scan never reads more than transaction_rows_read_err+1 rows if
transaction_rows_read_err is set.

Informs cockroachdb#70473

Release note (performance improvement): If transaction_rows_read_err is
set to a non-zero value, we now ensure that any single scan never reads
more than transaction_rows_read_err+1 rows. This prevents transactions that
would error due to the transaction_rows_read_err setting from causing
a large performance overhead due to large scans. For some queries in rare
cases this change may end up disabling cross-range parallelism of the scan
operation which can result in increase of query latency.
@rytaft rytaft changed the title sql: make transaction_rows_{read|written}_err guardrails halt query execution sql: make transaction_rows_{read|written}_err and large_full_scan_rows guardrails halt query execution Jun 13, 2023
@mgartner mgartner moved this to 23.2 Release in SQL Queries Jul 24, 2023
@michae2 michae2 moved this from 23.2 Release to 24.1 Release in SQL Queries Sep 12, 2023
@mgartner mgartner moved this from 24.1 Release to New Backlog in SQL Queries Nov 28, 2023
@michae2 michae2 added the O-postmortem Originated from a Postmortem action item. label Sep 20, 2024
@michae2 michae2 moved this from Backlog to Triage in SQL Queries Sep 20, 2024
@mgartner mgartner moved this from Triage to Backlog in SQL Queries Sep 24, 2024
@mgartner mgartner added the P-3 Issues/test failures with no fix SLA label Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-execution Relating to SQL execution. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) docs-done docs-known-limitation O-postmortem Originated from a Postmortem action item. P-3 Issues/test failures with no fix SLA T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

No branches or pull requests

5 participants