Skip to content

Commit

Permalink
sql: add txn rows written/read guardrails
Browse files Browse the repository at this point in the history
TODO: introduce metrics on the number of txns that reached the limits.

Release note (ops change): TODO.
  • Loading branch information
yuzefovich committed Aug 21, 2021
1 parent 8d8a763 commit aca4928
Show file tree
Hide file tree
Showing 34 changed files with 3,055 additions and 604 deletions.
114 changes: 114 additions & 0 deletions docs/generated/eventlog.md
Original file line number Diff line number Diff line change
Expand Up @@ -1939,6 +1939,62 @@ set to a non-zero value, AND
| `FullIndexScan` | Whether the query contains a full secondary index scan. | no |
| `TxnCounter` | The sequence number of the SQL transaction inside its session. | no |

### `txn_rows_read_limit_reached`

An event of type `txn_rows_read_limit_reached` is recorded when a transaction tries to read more
rows than cluster setting `sql.defaults.transaction_rows_read_log`. There
will only be a single record for a single transaction (unless it is retried)
even if there are more statement within the transaction that haven't been
executed yet.




#### Common fields

| Field | Description | Sensitive |
|--|--|--|
| `Timestamp` | The timestamp of the event. Expressed as nanoseconds since the Unix epoch. | no |
| `EventType` | The type of the event. | no |
| `Statement` | A normalized copy of the SQL statement that triggered the event. The statement string contains a mix of sensitive and non-sensitive details (it is redactable). | partially |
| `Tag` | The statement tag. This is separate from the statement string, since the statement string can contain sensitive information. The tag is guaranteed not to. | no |
| `User` | The user account that triggered the event. The special usernames `root` and `node` are not considered sensitive. | depends |
| `DescriptorID` | The primary object descriptor affected by the operation. Set to zero for operations that don't affect descriptors. | no |
| `ApplicationName` | The application name for the session where the event was emitted. This is included in the event to ease filtering of logging output by application. Application names starting with a dollar sign (`$`) are not considered sensitive. | depends |
| `PlaceholderValues` | The mapping of SQL placeholders to their values, for prepared statements. | yes |
| `TxnID` | | no |
| `SessionID` | | no |
| `ViolatesTxnRowsLimitErr` | | no |
| `ReadKind` | ReadKind indicates that the "rows read" limit is reached if true and the "rows written" limit otherwise. | no |

### `txn_rows_written_limit_reached`

An event of type `txn_rows_written_limit_reached` is recorded when a transaction tries to write more
rows than cluster setting `sql.defaults.transaction_rows_written_log`. There
will only be a single record for a single transaction (unless it is retried)
even if there are more mutation statement within the transaction that haven't
been executed yet.




#### Common fields

| Field | Description | Sensitive |
|--|--|--|
| `Timestamp` | The timestamp of the event. Expressed as nanoseconds since the Unix epoch. | no |
| `EventType` | The type of the event. | no |
| `Statement` | A normalized copy of the SQL statement that triggered the event. The statement string contains a mix of sensitive and non-sensitive details (it is redactable). | partially |
| `Tag` | The statement tag. This is separate from the statement string, since the statement string can contain sensitive information. The tag is guaranteed not to. | no |
| `User` | The user account that triggered the event. The special usernames `root` and `node` are not considered sensitive. | depends |
| `DescriptorID` | The primary object descriptor affected by the operation. Set to zero for operations that don't affect descriptors. | no |
| `ApplicationName` | The application name for the session where the event was emitted. This is included in the event to ease filtering of logging output by application. Application names starting with a dollar sign (`$`) are not considered sensitive. | depends |
| `PlaceholderValues` | The mapping of SQL placeholders to their values, for prepared statements. | yes |
| `TxnID` | | no |
| `SessionID` | | no |
| `ViolatesTxnRowsLimitErr` | | no |
| `ReadKind` | ReadKind indicates that the "rows read" limit is reached if true and the "rows written" limit otherwise. | no |

## SQL Slow Query Log (Internal)

Events in this category report slow query execution by
Expand Down Expand Up @@ -2006,6 +2062,64 @@ the "slow query" condition.
| `FullIndexScan` | Whether the query contains a full secondary index scan. | no |
| `TxnCounter` | The sequence number of the SQL transaction inside its session. | no |

### `txn_rows_read_limit_reached_internal`

An event of type `txn_rows_read_limit_reached_internal` is recorded when an internal transaction
tries to read more rows than cluster setting
`sql.defaults.transaction_rows_read_log` or
`sql.defaults.transaction_rows_read_err`. There will only be a single
record for a single transaction (unless it is retried) even if there are more
mutation statement within the transaction that haven't been executed yet.




#### Common fields

| Field | Description | Sensitive |
|--|--|--|
| `Timestamp` | The timestamp of the event. Expressed as nanoseconds since the Unix epoch. | no |
| `EventType` | The type of the event. | no |
| `Statement` | A normalized copy of the SQL statement that triggered the event. The statement string contains a mix of sensitive and non-sensitive details (it is redactable). | partially |
| `Tag` | The statement tag. This is separate from the statement string, since the statement string can contain sensitive information. The tag is guaranteed not to. | no |
| `User` | The user account that triggered the event. The special usernames `root` and `node` are not considered sensitive. | depends |
| `DescriptorID` | The primary object descriptor affected by the operation. Set to zero for operations that don't affect descriptors. | no |
| `ApplicationName` | The application name for the session where the event was emitted. This is included in the event to ease filtering of logging output by application. Application names starting with a dollar sign (`$`) are not considered sensitive. | depends |
| `PlaceholderValues` | The mapping of SQL placeholders to their values, for prepared statements. | yes |
| `TxnID` | | no |
| `SessionID` | | no |
| `ViolatesTxnRowsLimitErr` | | no |
| `ReadKind` | ReadKind indicates that the "rows read" limit is reached if true and the "rows written" limit otherwise. | no |

### `txn_rows_written_limit_reached_internal`

An event of type `txn_rows_written_limit_reached_internal` is recorded when an internal transaction
tries to write more rows than cluster setting
`sql.defaults.transaction_rows_written_log` or
`sql.defaults.transaction_rows_written_err`. There will only be a single
record for a single transaction (unless it is retried) even if there are more
mutation statement within the transaction that haven't been executed yet.




#### Common fields

| Field | Description | Sensitive |
|--|--|--|
| `Timestamp` | The timestamp of the event. Expressed as nanoseconds since the Unix epoch. | no |
| `EventType` | The type of the event. | no |
| `Statement` | A normalized copy of the SQL statement that triggered the event. The statement string contains a mix of sensitive and non-sensitive details (it is redactable). | partially |
| `Tag` | The statement tag. This is separate from the statement string, since the statement string can contain sensitive information. The tag is guaranteed not to. | no |
| `User` | The user account that triggered the event. The special usernames `root` and `node` are not considered sensitive. | depends |
| `DescriptorID` | The primary object descriptor affected by the operation. Set to zero for operations that don't affect descriptors. | no |
| `ApplicationName` | The application name for the session where the event was emitted. This is included in the event to ease filtering of logging output by application. Application names starting with a dollar sign (`$`) are not considered sensitive. | depends |
| `PlaceholderValues` | The mapping of SQL placeholders to their values, for prepared statements. | yes |
| `TxnID` | | no |
| `SessionID` | | no |
| `ViolatesTxnRowsLimitErr` | | no |
| `ReadKind` | ReadKind indicates that the "rows read" limit is reached if true and the "rows written" limit otherwise. | no |

## SQL User and Role operations

Events in this category pertain to SQL statements that modify the
Expand Down
4 changes: 4 additions & 0 deletions docs/generated/settings/settings-for-tenants.txt
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,10 @@ sql.defaults.results_buffer.size byte size 16 KiB default size of the buffer tha
sql.defaults.serial_normalization enumeration rowid default handling of SERIAL in table definitions [rowid = 0, virtual_sequence = 1, sql_sequence = 2, sql_sequence_cached = 3]
sql.defaults.statement_timeout duration 0s default value for the statement_timeout; default value for the statement_timeout session setting; controls the duration a query is permitted to run before it is canceled; if set to 0, there is no timeout
sql.defaults.stub_catalog_tables.enabled boolean true default value for stub_catalog_tables session setting
sql.defaults.transaction_rows_read_err integer 0 the limit for the number of rows read by a SQL transaction which - once reached - will fail the transaction (or will trigger a logging event to SQL_INTERNAL_PERF for internal transactions); use 0 to disable
sql.defaults.transaction_rows_read_log integer 0 the threshold for the number of rows read by a SQL transaction which - once reached - will trigger a logging event to SQL_PERF (or SQL_INTERNAL_PERF for internal transactions); use 0 to disable
sql.defaults.transaction_rows_written_err integer 0 the limit for the number of rows written by a SQL transaction which - once reached - will fail the transaction (or will trigger a logging event to SQL_INTERNAL_PERF for internal transactions); use 0 to disable
sql.defaults.transaction_rows_written_log integer 0 the threshold for the number of rows written by a SQL transaction which - once reached - will trigger a logging event to SQL_PERF (or SQL_INTERNAL_PERF for internal transactions); use 0 to disable
sql.defaults.vectorize enumeration on default vectorize mode [on = 0, on = 2, experimental_always = 3, off = 4]
sql.defaults.zigzag_join.enabled boolean true default value for enable_zigzag_join session setting; allows use of zig-zag join by default
sql.distsql.max_running_flows integer 500 maximum number of concurrent flows that can be run on a node
Expand Down
4 changes: 4 additions & 0 deletions docs/generated/settings/settings.html
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,10 @@
<tr><td><code>sql.defaults.serial_normalization</code></td><td>enumeration</td><td><code>rowid</code></td><td>default handling of SERIAL in table definitions [rowid = 0, virtual_sequence = 1, sql_sequence = 2, sql_sequence_cached = 3]</td></tr>
<tr><td><code>sql.defaults.statement_timeout</code></td><td>duration</td><td><code>0s</code></td><td>default value for the statement_timeout; default value for the statement_timeout session setting; controls the duration a query is permitted to run before it is canceled; if set to 0, there is no timeout</td></tr>
<tr><td><code>sql.defaults.stub_catalog_tables.enabled</code></td><td>boolean</td><td><code>true</code></td><td>default value for stub_catalog_tables session setting</td></tr>
<tr><td><code>sql.defaults.transaction_rows_read_err</code></td><td>integer</td><td><code>0</code></td><td>the limit for the number of rows read by a SQL transaction which - once reached - will fail the transaction (or will trigger a logging event to SQL_INTERNAL_PERF for internal transactions); use 0 to disable</td></tr>
<tr><td><code>sql.defaults.transaction_rows_read_log</code></td><td>integer</td><td><code>0</code></td><td>the threshold for the number of rows read by a SQL transaction which - once reached - will trigger a logging event to SQL_PERF (or SQL_INTERNAL_PERF for internal transactions); use 0 to disable</td></tr>
<tr><td><code>sql.defaults.transaction_rows_written_err</code></td><td>integer</td><td><code>0</code></td><td>the limit for the number of rows written by a SQL transaction which - once reached - will fail the transaction (or will trigger a logging event to SQL_INTERNAL_PERF for internal transactions); use 0 to disable</td></tr>
<tr><td><code>sql.defaults.transaction_rows_written_log</code></td><td>integer</td><td><code>0</code></td><td>the threshold for the number of rows written by a SQL transaction which - once reached - will trigger a logging event to SQL_PERF (or SQL_INTERNAL_PERF for internal transactions); use 0 to disable</td></tr>
<tr><td><code>sql.defaults.vectorize</code></td><td>enumeration</td><td><code>on</code></td><td>default vectorize mode [on = 0, on = 2, experimental_always = 3, off = 4]</td></tr>
<tr><td><code>sql.defaults.zigzag_join.enabled</code></td><td>boolean</td><td><code>true</code></td><td>default value for enable_zigzag_join session setting; allows use of zig-zag join by default</td></tr>
<tr><td><code>sql.distsql.max_running_flows</code></td><td>integer</td><td><code>500</code></td><td>maximum number of concurrent flows that can be run on a node</td></tr>
Expand Down
10 changes: 10 additions & 0 deletions pkg/sql/conn_executor.go
Original file line number Diff line number Diff line change
Expand Up @@ -1216,6 +1216,16 @@ type connExecutor struct {
rowsRead int64
bytesRead int64

// rowsWritten tracks the number of rows written (modified) by all
// statements in this txn so far.
rowsWritten int64

// rowsWrittenLogged and rowsReadLoggen indicates whether we have
// already logged an event about reaching written/read rows setting,
// respectively.
rowsWrittenLogged bool
rowsReadLogged bool

// hasAdminRole is used to cache if the user running the transaction
// has admin privilege. hasAdminRoleCache is set for the first statement
// in a transaction.
Expand Down
109 changes: 109 additions & 0 deletions pkg/sql/conn_executor_exec.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/util/fsm"
"github.com/cockroachdb/cockroach/pkg/util/hlc"
"github.com/cockroachdb/cockroach/pkg/util/log"
"github.com/cockroachdb/cockroach/pkg/util/log/eventpb"
"github.com/cockroachdb/cockroach/pkg/util/timeutil"
"github.com/cockroachdb/cockroach/pkg/util/tracing"
"github.com/cockroachdb/errors"
Expand Down Expand Up @@ -1020,6 +1021,7 @@ func (ex *connExecutor) dispatchToExecutionEngine(

ex.extraTxnState.rowsRead += stats.rowsRead
ex.extraTxnState.bytesRead += stats.bytesRead
ex.extraTxnState.rowsWritten += stats.rowsWritten

// Record the statement summary. This also closes the plan if the
// plan has not been closed earlier.
Expand All @@ -1031,6 +1033,107 @@ func (ex *connExecutor) dispatchToExecutionEngine(
ex.server.cfg.TestingKnobs.AfterExecute(ctx, stmt.String(), res.Err())
}

if limitsErr := ex.handleTxnRowsWrittenReadLimits(ctx); limitsErr != nil && res.Err() == nil {
res.SetError(limitsErr)
}

return err
}

// handleTxnRowsWrittenReadLimits checks whether the current transaction has
// reached the limits on the number of rows written/read and logs the
// corresponding event or returns an error. It should be called after executing
// a single statement.
func (ex *connExecutor) handleTxnRowsWrittenReadLimits(ctx context.Context) error {
sd := ex.sessionData
var commonSQLEventDetails *eventpb.CommonSQLEventDetails
var err error

shouldLogRowsWritten := sd.TxnRowsWrittenLog != 0 && ex.extraTxnState.rowsWritten >= sd.TxnRowsWrittenLog
shouldErrRowsWritten := sd.TxnRowsWrittenErr != 0 && ex.extraTxnState.rowsWritten >= sd.TxnRowsWrittenErr
if shouldLogRowsWritten || shouldErrRowsWritten {
commonSQLEventDetails = ex.planner.getCommonSQLEventDetails()
info := eventpb.CommonTxnRowsLimitReachedDetails{
TxnID: ex.state.mu.txn.ID().String(),
SessionID: ex.sessionID.String(),
ViolatesTxnRowsLimitErr: shouldErrRowsWritten,
ReadKind: false,
}
if shouldErrRowsWritten && ex.executorType == executorTypeInternal {
// Internal work should never err and always log if violating either
// limit.
shouldErrRowsWritten = false
shouldLogRowsWritten = true
}
if ex.extraTxnState.rowsWrittenLogged {
// We have already logged an event about this transaction.
shouldLogRowsWritten = false
} else {
ex.extraTxnState.rowsWrittenLogged = shouldLogRowsWritten
}
if shouldLogRowsWritten {
var event eventpb.EventPayload
if ex.executorType == executorTypeInternal {
event = &eventpb.TxnRowsWrittenLimitReachedInternal{
CommonSQLEventDetails: *commonSQLEventDetails,
CommonTxnRowsLimitReachedDetails: info,
}
} else {
event = &eventpb.TxnRowsWrittenLimitReached{
CommonSQLEventDetails: *commonSQLEventDetails,
CommonTxnRowsLimitReachedDetails: info,
}
}
log.StructuredEvent(ctx, event)
}
if shouldErrRowsWritten {
err = pgerror.WithCandidateCode(&info, pgcode.ProgramLimitExceeded)
}
}

shouldLogRowsRead := sd.TxnRowsReadLog != 0 && ex.extraTxnState.rowsRead >= sd.TxnRowsReadLog
shouldErrRowsRead := sd.TxnRowsReadErr != 0 && ex.extraTxnState.rowsRead >= sd.TxnRowsReadErr
if shouldLogRowsRead || shouldErrRowsRead {
if commonSQLEventDetails == nil {
commonSQLEventDetails = ex.planner.getCommonSQLEventDetails()
}
info := eventpb.CommonTxnRowsLimitReachedDetails{
TxnID: ex.state.mu.txn.ID().String(),
SessionID: ex.sessionID.String(),
ViolatesTxnRowsLimitErr: shouldErrRowsRead,
ReadKind: true,
}
if shouldErrRowsRead && ex.executorType == executorTypeInternal {
// Internal work should never err and always log if violating either
// limit.
shouldErrRowsRead = false
shouldLogRowsRead = true
}
if ex.extraTxnState.rowsReadLogged {
// We have already logged an event about this transaction.
shouldLogRowsRead = false
} else {
ex.extraTxnState.rowsReadLogged = shouldLogRowsRead
}
if shouldLogRowsRead {
var event eventpb.EventPayload
if ex.executorType == executorTypeInternal {
event = &eventpb.TxnRowsReadLimitReachedInternal{
CommonSQLEventDetails: *commonSQLEventDetails,
CommonTxnRowsLimitReachedDetails: info,
}
} else {
event = &eventpb.TxnRowsReadLimitReached{
CommonSQLEventDetails: *commonSQLEventDetails,
CommonTxnRowsLimitReachedDetails: info,
}
}
log.StructuredEvent(ctx, event)
}
if shouldErrRowsRead && err == nil {
err = pgerror.WithCandidateCode(&info, pgcode.ProgramLimitExceeded)
}
}
return err
}

Expand Down Expand Up @@ -1074,6 +1177,8 @@ type topLevelQueryStats struct {
bytesRead int64
// rowsRead is the number of rows read from disk.
rowsRead int64
// rowsWritten is the number of rows written.
rowsWritten int64
}

// execWithDistSQLEngine converts a plan to a distributed SQL physical plan and
Expand Down Expand Up @@ -1652,6 +1757,9 @@ func (ex *connExecutor) recordTransactionStart() (
ex.extraTxnState.accumulatedStats = execstats.QueryLevelStats{}
ex.extraTxnState.rowsRead = 0
ex.extraTxnState.bytesRead = 0
ex.extraTxnState.rowsWritten = 0
ex.extraTxnState.rowsWrittenLogged = false
ex.extraTxnState.rowsReadLogged = false
if txnExecStatsSampleRate := collectTxnStatsSampleRate.Get(&ex.server.GetExecutorConfig().Settings.SV); txnExecStatsSampleRate > 0 {
ex.extraTxnState.shouldCollectTxnExecutionStats = txnExecStatsSampleRate > ex.rng.Float64()
}
Expand All @@ -1678,6 +1786,7 @@ func (ex *connExecutor) recordTransactionStart() (
ex.extraTxnState.accumulatedStats = execstats.QueryLevelStats{}
ex.extraTxnState.rowsRead = 0
ex.extraTxnState.bytesRead = 0
ex.extraTxnState.rowsWritten = 0

if ex.server.cfg.TestingKnobs.BeforeRestart != nil {
ex.server.cfg.TestingKnobs.BeforeRestart(ex.Ctx(), ex.extraTxnState.autoRetryReason)
Expand Down
12 changes: 6 additions & 6 deletions pkg/sql/create_stats.go
Original file line number Diff line number Diff line change
Expand Up @@ -610,12 +610,12 @@ func (r *createStatsResumer) Resume(ctx context.Context, execCtx interface{}) er
evalCtx.ExecCfg, txn,
0, /* depth: use event_log=2 for vmodule filtering */
eventLogOptions{dst: LogEverywhere},
sqlEventCommonExecPayload{
user: evalCtx.SessionData.User(),
appName: evalCtx.SessionData.ApplicationName,
stmt: redact.Sprint(details.Statement),
stmtTag: "CREATE STATISTICS",
placeholders: nil, /* no placeholders known at this point */
&eventpb.CommonSQLEventDetails{
Statement: redact.Sprint(details.Statement),
Tag: "CREATE STATISTICS",
User: evalCtx.SessionData.User().Normalized(),
ApplicationName: evalCtx.SessionData.ApplicationName,
PlaceholderValues: []string{}, /* no placeholders known at this point */
},
eventLogEntry{
targetID: int32(details.Table.ID),
Expand Down
Loading

0 comments on commit aca4928

Please sign in to comment.