Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
113952: log: add protobuf messages for telemetry txn events r=xinhaoz a=xinhaoz

This commit adds the  messages listed below to `telemetry.proto` in
preparation for sending transaction executions to the telemetry
channel. The transaction event that is eventually sent should  contain
all execution information currently being tracked for transaction
fingerprints.

- `SampledTransaction`: contains fields equivalent to the execution
information stored by `CollectedTransactionStatistics` from
app_stats.proto, but represents a single txn execution instead
of aggregated executions of a transaction fingerprint.
- `SampledExecStats`: used as a field in `SampledTransaction`, it
contains execution stats that are sampled. This event is the equivalent
to `ExecStats` from app_stats.proto but for a single execution.
- `MVCCIteratorStats`: used in `SampledExecStats` above, the equivalent of
MVCCIteratorStats from app_stats.proto but for a single execution.

In addition, in order to support the above fields a couple of additional
code templates have been added for generating json log encoding:
- array_of_uint64 type is now being handled for json logs
- `nestedMessage` has been added as a custom type in `gen.go`. Object field
types can be assigned to this type in order to generate them as nested
objects.

Part of: #108284

Release note: None

114666: opt: reduce planning time for queries with many joins r=mgartner a=mgartner

Prior to this commit, some queries with many joins would perform a large
number of allocations calculating the selectivity of null-rejecting join
filters. This was due to `statisticsBuiler.selectivityFromNullsRemoved`
allocating a single-column set for each not-null column, and allocating
column statistics for each set.

Many of those allocations and much unnecessary computations to traverse
the expression tree are now avoided. This is made possible by the
realization that the selectivity of a null-rejecting filter is always 1
if the column was already not-null in the input.

Epic: None

Release note: None


115509: span: Re-initialize iterator when forwarding r=miretskiy a=miretskiy

Re-initialize iterator when forwarding span
frontier timestamp.  The underlying btree may be
mutated (by merge operation) invalidating previously
constructed iterator.

Btree implementation is also hardened against mis-use
when mutating span frontier while iterating.

Fixes #115411
Fixes #115528
Fixes #115512
Fixes #115490
Fixes #115488
Fixes #115487
Fixes #115483

Release notes: None

Co-authored-by: Xin Hao Zhang <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Yevgeniy Miretskiy <[email protected]>
  • Loading branch information
4 people committed Dec 4, 2023
4 parents 9722711 + c9adcd8 + 5d626ba + e571747 commit cac8b23
Show file tree
Hide file tree
Showing 26 changed files with 935 additions and 165 deletions.
84 changes: 84 additions & 0 deletions docs/generated/eventlog.md
Original file line number Diff line number Diff line change
Expand Up @@ -2879,6 +2879,29 @@ An event of type `hot_ranges_stats`
| `Timestamp` | The timestamp of the event. Expressed as nanoseconds since the Unix epoch. | no |
| `EventType` | The type of the event. | no |

### `m_v_c_c_iterator_stats`

Internal storage iteration statistics for a single execution.


| Field | Description | Sensitive |
|--|--|--|
| `StepCount` | StepCount collects the number of times the iterator moved forward or backward over the DB's underlying storage keyspace. For details, see pkg/storage/engine.go and pkg/sql/opt/exec/factory.go. | no |
| `StepCountInternal` | StepCountInternal collects the number of times the iterator moved forward or backward over LSM internal keys. For details, see pkg/storage/engine.go and pkg/sql/opt/exec/factory.go. | no |
| `SeekCount` | SeekCount collects the number of times the iterator moved to a specific key/value pair in the DB's underlying storage keyspace. For details, see pkg/storage/engine.go and pkg/sql/opt/exec/factory.go. | no |
| `SeekCountInternal` | SeekCountInternal collects the number of times the iterator moved to a specific LSM internal key. For details, see pkg/storage/engine.go and pkg/sql/opt/exec/factory.go. | no |
| `BlockBytes` | BlockBytes collects the bytes in the loaded SSTable data blocks. For details, see pebble.InternalIteratorStats. | no |
| `BlockBytesInCache` | BlockBytesInCache collects the subset of BlockBytes in the block cache. For details, see pebble.InternalIteratorStats. | no |
| `KeyBytes` | KeyBytes collects the bytes in keys that were iterated over. For details, see pebble.InternalIteratorStats. | no |
| `ValueBytes` | ValueBytes collects the bytes in values that were iterated over. For details, see pebble.InternalIteratorStats. | no |
| `PointCount` | PointCount collects the count of point keys iterated over. For details, see pebble.InternalIteratorStats. | no |
| `PointsCoveredByRangeTombstones` | PointsCoveredByRangeTombstones collects the count of point keys that were iterated over that were covered by range tombstones. For details, see pebble.InternalIteratorStats and https://github.com/cockroachdb/cockroach/blob/master/docs/tech-notes/mvcc-range-tombstones.md. | no |
| `RangeKeyCount` | RangeKeyCount collects the count of range keys encountered during iteration. For details, see pebble.RangeKeyIteratorStats and https://github.com/cockroachdb/cockroach/blob/master/docs/tech-notes/mvcc-range-tombstones.md. | no |
| `RangeKeyContainedPoints` | RangeKeyContainedPoints collects the count of point keys encountered within the bounds of a range key. For details, see pebble.RangeKeyIteratorStats and https://github.com/cockroachdb/cockroach/blob/master/docs/tech-notes/mvcc-range-tombstones.md. | no |
| `RangeKeySkippedPoints` | RangeKeySkippedPoints collects the count of the subset of ContainedPoints point keys that were skipped during iteration due to range-key masking. For details, see pkg/storage/engine.go, pebble.RangeKeyIteratorStats, and https://github.com/cockroachdb/cockroach/blob/master/docs/tech-notes/mvcc-range-tombstones.md. | no |



### `recovery_event`

An event of type `recovery_event` is an event that is logged on every invocation of BACKUP,
Expand Down Expand Up @@ -2924,6 +2947,27 @@ logged whenever a BACKUP and RESTORE job completes or fails.
| `Timestamp` | The timestamp of the event. Expressed as nanoseconds since the Unix epoch. | no |
| `EventType` | The type of the event. | no |

### `sampled_exec_stats`

An event of type `sampled_exec_stats` contains execution statistics that apply to both statements
and transactions. These stats as a whole are collected using a sampling approach.
These exec stats are meant to contain the same fields as ExecStats in
apps_stats.proto but are for a single execution rather than aggregated executions.
Fields in this struct should be updated in sync with apps_stats.proto.


| Field | Description | Sensitive |
|--|--|--|
| `NetworkBytes` | NetworkBytes collects the number of bytes sent over the network. | no |
| `MaxMemUsage` | MaxMemUsage collects the maximum memory usage that occurred on a node. | no |
| `ContentionTime` | ContentionTime collects the time in seconds statements in the transaction spent contending. | no |
| `NetworkMessages` | NetworkMessages collects the number of messages that were sent over the network. | no |
| `MaxDiskUsage` | MaxDiskUsage collects the maximum temporary disk usage that occurred. This is set in cases where a query had to spill to disk, e.g. when performing a large sort where not all of the tuples fit in memory. | no |
| `CPUSQLNanos` | CPUSQLNanos collects the CPU time spent executing SQL operations in nanoseconds. Currently, it is only collected for statements without mutations that have a vectorized plan. | no |
| `MVCCIteratorStats` | Internal storage iteration statistics. | yes |



### `sampled_query`

An event of type `sampled_query` is the SQL query event logged to the telemetry channel. It
Expand Down Expand Up @@ -3030,6 +3074,46 @@ contains common SQL event/execution details.
| `BulkJobId` | The job id for bulk job (IMPORT/BACKUP/RESTORE). | no |
| `StmtPosInTxn` | The statement's index in the transaction, starting at 1. | no |

### `sampled_transaction`

An event of type `sampled_transaction` is the event logged to telemetry at the end of transaction execution.


| Field | Description | Sensitive |
|--|--|--|
| `User` | User is the user account that triggered the transaction. The special usernames `root` and `node` are not considered sensitive. | depends |
| `ApplicationName` | ApplicationName is the application name for the session where the transaction was executed. This is included in the event to ease filtering of logging output by application. | no |
| `TxnCounter` | TxnCounter is the sequence number of the SQL transaction inside its session. | no |
| `SessionID` | SessionID is the ID of the session that initiated the transaction. | no |
| `TransactionID` | TransactionID is the id of the transaction. | no |
| `TransactionFingerprintID` | TransactionFingerprintID is the fingerprint ID of the transaction. This can be used to find the transaction in the console. | no |
| `Committed` | Committed indicates if the transaction committed successfully. We want to include this value even if it is false. | no |
| `ImplicitTxn` | ImplicitTxn indicates if the transaction was an implicit one. We want to include this value even if it is false. | no |
| `StartTimeUnixNanos` | StartTimeUnixNanos is the time the transaction was started. Expressed as unix time in nanoseconds. | no |
| `EndTimeUnixNanos` | EndTimeUnixNanos the time the transaction finished (either committed or aborted). Expressed as unix time in nanoseconds. | no |
| `ServiceLatNanos` | ServiceLatNanos is the time to service the whole transaction, from start to end of execution. | no |
| `SQLSTATE` | SQLSTATE is the SQLSTATE code for the error, if an error was encountered. Empty/omitted if no error. | no |
| `ErrorText` | ErrorText is the text of the error if any. | partially |
| `NumRetries` | NumRetries is the number of time when the txn was retried automatically by the server. | no |
| `LastAutoRetryReason` | LastAutoRetryReason is a string containing the reason for the last automatic retry. | partially |
| `StatementFingerprintIDs` | StatementFingerprintIDs is an array of statement fingerprint IDs belonging to this transaction. | yes |
| `NumRows` | NumRows is the total number of rows returned across all statements. | no |
| `RetryLatNanos` | RetryLatNanos is the amount of time spent retrying the transaction. | no |
| `CommitLatNanos` | CommitLatNanos is the amount of time spent committing the transaction after all statement operations. | no |
| `IdleLatNanos` | IdleLatNanos is the amount of time spent waiting for the client to send statements while the transaction is open. | no |
| `BytesRead` | BytesRead is the number of bytes read from disk. | no |
| `RowsRead` | RowsRead is the number of rows read from disk. | no |
| `RowsWritten` | RowsWritten is the number of rows written to disk. | no |
| `SampledExecStats` | SampledExecStats is a nested field containing execution statistics. This field will be omitted if the stats were not sampled. | yes |


#### Common fields

| Field | Description | Sensitive |
|--|--|--|
| `Timestamp` | The timestamp of the event. Expressed as nanoseconds since the Unix epoch. | no |
| `EventType` | The type of the event. | no |

### `schema_descriptor`

An event of type `schema_descriptor` is an event for schema telemetry, whose purpose is
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3650,7 +3650,7 @@ project
│ ├── key columns: [13 17 12 11] = [6 2 5 1]
│ ├── lookup columns are key
│ ├── immutable
│ ├── stats: [rows=0.01031997, distinct(1)=0.01032, null(1)=0, distinct(2)=0.01032, null(2)=0, distinct(5)=0.01032, null(5)=0, distinct(6)=0.01032, null(6)=0, distinct(10)=0.0102714, null(10)=0, distinct(11)=0.01032, null(11)=0, distinct(12)=0.01032, null(12)=0, distinct(13)=0.01032, null(13)=0, distinct(16)=0.0102638, null(16)=0, distinct(17)=0.01032, null(17)=0, distinct(18)=0.010244, null(18)=0, distinct(19)=0.010244, null(19)=0, distinct(20)=0.0102638, null(20)=0, distinct(21)=0.0102638, null(21)=0, distinct(2,5,6)=0.01032, null(2,5,6)=0]
│ ├── stats: [rows=0.01031997, distinct(1)=0.01032, null(1)=0, distinct(2)=0.01032, null(2)=0, distinct(5)=0.01032, null(5)=0, distinct(6)=0.01032, null(6)=0, distinct(11)=0.01032, null(11)=0, distinct(12)=0.01032, null(12)=0, distinct(13)=0.01032, null(13)=0, distinct(17)=0.01032, null(17)=0, distinct(2,5,6)=0.01032, null(2,5,6)=0]
│ ├── cost: 112.088874
│ ├── fd: ()-->(2,5,6,12,13,17,20,21), (16)-->(18,19), (11)==(1,16), (16)==(1,11), (12)==(5,20), (20)==(5,12), (13)==(6,21), (21)==(6,13), (6)==(13,21), (2)==(17), (17)==(2), (5)==(12,20), (1)==(11,16)
│ ├── distribution: ap-southeast-2
Expand All @@ -3660,7 +3660,7 @@ project
│ │ ├── key columns: [21 16] = [21 16]
│ │ ├── lookup columns are key
│ │ ├── immutable
│ │ ├── stats: [rows=1.101776, distinct(10)=1.08174, null(10)=0, distinct(11)=0.936667, null(11)=0, distinct(12)=0.936667, null(12)=0, distinct(13)=0.936667, null(13)=0, distinct(16)=0.936667, null(16)=0, distinct(17)=0.709904, null(17)=0, distinct(18)=0.693547, null(18)=0, distinct(19)=0.693547, null(19)=0, distinct(20)=0.936667, null(20)=0, distinct(21)=0.936667, null(21)=0]
│ │ ├── stats: [rows=1.101776, distinct(11)=0.936667, null(11)=0, distinct(12)=0.936667, null(12)=0, distinct(13)=0.936667, null(13)=0, distinct(16)=0.936667, null(16)=0, distinct(17)=0.709904, null(17)=0, distinct(20)=0.936667, null(20)=0, distinct(21)=0.936667, null(21)=0]
│ │ ├── cost: 107.514948
│ │ ├── fd: ()-->(12,13,17,20,21), (16)-->(18,19), (11)==(16), (16)==(11), (12)==(20), (20)==(12), (13)==(21), (21)==(13)
│ │ ├── distribution: ap-southeast-2
Expand All @@ -3680,7 +3680,7 @@ project
│ │ │ ├── project
│ │ │ │ ├── columns: "lookup_join_const_col_@17":28 str:10 abc_id:11 x.id2:12 x.crdb_region:13
│ │ │ │ ├── immutable
│ │ │ │ ├── stats: [rows=3.666667, distinct(10)=3.11719, null(10)=0, distinct(11)=3.11719, null(11)=0, distinct(12)=1, null(12)=0, distinct(13)=1, null(13)=0, distinct(28)=1, null(28)=0]
│ │ │ │ ├── stats: [rows=3.666667, distinct(11)=3.11719, null(11)=0, distinct(12)=1, null(12)=0, distinct(13)=1, null(13)=0, distinct(28)=1, null(28)=0]
│ │ │ │ ├── cost: 73.3666668
│ │ │ │ ├── fd: ()-->(12,13,28)
│ │ │ │ ├── distribution: ap-southeast-2
Expand All @@ -3689,7 +3689,7 @@ project
│ │ │ │ │ ├── columns: str:10 abc_id:11 x.id2:12 x.crdb_region:13
│ │ │ │ │ ├── constraint: /13/12/10/11/9: [/'ap-southeast-2'/'68088706-02c6-47d1-b993-a421cd761f2b' - /'ap-southeast-2'/'68088706-02c6-47d1-b993-a421cd761f2b']
│ │ │ │ │ ├── immutable
│ │ │ │ │ ├── stats: [rows=3.666667, distinct(10)=3.11719, null(10)=0, distinct(11)=3.11719, null(11)=0, distinct(12)=1, null(12)=0, distinct(13)=1, null(13)=0, distinct(12,13)=1, null(12,13)=0]
│ │ │ │ │ ├── stats: [rows=3.666667, distinct(11)=3.11719, null(11)=0, distinct(12)=1, null(12)=0, distinct(13)=1, null(13)=0, distinct(12,13)=1, null(12,13)=0]
│ │ │ │ │ │ histogram(13)= 0 3.6667
│ │ │ │ │ │ <--- 'ap-southeast-2'
│ │ │ │ │ ├── cost: 73.2733335
Expand Down
2 changes: 2 additions & 0 deletions pkg/kv/kvclient/kvcoord/dist_sender_rangefeed_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -642,6 +642,7 @@ func TestRangeFeedMetricsManagement(t *testing.T) {

frontier, err := span.MakeFrontier(fooSpan)
require.NoError(t, err)
frontier = span.MakeConcurrentFrontier(frontier)

// This error causes rangefeed to restart.
transientErrEvent := kvpb.RangeFeedEvent{
Expand Down Expand Up @@ -887,6 +888,7 @@ func TestMuxRangeFeedCanCloseStream(t *testing.T) {

frontier, err := span.MakeFrontier(fooSpan)
require.NoError(t, err)
frontier = span.MakeConcurrentFrontier(frontier)

expectFrontierAdvance := func() {
t.Helper()
Expand Down
3 changes: 3 additions & 0 deletions pkg/sql/appstatspb/app_stats.proto
Original file line number Diff line number Diff line change
Expand Up @@ -313,6 +313,9 @@ message TxnStats {
// ExecStats contains execution statistics that apply to both statements
// and transactions. These stats are currently collected using a sampling
// approach.
// When adding additional fields to this message, please make the corresponding
// changes to 'SampledExecStats' in telemetry.proto to keep the two messages in sync
// with respect to the information stored.
message ExecStats {
// Count keeps track of how many times execution stats were recorded. This is
// not necessarily equal to the number of times a statement/transaction was
Expand Down
4 changes: 2 additions & 2 deletions pkg/sql/opt/exec/execbuilder/testdata/explain_redact
Original file line number Diff line number Diff line change
Expand Up @@ -2308,7 +2308,7 @@ project
├── project
│ ├── columns: c:2 b:1
│ ├── immutable
│ ├── stats: [rows=1000, distinct(1)=1000, null(1)=0, distinct(2)=1000, null(2)=0]
│ ├── stats: [rows=1000, distinct(1)=1000, null(1)=0]
│ ├── cost: 1088.44
│ ├── key: (1)
│ ├── fd: (1)-->(2)
Expand Down Expand Up @@ -2378,7 +2378,7 @@ project
├── project
│ ├── columns: c:2(float!null) b:1(float!null)
│ ├── immutable
│ ├── stats: [rows=1000, distinct(1)=1000, null(1)=0, distinct(2)=1000, null(2)=0]
│ ├── stats: [rows=1000, distinct(1)=1000, null(1)=0]
│ ├── cost: 1088.44
│ ├── key: (1)
│ ├── fd: (1)-->(2)
Expand Down
14 changes: 7 additions & 7 deletions pkg/sql/opt/exec/execbuilder/testdata/inverted_index
Original file line number Diff line number Diff line change
Expand Up @@ -3302,14 +3302,14 @@ inner-join (lookup geo_table)
│ ├── columns: geo_table2.k:1 geo_table2.geom:2 geo_table.k:11
│ ├── inverted-expr
│ │ └── st_intersects(geo_table2.geom:2, geo_table.geom:12)
│ ├── stats: [rows=10000, distinct(1)=999.957, null(1)=0, distinct(11)=999.957, null(11)=0]
│ ├── stats: [rows=10000, distinct(11)=999.957, null(11)=0]
│ ├── cost: 41812.84
│ ├── key: (1,11)
│ ├── fd: (1)-->(2)
│ ├── distribution: test
│ ├── scan geo_table2
│ │ ├── columns: geo_table2.k:1 geo_table2.geom:2
│ │ ├── stats: [rows=1000, distinct(1)=1000, null(1)=0, distinct(2)=100, null(2)=10]
│ │ ├── stats: [rows=1000, distinct(2)=100, null(2)=10]
│ │ ├── cost: 1108.82
│ │ ├── key: (1)
│ │ ├── fd: (1)-->(2)
Expand Down Expand Up @@ -3395,14 +3395,14 @@ left-join (lookup geo_table)
│ ├── first join in paired joiner; continuation column: continuation:16
│ ├── inverted-expr
│ │ └── st_intersects(geo_table2.geom:2, geo_table.geom:12)
│ ├── stats: [rows=10000, distinct(1)=1000, null(1)=0, distinct(11)=999.957, null(11)=0]
│ ├── stats: [rows=10000, distinct(11)=999.957, null(11)=0]
│ ├── cost: 41812.84
│ ├── key: (1,11)
│ ├── fd: (1)-->(2), (11)-->(16)
│ ├── distribution: test
│ ├── scan geo_table2
│ │ ├── columns: geo_table2.k:1 geo_table2.geom:2
│ │ ├── stats: [rows=1000, distinct(1)=1000, null(1)=0]
│ │ ├── stats: [rows=1000]
│ │ ├── cost: 1108.82
│ │ ├── key: (1)
│ │ ├── fd: (1)-->(2)
Expand Down Expand Up @@ -3434,7 +3434,7 @@ semi-join (lookup geo_table)
│ ├── first join in paired joiner; continuation column: continuation:16
│ ├── inverted-expr
│ │ └── st_intersects(geo_table2.geom:2, geo_table.geom:12)
│ ├── stats: [rows=10000, distinct(1)=999.957, null(1)=0, distinct(11)=999.957, null(11)=0]
│ ├── stats: [rows=10000, distinct(11)=999.957, null(11)=0]
│ ├── cost: 41812.84
│ ├── key: (1,11)
│ ├── fd: (1)-->(2), (11)-->(16)
Expand Down Expand Up @@ -3474,14 +3474,14 @@ anti-join (lookup geo_table)
│ ├── first join in paired joiner; continuation column: continuation:16
│ ├── inverted-expr
│ │ └── st_intersects(geo_table2.geom:2, geo_table.geom:12)
│ ├── stats: [rows=10000, distinct(1)=1000, null(1)=0, distinct(11)=999.957, null(11)=0]
│ ├── stats: [rows=10000, distinct(11)=999.957, null(11)=0]
│ ├── cost: 41812.84
│ ├── key: (1,11)
│ ├── fd: (1)-->(2), (11)-->(16)
│ ├── distribution: test
│ ├── scan geo_table2
│ │ ├── columns: geo_table2.k:1 geo_table2.geom:2
│ │ ├── stats: [rows=1000, distinct(1)=1000, null(1)=0]
│ │ ├── stats: [rows=1000]
│ │ ├── cost: 1108.82
│ │ ├── key: (1)
│ │ ├── fd: (1)-->(2)
Expand Down
12 changes: 11 additions & 1 deletion pkg/sql/opt/memo/statistics_builder.go
Original file line number Diff line number Diff line change
Expand Up @@ -1340,7 +1340,17 @@ func (sb *statisticsBuilder) buildJoin(
corr := sb.correlationFromMultiColDistinctCountsForJoin(constrainedCols, leftCols, rightCols, join, s)
s.ApplySelectivity(sb.selectivityFromConstrainedCols(constrainedCols, histCols, join, s, corr))
s.ApplySelectivity(sb.selectivityFromUnappliedConjuncts(numUnappliedConjuncts))
s.ApplySelectivity(sb.selectivityFromNullsRemoved(join, relProps.NotNullCols, constrainedCols))

// Ignore columns that are already null in the input when calculating
// selectivity from null-removing filters - the selectivity would always be
// 1.
ignoreCols := constrainedCols
if relProps.NotNullCols.Intersects(h.leftProps.NotNullCols) ||
relProps.NotNullCols.Intersects(h.rightProps.NotNullCols) {
ignoreCols = ignoreCols.Union(h.leftProps.NotNullCols)
ignoreCols.UnionWith(h.rightProps.NotNullCols)
}
s.ApplySelectivity(sb.selectivityFromNullsRemoved(join, relProps.NotNullCols, ignoreCols))

// Update distinct counts based on equivalencies; this should happen after
// selectivityFromMultiColDistinctCounts and selectivityFromEquivalencies.
Expand Down
Loading

0 comments on commit cac8b23

Please sign in to comment.