Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture transaction execution statistics as a new event #108284

Closed
kevin-v-ngo opened this issue Aug 7, 2023 · 0 comments · Fixed by #115722
Closed

Capture transaction execution statistics as a new event #108284

kevin-v-ngo opened this issue Aug 7, 2023 · 0 comments · Fixed by #115722
Assignees
Labels
A-observability-inf C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@kevin-v-ngo
Copy link

kevin-v-ngo commented Aug 7, 2023

It's difficult to monitor and troubleshoot CockroachDB without SQL statistics in downstream APM and observability tools.

This issue tracks emitting a new transaction-level event with statistics similar to what have for the statement-level event to allow users to identify and correlate issues with specific statements and up their application. Note the following behavior:

  • Users should be able to correlate transaction-level and statement-level statistics in their downstream system
  • Users should be able to correlate this new event with console statistics when performing deeper database troubleshooting using the console
  • Users should be able to view the performance of a transaction fingerprint over time and drill into any outliers

Jira issue: CRDB-30411

Epic CRDB-25399

@kevin-v-ngo kevin-v-ngo added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-cluster-observability labels Aug 7, 2023
@xinhaoz xinhaoz self-assigned this Oct 30, 2023
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Nov 3, 2023
Previously with the json logging format, it was not possible to emit
boolean fields that were false. Even if the boolean field is marked as
'includeempty', it will be emitted as `field: true` in the event.
While always not including false boolean fields is more space efficient,
with certain fields it's helpful to have the field emitted in all
instances and explicitly state its 'false' value.

This patch makes it possible for `fieldName: false` to be emitted
in the json logging format when the field is marked as `includeempty`.

Epic: none
Part of: cockroachdb#108284

Release note: None
craig bot pushed a commit that referenced this issue Nov 9, 2023
113757: util/log: allow bool fields to be emitted as false in json format r=xinhaoz a=xinhaoz

Previously with the json logging format, it was not possible to emit boolean fields that were false. Even if the boolean field is marked as 'includeempty', it will be emitted as `field: true` in the event. While always not including false boolean fields is more space efficient, with certain fields it's helpful to have the field emitted in all instances and explicitly state its 'false' value.

This patch makes it possible for `fieldName: false` to be emitted in the json logging format when the field is marked as `includeempty`.

Epic: none
Part of: #108284

Release note: None

113992: sql: add disable_changefeed_replication session variable r=miretskiy,yuzefovich a=andyyang890

This patch adds a `disable_changefeed_replication` session variable
that can be used to disable changefeed replication for changes that
occur within a session. Right now, the session variable has no effect
but in later commits, it will be plumbed to the KV layer.

Fixes #114071

Release note: None

Co-authored-by: Xin Hao Zhang <[email protected]>
Co-authored-by: Andy Yang <[email protected]>
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Nov 10, 2023
This commit adds the  messages listed below to `telemetry.proto` in
preparation for sending transaction executions to the telemetry
channel. The transaction event that is eventually sent should  contain
all execution information currently being tracked for transaction
fingerprints.

- `SampledTransaction`: contains fields equivalent to the execution
information stored by `CollectedTransactionStatistics` from
app_stats.proto, but represents a single txn execution instead
of aggregated executions of a transaction fingerprint.
- `SampledExecStats`: used as a field in `SampledTransaction`, it
contains execution stats that are sampled. This event is the equivalent
to `ExecStats` from app_stats.proto but for a single execution.
- `MVCCIteratorStats`: used in `SampledExecStats` above, the equivalent of
MVCCIteratorStats from app_stats.proto but for a single execution.

In addition, in order to support the above fields a couple of additional
code templates have been added for generating json log encoding:
- array_of_uint64 type is now being handled for json logs
- `nestedMessage` has been added as a custom type in `gen.go`. Object field
types can be assigned to this type in order to generate them as nested
objects.

Part of: cockroachdb#108284

Release note: None
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Nov 29, 2023
This commit adds the  messages listed below to `telemetry.proto` in
preparation for sending transaction executions to the telemetry
channel. The transaction event that is eventually sent should  contain
all execution information currently being tracked for transaction
fingerprints.

- `SampledTransaction`: contains fields equivalent to the execution
information stored by `CollectedTransactionStatistics` from
app_stats.proto, but represents a single txn execution instead
of aggregated executions of a transaction fingerprint.
- `SampledExecStats`: used as a field in `SampledTransaction`, it
contains execution stats that are sampled. This event is the equivalent
to `ExecStats` from app_stats.proto but for a single execution.
- `MVCCIteratorStats`: used in `SampledExecStats` above, the equivalent of
MVCCIteratorStats from app_stats.proto but for a single execution.

In addition, in order to support the above fields a couple of additional
code templates have been added for generating json log encoding:
- array_of_uint64 type is now being handled for json logs
- `nestedMessage` has been added as a custom type in `gen.go`. Object field
types can be assigned to this type in order to generate them as nested
objects.

Part of: cockroachdb#108284

Release note: None
craig bot pushed a commit that referenced this issue Dec 4, 2023
113952: log: add protobuf messages for telemetry txn events r=xinhaoz a=xinhaoz

This commit adds the  messages listed below to `telemetry.proto` in
preparation for sending transaction executions to the telemetry
channel. The transaction event that is eventually sent should  contain
all execution information currently being tracked for transaction
fingerprints.

- `SampledTransaction`: contains fields equivalent to the execution
information stored by `CollectedTransactionStatistics` from
app_stats.proto, but represents a single txn execution instead
of aggregated executions of a transaction fingerprint.
- `SampledExecStats`: used as a field in `SampledTransaction`, it
contains execution stats that are sampled. This event is the equivalent
to `ExecStats` from app_stats.proto but for a single execution.
- `MVCCIteratorStats`: used in `SampledExecStats` above, the equivalent of
MVCCIteratorStats from app_stats.proto but for a single execution.

In addition, in order to support the above fields a couple of additional
code templates have been added for generating json log encoding:
- array_of_uint64 type is now being handled for json logs
- `nestedMessage` has been added as a custom type in `gen.go`. Object field
types can be assigned to this type in order to generate them as nested
objects.

Part of: #108284

Release note: None

114666: opt: reduce planning time for queries with many joins r=mgartner a=mgartner

Prior to this commit, some queries with many joins would perform a large
number of allocations calculating the selectivity of null-rejecting join
filters. This was due to `statisticsBuiler.selectivityFromNullsRemoved`
allocating a single-column set for each not-null column, and allocating
column statistics for each set.

Many of those allocations and much unnecessary computations to traverse
the expression tree are now avoided. This is made possible by the
realization that the selectivity of a null-rejecting filter is always 1
if the column was already not-null in the input.

Epic: None

Release note: None


115509: span: Re-initialize iterator when forwarding r=miretskiy a=miretskiy

Re-initialize iterator when forwarding span
frontier timestamp.  The underlying btree may be
mutated (by merge operation) invalidating previously
constructed iterator.

Btree implementation is also hardened against mis-use
when mutating span frontier while iterating.

Fixes #115411
Fixes #115528
Fixes #115512
Fixes #115490
Fixes #115488
Fixes #115487
Fixes #115483

Release notes: None

Co-authored-by: Xin Hao Zhang <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Yevgeniy Miretskiy <[email protected]>
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Dec 5, 2023
…on mode

Previously, we dropped loggingBEGIN statements in telemetry transaction
sampling mode due to BEGIn not having an associated transaction execution
id at the time of logging. For transaction sampling we were using the
transaction execution id to track the transaction through its execution
in order to log all of its statements. Since BEGIN statements did not have
an id, we could not start tracking with BEGIN. This commit enables us to
include BEGIN statements by using a combination of the session id and
session txn counter as the tracking id for telemetry, instead of the
execution id. This allows us to start tracking transactions at BEGIN instead
of at the statement after it.

Part of: cockroachdb#108284

Release note (sql change): Telemetry logging - "transaction" sampling
mode will now log BEGIN statements when they are present in a sampled
transaction.
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Jan 10, 2024
This change modifies the telemetry transaction sampling process to be simpler.
Previously for telemetry transaction sampling the telemetry logging struct
managed the tracking of sampled transactions via a map of execution
ids. If a transaction was determined to be sampled, we would input an entry
in this map and for each statement, we would check the map to see if the
statement belongs to a tracked transaction. Instead of using a map, we
can mark the transaction as being sampled by telemetry in the conn executor.
This removes the need for concurrent data struct access when the transaction
is marked as being sampled, since each statement no longer needs to read an
entry from the shared map. This also removes the need to track the number
of transactions currently being sampled for telemetry, as this was introduced
to manage the memory used by the map.

We will now determine if the transaction should be logged to telemetry at
the start of transaction execution or at the start of a transaction restart.
The transaction will be marked for telemetry logging if enough time has
elapsed since the last transaction was sampled or if session tracing is on.

Transaction statements will be logged according to the following settings:
- sql.telemetry.transaction_sampling.frequency controls the frequency at
which we sample a transaction. If a transaction is marked to be sampled
by telemetry, this means we will log all of its statement execution events
to telemetry, up to a maximum of
`sql.telemetry.transaction_sampling.statement_events_per_transaction.max`
statements.

Part of: cockroachdb#108284

Release note (ops change): New cluster settings:
 - sql.telemetry.transaction_sampling.statement_events_per_transaction.max:
controls the maximum number of statement events to emit per sampled
transaction for TELEMETRY
- sql.telemetry.transaction_sampling.frequency: controls the maximum
frequency at which we sample transactions for telemetry
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Jan 10, 2024
This commit sends SampledTransaction events to the telemetry channel.
In "transaction" sampling mode, if a transaction is marked to be logged
to telemetry, we will emit a SampledTranaction event on transaction
end containing transaction level stats. It is expected that if a transaction
event exists in telemetry, its statement events will also have been logged
(with a maximum number according to the setting
sql.telemetry.transaction_sampling.statement_events_per_transaction.max).

Closes: cockroachdb#108284

Release note (ops change): Transactions sampled for the telemetry logging
channel will now emit a SampledTransaction event. To sample transactions,
set the cluster setting `sql.telemetry.query_sampling.mode = 'transaction'`
and enable telemetry logging via `sql.telemetry.query_sampling.enabled = true`.
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Jan 12, 2024
This change modifies the telemetry transaction sampling process to be simpler.
Previously for telemetry transaction sampling the telemetry logging struct
managed the tracking of sampled transactions via a map of execution
ids. If a transaction was determined to be sampled, we would input an entry
in this map and for each statement, we would check the map to see if the
statement belongs to a tracked transaction. Instead of using a map, we
can mark the transaction as being sampled by telemetry in the conn executor.
This removes the need for concurrent data struct access when the transaction
is marked as being sampled, since each statement no longer needs to read an
entry from the shared map. This also removes the need to track the number
of transactions currently being sampled for telemetry, as this was introduced
to manage the memory used by the map.

We will now determine if the transaction should be logged to telemetry at
the start of transaction execution or at the start of a transaction restart.
The transaction will be marked for telemetry logging if enough time has
elapsed since the last transaction was sampled or if session tracing is on.

Transaction statements will be logged according to the following settings:
- sql.telemetry.transaction_sampling.frequency controls the frequency at
which we sample a transaction. If a transaction is marked to be sampled
by telemetry, this means we will log all of its statement execution events
to telemetry, up to a maximum of
`sql.telemetry.transaction_sampling.statement_events_per_transaction.max`
statements.

Part of: cockroachdb#108284

Release note (ops change): New cluster settings:
 - sql.telemetry.transaction_sampling.statement_events_per_transaction.max:
controls the maximum number of statement events to emit per sampled
transaction for TELEMETRY
- sql.telemetry.transaction_sampling.frequency: controls the maximum
frequency at which we sample transactions for telemetry
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Jan 12, 2024
This change modifies the telemetry transaction sampling process to be simpler.
Previously for telemetry transaction sampling the telemetry logging struct
managed the tracking of sampled transactions via a map of execution
ids. If a transaction was determined to be sampled, we would input an entry
in this map and for each statement, we would check the map to see if the
statement belongs to a tracked transaction. Instead of using a map, we
can mark the transaction as being sampled by telemetry in the conn executor.
This removes the need for concurrent data struct access when the transaction
is marked as being sampled, since each statement no longer needs to read an
entry from the shared map. This also removes the need to track the number
of transactions currently being sampled for telemetry, as this was introduced
to manage the memory used by the map.

We will now determine if the transaction should be logged to telemetry at
the start of transaction execution or at the start of a transaction restart.
The transaction will be marked for telemetry logging if enough time has
elapsed since the last transaction was sampled or if session tracing is on.

Transaction statements will be logged according to the following settings:
- sql.telemetry.transaction_sampling.frequency controls the frequency at
which we sample a transaction. If a transaction is marked to be sampled
by telemetry, this means we will log all of its statement execution events
to telemetry, up to a maximum of
`sql.telemetry.transaction_sampling.statement_events_per_transaction.max`
statements.

Part of: cockroachdb#108284

Release note (ops change): New cluster settings:
 - sql.telemetry.transaction_sampling.statement_events_per_transaction.max:
controls the maximum number of statement events to emit per sampled
transaction for TELEMETRY
- sql.telemetry.transaction_sampling.frequency: controls the maximum
frequency at which we sample transactions for telemetry
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Jan 16, 2024
This change modifies the telemetry transaction sampling process to be simpler.
Previously for telemetry transaction sampling the telemetry logging struct
managed the tracking of sampled transactions via a map of execution
ids. If a transaction was determined to be sampled, we would input an entry
in this map and for each statement, we would check the map to see if the
statement belongs to a tracked transaction. Instead of using a map, we
can mark the transaction as being sampled by telemetry in the conn executor.
This removes the need for concurrent data struct access when the transaction
is marked as being sampled, since each statement no longer needs to read an
entry from the shared map. This also removes the need to track the number
of transactions currently being sampled for telemetry, as this was introduced
to manage the memory used by the map.

We will now determine if the transaction should be logged to telemetry at
the start of transaction execution or at the start of a transaction restart.
The transaction will be marked for telemetry logging if enough time has
elapsed since the last transaction was sampled or if session tracing is on.

Transaction statements will be logged according to the following settings:
- sql.telemetry.transaction_sampling.frequency controls the frequency at
which we sample a transaction. If a transaction is marked to be sampled
by telemetry, this means we will log all of its statement execution events
to telemetry, up to a maximum of
`sql.telemetry.transaction_sampling.statement_events_per_transaction.max`
statements.

Part of: cockroachdb#108284

Release note (ops change): New cluster settings:
 - sql.telemetry.transaction_sampling.statement_events_per_transaction.max:
controls the maximum number of statement events to emit per sampled
transaction for TELEMETRY
- sql.telemetry.transaction_sampling.frequency: controls the maximum
frequency at which we sample transactions for telemetry
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Jan 17, 2024
This change modifies the telemetry transaction sampling process to be simpler.
Previously for telemetry transaction sampling the telemetry logging struct
managed the tracking of sampled transactions via a map of execution
ids. If a transaction was determined to be sampled, we would input an entry
in this map and for each statement, we would check the map to see if the
statement belongs to a tracked transaction. Instead of using a map, we
can mark the transaction as being sampled by telemetry in the conn executor.
This removes the need for concurrent data struct access when the transaction
is marked as being sampled, since each statement no longer needs to read an
entry from the shared map. This also removes the need to track the number
of transactions currently being sampled for telemetry, as this was introduced
to manage the memory used by the map.

We will now determine if the transaction should be logged to telemetry at
the start of transaction execution or at the start of a transaction restart.
The transaction will be marked for telemetry logging if enough time has
elapsed since the last transaction was sampled or if session tracing is on.

Transaction statements will be logged according to the following settings:
- sql.telemetry.transaction_sampling.frequency controls the frequency at
which we sample a transaction. If a transaction is marked to be sampled
by telemetry, this means we will log all of its statement execution events
to telemetry, up to a maximum of
`sql.telemetry.transaction_sampling.statement_events_per_transaction.max`
statements.

Additional items in this commit:
- TelemetryLoggingMetrics to telemetryLoggingMetrics since it is not used
in other packages.
- Renames `lastEmittedTime` -> `lastSampledTime` in telemetryLogging struct
as it is no longer representative of what this timestamp is. With th eintroduction
of transaction sampling, the emitted event time is not necessary the time at which
we decide to sample an event.
- Creates a datadriven test handler for telemetry logging.  Datadriven telemetry
logging tests should be created in the dir pkg/sql/testdata/telemetry_logging.
  - `telemetry_logging/logging` contains tests on verifying emitted logs.
  - `telemetry_logging/logging_decision` contains unit tests for the functions
    `shouldEmitTransactionLog` and `shouldEmitStatementLog`.

Epic: none

Release note: None

Part of: cockroachdb#108284

Release note (ops change): New cluster settings:
 - sql.telemetry.transaction_sampling.statement_events_per_transaction.max:
controls the maximum number of statement events to emit per sampled
transaction for TELEMETRY
- sql.telemetry.transaction_sampling.frequency: controls the maximum
frequency at which we sample transactions for telemetry
craig bot pushed a commit that referenced this issue Jan 18, 2024
115733: telemetry: track telemetry transactions through conn executor r=xinhaoz a=xinhaoz

This change modifies the telemetry transaction sampling process to be simpler.
Previously for telemetry transaction sampling the telemetry logging struct
managed the tracking of sampled transactions via a map of execution
ids. If a transaction was determined to be sampled, we would input an entry
in this map and for each statement, we would check the map to see if the
statement belongs to a tracked transaction. Instead of using a map, we
can mark the transaction as being sampled by telemetry in the conn executor.
This removes the need for concurrent data struct access when the transaction
is marked as being sampled, since each statement no longer needs to read an
entry from the shared map. This also removes the need to track the number
of transactions currently being sampled for telemetry, as this was introduced
to manage the memory used by the map.

We will now determine if the transaction should be logged to telemetry at
the start of transaction execution or at the start of a transaction restart.
The transaction will be marked for telemetry logging if enough time has
elapsed since the last transaction was sampled or if session tracing is on.

Transaction statements will be logged according to the following settings:
- sql.telemetry.transaction_sampling.frequency controls the frequency at
which we sample a transaction. If a transaction is marked to be sampled
by telemetry, this means we will log all of its statement execution events
to telemetry, up to a maximum of
`sql.telemetry.transaction_sampling.statement_events_per_transaction.max`
statements.

Additional items in this commit:
- TelemetryLoggingMetrics to telemetryLoggingMetrics since it is not used
in other packages.
- Renames `lastEmittedTime` -> `lastSampledTime` in telemetryLogging struct
as it is no longer representative of what this timestamp is. With th eintroduction
of transaction sampling, the emitted event time is not necessary the time at which
we decide to sample an event.
- Creates a datadriven test handler for telemetry logging.  Datadriven telemetry
logging tests should be created in the dir pkg/sql/testdata/telemetry_logging.
  - `telemetry_logging/logging` contains tests on verifying emitted logs.
  - `telemetry_logging/logging_decision` contains unit tests for the functions
    `shouldEmitTransactionLog` and `shouldEmitStatementLog`.

Epic: none

Release note: None

Part of: #108284

Release note (ops change): New cluster settings:
 - sql.telemetry.transaction_sampling.statement_events_per_transaction.max:
controls the maximum number of statement events to emit per sampled
transaction for TELEMETRY
- sql.telemetry.transaction_sampling.frequency: controls the maximum
frequency at which we sample transactions for telemetry

Co-authored-by: Xin Hao Zhang <[email protected]>
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Jan 19, 2024
This commit sends SampledTransaction events to the telemetry channel.
In "transaction" sampling mode, if a transaction is marked to be logged
to telemetry, we will emit a SampledTranaction event on transaction
end containing transaction level stats. It is expected that if a transaction
event exists in telemetry, its statement events will also have been logged
(with a maximum number according to the setting
sql.telemetry.transaction_sampling.statement_events_per_transaction.max).

Closes: cockroachdb#108284

Release note (ops change): Transactions sampled for the telemetry logging
channel will now emit a SampledTransaction event. To sample transactions,
set the cluster setting `sql.telemetry.query_sampling.mode = 'transaction'`
and enable telemetry logging via `sql.telemetry.query_sampling.enabled = true`.
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Jan 22, 2024
This commit sends SampledTransaction events to the telemetry channel.
In "transaction" sampling mode, if a transaction is marked to be logged
to telemetry, we will emit a SampledTranaction event on transaction
end containing transaction level stats. It is expected that if a transaction
event exists in telemetry, its statement events will also have been logged
(with a maximum number according to the setting
sql.telemetry.transaction_sampling.statement_events_per_transaction.max).

Closes: cockroachdb#108284

Release note (ops change): Transactions sampled for the telemetry logging
channel will now emit a SampledTransaction event. To sample transactions,
set the cluster setting `sql.telemetry.query_sampling.mode = 'transaction'`
and enable telemetry logging via `sql.telemetry.query_sampling.enabled = true`.
craig bot pushed a commit that referenced this issue Jan 26, 2024
115722: telemetry: log transaction exec events to TELEMETRY r=xinhaoz a=xinhaoz

### 1. [sql/telemetry: add SkippedTransactions to SampledTransaction proto](a356e63) 

This commit adds the field SkippedTransactions to the
SampledTransaction protobuf to count the number of transactions
that were not sampled while telemetry transaction logging
is enabled. The corresponding field is added to the
telemetryLogging struct and will be used in the following
commit to track skipped transactions. Some whitespace in the
SampledTransaction proto definition is adjusted.

Epic: none

Release note (sql change): New field `SkippedTransactions` in
the SampledTransaction event, which is emitted to the TELEMETRY
logging channel when telemetry logging is enabled and set to
"transaction" mode.

### 2. [eventpb: make MVCCIteratorStats in SampledExecStats non-nullable](6610467) 

This field should always exist in SampledExecStats. Since
SampledTransaction is the only user of this message right now
and is yet to be used we can safely change the proto definition.

Epic: none

Release note: None

### 3. [eventpb: make MVCCIteratorStats in SampledExecStats non-nullable](6610467) 

This field should always exist in SampledExecStats. Since
SampledTransaction is the only user of this message right now
and is yet to be used we can safely change the proto definition.

Epic: none

Release note: None

### 4.  [telemetry: log transaction exec events to TELEMETRY](cc0d1bb)

This commit sends SampledTransaction events to the telemetry channel.
In "transaction" sampling mode, if a transaction is marked to be logged
to telemetry, we will emit a SampledTranaction event on transaction
end containing transaction level stats. It is expected that if a transaction
event exists in telemetry, its statement events will also have been logged
(with a maximum number according to the setting
sql.telemetry.transaction_sampling.statement_events_per_transaction.max.

Closes: #108284

Release note (ops change): Transactions sampled for the telemetry logging
channel will now emit a SampledTransaction event. To sample transactions,
set the cluster setting `sql.telemetry.query_sampling.mode = 'transaction'`
and enable telemetry logging via `sql.telemetry.query_sampling.enabled = true`.

118325: datapathutils: update comment for `DebuggableTempDir` r=rail a=rickystewart

Improve some of the wording here. Also I accidentally wrote "temp" instead of "test" which is confusing.

Epic: none
Release note: None

Co-authored-by: Xin Hao Zhang <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
@craig craig bot closed this as completed in 9199146 Jan 26, 2024
jlinder pushed a commit that referenced this issue Jan 29, 2024
Previously with the json logging format, it was not possible to emit
boolean fields that were false. Even if the boolean field is marked as
'includeempty', it will be emitted as `field: true` in the event.
While always not including false boolean fields is more space efficient,
with certain fields it's helpful to have the field emitted in all
instances and explicitly state its 'false' value.

This patch makes it possible for `fieldName: false` to be emitted
in the json logging format when the field is marked as `includeempty`.

Epic: none
Part of: #108284

Release note: None
jlinder pushed a commit that referenced this issue Jan 29, 2024
This commit adds the  messages listed below to `telemetry.proto` in
preparation for sending transaction executions to the telemetry
channel. The transaction event that is eventually sent should  contain
all execution information currently being tracked for transaction
fingerprints.

- `SampledTransaction`: contains fields equivalent to the execution
information stored by `CollectedTransactionStatistics` from
app_stats.proto, but represents a single txn execution instead
of aggregated executions of a transaction fingerprint.
- `SampledExecStats`: used as a field in `SampledTransaction`, it
contains execution stats that are sampled. This event is the equivalent
to `ExecStats` from app_stats.proto but for a single execution.
- `MVCCIteratorStats`: used in `SampledExecStats` above, the equivalent of
MVCCIteratorStats from app_stats.proto but for a single execution.

In addition, in order to support the above fields a couple of additional
code templates have been added for generating json log encoding:
- array_of_uint64 type is now being handled for json logs
- `nestedMessage` has been added as a custom type in `gen.go`. Object field
types can be assigned to this type in order to generate them as nested
objects.

Part of: #108284

Release note: None
jlinder pushed a commit that referenced this issue Jan 29, 2024
This change modifies the telemetry transaction sampling process to be simpler.
Previously for telemetry transaction sampling the telemetry logging struct
managed the tracking of sampled transactions via a map of execution
ids. If a transaction was determined to be sampled, we would input an entry
in this map and for each statement, we would check the map to see if the
statement belongs to a tracked transaction. Instead of using a map, we
can mark the transaction as being sampled by telemetry in the conn executor.
This removes the need for concurrent data struct access when the transaction
is marked as being sampled, since each statement no longer needs to read an
entry from the shared map. This also removes the need to track the number
of transactions currently being sampled for telemetry, as this was introduced
to manage the memory used by the map.

We will now determine if the transaction should be logged to telemetry at
the start of transaction execution or at the start of a transaction restart.
The transaction will be marked for telemetry logging if enough time has
elapsed since the last transaction was sampled or if session tracing is on.

Transaction statements will be logged according to the following settings:
- sql.telemetry.transaction_sampling.frequency controls the frequency at
which we sample a transaction. If a transaction is marked to be sampled
by telemetry, this means we will log all of its statement execution events
to telemetry, up to a maximum of
`sql.telemetry.transaction_sampling.statement_events_per_transaction.max`
statements.

Additional items in this commit:
- TelemetryLoggingMetrics to telemetryLoggingMetrics since it is not used
in other packages.
- Renames `lastEmittedTime` -> `lastSampledTime` in telemetryLogging struct
as it is no longer representative of what this timestamp is. With th eintroduction
of transaction sampling, the emitted event time is not necessary the time at which
we decide to sample an event.
- Creates a datadriven test handler for telemetry logging.  Datadriven telemetry
logging tests should be created in the dir pkg/sql/testdata/telemetry_logging.
  - `telemetry_logging/logging` contains tests on verifying emitted logs.
  - `telemetry_logging/logging_decision` contains unit tests for the functions
    `shouldEmitTransactionLog` and `shouldEmitStatementLog`.

Epic: none

Release note: None

Part of: #108284

Release note (ops change): New cluster settings:
 - sql.telemetry.transaction_sampling.statement_events_per_transaction.max:
controls the maximum number of statement events to emit per sampled
transaction for TELEMETRY
- sql.telemetry.transaction_sampling.frequency: controls the maximum
frequency at which we sample transactions for telemetry
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Jan 29, 2024
This commit sends SampledTransaction events to the telemetry channel.
In "transaction" sampling mode, if a transaction is marked to be logged
to telemetry, we will emit a SampledTranaction event on transaction
end containing transaction level stats. It is expected that if a transaction
event exists in telemetry, its statement events will also have been logged
(with a maximum number according to the setting
sql.telemetry.transaction_sampling.statement_events_per_transaction.max).
Transaction recording for telemetry is decided at the start of transaction
execution (including on restarts), and will not be refreshed for the remainder
of transaction execution.

Closes: cockroachdb#108284

Release note (ops change): Transactions sampled for the telemetry logging
channel will now emit a SampledTransaction event. To sample transactions,
set the cluster setting `sql.telemetry.query_sampling.mode = 'transaction'`
and enable telemetry logging via `sql.telemetry.query_sampling.enabled = true`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-observability-inf C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants