Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add statement fingerprint ID to sampled query #84182

Merged

Conversation

THardy98
Copy link

Partially addresses: #71328

This change adds the statement fingerprint ID to the sampled query log,
allowing us to measure the uniqueness of a workload.

Release note (sql change): Sampled query telemetry log now includes the
statement's fingerprint ID.

@THardy98 THardy98 requested review from a team July 11, 2022 15:53
@THardy98 THardy98 requested a review from a team as a code owner July 11, 2022 15:53
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@THardy98 THardy98 force-pushed the add_stmt_fingerprint_to_sampled_query branch 3 times, most recently from cb74c19 to d710737 Compare July 12, 2022 17:14
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, 4 of 5 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @THardy98)


-- commits line 2 at r1:
nit: this commit should follow the guidelines too :)


-- commits line 10 at r3:
nit: is this user-visible change? In other words, do CRDB users care about this? Or only CRDB staff? If it's the latter, then we should not add a release note for it.


pkg/sql/conn_executor.go line 1371 at r3 (raw file):

		// currentStatementFingerprintID is the statement fingerprint ID of the last statement
		// dispatched to the execution engine.
		currentStatementFingerprintID roachpb.StmtFingerprintID

I think I would move it outside of extraTxnState since the field is not updated in the scope of the transaction. Currently, it seems like it's just a placeholder to pass the stmt fingerprint ID from recordStatementSummary to maybeLogStatement, and we don't modify it in other methods like onTxnRestart.

How about not storing it explicitly at all? Something like this:

diff --git a/pkg/sql/conn_executor_exec.go b/pkg/sql/conn_executor_exec.go
index 98878f46f3..f76d427c0a 100644
--- a/pkg/sql/conn_executor_exec.go
+++ b/pkg/sql/conn_executor_exec.go
@@ -1060,6 +1060,7 @@ func (ex *connExecutor) dispatchToExecutionEngine(
                res.DisableBuffering()
        }
 
+       var stmtFingerprintID roachpb.StmtFingerprintID
        defer func() {
                planner.maybeLogStatement(
                        ctx,
@@ -1071,6 +1072,7 @@ func (ex *connExecutor) dispatchToExecutionEngine(
                        ex.statsCollector.PhaseTimes().GetSessionPhaseTime(sessionphase.SessionQueryReceived),
                        &ex.extraTxnState.hasAdminRoleCache,
                        ex.server.TelemetryLoggingMetrics,
+                       stmtFingerprintID,
                )
        }()
 
@@ -1184,7 +1186,7 @@ func (ex *connExecutor) dispatchToExecutionEngine(
 
        // Record the statement summary. This also closes the plan if the
        // plan has not been closed earlier.
-       ex.recordStatementSummary(
+       stmtFingerprintID = ex.recordStatementSummary(
                ctx, planner,
                int(ex.state.mu.autoRetryCounter), res.RowsAffected(), res.Err(), stats,
        )
diff --git a/pkg/sql/executor_statement_metrics.go b/pkg/sql/executor_statement_metrics.go
index c44dc589c7..27dd659502 100644
--- a/pkg/sql/executor_statement_metrics.go
+++ b/pkg/sql/executor_statement_metrics.go
@@ -116,7 +116,7 @@ func (ex *connExecutor) recordStatementSummary(
        rowsAffected int,
        stmtErr error,
        stats topLevelQueryStats,
-) {
+) roachpb.StmtFingerprintID {
        phaseTimes := ex.statsCollector.PhaseTimes()
 
        // Collect the statistics.
@@ -235,6 +235,8 @@ func (ex *connExecutor) recordStatementSummary(
                        sessionAge,
                )
        }
+
+       return stmtFingerprintID
 }
 
 func (ex *connExecutor) updateOptCounters(planFlags planFlags) {

pkg/sql/exec_log.go line 17 at r3 (raw file):

	"context"
	"fmt"
	"github.com/cockroachdb/cockroach/pkg/roachpb"

nit: ordering is off.


pkg/util/log/eventpb/telemetry.proto line 63 at r3 (raw file):

  // Statement fingerprint ID of the query.
  uint64 statement_fingerprint_id = 12 [(gogoproto.customname) = "StatementFingerprintID", (gogoproto.jsontag) = ',omitempty'];

nit: we probably could use roachpb.StatementFingerprintID type here too, not sure if it's important enough, up to you.

nit: is this sensitive? Should this be marked as "redact:\"nonsensitive\""?

Copy link
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @THardy98)


pkg/sql/exec_log.go line 389 at r3 (raw file):

		}
		if telemetryMetrics.maybeUpdateLastEmittedTime(telemetryMetrics.timeNow(), requiredTimeElapsed) {
			database := p.CurrentDatabase()

nit: is this change to assign p.CurrentDatabase() to an intermediate variable necessary, if it's only used in a single place?

Code quote:

database := p.CurrentDatabase()

@THardy98 THardy98 force-pushed the add_stmt_fingerprint_to_sampled_query branch from d710737 to 0c70fa9 Compare July 12, 2022 20:06
Copy link
Author

@THardy98 THardy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @THardy98, and @yuzefovich)


-- commits line 10 at r3:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: is this user-visible change? In other words, do CRDB users care about this? Or only CRDB staff? If it's the latter, then we should not add a release note for it.

I'd say yes, users ingesting telemetry logs will see this new field.


pkg/sql/conn_executor.go line 1371 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I think I would move it outside of extraTxnState since the field is not updated in the scope of the transaction. Currently, it seems like it's just a placeholder to pass the stmt fingerprint ID from recordStatementSummary to maybeLogStatement, and we don't modify it in other methods like onTxnRestart.

How about not storing it explicitly at all? Something like this:

diff --git a/pkg/sql/conn_executor_exec.go b/pkg/sql/conn_executor_exec.go
index 98878f46f3..f76d427c0a 100644
--- a/pkg/sql/conn_executor_exec.go
+++ b/pkg/sql/conn_executor_exec.go
@@ -1060,6 +1060,7 @@ func (ex *connExecutor) dispatchToExecutionEngine(
                res.DisableBuffering()
        }
 
+       var stmtFingerprintID roachpb.StmtFingerprintID
        defer func() {
                planner.maybeLogStatement(
                        ctx,
@@ -1071,6 +1072,7 @@ func (ex *connExecutor) dispatchToExecutionEngine(
                        ex.statsCollector.PhaseTimes().GetSessionPhaseTime(sessionphase.SessionQueryReceived),
                        &ex.extraTxnState.hasAdminRoleCache,
                        ex.server.TelemetryLoggingMetrics,
+                       stmtFingerprintID,
                )
        }()
 
@@ -1184,7 +1186,7 @@ func (ex *connExecutor) dispatchToExecutionEngine(
 
        // Record the statement summary. This also closes the plan if the
        // plan has not been closed earlier.
-       ex.recordStatementSummary(
+       stmtFingerprintID = ex.recordStatementSummary(
                ctx, planner,
                int(ex.state.mu.autoRetryCounter), res.RowsAffected(), res.Err(), stats,
        )
diff --git a/pkg/sql/executor_statement_metrics.go b/pkg/sql/executor_statement_metrics.go
index c44dc589c7..27dd659502 100644
--- a/pkg/sql/executor_statement_metrics.go
+++ b/pkg/sql/executor_statement_metrics.go
@@ -116,7 +116,7 @@ func (ex *connExecutor) recordStatementSummary(
        rowsAffected int,
        stmtErr error,
        stats topLevelQueryStats,
-) {
+) roachpb.StmtFingerprintID {
        phaseTimes := ex.statsCollector.PhaseTimes()
 
        // Collect the statistics.
@@ -235,6 +235,8 @@ func (ex *connExecutor) recordStatementSummary(
                        sessionAge,
                )
        }
+
+       return stmtFingerprintID
 }
 
 func (ex *connExecutor) updateOptCounters(planFlags planFlags) {

Yea, that's a better idea. Changed to return from recordStatementSummary instead.


pkg/sql/exec_log.go line 389 at r3 (raw file):

Previously, abarganier (Alex Barganier) wrote…

nit: is this change to assign p.CurrentDatabase() to an intermediate variable necessary, if it's only used in a single place?

Nope, moving it to the log field.


pkg/util/log/eventpb/telemetry.proto line 63 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: we probably could use roachpb.StatementFingerprintID type here too, not sure if it's important enough, up to you.

nit: is this sensitive? Should this be marked as "redact:\"nonsensitive\""?

I tried to use roachpb.StatementFingerprintID earlier, but there's an import cycle. We don't lose any information using uint64, seems fine.

By default, numbers aren't marked for redaction.

Copy link
Author

@THardy98 THardy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @THardy98, and @yuzefovich)


pkg/sql/exec_log.go line 17 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: ordering is off.

Fixed.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Thanks!

Reviewed 4 of 4 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @abarganier and @THardy98)

@THardy98 THardy98 force-pushed the add_stmt_fingerprint_to_sampled_query branch from 0c70fa9 to bd314c4 Compare July 12, 2022 20:30
Copy link
Author

@THardy98 THardy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier and @yuzefovich)


-- commits line 2 at r1:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: this commit should follow the guidelines too :)

Oh, didn't know that, fixed.

Copy link
Author

@THardy98 THardy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome TYFR :)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier and @yuzefovich)

Copy link
Member

@xinhaoz xinhaoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1, 7 of 8 files at r6, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier and @yuzefovich)

Thomas Hardy added 2 commits July 13, 2022 09:27
Remove unused `Fingerprint()` function from `roachpb`.

Release note: None
This change adds the statement fingerprint ID to the sampled query log,
allowing us to measure the uniqueness of a workload.

Release note (sql change): Sampled query telemetry log now includes the
statement's fingerprint ID.
@THardy98 THardy98 force-pushed the add_stmt_fingerprint_to_sampled_query branch from bd314c4 to 95d655a Compare July 13, 2022 13:39
@THardy98
Copy link
Author

Rebased, fixed some merge conflicts

Copy link
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: thank you!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @xinhaoz and @yuzefovich)

@THardy98
Copy link
Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 15, 2022

Build succeeded:

@THardy98
Copy link
Author

blathers backport 22.1 21.2

@blathers-crl
Copy link

blathers-crl bot commented Jul 21, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 95d655a to blathers/backport-release-21.2-84182: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 21.2 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants