-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: add crdb_internal.reset_sql_stats() builtin #62175
Conversation
Couldn't recreate the CI failures both locally and on GCE worker. Will take another look tomorrow. |
Fixed the test. CI is green. Ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! @maryliag would be great if you could review as well even if it's just to ask questions.
Reviewed 10 of 15 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng)
docs/generated/http/full.md, line 2839 at r1 (raw file):
## SQLStatisticsReset `POST /_status/sqlstatisticsreset`
I would change the name here to be more in line with the builtin. It also sounds more "active", i.e. "resetsqlstats" instead of "sqlstatisticsreset". I also think it's worth it to shorten statistics to stats here and elsewhere.
docs/generated/http/full.md, line 2873 at r1 (raw file):
| Field | Type | Label | Description | Support status | | ----- | ---- | ----- | ----------- | -------------- | | num_of_nodes_resetted | [int32](#cockroach.server.serverpb.SQLStatisticsResetResponse-int32) | | Number of nodes that have resetted their SQL stats. | [reserved](#support-status) |
nit: here and elsewhere s/resetted/reset
.
Also, why return the number of nodes reset? Is it used anywhere?
docs/generated/sql/functions.md, line 2717 at r1 (raw file):
<tr><td><a name="crdb_internal.range_stats"></a><code>crdb_internal.range_stats(key: <a href="bytes.html">bytes</a>) → jsonb</code></td><td><span class="funcdesc"><p>This function is used to retrieve range statistics information as a JSON object.</p> </span></td></tr> <tr><td><a name="crdb_internal.reset_sql_statistics"></a><code>crdb_internal.reset_sql_statistics() → <a href="bool.html">bool</a></code></td><td><span class="funcdesc"><p>This function can be used to clear the collected SQL-related statistics.</p>
nit: s/can be used/is used
and s/SQL-related/SQL
pkg/server/sql_stats.go, line 23 at r1 (raw file):
) func (s *statusServer) SQLStatisticsReset(
cc @yuzefovich when you added the contention API you mentioned a v2 status server, should this method be implemented there?
As mentioned before, I think it'd be nicer to s/SQLStatisticsReset/ResetSQLStats
pkg/server/sql_stats.go, line 72 at r1 (raw file):
response.NumOfNodesResetted++ }, func(nodeID roachpb.NodeID, err error) {
Empty fn?
pkg/server/stats_test.go, line 264 at r1 (raw file):
func resetClusterStats(ctx context.Context, cluster serverutils.TestClusterInterface) { // Flush stats at the beginning of the test.
nit: remove comment, I think this belongs outside the function (as you've done)
pkg/server/stats_test.go, line 287 at r1 (raw file):
// Flush stats at the beginning of the test. resetClusterStats(ctx, testCluster)
Why do the stats need to be reset at the start?
pkg/server/stats_test.go, line 300 at r1 (raw file):
statsPreReset, err := status.Statements(ctx, &serverpb.StatementsRequest{}) if err != nil { t.Fatal(err)
Consider using require
(e.g. require.NoError
in this case) to help with vebosity in this test.
pkg/server/tenant_status.go, line 119 at r1 (raw file):
) (*serverpb.SQLStatisticsResetResponse, error) { // TODO(azhng): Currently a tenant status server only reset its local SQL stats, // this needs to be updated once the pod-to-pod communication is implemented.
I think it'd be nice to add a top-level comment about this rather than just have it in this function.
pkg/server/testserver.go, line 763 at r1 (raw file):
// Since SQLStatusServer requires a pointer to SQLServer, it is constructed after SQLServer is // constructed and retrospectively added to SQLServer's execCfg.
nit: I would just delete this comment.
pkg/server/serverpb/status.go, line 28 at r1 (raw file):
ListContentionEvents(context.Context, *ListContentionEventsRequest) (*ListContentionEventsResponse, error) ListLocalContentionEvents(context.Context, *ListContentionEventsRequest) (*ListContentionEventsResponse, error) SQLStatisticsReset(ctx context.Context, request *SQLStatisticsResetRequest) (*SQLStatisticsResetResponse, error)
cc @Azhng in a follow up PR, could you add the actual stats endpoints? This should support us displaying stats in a tenant's DB Console. If it doesn't work it'd be nice to know this sooner rather than later cc @jordanlewis
pkg/sql/sem/tree/eval.go, line 3271 at r1 (raw file):
// SQLStatisticsResetter is an interface embedded in EvalCtx which can be used by // the builtins to reset SQL stats. type SQLStatisticsResetter interface {
nit: s/SQLStatisticsResetter/SQLStatsResetter
pkg/sql/sem/tree/eval.go, line 3272 at r1 (raw file):
// the builtins to reset SQL stats. type SQLStatisticsResetter interface { SQLStatsReset(ctx context.Context) error
Nice.
s/SQLStatsReset/ResetSQLStats
and comment that the interface exists to avoid a dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few nit picking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @Azhng)
docs/generated/http/full.md, line 2873 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
nit: here and elsewhere
s/resetted/reset
.Also, why return the number of nodes reset? Is it used anywhere?
My thoughts here is that since resetting the stats requiring fan-out RPCs to all nodes in the cluster, it is possible for number of RPCs to fail for various reasons.
If the reset operation only partially failed, it is difficult to communicate the partial failure using a simple boolean. Hence I opted for a numeric value here.
But yes I should probably change the return value of the builtin from boolean to an int value.
pkg/server/sql_stats.go, line 72 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Empty fn?
It is empty because I think iterateNodes
does not check for nil
before calling errorFn
. I was using statement.go
as a reference, and seems like it simply ignored the error if the RPC fails. Since the result of RPC is communicate through number of nodes where reset operations were successful, I don't think additional error handling logic is required here.
Though I think the alternative is to modify the iterateNodes
to check for nil
before calling errorFn
. I don't have particular preference for either, what's your thoughts on this?
efd0ee9
to
1d86a2b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @Azhng)
pkg/server/sql_stats.go, line 23 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
cc @yuzefovich when you added the contention API you mentioned a v2 status server, should this method be implemented there?
As mentioned before, I think it'd be nicer to
s/SQLStatisticsReset/ResetSQLStats
My thinking is that for now it is ok to implement things as v1 only, but soon (in 21.2 release) we'll need to implement everything as v2, so it's up to Archer to decide when to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asubiotto can you PTAL ?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @jordanlewis, and @maryliag)
docs/generated/http/full.md, line 2839 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I would change the name here to be more in line with the builtin. It also sounds more "active", i.e. "resetsqlstats" instead of "sqlstatisticsreset". I also think it's worth it to shorten statistics to stats here and elsewhere.
Done.
docs/generated/http/full.md, line 2873 at r1 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
reset
, notresetted
Done.
pkg/server/sql_stats.go, line 23 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
My thinking is that for now it is ok to implement things as v1 only, but soon (in 21.2 release) we'll need to implement everything as v2, so it's up to Archer to decide when to do it.
Done.
pkg/server/stats_test.go, line 287 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Why do the stats need to be reset at the start?
Hmm seems like it's not needed. Removed
pkg/server/stats_test.go, line 300 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Consider using
require
(e.g.require.NoError
in this case) to help with vebosity in this test.
Done.
pkg/server/stats_test.go, line 313 at r1 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
the previous test had
but found:
and this one doesn't have the:
. Change one of them so they're the same
Done.
pkg/server/tenant_status.go, line 119 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I think it'd be nice to add a top-level comment about this rather than just have it in this function.
Done.
pkg/server/tenant_status.go, line 122 at r1 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
NumOfNodesReset
Done.
pkg/server/testserver.go, line 763 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
nit: I would just delete this comment.
Removed.
pkg/server/serverpb/status.go, line 28 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
cc @Azhng in a follow up PR, could you add the actual stats endpoints? This should support us displaying stats in a tenant's DB Console. If it doesn't work it'd be nice to know this sooner rather than later cc @jordanlewis
Sounds good, will send a follow up PR after this.
pkg/server/serverpb/status.proto, line 1020 at r1 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
comment and variable name (make sure to fix all places this variable is called):
reset
and notresetted
Done.
pkg/sql/sem/tree/eval.go, line 3272 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Nice.
s/SQLStatsReset/ResetSQLStats
and comment that the interface exists to avoid a dependency
ResetSQLStats
is already a function in conn_executor.go
. I suppose I can rename this to ResetClusterSQLStats
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 13 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @Azhng, and @maryliag)
pkg/server/sql_stats.go, line 72 at r1 (raw file):
Previously, Azhng (Archer Zhang) wrote…
It is empty because I think
iterateNodes
does not check fornil
before callingerrorFn
. I was usingstatement.go
as a reference, and seems like it simply ignored the error if the RPC fails. Since the result of RPC is communicate through number of nodes where reset operations were successful, I don't think additional error handling logic is required here.Though I think the alternative is to modify the
iterateNodes
to check fornil
before callingerrorFn
. I don't have particular preference for either, what's your thoughts on this?
So why not use this mechanism instead of returning the number of nodes in the response? I think it's not good usability to have the NumNodesReset
field because you're just pushing the error handling logic one layer up. The caller shouldn't have to care or know the details about how this RPC is executed. I think what you should do is remove that field and instead combine errors in this function and return an error as normal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 13 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @Azhng, and @maryliag)
pkg/server/tenant_status.go, line 119 at r1 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Done.
Sorry, I meant a top-level file comment. i.e. all functions need to be updated once we have pod-to-pod communication. Even better, we probably would remove this tenant status server entirely and use the normal status server once we remove any dependencies unavailable to tenants.
Appears to be a flake. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @maryliag)
pkg/server/sql_stats.go, line 72 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
So why not use this mechanism instead of returning the number of nodes in the response? I think it's not good usability to have the
NumNodesReset
field because you're just pushing the error handling logic one layer up. The caller shouldn't have to care or know the details about how this RPC is executed. I think what you should do is remove that field and instead combine errors in this function and return an error as normal.
Changed the response to have an array of Error protobuf objects.
pkg/server/tenant_status.go, line 119 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Sorry, I meant a top-level file comment. i.e. all functions need to be updated once we have pod-to-pod communication. Even better, we probably would remove this tenant status server entirely and use the normal status server once we remove any dependencies unavailable to tenants.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @Azhng, and @maryliag)
pkg/server/sql_stats.go, line 72 at r1 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Changed the response to have an array of Error protobuf objects.
What I meant is use errors.CombineErrors
and return the result as an error here (i.e. nil, err
) this will return the error as a GRPC error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @maryliag)
pkg/server/sql_stats.go, line 72 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
What I meant is use
errors.CombineErrors
and return the result as an error here (i.e.nil, err
) this will return the error as a GRPC error
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 15 files at r1, 1 of 13 files at r2, 10 of 10 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @maryliag)
TFTR! bors r=asubiotto |
Build failed (retrying...): |
Merge conflict. |
Previously, there was no mechanism to immediately clear SQL statistics. Users would have to wait until the reset interval expires. This commit creates a builtin to immediately clears SQL stats. Release note (sql change): SQL stats can now be cleared using crdb_internal.reset_sql_stats()
bors r=asubiotto |
Build succeeded: |
Just passing by and this looks great! Nice work. I noticed that the PR description says |
Hi @rafiss , thanks for the heads up. Though this PR only partially addressed the issue by adding a builtin and an RPC endpoint, we are still working on adding the frontend components that will allow uses to reset SQL stats from db console. |
Gotcha! I was wondering if it was on purpose but just wanted to point it out in case |
Previously, there was no mechanism to immediately clear SQL statistics. Users
would have to wait until the reset interval expires. This commit creates a
builtin to immediately clears SQL stats.
Release note (sql change): SQL stats can now be cleared using
crdb_internal.reset_sql_stats()
Addresses #33315