Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add crdb_internal.reset_sql_stats() builtin #62175

Merged
merged 1 commit into from
Mar 24, 2021

Conversation

Azhng
Copy link
Contributor

@Azhng Azhng commented Mar 17, 2021

Previously, there was no mechanism to immediately clear SQL statistics. Users
would have to wait until the reset interval expires. This commit creates a
builtin to immediately clears SQL stats.

Release note (sql change): SQL stats can now be cleared using
crdb_internal.reset_sql_stats()

Addresses #33315

@Azhng Azhng requested a review from asubiotto March 17, 2021 22:01
@Azhng Azhng requested review from a team as code owners March 17, 2021 22:01
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@Azhng
Copy link
Contributor Author

Azhng commented Mar 18, 2021

Couldn't recreate the CI failures both locally and on GCE worker. Will take another look tomorrow.

@Azhng
Copy link
Contributor Author

Azhng commented Mar 18, 2021

Fixed the test. CI is green. Ready for review

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! @maryliag would be great if you could review as well even if it's just to ask questions.

Reviewed 10 of 15 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng)


docs/generated/http/full.md, line 2839 at r1 (raw file):

## SQLStatisticsReset

`POST /_status/sqlstatisticsreset`

I would change the name here to be more in line with the builtin. It also sounds more "active", i.e. "resetsqlstats" instead of "sqlstatisticsreset". I also think it's worth it to shorten statistics to stats here and elsewhere.


docs/generated/http/full.md, line 2873 at r1 (raw file):

| Field | Type | Label | Description | Support status |
| ----- | ---- | ----- | ----------- | -------------- |
| num_of_nodes_resetted | [int32](#cockroach.server.serverpb.SQLStatisticsResetResponse-int32) |  | Number of nodes that have resetted their SQL stats. | [reserved](#support-status) |

nit: here and elsewhere s/resetted/reset.

Also, why return the number of nodes reset? Is it used anywhere?


docs/generated/sql/functions.md, line 2717 at r1 (raw file):

<tr><td><a name="crdb_internal.range_stats"></a><code>crdb_internal.range_stats(key: <a href="bytes.html">bytes</a>) &rarr; jsonb</code></td><td><span class="funcdesc"><p>This function is used to retrieve range statistics information as a JSON object.</p>
</span></td></tr>
<tr><td><a name="crdb_internal.reset_sql_statistics"></a><code>crdb_internal.reset_sql_statistics() &rarr; <a href="bool.html">bool</a></code></td><td><span class="funcdesc"><p>This function can be used to clear the collected SQL-related statistics.</p>

nit: s/can be used/is used and s/SQL-related/SQL


pkg/server/sql_stats.go, line 23 at r1 (raw file):

)

func (s *statusServer) SQLStatisticsReset(

cc @yuzefovich when you added the contention API you mentioned a v2 status server, should this method be implemented there?

As mentioned before, I think it'd be nicer to s/SQLStatisticsReset/ResetSQLStats


pkg/server/sql_stats.go, line 72 at r1 (raw file):

			response.NumOfNodesResetted++
		},
		func(nodeID roachpb.NodeID, err error) {

Empty fn?


pkg/server/stats_test.go, line 264 at r1 (raw file):

func resetClusterStats(ctx context.Context, cluster serverutils.TestClusterInterface) {
	// Flush stats at the beginning of the test.

nit: remove comment, I think this belongs outside the function (as you've done)


pkg/server/stats_test.go, line 287 at r1 (raw file):

	// Flush stats at the beginning of the test.
	resetClusterStats(ctx, testCluster)

Why do the stats need to be reset at the start?


pkg/server/stats_test.go, line 300 at r1 (raw file):

	statsPreReset, err := status.Statements(ctx, &serverpb.StatementsRequest{})
	if err != nil {
		t.Fatal(err)

Consider using require (e.g. require.NoError in this case) to help with vebosity in this test.


pkg/server/tenant_status.go, line 119 at r1 (raw file):

) (*serverpb.SQLStatisticsResetResponse, error) {
	// TODO(azhng): Currently a tenant status server only reset its local SQL stats,
	// 	this needs to be updated once the pod-to-pod communication is implemented.

I think it'd be nice to add a top-level comment about this rather than just have it in this function.


pkg/server/testserver.go, line 763 at r1 (raw file):

	// Since SQLStatusServer requires a pointer to SQLServer, it is constructed after SQLServer is
	// constructed and retrospectively added to SQLServer's execCfg.

nit: I would just delete this comment.


pkg/server/serverpb/status.go, line 28 at r1 (raw file):

	ListContentionEvents(context.Context, *ListContentionEventsRequest) (*ListContentionEventsResponse, error)
	ListLocalContentionEvents(context.Context, *ListContentionEventsRequest) (*ListContentionEventsResponse, error)
	SQLStatisticsReset(ctx context.Context, request *SQLStatisticsResetRequest) (*SQLStatisticsResetResponse, error)

cc @Azhng in a follow up PR, could you add the actual stats endpoints? This should support us displaying stats in a tenant's DB Console. If it doesn't work it'd be nice to know this sooner rather than later cc @jordanlewis


pkg/sql/sem/tree/eval.go, line 3271 at r1 (raw file):

// SQLStatisticsResetter is an interface embedded in EvalCtx which can be used by
// the builtins to reset SQL stats.
type SQLStatisticsResetter interface {

nit: s/SQLStatisticsResetter/SQLStatsResetter


pkg/sql/sem/tree/eval.go, line 3272 at r1 (raw file):

// the builtins to reset SQL stats.
type SQLStatisticsResetter interface {
	SQLStatsReset(ctx context.Context) error

Nice.

s/SQLStatsReset/ResetSQLStats and comment that the interface exists to avoid a dependency

Copy link
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nit picking

docs/generated/http/full.md Outdated Show resolved Hide resolved
pkg/server/serverpb/status.proto Outdated Show resolved Hide resolved
pkg/server/tenant_status.go Outdated Show resolved Hide resolved
pkg/server/stats_test.go Outdated Show resolved Hide resolved
Copy link
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @Azhng)


docs/generated/http/full.md, line 2873 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

nit: here and elsewhere s/resetted/reset.

Also, why return the number of nodes reset? Is it used anywhere?

My thoughts here is that since resetting the stats requiring fan-out RPCs to all nodes in the cluster, it is possible for number of RPCs to fail for various reasons.
If the reset operation only partially failed, it is difficult to communicate the partial failure using a simple boolean. Hence I opted for a numeric value here.
But yes I should probably change the return value of the builtin from boolean to an int value.


pkg/server/sql_stats.go, line 72 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Empty fn?

It is empty because I think iterateNodes does not check for nil before calling errorFn. I was using statement.go as a reference, and seems like it simply ignored the error if the RPC fails. Since the result of RPC is communicate through number of nodes where reset operations were successful, I don't think additional error handling logic is required here.

Though I think the alternative is to modify the iterateNodes to check for nil before calling errorFn. I don't have particular preference for either, what's your thoughts on this?

@Azhng Azhng force-pushed the reset-stats branch 4 times, most recently from efd0ee9 to 1d86a2b Compare March 22, 2021 18:22
@Azhng Azhng changed the title sql: add crdb_internal.reset_sql_statistics() builtin sql: add crdb_internal.reset_sql_stats() builtin Mar 22, 2021
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @Azhng)


pkg/server/sql_stats.go, line 23 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

cc @yuzefovich when you added the contention API you mentioned a v2 status server, should this method be implemented there?

As mentioned before, I think it'd be nicer to s/SQLStatisticsReset/ResetSQLStats

My thinking is that for now it is ok to implement things as v1 only, but soon (in 21.2 release) we'll need to implement everything as v2, so it's up to Archer to decide when to do it.

Copy link
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asubiotto can you PTAL ?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @jordanlewis, and @maryliag)


docs/generated/http/full.md, line 2839 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I would change the name here to be more in line with the builtin. It also sounds more "active", i.e. "resetsqlstats" instead of "sqlstatisticsreset". I also think it's worth it to shorten statistics to stats here and elsewhere.

Done.


docs/generated/http/full.md, line 2873 at r1 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

reset, not resetted

Done.


pkg/server/sql_stats.go, line 23 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

My thinking is that for now it is ok to implement things as v1 only, but soon (in 21.2 release) we'll need to implement everything as v2, so it's up to Archer to decide when to do it.

Done.


pkg/server/stats_test.go, line 287 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Why do the stats need to be reset at the start?

Hmm seems like it's not needed. Removed


pkg/server/stats_test.go, line 300 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Consider using require (e.g. require.NoError in this case) to help with vebosity in this test.

Done.


pkg/server/stats_test.go, line 313 at r1 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

the previous test had but found: and this one doesn't have the :. Change one of them so they're the same

Done.


pkg/server/tenant_status.go, line 119 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I think it'd be nice to add a top-level comment about this rather than just have it in this function.

Done.


pkg/server/tenant_status.go, line 122 at r1 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

NumOfNodesReset

Done.


pkg/server/testserver.go, line 763 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

nit: I would just delete this comment.

Removed.


pkg/server/serverpb/status.go, line 28 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

cc @Azhng in a follow up PR, could you add the actual stats endpoints? This should support us displaying stats in a tenant's DB Console. If it doesn't work it'd be nice to know this sooner rather than later cc @jordanlewis

Sounds good, will send a follow up PR after this.


pkg/server/serverpb/status.proto, line 1020 at r1 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

comment and variable name (make sure to fix all places this variable is called): reset and not resetted

Done.


pkg/sql/sem/tree/eval.go, line 3272 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Nice.

s/SQLStatsReset/ResetSQLStats and comment that the interface exists to avoid a dependency

ResetSQLStats is already a function in conn_executor.go. I suppose I can rename this to ResetClusterSQLStats.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 13 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @Azhng, and @maryliag)


pkg/server/sql_stats.go, line 72 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

It is empty because I think iterateNodes does not check for nil before calling errorFn. I was using statement.go as a reference, and seems like it simply ignored the error if the RPC fails. Since the result of RPC is communicate through number of nodes where reset operations were successful, I don't think additional error handling logic is required here.

Though I think the alternative is to modify the iterateNodes to check for nil before calling errorFn. I don't have particular preference for either, what's your thoughts on this?

So why not use this mechanism instead of returning the number of nodes in the response? I think it's not good usability to have the NumNodesReset field because you're just pushing the error handling logic one layer up. The caller shouldn't have to care or know the details about how this RPC is executed. I think what you should do is remove that field and instead combine errors in this function and return an error as normal.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 13 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @Azhng, and @maryliag)


pkg/server/tenant_status.go, line 119 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Done.

Sorry, I meant a top-level file comment. i.e. all functions need to be updated once we have pod-to-pod communication. Even better, we probably would remove this tenant status server entirely and use the normal status server once we remove any dependencies unavailable to tenants.

@Azhng
Copy link
Contributor Author

Azhng commented Mar 23, 2021

Appears to be a flake.

Copy link
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @maryliag)


pkg/server/sql_stats.go, line 72 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

So why not use this mechanism instead of returning the number of nodes in the response? I think it's not good usability to have the NumNodesReset field because you're just pushing the error handling logic one layer up. The caller shouldn't have to care or know the details about how this RPC is executed. I think what you should do is remove that field and instead combine errors in this function and return an error as normal.

Changed the response to have an array of Error protobuf objects.


pkg/server/tenant_status.go, line 119 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Sorry, I meant a top-level file comment. i.e. all functions need to be updated once we have pod-to-pod communication. Even better, we probably would remove this tenant status server entirely and use the normal status server once we remove any dependencies unavailable to tenants.

Done

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @Azhng, and @maryliag)


pkg/server/sql_stats.go, line 72 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Changed the response to have an array of Error protobuf objects.

What I meant is use errors.CombineErrors and return the result as an error here (i.e. nil, err) this will return the error as a GRPC error

Copy link
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @maryliag)


pkg/server/sql_stats.go, line 72 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

What I meant is use errors.CombineErrors and return the result as an error here (i.e. nil, err) this will return the error as a GRPC error

Done.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 15 files at r1, 1 of 13 files at r2, 10 of 10 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @maryliag)

@Azhng
Copy link
Contributor Author

Azhng commented Mar 24, 2021

TFTR!

bors r=asubiotto

@craig
Copy link
Contributor

craig bot commented Mar 24, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 24, 2021

Merge conflict.

Previously, there was no mechanism to immediately clear SQL statistics. Users
would have to wait until the reset interval expires. This commit creates a
builtin to immediately clears SQL stats.

Release note (sql change): SQL stats can now be cleared using
crdb_internal.reset_sql_stats()
@Azhng
Copy link
Contributor Author

Azhng commented Mar 24, 2021

bors r=asubiotto

@craig craig bot merged commit 8aace6d into cockroachdb:master Mar 24, 2021
@craig
Copy link
Contributor

craig bot commented Mar 24, 2021

Build succeeded:

@rafiss
Copy link
Collaborator

rafiss commented Mar 25, 2021

Just passing by and this looks great! Nice work.

I noticed that the PR description says Addresses #33315 -- just as a heads up, that will not automatically close the issue. If you use Closes #33315 or Fixes #33315, then it would. See https://docs.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword

@Azhng
Copy link
Contributor Author

Azhng commented Mar 25, 2021

Hi @rafiss , thanks for the heads up. Though this PR only partially addressed the issue by adding a builtin and an RPC endpoint, we are still working on adding the frontend components that will allow uses to reset SQL stats from db console.

@rafiss
Copy link
Collaborator

rafiss commented Mar 25, 2021

Gotcha! I was wondering if it was on purpose but just wanted to point it out in case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: sqlsmith: crdb_internal.reset_sql_stats(): sql stats resetter not set
6 participants