Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql,server: change indexusagestats subsystem to issue cluster RPC fanout #66639

Merged
merged 1 commit into from
Jul 26, 2021

Conversation

Azhng
Copy link
Contributor

@Azhng Azhng commented Jun 18, 2021

sql: introduce crdb_internal.index_usage_stats virtual table

This commit introduce crdb_internal.index_usage_stats virtual
table that is backed by new clusterindexusagestats package. This
new package implements a variant of the indexusagestats interface
and serves the data by issuing cluster RPC fanout.

Addresses #64740

Followup to #66451

Release note (sql change): introduce crdb_internal.index_usage_statistics
virtual table to surface index usage statistics.
sql.metrics.index_usage_stats.enabled cluster setting can be used to
turn on/off the subsystem. It is default to true.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@Azhng Azhng force-pushed the index-usage-stats-cluster branch 2 times, most recently from 4061552 to e5445a2 Compare June 19, 2021 22:33
@Azhng Azhng force-pushed the index-usage-stats-cluster branch 3 times, most recently from d55bae2 to 1dafb7e Compare June 29, 2021 16:17
@Azhng Azhng requested review from a team and mgartner June 30, 2021 18:39
@Azhng Azhng marked this pull request as ready for review June 30, 2021 18:39
@Azhng Azhng requested a review from a team as a code owner June 30, 2021 18:39
@Azhng Azhng removed the request for review from a team June 30, 2021 18:39
@Azhng Azhng force-pushed the index-usage-stats-cluster branch 3 times, most recently from 556e47b to 2623363 Compare July 8, 2021 02:11
@blathers-crl
Copy link

blathers-crl bot commented Jul 8, 2021

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added the O-community Originated from the community label Jul 8, 2021
@Azhng Azhng removed the O-community Originated from the community label Jul 8, 2021
@Azhng Azhng force-pushed the index-usage-stats-cluster branch from 2623363 to 47b69e0 Compare July 8, 2021 03:35
@Azhng Azhng force-pushed the index-usage-stats-cluster branch 3 times, most recently from 14580f3 to c94a165 Compare July 13, 2021 16:59
@Azhng Azhng force-pushed the index-usage-stats-cluster branch from c94a165 to d1e6011 Compare July 14, 2021 16:26
craig bot pushed a commit that referenced this pull request Jul 14, 2021
66451: sql: introduce index usage stats subsystem r=mgartner a=Azhng

First commit:

**roachpb: add protobuf definition for index usage statistics**

Release note: None

Second commit:

**sql: introduce index usage stats subsystem**

This commit introduce a new index usage stats subsystem living inside
the sql package. This subsystem is responsible for asynchronously
collecting index usage statistics from the planner and its is created
within the sql.Server.

Release note: None

Related PR:
* 1/3: Current PR
* 2/3: #66639
* 3/3: #66640

67426: builtin: add {de,}compress builtins r=pbardea a=pbardea

```
[email protected]:26257/movr> select compress('hello there', 'gzip');
                                                        compress
------------------------------------------------------------------------------------------------------------------------
  \037\213\010\000\000\000\000\000\000\377\312H\315\311\311W(\311H-J\005\004\000\000\377\377P\351\022m\013\000\000\000
(1 row)

[email protected]:26257/movr> select decompress(compress('hello there', 'gzip'), 'gzip');
  decompress
---------------
  hello there
(1 row)

[email protected]:26257/movr> select length(repeat('hi there', 100)::bytes);
  length
----------
     800
(1 row)

[email protected]:26257/movr> select length(compress(repeat('hi there', 100)::bytes, 'gzip'));
  length
----------
      39
(1 row)

[email protected]:26257/movr> select decompress('not compressed', 'gzip');
ERROR: decompress(): failed to decompress: gzip: invalid header
[email protected]:26257/movr> select decompress('bad codec', 'zip');
ERROR: decompress(): only 'gzip' codec is supported for decompress()
SQLSTATE: 22023
```

This commits adds builtins to compress and decompress bytes. The only
codec currently supported is `gzip`.

Release note (sql change): New builtins, compress(data, codec) and
decompress(data, codec), are added which can compress and decompress
bytes with the specified codec.  Gzip is the only currently supported
codec.


67540: dev: infer and pass in an appropriate `dev` config to bazel invocations r=rail a=rickystewart

Passing `--config=dev` for `dev` builds is a good choice, and
`--config=devdarwinx86_64` is required for most dev builds on macOS
machines. Automatically pass that flag unless the user requests not to.

Release note: None

67584: parser: move scan.go to new scanner package  r=knz,otan a=rafiss

Fixes #64710

See individual commits.

This extracts an interface for sqlSymType so that scan.go can keep using it.

This allows scan.go to live in its own package so that other components can
use its functionality without needing to pull in a large dependency
on sql/sem/tree.

Release note: None

67614: backfill: fix backfills for indexes on non-null virtual columns r=mgartner a=mgartner

Refactoring in #65514 broke backfills for indexes that include non-null
virtual columns. This specific case was not tested so it was not caught
earlier. This commit fixes the issue and adds a test.

Fixes #67528

There is no release note because this bug was not included in any prior
releases.

Release note: None

Co-authored-by: Azhng <[email protected]>
Co-authored-by: Paul Bardea <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
@Azhng Azhng force-pushed the index-usage-stats-cluster branch from d1e6011 to 5692451 Compare July 14, 2021 22:17
Copy link
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your "follow up"on the PR description is pointing to itself :)

Reviewed 2 of 21 files at r3, 31 of 31 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @mgartner)


pkg/server/index_usage_stats_test.go, line 215 at r4 (raw file):

	// When being stressed, there's possibility that it might takes a while for
	// the stats payload to be ingested.
	// time.Sleep(3 * time.Second)

nit: remove unnecessary comments


pkg/server/index_usage_stats_test.go, line 218 at r4 (raw file):

	// Check local node stats.
	// Fetch stats reader from each individual

nit: period


pkg/sql/idxusage/cluster_idx_usage_stats.go, line 18 at r4 (raw file):

)

// ClusterIndexUsageStats implements the Reader interface and is able to returns

nit: return

@Azhng Azhng force-pushed the index-usage-stats-cluster branch from 5692451 to 9c627f0 Compare July 16, 2021 20:36
@Azhng Azhng force-pushed the index-usage-stats-cluster branch from 9c627f0 to 897bc93 Compare July 19, 2021 18:29
Copy link
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @maryliag and @mgartner)

@Azhng
Copy link
Contributor Author

Azhng commented Jul 19, 2021

Friendly ping @mgartner

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 21 files at r3, 9 of 31 files at r4, 4 of 9 files at r5, 4 of 5 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @maryliag)


pkg/server/index_usage_stats.go, line 55 at r6 (raw file):

			statsReader :=
				s.sqlServer.pgServer.SQLServer.GetLocalIndexStatisticsReader()
			return serializeIndexUsageStats(req, statsReader)

Should req be localReq here?


pkg/server/index_usage_stats.go, line 58 at r6 (raw file):

		}

		statusServer, err := s.dialNode(ctx, requestedNodeID)

nit: maybe name statusServer to statusClient or client to make it clear its not of type statusServer and that it's a client to a remote status server


pkg/server/index_usage_stats.go, line 63 at r6 (raw file):

		}

		return statusServer.IndexUsageStatistics(ctx, localReq)

Should localReq be req here?


pkg/server/index_usage_stats.go, line 87 at r6 (raw file):

	var combinedError error
	errFn := func(_ roachpb.NodeID, nodeFnError error) {
		if nodeFnError != nil {

nit: iterateNodes should not call this function with a nil error, so you can remove this check.


pkg/server/index_usage_stats.go, line 115 at r6 (raw file):

	}

	err := reader.ForEach(idxusage.IteratorOptions{

Creating a reader interface and implementing a local and cluster implementation of the interface only for the purpose of using either here seems over-engineered for what needs to be accomplished: aggregating index usage stats response from multiple nodes. Also, the cluster implementation just uses a local implementation which has all the complexity and overhead of dealing with concurrent reads and write. Would it be simpler to remove the interface, implement a single struct for local index usage stats, and perform aggregation of the results here?


pkg/server/index_usage_stats.go, line 118 at r6 (raw file):

		SortedTableID: req.OrderedTableID,
		SortedIndexID: req.OrderedIndexID,
		Max:           max,

It'd be nice to not fetch more than max usage stats from remote nodes in the first place, if possible. Or at least to fetch only up to max from each node because we'll never need more than that. If that's a lot more effort, maybe leave a TODO somewhere?


pkg/server/serverpb/status.proto, line 1126 at r6 (raw file):

  // by the RPC. If this field is left unset, then all stats data will be
  // returned at once.
  oneof max {

Did you try using optional here instead of oneof?


pkg/sql/idxusage/cluster_idx_usage_stats.go, line 26 at r6 (raw file):

	// usage stats, therefore, we do not call LocalIndexUsageStats.Start() method.
	storage      *LocalIndexUsageStats
	statusServer serverpb.SQLStatusServer

nit: this field is never used

@Azhng Azhng force-pushed the index-usage-stats-cluster branch 2 times, most recently from b6597e8 to 3437ee3 Compare July 21, 2021 20:34
Copy link
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @maryliag, and @mgartner)


pkg/server/index_usage_stats.go, line 87 at r6 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: iterateNodes should not call this function with a nil error, so you can remove this check.

done.


pkg/server/index_usage_stats.go, line 115 at r6 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Creating a reader interface and implementing a local and cluster implementation of the interface only for the purpose of using either here seems over-engineered for what needs to be accomplished: aggregating index usage stats response from multiple nodes. Also, the cluster implementation just uses a local implementation which has all the complexity and overhead of dealing with concurrent reads and write. Would it be simpler to remove the interface, implement a single struct for local index usage stats, and perform aggregation of the results here?

I agree this is a bit awkward. I think the root cause here is that this RPC endpoint is consumed by two different clients:

  1. The code that populates crdb_internal virtual table. This client needs an aggregated reader so it can populate the virtual table.
  2. The DB Console. This client needs the serialized data returning from the RPC endpoint. Preferably aggregated. If not, the DB Console will aggregated in the frontend.

The max param was introduced here to distinguish between those two type of use case. But this does introduce quite a bit of awkwardness.

I think a better approach would be: this endpoint does not perform aggregation at all. It returns all the statistics directly to the caller without aggregation. Then the code responsible for crdb_internal virtual table would be the one performing aggregation.

Then, the DB Console will be calling to a different endpoint, (possibly the v2 status API that's being built right now) and that API will be responsible to fetching data from the crdb_internal virtual table.

I updated the code to reflect this.


pkg/server/index_usage_stats.go, line 118 at r6 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

It'd be nice to not fetch more than max usage stats from remote nodes in the first place, if possible. Or at least to fetch only up to max from each node because we'll never need more than that. If that's a lot more effort, maybe leave a TODO somewhere?

Removed max from the request protobuf now since this API will be primary used for populating the crdb_internal virtual table as mentioned in above comment. There's not quite necessary to limit number of rows returning from this endpoint.


pkg/server/serverpb/status.proto, line 1126 at r6 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Did you try using optional here instead of oneof?

Removed max param here. See the comment above.


pkg/sql/idxusage/cluster_idx_usage_stats.go, line 26 at r6 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: this field is never used

Done.

Azhng added a commit to Azhng/cockroach that referenced this pull request Jul 21, 2021
This commit introduce crdb_internal.index_usage_stats virtual
table that is backed by new clusterindexusagestats package. This
new package implements a variant of the indexusagestats interface
and serves the data by issuing cluster RPC fanout.

Addresses cockroachdb#64740

Followup to cockroachdb#66639

Release note (sql change): introduce crdb_internal.index_usage_statistics
virtual table to surface index usage statistics.
sql.metrics.index_usage_stats.enabled cluster setting can be used to
turn on/off the subsystem. It is default to true.
Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 15 of 16 files at r7.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @maryliag, and @mgartner)


pkg/server/index_usage_stats.go, line 63 at r6 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Should localReq be req here?

Looking at this again, and I think localReq is correct. This is basically telling another node "give me your local stats", is that correct? Might be worth a comment for clarity.


pkg/server/index_usage_stats.go, line 115 at r6 (raw file):

Previously, Azhng (Archer Zhang) wrote…

I agree this is a bit awkward. I think the root cause here is that this RPC endpoint is consumed by two different clients:

  1. The code that populates crdb_internal virtual table. This client needs an aggregated reader so it can populate the virtual table.
  2. The DB Console. This client needs the serialized data returning from the RPC endpoint. Preferably aggregated. If not, the DB Console will aggregated in the frontend.

The max param was introduced here to distinguish between those two type of use case. But this does introduce quite a bit of awkwardness.

I think a better approach would be: this endpoint does not perform aggregation at all. It returns all the statistics directly to the caller without aggregation. Then the code responsible for crdb_internal virtual table would be the one performing aggregation.

Then, the DB Console will be calling to a different endpoint, (possibly the v2 status API that's being built right now) and that API will be responsible to fetching data from the crdb_internal virtual table.

I updated the code to reflect this.

Sounds simpler to me. 👍


pkg/server/index_usage_stats_test.go, line 195 at r7 (raw file):

	}()

	// Records a non-full scan.

nit add: "over t_a_idx".


pkg/server/index_usage_stats_test.go, line 199 at r7 (raw file):

	require.NoError(t, err)

	// Records an zigzag join.

nit add: "that scans both t_a_idx and t_b_idx"


pkg/server/index_usage_stats_test.go, line 203 at r7 (raw file):

	require.NoError(t, err)

	// Record a full scan.

nit add: "over t_b_idx"


pkg/server/index_usage_stats_test.go, line 213 at r7 (raw file):

	// When being stressed, there's possibility that it might takes a while for
	// the stats payload to be ingested.
	// time.Sleep(3 * time.Second)

Is this leftover?


pkg/server/index_usage_stats_test.go, line 260 at r7 (raw file):

		statsEntries++
		switch stats.Key.IndexID {
		case 2: // t@t_a_idx

nit: could this instead be case indexKeyA.IndexID?


pkg/sql/conn_executor.go, line 272 at r7 (raw file):

	pool *mon.BytesMonitor

	// indexUsageStats tracks the index usage statistics.

nit: Make it clear in this comment that it only tracks usage for the gateway node of a query, not the entire cluster.

@Azhng Azhng force-pushed the index-usage-stats-cluster branch from 3437ee3 to 0a95d40 Compare July 22, 2021 17:37
Copy link
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @maryliag and @mgartner)


pkg/server/index_usage_stats.go, line 63 at r6 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Looking at this again, and I think localReq is correct. This is basically telling another node "give me your local stats", is that correct? Might be worth a comment for clarity.

Yes. I'll add a comment.


pkg/server/index_usage_stats.go, line 115 at r6 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Sounds simpler to me. 👍

Done.


pkg/server/index_usage_stats_test.go, line 213 at r7 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Is this leftover?

Ah yes. I cleaned this up earlier but probably messed up the rebase and it sneaked back. 🤦

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 16 files at r7, 3 of 3 files at r8.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @maryliag and @mgartner)


pkg/server/index_usage_stats.go, line 63 at r6 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Yes. I'll add a comment.

Thanks!

@Azhng
Copy link
Contributor Author

Azhng commented Jul 26, 2021

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 26, 2021

Build succeeded:

@craig craig bot merged commit 82b32cf into cockroachdb:master Jul 26, 2021
Azhng added a commit to Azhng/cockroach that referenced this pull request Aug 3, 2021
This commit introduce crdb_internal.index_usage_stats virtual
table that is backed by new clusterindexusagestats package. This
new package implements a variant of the indexusagestats interface
and serves the data by issuing cluster RPC fanout.

Addresses cockroachdb#64740

Followup to cockroachdb#66639

Release note (sql change): introduce crdb_internal.index_usage_statistics
virtual table to surface index usage statistics.
sql.metrics.index_usage_stats.enabled cluster setting can be used to
turn on/off the subsystem. It is default to true.
Azhng added a commit to Azhng/cockroach that referenced this pull request Aug 4, 2021
This commit introduce crdb_internal.index_usage_stats virtual
table that is backed by new clusterindexusagestats package. This
new package implements a variant of the indexusagestats interface
and serves the data by issuing cluster RPC fanout.

Addresses cockroachdb#64740

Followup to cockroachdb#66639

Release note (sql change): introduce crdb_internal.index_usage_statistics
virtual table to surface index usage statistics.
sql.metrics.index_usage_stats.enabled cluster setting can be used to
turn on/off the subsystem. It is default to true.
Azhng added a commit to Azhng/cockroach that referenced this pull request Aug 4, 2021
This commit introduce crdb_internal.index_usage_stats virtual
table that is backed by new clusterindexusagestats package. This
new package implements a variant of the indexusagestats interface
and serves the data by issuing cluster RPC fanout.

Addresses cockroachdb#64740

Followup to cockroachdb#66639

Release note (sql change): introduce crdb_internal.index_usage_statistics
virtual table to surface index usage statistics.
sql.metrics.index_usage_stats.enabled cluster setting can be used to
turn on/off the subsystem. It is default to true.
Azhng added a commit to Azhng/cockroach that referenced this pull request Aug 5, 2021
This commit introduce crdb_internal.index_usage_stats virtual
table that is backed by new clusterindexusagestats package. This
new package implements a variant of the indexusagestats interface
and serves the data by issuing cluster RPC fanout.

Addresses cockroachdb#64740

Followup to cockroachdb#66639

Release note (sql change): introduce crdb_internal.index_usage_statistics
virtual table to surface index usage statistics.
sql.metrics.index_usage_stats.enabled cluster setting can be used to
turn on/off the subsystem. It is default to true.
craig bot pushed a commit that referenced this pull request Aug 5, 2021
66640: sql: introduce crdb_internal.index_usage_stats virtual table r=Azhng a=Azhng

This commit introduce crdb_internal.index_usage_stats virtual
table that is backed by new clusterindexusagestats package. This
new package implements a variant of the indexusagestats interface
and serves the data by issuing cluster RPC fanout.

Release note (sql change): introduce crdb_internal.index_usage_statistics
virtual table to surface index usage statistics.
sql.metrics.index_usage_stats.enabled cluster setting can be used to
turn on/off the subsystem. It is default to true.
sql.metrics.index_usage_stats.reset_interval can change the reset
interval of the collected statistics. It is default to 1 hour.

Addresses #64740

Followup to #66639

Co-authored-by: Azhng <[email protected]>
sajjadrizvi pushed a commit to sajjadrizvi/cockroach that referenced this pull request Aug 10, 2021
This commit introduce crdb_internal.index_usage_stats virtual
table that is backed by new clusterindexusagestats package. This
new package implements a variant of the indexusagestats interface
and serves the data by issuing cluster RPC fanout.

Addresses cockroachdb#64740

Followup to cockroachdb#66639

Release note (sql change): introduce crdb_internal.index_usage_statistics
virtual table to surface index usage statistics.
sql.metrics.index_usage_stats.enabled cluster setting can be used to
turn on/off the subsystem. It is default to true.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants