Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql, server: expose contention info via vtable and StatusServer #60713

Merged
merged 1 commit into from
Feb 25, 2021

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Feb 18, 2021

This commit adds two new endpoints to the status server
contention_events and local_contention_events (as well as
corresponding methods) for using internally that expose the contention
information that comes from contention.Registry. These endpoints
require the user to either be an admin or have VIEWACTIVITY grant.

Additionally, this commit introduces new
crdb_internal.{cluster,node}_contention_events virtual tables that
provide access to the contention info via SQL.

Fixes: #57114.

Release justification: low risk update to new functionality.

Release note (sql change): New virtual tables
crdb_internal.cluster_contention_events and
crdb_internal.node_contention_events are introduced which expose
cluster-wide and gateway-only, respectively, contention information. In
order to access it the user needs to be either an admin or have
VIEWACTIVITY grant.

@yuzefovich yuzefovich added the do-not-merge bors won't merge a PR with this label. label Feb 18, 2021
@yuzefovich yuzefovich requested review from a team as code owners February 18, 2021 01:11
@yuzefovich yuzefovich removed request for a team February 18, 2021 01:11
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich force-pushed the contention-more branch 2 times, most recently from c679ef1 to 1244fa8 Compare February 23, 2021 04:51
@yuzefovich yuzefovich changed the title WIP more contention work sql, server: expose contention info via vtable and StatusServer Feb 23, 2021
@yuzefovich yuzefovich requested review from asubiotto and a team February 23, 2021 04:52
@yuzefovich
Copy link
Member Author

First two commits are in separate PR, but otherwise I think this is RFAL.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so exciting!! @tbg @awoods187

Reviewed 6 of 6 files at r1, 7 of 7 files at r2, 29 of 29 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/server/status.go, line 307 at r3 (raw file):

	ctx = b.AnnotateCtx(ctx)

	sessionUser, isAdmin, err := b.privilegeChecker.getUserAndRole(ctx)

not your code, but I'm surprised that this VIEWACTIVITY check is not extracted since I think a lot of these methods probably care about it and it's pretty repetitive. Do you mind extracting this privilege check (and the error return)?.


pkg/server/status.go, line 1852 at r3 (raw file):

}

// ListLocalContentionEvents returns a list of contention events on this node.

nit: add ordering information since this is also included in the docs.


pkg/server/status.go, line 2062 at r3 (raw file):

}

func (s *statusServer) listContentionEventsHelper(

Is this used anywhere other than ListContentionEvents? Since there's not much in there, I think it's fine to put the code in ListContentionEvents and remove this separate method.


pkg/server/status.go, line 2079 at r3 (raw file):

			return resp.Events, nil
		}
		return nil, err

nit: put error case inside if.


pkg/server/status.go, line 2193 at r3 (raw file):

	ctx = s.AnnotateCtx(ctx)

	if _, _, err := s.privilegeChecker.getUserAndRole(ctx); err != nil {

Does this call do anything?


pkg/sql/crdb_internal.go, line 1690 at r3 (raw file):

const contentionEventsSchemaPattern = `
CREATE TABLE crdb_internal.%s (

We should probably inform the caller of the eviction policy in the virtual table comment field.


pkg/sql/logictest/testdata/logic_test/crdb_internal, line 215 at r3 (raw file):


query IIITTTI colnames
SELECT * FROM crdb_internal.node_contention_events WHERE table_id < 0

I think there are also some logic tests that @tbg added to produce contention events. Would probably be good to check that those contention events make it into the virtual table.

@tbg
Copy link
Member

tbg commented Feb 23, 2021

Wohoo! Can we have a sample printout of that vtable pretty please? :-)

@knz
Copy link
Contributor

knz commented Feb 23, 2021

As discussed elsewhere, the implementation approach taken here is amplifying the problem identified in #60584 and so we should try to find a different approach instead.

@yuzefovich yuzefovich force-pushed the contention-more branch 2 times, most recently from c382b55 to 0d46308 Compare February 24, 2021 05:33
@yuzefovich yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Feb 24, 2021
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tbg, here is an example (I ran a single node cluster and followed the queries of contention_event logic test file):

root@:26257/defaultdb> select * from crdb_internal.cluster_contention_events;
  table_id | index_id | num_contention_events | cumulative_contention_time |                      key                      |                txn_id                | count
-----------+----------+-----------------------+----------------------------+-----------------------------------------------+--------------------------------------+--------
        64 |        1 |                     1 | 00:00:00.083261            | \310\211\375\010\323R\200\013\212\000\001\210 | 7e9f5456-ed01-4525-97d1-a72dfd06a339 |     1
(1 row)

Time: 32ms total (execution 32ms / network 0ms)

root@:26257/defaultdb> select * from crdb_internal.node_contention_events;
  table_id | index_id | num_contention_events | cumulative_contention_time |                      key                      |                txn_id                | count
-----------+----------+-----------------------+----------------------------+-----------------------------------------------+--------------------------------------+--------
        64 |        1 |                     1 | 00:00:00.083261            | \310\211\375\010\323R\200\013\212\000\001\210 | 7e9f5456-ed01-4525-97d1-a72dfd06a339 |     1
(1 row)

Time: 0ms total (execution 0ms / network 0ms)

As discussed elsewhere, the implementation approach taken here is amplifying the problem identified in #60584 and so we should try to find a different approach instead.

Ack that the current approach is broken, but it will have to be fixed at the same time as existing violations are.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @itsbilal, and @tbg)


pkg/server/status.go, line 307 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

not your code, but I'm surprised that this VIEWACTIVITY check is not extracted since I think a lot of these methods probably care about it and it's pretty repetitive. Do you mind extracting this privilege check (and the error return)?.

Actually, VIEWACTIVITY is used only for SESSIONS, and there we have special handling that allows any user to see its own sessions (even if that other is not admin nor has VIEWACTIVITY permissions), so I won't be touching any of the old code.


pkg/server/status.go, line 1852 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

nit: add ordering information since this is also included in the docs.

Good point, done.


pkg/server/status.go, line 2062 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Is this used anywhere other than ListContentionEvents? Since there's not much in there, I think it's fine to put the code in ListContentionEvents and remove this separate method.

Refactored. I originally implemented api V2 (which was using the helper) but decided to get rid of it.


pkg/server/status.go, line 2193 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Does this call do anything?

Hm, good question. I copy-pasted this from ListSessions, but it appears that it doesn't do much other than checking whether a user is specified in the context.

I'm guessing maybe the reasoning is that we will issue ListLocalContentionEvents RPC for each node, and each node locally performs the permissions check, so if the user doesn't have permissions, we should be getting errors from all of the nodes. However, I think we could do that check upfront and fail quickly if not enough permissions. cc @itsbilal to see whether my reasoning is correct.


pkg/sql/crdb_internal.go, line 1690 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

We should probably inform the caller of the eviction policy in the virtual table comment field.

Done.


pkg/sql/logictest/testdata/logic_test/crdb_internal, line 215 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I think there are also some logic tests that @tbg added to produce contention events. Would probably be good to check that those contention events make it into the virtual table.

Done.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

As discussed elsewhere, the implementation approach taken here is amplifying the problem identified in #60584 and so we should try to find a different approach instead.

Ack on my end as well. I agree there is a problem, but this change does not break anything new, so it only slightly increases the existing technical debt by introducing one more API call to be moved. Since we're so close to release, I think the extremely high benefit of this change (allowing users to observe contention) justifies merging this.

Reviewed 11 of 11 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto, @tbg, and @yuzefovich)


pkg/server/status.go, line 1868 at r4 (raw file):

// On the middle level, all SingleKeyContention objects are ordered by their
// keys lexicographically.
// On the lowest level, all SingleTxnContention objects are ordered by the

nit: maybe it's just me, but I feel like this is a little hard to understand. Maybe reword to something likeordered by the number of times that transaction was observed to contend with other transactions. Up to you.

@yuzefovich yuzefovich force-pushed the contention-more branch 2 times, most recently from 7a6850d to e853f08 Compare February 24, 2021 19:29
This commit adds two new endpoints to the status server
`contention_events` and `local_contention_events` (as well as
corresponding methods) for using internally that expose the contention
information that comes from `contention.Registry`. These endpoints
require the user to either be an admin or have `VIEWACTIVITY` grant.

Additionally, this commit introduces new
`crdb_internal.{cluster,node}_contention_events` virtual tables that
provide access to the contention info via SQL.

Release justification: low risk update to new functionality.

Release note (sql change): New virtual tables
`crdb_internal.cluster_contention_events` and
`crdb_internal.node_contention_events` are introduced which expose
cluster-wide and gateway-only, respectively, contention information. In
order to access it the user needs to be either an admin or have
`VIEWACTIVITY` grant.
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @tbg)

@craig
Copy link
Contributor

craig bot commented Feb 25, 2021

Build failed:

@yuzefovich
Copy link
Member Author

Unrelated flake.

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 25, 2021

Build failed:

@yuzefovich
Copy link
Member Author

Flaked on #61091. Let's try one more time.

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 25, 2021

Build succeeded:

@craig craig bot merged commit c7e0888 into cockroachdb:master Feb 25, 2021
@yuzefovich yuzefovich deleted the contention-more branch February 25, 2021 06:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: implement contention registry and expose through StatusServer and virtual table
5 participants