-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql, server: expose contention info via vtable and StatusServer #60713
Conversation
c679ef1
to
1244fa8
Compare
First two commits are in separate PR, but otherwise I think this is RFAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so exciting!! @tbg @awoods187
Reviewed 6 of 6 files at r1, 7 of 7 files at r2, 29 of 29 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/server/status.go, line 307 at r3 (raw file):
ctx = b.AnnotateCtx(ctx) sessionUser, isAdmin, err := b.privilegeChecker.getUserAndRole(ctx)
not your code, but I'm surprised that this VIEWACTIVITY
check is not extracted since I think a lot of these methods probably care about it and it's pretty repetitive. Do you mind extracting this privilege check (and the error return)?.
pkg/server/status.go, line 1852 at r3 (raw file):
} // ListLocalContentionEvents returns a list of contention events on this node.
nit: add ordering information since this is also included in the docs.
pkg/server/status.go, line 2062 at r3 (raw file):
} func (s *statusServer) listContentionEventsHelper(
Is this used anywhere other than ListContentionEvents
? Since there's not much in there, I think it's fine to put the code in ListContentionEvents
and remove this separate method.
pkg/server/status.go, line 2079 at r3 (raw file):
return resp.Events, nil } return nil, err
nit: put error case inside if.
pkg/server/status.go, line 2193 at r3 (raw file):
ctx = s.AnnotateCtx(ctx) if _, _, err := s.privilegeChecker.getUserAndRole(ctx); err != nil {
Does this call do anything?
pkg/sql/crdb_internal.go, line 1690 at r3 (raw file):
const contentionEventsSchemaPattern = ` CREATE TABLE crdb_internal.%s (
We should probably inform the caller of the eviction policy in the virtual table comment field.
pkg/sql/logictest/testdata/logic_test/crdb_internal, line 215 at r3 (raw file):
query IIITTTI colnames SELECT * FROM crdb_internal.node_contention_events WHERE table_id < 0
I think there are also some logic tests that @tbg added to produce contention events. Would probably be good to check that those contention events make it into the virtual table.
Wohoo! Can we have a sample printout of that vtable pretty please? :-) |
As discussed elsewhere, the implementation approach taken here is amplifying the problem identified in #60584 and so we should try to find a different approach instead. |
c382b55
to
0d46308
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tbg, here is an example (I ran a single node cluster and followed the queries of contention_event
logic test file):
root@:26257/defaultdb> select * from crdb_internal.cluster_contention_events;
table_id | index_id | num_contention_events | cumulative_contention_time | key | txn_id | count
-----------+----------+-----------------------+----------------------------+-----------------------------------------------+--------------------------------------+--------
64 | 1 | 1 | 00:00:00.083261 | \310\211\375\010\323R\200\013\212\000\001\210 | 7e9f5456-ed01-4525-97d1-a72dfd06a339 | 1
(1 row)
Time: 32ms total (execution 32ms / network 0ms)
root@:26257/defaultdb> select * from crdb_internal.node_contention_events;
table_id | index_id | num_contention_events | cumulative_contention_time | key | txn_id | count
-----------+----------+-----------------------+----------------------------+-----------------------------------------------+--------------------------------------+--------
64 | 1 | 1 | 00:00:00.083261 | \310\211\375\010\323R\200\013\212\000\001\210 | 7e9f5456-ed01-4525-97d1-a72dfd06a339 | 1
(1 row)
Time: 0ms total (execution 0ms / network 0ms)
As discussed elsewhere, the implementation approach taken here is amplifying the problem identified in #60584 and so we should try to find a different approach instead.
Ack that the current approach is broken, but it will have to be fixed at the same time as existing violations are.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @itsbilal, and @tbg)
pkg/server/status.go, line 307 at r3 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
not your code, but I'm surprised that this
VIEWACTIVITY
check is not extracted since I think a lot of these methods probably care about it and it's pretty repetitive. Do you mind extracting this privilege check (and the error return)?.
Actually, VIEWACTIVITY
is used only for SESSIONS
, and there we have special handling that allows any user to see its own sessions (even if that other is not admin nor has VIEWACTIVITY permissions), so I won't be touching any of the old code.
pkg/server/status.go, line 1852 at r3 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
nit: add ordering information since this is also included in the docs.
Good point, done.
pkg/server/status.go, line 2062 at r3 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Is this used anywhere other than
ListContentionEvents
? Since there's not much in there, I think it's fine to put the code inListContentionEvents
and remove this separate method.
Refactored. I originally implemented api V2 (which was using the helper) but decided to get rid of it.
pkg/server/status.go, line 2193 at r3 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Does this call do anything?
Hm, good question. I copy-pasted this from ListSessions
, but it appears that it doesn't do much other than checking whether a user is specified in the context.
I'm guessing maybe the reasoning is that we will issue ListLocalContentionEvents
RPC for each node, and each node locally performs the permissions check, so if the user doesn't have permissions, we should be getting errors from all of the nodes. However, I think we could do that check upfront and fail quickly if not enough permissions. cc @itsbilal to see whether my reasoning is correct.
pkg/sql/crdb_internal.go, line 1690 at r3 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
We should probably inform the caller of the eviction policy in the virtual table comment field.
Done.
pkg/sql/logictest/testdata/logic_test/crdb_internal, line 215 at r3 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I think there are also some logic tests that @tbg added to produce contention events. Would probably be good to check that those contention events make it into the virtual table.
Done.
0d46308
to
305baef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed elsewhere, the implementation approach taken here is amplifying the problem identified in #60584 and so we should try to find a different approach instead.
Ack on my end as well. I agree there is a problem, but this change does not break anything new, so it only slightly increases the existing technical debt by introducing one more API call to be moved. Since we're so close to release, I think the extremely high benefit of this change (allowing users to observe contention) justifies merging this.
Reviewed 11 of 11 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto, @tbg, and @yuzefovich)
pkg/server/status.go, line 1868 at r4 (raw file):
// On the middle level, all SingleKeyContention objects are ordered by their // keys lexicographically. // On the lowest level, all SingleTxnContention objects are ordered by the
nit: maybe it's just me, but I feel like this is a little hard to understand. Maybe reword to something likeordered by the number of times that transaction was observed to contend with other transactions
. Up to you.
7a6850d
to
e853f08
Compare
This commit adds two new endpoints to the status server `contention_events` and `local_contention_events` (as well as corresponding methods) for using internally that expose the contention information that comes from `contention.Registry`. These endpoints require the user to either be an admin or have `VIEWACTIVITY` grant. Additionally, this commit introduces new `crdb_internal.{cluster,node}_contention_events` virtual tables that provide access to the contention info via SQL. Release justification: low risk update to new functionality. Release note (sql change): New virtual tables `crdb_internal.cluster_contention_events` and `crdb_internal.node_contention_events` are introduced which expose cluster-wide and gateway-only, respectively, contention information. In order to access it the user needs to be either an admin or have `VIEWACTIVITY` grant.
e853f08
to
df8cd50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @tbg)
Build failed: |
Unrelated flake. bors r+ |
Build failed: |
Flaked on #61091. Let's try one more time. bors r+ |
Build succeeded: |
This commit adds two new endpoints to the status server
contention_events
andlocal_contention_events
(as well ascorresponding methods) for using internally that expose the contention
information that comes from
contention.Registry
. These endpointsrequire the user to either be an admin or have
VIEWACTIVITY
grant.Additionally, this commit introduces new
crdb_internal.{cluster,node}_contention_events
virtual tables thatprovide access to the contention info via SQL.
Fixes: #57114.
Release justification: low risk update to new functionality.
Release note (sql change): New virtual tables
crdb_internal.cluster_contention_events
andcrdb_internal.node_contention_events
are introduced which exposecluster-wide and gateway-only, respectively, contention information. In
order to access it the user needs to be either an admin or have
VIEWACTIVITY
grant.