-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: add /api/v2/ tree with auth/pagination, port listSessions endpoint #58436
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wohoo 💯
I lkove the pagination logic.
Can you spell out where you'd like the review iterations to focus first? I can see 3 areas:
- pagination
- API serveHTTP, authentication and authorization
- actual API endpoints we want to serve and their request/response payloads
Reviewed 15 of 15 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal)
pkg/server/api.go, line 60 at r1 (raw file):
// next offset that can be used to return the next "limit" results, or // len(result) if there are no more results. func simplePaginate(input interface{}, limit, offset int) (result interface{}, next int) {
can you move the pagination functions to a separate file pagination.go
so that api.go
focuses on the api endpoints.
pkg/server/api.go, line 205 at r1 (raw file):
} type authenticationV2Server struct {
ditto api_auth.go
for this
pkg/server/api.go, line 445 at r1 (raw file):
username := security.MakeSQLUsernameFromPreNormalizedString( req.Context().Value(webSessionUserKey{}).(string)) if admin, err := r.a.hasAdminRole(req.Context(), username); err != nil || !admin {
we probably want different role checks for different API endpoints. Not all APIs require admin.
pkg/server/api.go, line 488 at r1 (raw file):
}{ {"sessions/", a.listSessions, true /* requiresAuth */ , true /* requiresAdmin */}, {"hotranges/", a.hotRanges, true /* requiresAuth */ , true /* requiresAdmin */},
ok for sessions, but not for hotranges, ranges and nodes.
The current implementation of these endpoints in RPC-land is defective because it exposes too much data in the response payloads.
We need not only a new HTTP API tree; we need entirely different request/response payloads for the v2 api tree.
pkg/server/status.go, line 1240 at r1 (raw file):
b := &kv.Batch{} b.Scan(startKey, endKey)
If there's a limit
you can use this to restrict the number of rows scanned.
Also if the pagination was done using a cursor (nodes already seen) you can encode the value of limit
as startKey
.
(This works with topology changes because we only add node IDs and never reuse past node IDs for new nodes yet.)
pkg/server/status.go, line 1256 at r1 (raw file):
} } sort.Slice(resp.Nodes, func(i, j int) bool {
Our K/V store already sorts the entries by their keys. No way to reuse that pre-sorting here?
pkg/server/status.go, line 1973 at r1 (raw file):
nodeIDs = append(nodeIDs, nodeID) } sort.Slice(nodeIDs, func(i, j int) bool {
if there's a way to guarantee that the result of nodesStatusWithLiveness is already sorted (I suspect there is, cf my remark about the sorted K/V pairs above), then this sort here is not needed.
pkg/server/testserver.go, line 879 at r1 (raw file):
// GetRootAuthenticatedHTTPClient implements the TestServerInterface. func (ts *TestServer) GetRootAuthenticatedHTTPClient() (http.Client, error) {
Can you explain why you felt this was necessary? What does testing with root
add that's not already supported by the admin user defined above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this elsewhere: bilal wishes to focus on pagination first.
Here's a copy-paste of the salient points of the discussion:
-
Let's not forget that pagination is a means to an end, end is to limit resource usage by API server
- How to enforce that client-provided pagination parameters don't cause excessive resource usage?
- Beware that the end-points in a fan-out may also have large resource sets, this needs to be limited as well
- Resource usage is not just number of elements, it's number of bytes
- Size abstraction (method)
- Limit as bytes + number of items at the low level
- Number of items is the client-provided parameters (with a possible upper bound if none specified)
- Number of bytes is the limit that crdb cares about (cluster setting?)
-
There are two different sub-projects:
- Acknowledge/implement the mixed limits in the shared pagination logic at the API level
- Separately, implement per-node limits in the leaves of fan-out APIs
- We understand that the current per-node internal RPCs don't do this
- Adding to the existing RPCs might be burdensome due to cross-version compat requirements with existing admin UI and CLI logic
- Maybe not be shy about creating a separate set of RPCs to power the API v2 (with its own lifecycle and standardized parameters for pagination and limits)
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some of the suggested improvements, and will be making the rest soon. Thanks for the review!. I also un-modified vendor
for now just to ease frequent rebases. I'll re-vendor gorilla/mux
before merge.
Also, as per our conversation elsewhere: I did explore replacing the http.HandlerFunc
-style methods with a more customized Request
object for our needs, but that had one drawback - it would have limited the pluggability of handlers across different types of Muxes, whether that's the authenticationMux, or the authorizationMux
(currently requireAdminWrapper
), or gorilla.Mux
for pattern matching.
At best I'd have to have the middlewares do their thing and store, say, the username in the context like they already are, then have another wrapper around the endpoint-specific handler that pulls it all out of the context and puts it in the custom request object. That seems like an extra step with little advantage, so I'm just going to prefer the way it is right now.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @knz)
pkg/server/api.go, line 60 at r1 (raw file):
Previously, knz (kena) wrote…
can you move the pagination functions to a separate file
pagination.go
so thatapi.go
focuses on the api endpoints.
Done.
pkg/server/api.go, line 205 at r1 (raw file):
Previously, knz (kena) wrote…
ditto
api_auth.go
for this
Done.
pkg/server/api.go, line 445 at r1 (raw file):
Previously, knz (kena) wrote…
we probably want different role checks for different API endpoints. Not all APIs require admin.
Good point - noted. I'll work on generalizing this more shortly, and turning this into a "roleRequirementWrapper" or something.
pkg/server/api.go, line 488 at r1 (raw file):
Previously, knz (kena) wrote…
ok for sessions, but not for hotranges, ranges and nodes.
The current implementation of these endpoints in RPC-land is defective because it exposes too much data in the response payloads.
We need not only a new HTTP API tree; we need entirely different request/response payloads for the v2 api tree.
Ah okay, makes sense. To speed up time to merge I let this PR just be the http tree, but ideally we shouldn't be relying on the protobuf specs or embedding them at all.
pkg/server/testserver.go, line 879 at r1 (raw file):
Previously, knz (kena) wrote…
Can you explain why you felt this was necessary? What does testing with
root
add that's not already supported by the admin user defined above?
This is necessary because ListSessions doesn't return server-wide sessions unless it's run as a superuser (so, root). Calling as a regular admin user returns only the sessions created by that admin user.
6e53cba
to
d618d16
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. I like the new breakdown.
See comments below.
Also a general question: is there a limit on the size of the arguments to an API request? For example, I could imagine we want a hard block on pagination state tokens that are larger than X kilobytes, to avoid excessive memory use when unmarshaling them.
Reviewed 12 of 12 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal)
pkg/server/api.go, line 488 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Ah okay, makes sense. To speed up time to merge I let this PR just be the http tree, but ideally we shouldn't be relying on the protobuf specs or embedding them at all.
I'm ok with merging the PR to be "just the http tree" but then I'll request you remove all but sessions/
here before this merges.
(Ok to keep it for testing until you're happy with the PR)
pkg/server/api_auth.go, line 52 at r2 (raw file):
func (a *authenticationV2Server) registerRoutes() { a.bindEndpoint("login/", a.login)
Is there a way to centralize all the routes in one file, instead of having registerRoutes()
calls spread through multiple files. It's important for auditing the code and helping newcomers to the code base.
pkg/server/api_test.go, line 1 at r1 (raw file):
// Copyright 2020 The Cockroach Authors.
maybe make this pagination_test.go
.
pkg/server/pagination.go, line 40 at r2 (raw file):
val := reflect.ValueOf(input) if limit <= 0 || val.Kind() != reflect.Slice { return val.Interface(), offset
is this not an API error instead?
also you need to check offset < 0
also maybe this entire function would benefit from a unit test with various conditions of bounds
pkg/server/pagination.go, line 66 at r2 (raw file):
// adding any nodes to the end of p.nodesToQuery that don't already exist in p. // sortedNodeIDs must be a sorted slice of all currently-live nodes. func (p *paginationState) mergeNodeIDs(sortedNodeIDs []roachpb.NodeID) {
I think function also need a couple of unit tests to check the edge cases.
pkg/server/pagination.go, line 78 at r2 (raw file):
j := 0 for i := range sortedNodeIDs { for j < len(allNodeIDs) && allNodeIDs[j] < sortedNodeIDs[i] {
I think sort.Search
is what you need here, may make the code clearer.
pkg/server/pagination.go, line 92 at r2 (raw file):
} func (p *paginationState) UnmarshalText(text []byte) error {
maybe worth having a unit test with a set of pagination states, and checks that marshal -> unmarshal cycles reproduce the original content
pkg/server/pagination.go, line 107 at r2 (raw file):
res := make([]roachpb.NodeID, 0, len(parts)) for _, part := range parts { part = strings.TrimSpace(part)
I don't think you need to trim spaces and check the length below. If the forma tis invalid, Atoi
will return an error anyway.
pkg/server/pagination.go, line 111 at r2 (raw file):
continue } val, err := strconv.Atoi(part)
here and below, check the node ID is not zero or negative. Maybe use strconv.ParseUint
with 32 bit precision instead, the same as node IDs, to check for overflows.
pkg/server/pagination.go, line 140 at r2 (raw file):
} func (p *paginationState) MarshalText() (text []byte, err error) {
I think it would be useful to add docstrings here and above to explain the format of marshaled pagination state tokens.
pkg/server/pagination.go, line 169 at r2 (raw file):
} // rpcNodePaginator allows for concurrent requests
nit: period missing at end.
pkg/server/pagination.go, line 229 at r2 (raw file):
r.mu.currentLen -= r.pagState.inProgressIndex } if r.mu.currentLen >= r.limit {
if the limit is larger than the total size of responses combined, this condition is never reached. In which case do you store r.done
and close reponseChan
then?
pkg/server/pagination.go, line 236 at r2 (raw file):
r.mu.turnCond.Broadcast() } err := contextutil.RunWithTimeout(ctx, "dial node", base.NetworkTimeout, func(ctx context.Context) error {
nit: if err := ...; err != nil {
pkg/server/pagination.go, line 278 at r2 (raw file):
r.errorFn(res.nodeID, res.err) } else if res.len > 0 && count < r.limit { var startIdx, endIdx int
I'd recommend extracing the computation of startIdx/count/endIdx in a different function so it's easier to review and understand for the next maintainer of the code.
pkg/server/pagination.go, line 301 at r2 (raw file):
response = res.value.Interface() } r.responseFn(res.nodeID, response)
don't you need to pass ctx
here and check for cancellation inside the response function too?
pkg/server/pagination.go, line 319 at r2 (raw file):
} case <-ctx.Done(): err = errors.Errorf("request of %s canceled before completion", r.errorCtx)
Should this condition not interrupt the loop somehow?
Also, you may need errors.CombineErrors
if you're collecting errors across multiple nodes.
pkg/server/testserver.go, line 879 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
This is necessary because ListSessions doesn't return server-wide sessions unless it's run as a superuser (so, root). Calling as a regular admin user returns only the sessions created by that admin user.
Can you double-check this? The condition in getLocalSessions()
is if isAdmin
, not if user == security.RootUser
. Have you verified that root gets different behavior from a non-root admin?
(Conversely, if you did verify that root
has a different behavior, can you point to me in the code where we implement this special case? It seems wrong and may need to be corrected.)
d618d16
to
3ad7af0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To answer the general question: no, there's no current limit on size of arguments to an API request, but base64 and this encoding is relatively compact as far as these go. Really long URLs are unadvisable, but I don't think we'll run into this issue in any topology.
Made some more significant changes:
- Most of the paginatedIterateNodes logic is now in rpcPaginator and unit tested, some more of it is in pagState.paginate and also unit tested
- Centralized authv2Server route registration in the apiV2Server
- Some minor bug fixes / cancellation checks etc as suggested.
- Added a role checking mux, though it's not fully fleshed out yet
Big TODOs that still remain for this PR:
- Delete handlers other than sessions (for now)
- Add docstrings on how to add new endpoints
- Improve roleAuthorizationMux to allow for any arbitrary
roleoption
. - anything else marked unresolved below
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @knz)
pkg/server/api.go, line 488 at r1 (raw file):
Previously, knz (kena) wrote…
I'm ok with merging the PR to be "just the http tree" but then I'll request you remove all but
sessions/
here before this merges.(Ok to keep it for testing until you're happy with the PR)
I'm okay with dropping the other handlers for now, and moving them to another PR while we distill down the responses. I think that approach makes sense.
pkg/server/api_auth.go, line 52 at r2 (raw file):
Previously, knz (kena) wrote…
Is there a way to centralize all the routes in one file, instead of having
registerRoutes()
calls spread through multiple files. It's important for auditing the code and helping newcomers to the code base.
Done. While I don't think it'll ever be truly centralized; in particular because I see the value in having sub-servers that encapsulate any variables / logic common to a set of endpoints but not all of them. authServer
is one example of such a sub-server. That said, I've improved visibility by having one "global map" of all apiV2 endpoints, and then the sub-servers can then reimplement/re-route endpoints the way they wish.
Your TODO on documenting this in a docstring still stands.
pkg/server/api_test.go, line 1 at r1 (raw file):
Previously, knz (kena) wrote…
maybe make this
pagination_test.go
.
Done.
pkg/server/pagination.go, line 40 at r2 (raw file):
Previously, knz (kena) wrote…
is this not an API error instead?
also you need to check offset < 0
also maybe this entire function would benefit from a unit test with various conditions of bounds
Done. Unit tests still a TODO for this function.
pkg/server/pagination.go, line 66 at r2 (raw file):
Previously, knz (kena) wrote…
I think function also need a couple of unit tests to check the edge cases.
This should already be unit tested, see TestPaginationState.
pkg/server/pagination.go, line 78 at r2 (raw file):
Previously, knz (kena) wrote…
I think
sort.Search
is what you need here, may make the code clearer.
I did change to sort.Search as it's cleaner, but on closer inspection it's less efficient than the previous approach as that one is guaranteed to just be O(max(n,m))
while sort.Search is O(nlogm)
. You can think of this as more of a merge of two sorted runs. I don't think the difference matters much, so I'm happy with it either way.
pkg/server/pagination.go, line 92 at r2 (raw file):
Previously, knz (kena) wrote…
maybe worth having a unit test with a set of pagination states, and checks that marshal -> unmarshal cycles reproduce the original content
There's one example of this in TestPaginationState.
pkg/server/pagination.go, line 111 at r2 (raw file):
Previously, knz (kena) wrote…
here and below, check the node ID is not zero or negative. Maybe use
strconv.ParseUint
with 32 bit precision instead, the same as node IDs, to check for overflows.
Done.
pkg/server/pagination.go, line 169 at r2 (raw file):
Previously, knz (kena) wrote…
nit: period missing at end.
Done. The entire comment was incomplete.
pkg/server/pagination.go, line 229 at r2 (raw file):
Previously, knz (kena) wrote…
if the limit is larger than the total size of responses combined, this condition is never reached. In which case do you store
r.done
and closereponseChan
then?
Yep - in that case we rely on the for numNodes
loop to figure this out down in the consumer.
pkg/server/pagination.go, line 278 at r2 (raw file):
Previously, knz (kena) wrote…
I'd recommend extracing the computation of startIdx/count/endIdx in a different function so it's easier to review and understand for the next maintainer of the code.
Done. It's in paginationState.paginate
now, and is unit tested.
pkg/server/pagination.go, line 301 at r2 (raw file):
Previously, knz (kena) wrote…
don't you need to pass
ctx
here and check for cancellation inside the response function too?
This is a very simple function as it just aggregates the results, so no. I did add more ctx cancellation checks in this function though
pkg/server/testserver.go, line 879 at r1 (raw file):
Previously, knz (kena) wrote…
Can you double-check this? The condition in
getLocalSessions()
isif isAdmin
, notif user == security.RootUser
. Have you verified that root gets different behavior from a non-root admin?(Conversely, if you did verify that
root
has a different behavior, can you point to me in the code where we implement this special case? It seems wrong and may need to be corrected.)
It's in the hasViewActivity
role over there. You're right in that we don't necessarily need a superuser, but I thought that having a superuser http.Client could be useful in tests for multiple reasons anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I like the convergence so far!
Reviewed 10 of 10 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @knz)
pkg/server/pagination.go, line 78 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
I did change to sort.Search as it's cleaner, but on closer inspection it's less efficient than the previous approach as that one is guaranteed to just be
O(max(n,m))
while sort.Search isO(nlogm)
. You can think of this as more of a merge of two sorted runs. I don't think the difference matters much, so I'm happy with it either way.
I'm OK with your previous algorithm then, but I'd rather see it extracted into a separate function with a docstring which motivates its existence.
pkg/server/pagination.go, line 111 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Done.
This is still not checking for 32-bit overflow though. You'd get this and the neg check for free with ParseUint
.
pkg/server/pagination.go, line 91 at r3 (raw file):
} // limitNodeResponse processes the response from a given node, and returns
comment does not start with the same name as the function.
pkg/server/pagination.go, line 300 at r3 (raw file):
r.mu.currentLen += resp.len if nodeID == r.pagState.inProgress { // We're resuming partway
nit: period at end of sentence.
pkg/server/pagination_test.go, line 30 at r3 (raw file):
) func TestPaginationState(t *testing.T) {
I'll welcome comments here:
-
a docstring above the function which documents the datadriven DSL
-
inline comments throughout the function that spell out the general steps taken by the test, in prose.
The second part is important for the review, because it enables me (the reviewer) to check that the code effectively mirrors your own understanding of what you intended to do.
pkg/server/pagination_test.go, line 93 at r3 (raw file):
} return "ok" case "merge-node-ids":
nit: our guide style suggests (or should suggest, if it doesn't already), an empty line before each case
after the first.
pkg/server/pagination_test.go, line 136 at r3 (raw file):
} func TestRPCPaginator(t *testing.T) {
ditto comments here
pkg/server/testserver.go, line 879 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
It's in the
hasViewActivity
role over there. You're right in that we don't necessarily need a superuser, but I thought that having a superuser http.Client could be useful in tests for multiple reasons anyway.
what is "over there"?
4a5150a
to
8ad6ebe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @knz)
8bbae42
to
58bbb32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs! I should have addressed all the TODOs meant for this PR so far, with the exception of telemetry counters. I'll add those early tomorrow. Thanks again for the reviews so far.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz)
pkg/server/api.go, line 488 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
I'm okay with dropping the other handlers for now, and moving them to another PR while we distill down the responses. I think that approach makes sense.
Done.
pkg/server/pagination.go, line 78 at r2 (raw file):
Previously, knz (kena) wrote…
I'm OK with your previous algorithm then, but I'd rather see it extracted into a separate function with a docstring which motivates its existence.
I moved back to the original algorithm and improved comments around here. Since this method is just a sort-then-merge, and this is the merge side of it, I'm not sure if I'd want to split it away from the sorting half of the function. That said, if you still want me to split it up, and the new comments aren't sufficient, I'd be happy to do that.
pkg/server/pagination.go, line 107 at r2 (raw file):
Previously, knz (kena) wrote…
I don't think you need to trim spaces and check the length below. If the forma tis invalid,
Atoi
will return an error anyway.
The thing is we don't want to return an error if len(part) == 0
, as MarshalText can sometimes do that. The encoding leaves a stray comma at the end. Maybe I could change that, but the current MarshalText
is lazier and the UnmarshalText
is slightly more permissive
pkg/server/pagination.go, line 111 at r2 (raw file):
Previously, knz (kena) wrote…
This is still not checking for 32-bit overflow though. You'd get this and the neg check for free with
ParseUint
.
Done.
pkg/server/pagination.go, line 140 at r2 (raw file):
Previously, knz (kena) wrote…
I think it would be useful to add docstrings here and above to explain the format of marshaled pagination state tokens.
Done.
pkg/server/pagination.go, line 319 at r2 (raw file):
Previously, knz (kena) wrote…
Should this condition not interrupt the loop somehow?
Also, you may need
errors.CombineErrors
if you're collecting errors across multiple nodes.
The combination would have to happen in errorFunc. I'm leaving that up to the caller to handle, as that's how we do it with iterateNodes
, but maybe it's better to handle the aggregation inside the paginator?
Also yes, I've added the loop interruption now.
pkg/server/pagination.go, line 91 at r3 (raw file):
Previously, knz (kena) wrote…
comment does not start with the same name as the function.
Done.
pkg/server/pagination_test.go, line 30 at r3 (raw file):
Previously, knz (kena) wrote…
I'll welcome comments here:
a docstring above the function which documents the datadriven DSL
inline comments throughout the function that spell out the general steps taken by the test, in prose.
The second part is important for the review, because it enables me (the reviewer) to check that the code effectively mirrors your own understanding of what you intended to do.
Done. Let me know if you need more inline comments; reading the tests and comments from top to bottom should help a lot in understanding it.
pkg/server/pagination_test.go, line 136 at r3 (raw file):
Previously, knz (kena) wrote…
ditto comments here
Done.
pkg/server/status.go, line 1240 at r1 (raw file):
Previously, knz (kena) wrote…
If there's a
limit
you can use this to restrict the number of rows scanned.Also if the pagination was done using a cursor (nodes already seen) you can encode the value of
limit
asstartKey
.
(This works with topology changes because we only add node IDs and never reuse past node IDs for new nodes yet.)
N/A as this code change got deferred to a future PR.
pkg/server/status.go, line 1256 at r1 (raw file):
Previously, knz (kena) wrote…
Our K/V store already sorts the entries by their keys. No way to reuse that pre-sorting here?
N/A as this code change got deferred to a future PR.
pkg/server/status.go, line 1973 at r1 (raw file):
Previously, knz (kena) wrote…
if there's a way to guarantee that the result of nodesStatusWithLiveness is already sorted (I suspect there is, cf my remark about the sorted K/V pairs above), then this sort here is not needed.
The status server returns a map, so the sort order isn't guaranteed. Is there another node-listing API that guarantees a sort order? Either way it might just be cheaper to sort on the fly to get the order we want.
pkg/server/testserver.go, line 879 at r1 (raw file):
Previously, knz (kena) wrote…
what is "over there"?
"over there" is getLocalSessions
. It transparently translates calls to be about the current user's sessions only, if the user doesn't have the ViewActivity
roleoption. The superuser, by definition, has all role options, but admins don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is about good to go with just a few minor things remaining.
Reviewed 12 of 12 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal)
pkg/server/api.go, line 34 at r5 (raw file):
res, err := json.Marshal(payload) if err != nil { panic(err)
I'm not keen on naked panics.
I appreciate that you want this to fail in tests if there's an error; however in production at worst we'll want to log an error and just fail the request with an error.
Maybe this is an opportunity to introduce some kind of assert package: a function with two variants with build tags. I think test code is built with a special tag already. With the tag present this can panic, and without it trhere would just be an error logged.
You can also move the marshal above w.WriteHeader
above; so that if it fails the API fails properly with HTTP 500.
pkg/server/pagination.go, line 107 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
The thing is we don't want to return an error if
len(part) == 0
, as MarshalText can sometimes do that. The encoding leaves a stray comma at the end. Maybe I could change that, but the currentMarshalText
is lazier and theUnmarshalText
is slightly more permissive
Thanks for explaining. Worth copy-pasting this in an explanatory comment too then.
pkg/server/status.go, line 1973 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
The status server returns a map, so the sort order isn't guaranteed. Is there another node-listing API that guarantees a sort order? Either way it might just be cheaper to sort on the fly to get the order we want.
You're good here.
pkg/server/testserver.go, line 879 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
"over there" is
getLocalSessions
. It transparently translates calls to be about the current user's sessions only, if the user doesn't have theViewActivity
roleoption. The superuser, by definition, has all role options, but admins don't.
Wait what? A non-root user with role admin
does not get the VIEWACTIVITY privilege? That sounds like a bug if confirmed.
cb3dbc3
to
64d3ef3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR! Added a telemetry counter for list-sessions too. Wondering if I should add a more general solution to that, but I can also see it being unhelpful to have counters for every single API request. Or maybe I'm wrong and we probably should be adding them to every route definition and calling them in an outer http.Handler. Happy to hear some opinions.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz)
pkg/server/api.go, line 34 at r5 (raw file):
Previously, knz (kena) wrote…
I'm not keen on naked panics.
I appreciate that you want this to fail in tests if there's an error; however in production at worst we'll want to log an error and just fail the request with an error.Maybe this is an opportunity to introduce some kind of assert package: a function with two variants with build tags. I think test code is built with a special tag already. With the tag present this can panic, and without it trhere would just be an error logged.
You can also move the marshal above
w.WriteHeader
above; so that if it fails the API fails properly with HTTP 500.
Done. Woops. I meant to remove this at some time but forgot. Thanks!
pkg/server/pagination.go, line 107 at r2 (raw file):
Previously, knz (kena) wrote…
Thanks for explaining. Worth copy-pasting this in an explanatory comment too then.
Done.
pkg/server/testserver.go, line 879 at r1 (raw file):
Previously, knz (kena) wrote…
Wait what? A non-root user with role
admin
does not get the VIEWACTIVITY privilege? That sounds like a bug if confirmed.
Yep, that's the case. You can see the TestListSessionsV2
test actively reconstructs that situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is about to go. Just a nit about the telemetry thing, and then you'll be done.
Reviewed 9 of 9 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal)
pkg/server/api.go, line 30 at r6 (raw file):
// Telemetry counters for API endpoints. var listSessionsCounter = telemetry.GetCounterOnce("api.v2.list-sessions")
that won't do - we really want the counters to be auto-generated, so that we are guarnateed telemetry even if the next engineers doesn't think about it.
I invite you to review getServerEndpointCounter()
in server.go and how it's used, and do something similar.
pkg/server/testserver.go, line 879 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Yep, that's the case. You can see the
TestListSessionsV2
test actively reconstructs that situation.
That is ... a problem. Let's not block this PR on this, but please add a comment to explain this finding, and TODO that this may need to be simplified if we find that this distinction does not need to exist.
64d3ef3
to
899cca6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR! Cleaned up the GetRootAuthenticatedHTTPClient
stuff after I made the admin/root user propagation fix. Also did the automatic telemetry count decoration. PTAL!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz)
pkg/server/api.go, line 30 at r6 (raw file):
Previously, knz (kena) wrote…
that won't do - we really want the counters to be auto-generated, so that we are guarnateed telemetry even if the next engineers doesn't think about it.
I invite you to review
getServerEndpointCounter()
in server.go and how it's used, and do something similar.
Done. Let me know how you find it now.
pkg/server/testserver.go, line 879 at r1 (raw file):
Previously, knz (kena) wrote…
That is ... a problem. Let's not block this PR on this, but please add a comment to explain this finding, and TODO that this may need to be simplified if we find that this distinction does not need to exist.
Ended up investigating it - it was bad context conversion for gRPC fan-out calls. Latest revision fixes it. Thanks for highlighting!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r7.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal)
pkg/server/api.go, line 116 at r7 (raw file):
var handler http.Handler handler = &callCountDecorator{ counter: telemetry.GetCounterOnce(fmt.Sprintf("api.v2.%s", route.name)),
Can you simply use the url as route name for the counter. This removes an argument, and also makes it easier for PMs to interpret the data.
899cca6
to
8d4a575
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @knz)
pkg/server/api.go, line 116 at r7 (raw file):
Previously, knz (kena) wrote…
Can you simply use the url as route name for the counter. This removes an argument, and also makes it easier for PMs to interpret the data.
Done. Might need to change this when we add keyword URLs that go like /sessions/{session_id}/
but for now this works.
a984a33
to
1149238
Compare
…point This change adds the skeleton for a new API tree that lives in `/api/v2/` in the http listener, and currently reimplements the `/sessions/` endpoint that is also implemented in `/_status/`. The new v2 API tree avoids the need to use GRPC Gateway, as well as cookie-based authentication which is less intuitive and idiomatic for REST APIs. Instead, for authentication, it uses a new session header that needs to be set on every request. As many RPC fan-out APIs use statusServer.iterateNodes, this change implements a pagination-aware method, paginatedIterateNodes, that works on a sorted set of node IDs and arranges results in such a way to be able to return the next `limit` results of an arbitary slice after the `next` cursor passed in. An example of how this works in practice is the new `/api/v2/sessions/` endpoint. A dependency on gorilla/mux is added to be able to pattern-match arguments in the URL. This was already an indirect dependency; now it's a direct dependency of cockroach. TODO that are likely to fall over into future PRs: - API Documentation, need to explore using swagger here. - Porting over remaining /_admin/ and /_status/ APIs, incl. SQL based ones Part of cockroachdb#55947. Release note (api change): Adds a new API tree, in /api/v2/*, currently undocumented, that avoids the use of and cookie-based authentication in favour of sessions in headers, and support for pagination.
1149238
to
a889429
Compare
TFTR! Used the last couple pushes to fix lint issues. bors r+ |
Build succeeded: |
This change adds the skeleton for a new API tree that lives in
/api/v2/
in the http listener, and currently reimplements the/sessions/
endpoint that is also implemented in
/_status/
. The new v2 API tree avoidsthe need to use GRPC Gateway, as well as cookie-based authentication which is
less intuitive and idiomatic for REST APIs. Instead, for authentication,
it uses a new session header that needs to be set on every request.
As many RPC fan-out APIs use statusServer.iterateNodes, this change
implements a pagination-aware method, paginatedIterateNodes, that
works on a sorted set of node IDs and arranges results in such a way
to be able to return the next
limit
results of an arbitary sliceafter the
next
cursor passed in. An example of how this works in practiceis the new
/api/v2/sessions/
endpoint.A dependency on gorilla/mux is added to be able to pattern-match
arguments in the URL. This was already an indirect dependency; now it's
a direct dependency of cockroach.
TODO that are likely to fall over into future PRs:
Part of #55947.
Release note (api change): Adds a new API tree, in /api/v2/*, currently
undocumented, that avoids the use of and cookie-based
authentication in favour of sessions in headers, and support for pagination.