Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: add /api/v2/ tree with auth/pagination, port listSessions endpoint #58436

Merged
merged 1 commit into from
Feb 1, 2021

Conversation

itsbilal
Copy link
Contributor

@itsbilal itsbilal commented Jan 4, 2021

This change adds the skeleton for a new API tree that lives in
/api/v2/ in the http listener, and currently reimplements the /sessions/
endpoint that is also implemented in /_status/. The new v2 API tree avoids
the need to use GRPC Gateway, as well as cookie-based authentication which is
less intuitive and idiomatic for REST APIs. Instead, for authentication,
it uses a new session header that needs to be set on every request.

As many RPC fan-out APIs use statusServer.iterateNodes, this change
implements a pagination-aware method, paginatedIterateNodes, that
works on a sorted set of node IDs and arranges results in such a way
to be able to return the next limit results of an arbitary slice
after the next cursor passed in. An example of how this works in practice
is the new /api/v2/sessions/ endpoint.

A dependency on gorilla/mux is added to be able to pattern-match
arguments in the URL. This was already an indirect dependency; now it's
a direct dependency of cockroach.

TODO that are likely to fall over into future PRs:

  • API Documentation, need to explore using swagger here.
  • Porting over remaining /_admin/ and /_status/ APIs, incl. SQL based ones

Part of #55947.

Release note (api change): Adds a new API tree, in /api/v2/*, currently
undocumented, that avoids the use of and cookie-based
authentication in favour of sessions in headers, and support for pagination.

@itsbilal itsbilal requested a review from knz January 4, 2021 22:47
@itsbilal itsbilal self-assigned this Jan 4, 2021
@itsbilal itsbilal requested a review from a team as a code owner January 4, 2021 22:47
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wohoo 💯
I lkove the pagination logic.

Can you spell out where you'd like the review iterations to focus first? I can see 3 areas:

  • pagination
  • API serveHTTP, authentication and authorization
  • actual API endpoints we want to serve and their request/response payloads

Reviewed 15 of 15 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal)


pkg/server/api.go, line 60 at r1 (raw file):

// next offset that can be used to return the next "limit" results, or
// len(result) if there are no more results.
func simplePaginate(input interface{}, limit, offset int) (result interface{}, next int) {

can you move the pagination functions to a separate file pagination.go so that api.go focuses on the api endpoints.


pkg/server/api.go, line 205 at r1 (raw file):

}

type authenticationV2Server struct {

ditto api_auth.go for this


pkg/server/api.go, line 445 at r1 (raw file):

	username := security.MakeSQLUsernameFromPreNormalizedString(
		req.Context().Value(webSessionUserKey{}).(string))
	if admin, err := r.a.hasAdminRole(req.Context(), username); err != nil || !admin {

we probably want different role checks for different API endpoints. Not all APIs require admin.


pkg/server/api.go, line 488 at r1 (raw file):

	}{
		{"sessions/", a.listSessions, true /* requiresAuth */ , true /* requiresAdmin */},
		{"hotranges/", a.hotRanges, true /* requiresAuth */ , true /* requiresAdmin */},

ok for sessions, but not for hotranges, ranges and nodes.

The current implementation of these endpoints in RPC-land is defective because it exposes too much data in the response payloads.

We need not only a new HTTP API tree; we need entirely different request/response payloads for the v2 api tree.


pkg/server/status.go, line 1240 at r1 (raw file):

	b := &kv.Batch{}
	b.Scan(startKey, endKey)

If there's a limit you can use this to restrict the number of rows scanned.

Also if the pagination was done using a cursor (nodes already seen) you can encode the value of limit as startKey.
(This works with topology changes because we only add node IDs and never reuse past node IDs for new nodes yet.)


pkg/server/status.go, line 1256 at r1 (raw file):

		}
	}
	sort.Slice(resp.Nodes, func(i, j int) bool {

Our K/V store already sorts the entries by their keys. No way to reuse that pre-sorting here?


pkg/server/status.go, line 1973 at r1 (raw file):

		nodeIDs = append(nodeIDs, nodeID)
	}
	sort.Slice(nodeIDs, func(i, j int) bool {

if there's a way to guarantee that the result of nodesStatusWithLiveness is already sorted (I suspect there is, cf my remark about the sorted K/V pairs above), then this sort here is not needed.


pkg/server/testserver.go, line 879 at r1 (raw file):

// GetRootAuthenticatedHTTPClient implements the TestServerInterface.
func (ts *TestServer) GetRootAuthenticatedHTTPClient() (http.Client, error) {

Can you explain why you felt this was necessary? What does testing with root add that's not already supported by the admin user defined above?

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this elsewhere: bilal wishes to focus on pagination first.

Here's a copy-paste of the salient points of the discussion:

  • Let's not forget that pagination is a means to an end, end is to limit resource usage by API server

    • How to enforce that client-provided pagination parameters don't cause excessive resource usage?
    • Beware that the end-points in a fan-out may also have large resource sets, this needs to be limited as well
    • Resource usage is not just number of elements, it's number of bytes
    • Size abstraction (method)
      • Limit as bytes + number of items at the low level
      • Number of items is the client-provided parameters (with a possible upper bound if none specified)
      • Number of bytes is the limit that crdb cares about (cluster setting?)
  • There are two different sub-projects:

    • Acknowledge/implement the mixed limits in the shared pagination logic at the API level
    • Separately, implement per-node limits in the leaves of fan-out APIs
      • We understand that the current per-node internal RPCs don't do this
      • Adding to the existing RPCs might be burdensome due to cross-version compat requirements with existing admin UI and CLI logic
      • Maybe not be shy about creating a separate set of RPCs to power the API v2 (with its own lifecycle and standardized parameters for pagination and limits)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal)

Copy link
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some of the suggested improvements, and will be making the rest soon. Thanks for the review!. I also un-modified vendor for now just to ease frequent rebases. I'll re-vendor gorilla/mux before merge.

Also, as per our conversation elsewhere: I did explore replacing the http.HandlerFunc-style methods with a more customized Request object for our needs, but that had one drawback - it would have limited the pluggability of handlers across different types of Muxes, whether that's the authenticationMux, or the authorizationMux (currently requireAdminWrapper), or gorilla.Mux for pattern matching.

At best I'd have to have the middlewares do their thing and store, say, the username in the context like they already are, then have another wrapper around the endpoint-specific handler that pulls it all out of the context and puts it in the custom request object. That seems like an extra step with little advantage, so I'm just going to prefer the way it is right now.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @knz)


pkg/server/api.go, line 60 at r1 (raw file):

Previously, knz (kena) wrote…

can you move the pagination functions to a separate file pagination.go so that api.go focuses on the api endpoints.

Done.


pkg/server/api.go, line 205 at r1 (raw file):

Previously, knz (kena) wrote…

ditto api_auth.go for this

Done.


pkg/server/api.go, line 445 at r1 (raw file):

Previously, knz (kena) wrote…

we probably want different role checks for different API endpoints. Not all APIs require admin.

Good point - noted. I'll work on generalizing this more shortly, and turning this into a "roleRequirementWrapper" or something.


pkg/server/api.go, line 488 at r1 (raw file):

Previously, knz (kena) wrote…

ok for sessions, but not for hotranges, ranges and nodes.

The current implementation of these endpoints in RPC-land is defective because it exposes too much data in the response payloads.

We need not only a new HTTP API tree; we need entirely different request/response payloads for the v2 api tree.

Ah okay, makes sense. To speed up time to merge I let this PR just be the http tree, but ideally we shouldn't be relying on the protobuf specs or embedding them at all.


pkg/server/testserver.go, line 879 at r1 (raw file):

Previously, knz (kena) wrote…

Can you explain why you felt this was necessary? What does testing with root add that's not already supported by the admin user defined above?

This is necessary because ListSessions doesn't return server-wide sessions unless it's run as a superuser (so, root). Calling as a regular admin user returns only the sessions created by that admin user.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I like the new breakdown.

See comments below.

Also a general question: is there a limit on the size of the arguments to an API request? For example, I could imagine we want a hard block on pagination state tokens that are larger than X kilobytes, to avoid excessive memory use when unmarshaling them.

Reviewed 12 of 12 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal)


pkg/server/api.go, line 488 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Ah okay, makes sense. To speed up time to merge I let this PR just be the http tree, but ideally we shouldn't be relying on the protobuf specs or embedding them at all.

I'm ok with merging the PR to be "just the http tree" but then I'll request you remove all but sessions/ here before this merges.

(Ok to keep it for testing until you're happy with the PR)


pkg/server/api_auth.go, line 52 at r2 (raw file):

func (a *authenticationV2Server) registerRoutes() {
	a.bindEndpoint("login/", a.login)

Is there a way to centralize all the routes in one file, instead of having registerRoutes() calls spread through multiple files. It's important for auditing the code and helping newcomers to the code base.


pkg/server/api_test.go, line 1 at r1 (raw file):

// Copyright 2020 The Cockroach Authors.

maybe make this pagination_test.go.


pkg/server/pagination.go, line 40 at r2 (raw file):

	val := reflect.ValueOf(input)
	if limit <= 0 || val.Kind() != reflect.Slice {
		return val.Interface(), offset

is this not an API error instead?

also you need to check offset < 0

also maybe this entire function would benefit from a unit test with various conditions of bounds


pkg/server/pagination.go, line 66 at r2 (raw file):

// adding any nodes to the end of p.nodesToQuery that don't already exist in p.
// sortedNodeIDs must be a sorted slice of all currently-live nodes.
func (p *paginationState) mergeNodeIDs(sortedNodeIDs []roachpb.NodeID) {

I think function also need a couple of unit tests to check the edge cases.


pkg/server/pagination.go, line 78 at r2 (raw file):

	j := 0
	for i := range sortedNodeIDs {
		for j < len(allNodeIDs) && allNodeIDs[j] < sortedNodeIDs[i] {

I think sort.Search is what you need here, may make the code clearer.


pkg/server/pagination.go, line 92 at r2 (raw file):

}

func (p *paginationState) UnmarshalText(text []byte) error {

maybe worth having a unit test with a set of pagination states, and checks that marshal -> unmarshal cycles reproduce the original content


pkg/server/pagination.go, line 107 at r2 (raw file):

		res := make([]roachpb.NodeID, 0, len(parts))
		for _, part := range parts {
			part = strings.TrimSpace(part)

I don't think you need to trim spaces and check the length below. If the forma tis invalid, Atoi will return an error anyway.


pkg/server/pagination.go, line 111 at r2 (raw file):

				continue
			}
			val, err := strconv.Atoi(part)

here and below, check the node ID is not zero or negative. Maybe use strconv.ParseUint with 32 bit precision instead, the same as node IDs, to check for overflows.


pkg/server/pagination.go, line 140 at r2 (raw file):

}

func (p *paginationState) MarshalText() (text []byte, err error) {

I think it would be useful to add docstrings here and above to explain the format of marshaled pagination state tokens.


pkg/server/pagination.go, line 169 at r2 (raw file):

}

// rpcNodePaginator allows for concurrent requests

nit: period missing at end.


pkg/server/pagination.go, line 229 at r2 (raw file):

			r.mu.currentLen -= r.pagState.inProgressIndex
		}
		if r.mu.currentLen >= r.limit {

if the limit is larger than the total size of responses combined, this condition is never reached. In which case do you store r.done and close reponseChan then?


pkg/server/pagination.go, line 236 at r2 (raw file):

		r.mu.turnCond.Broadcast()
	}
	err := contextutil.RunWithTimeout(ctx, "dial node", base.NetworkTimeout, func(ctx context.Context) error {

nit: if err := ...; err != nil {


pkg/server/pagination.go, line 278 at r2 (raw file):

				r.errorFn(res.nodeID, res.err)
			} else if res.len > 0 && count < r.limit {
				var startIdx, endIdx int

I'd recommend extracing the computation of startIdx/count/endIdx in a different function so it's easier to review and understand for the next maintainer of the code.


pkg/server/pagination.go, line 301 at r2 (raw file):

					response = res.value.Interface()
				}
				r.responseFn(res.nodeID, response)

don't you need to pass ctx here and check for cancellation inside the response function too?


pkg/server/pagination.go, line 319 at r2 (raw file):

			}
		case <-ctx.Done():
			err = errors.Errorf("request of %s canceled before completion", r.errorCtx)

Should this condition not interrupt the loop somehow?

Also, you may need errors.CombineErrors if you're collecting errors across multiple nodes.


pkg/server/testserver.go, line 879 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

This is necessary because ListSessions doesn't return server-wide sessions unless it's run as a superuser (so, root). Calling as a regular admin user returns only the sessions created by that admin user.

Can you double-check this? The condition in getLocalSessions() is if isAdmin, not if user == security.RootUser. Have you verified that root gets different behavior from a non-root admin?

(Conversely, if you did verify that root has a different behavior, can you point to me in the code where we implement this special case? It seems wrong and may need to be corrected.)

Copy link
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer the general question: no, there's no current limit on size of arguments to an API request, but base64 and this encoding is relatively compact as far as these go. Really long URLs are unadvisable, but I don't think we'll run into this issue in any topology.

Made some more significant changes:

  1. Most of the paginatedIterateNodes logic is now in rpcPaginator and unit tested, some more of it is in pagState.paginate and also unit tested
  2. Centralized authv2Server route registration in the apiV2Server
  3. Some minor bug fixes / cancellation checks etc as suggested.
  4. Added a role checking mux, though it's not fully fleshed out yet

Big TODOs that still remain for this PR:

  1. Delete handlers other than sessions (for now)
  2. Add docstrings on how to add new endpoints
  3. Improve roleAuthorizationMux to allow for any arbitrary roleoption.
  4. anything else marked unresolved below

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @knz)


pkg/server/api.go, line 488 at r1 (raw file):

Previously, knz (kena) wrote…

I'm ok with merging the PR to be "just the http tree" but then I'll request you remove all but sessions/ here before this merges.

(Ok to keep it for testing until you're happy with the PR)

I'm okay with dropping the other handlers for now, and moving them to another PR while we distill down the responses. I think that approach makes sense.


pkg/server/api_auth.go, line 52 at r2 (raw file):

Previously, knz (kena) wrote…

Is there a way to centralize all the routes in one file, instead of having registerRoutes() calls spread through multiple files. It's important for auditing the code and helping newcomers to the code base.

Done. While I don't think it'll ever be truly centralized; in particular because I see the value in having sub-servers that encapsulate any variables / logic common to a set of endpoints but not all of them. authServer is one example of such a sub-server. That said, I've improved visibility by having one "global map" of all apiV2 endpoints, and then the sub-servers can then reimplement/re-route endpoints the way they wish.

Your TODO on documenting this in a docstring still stands.


pkg/server/api_test.go, line 1 at r1 (raw file):

Previously, knz (kena) wrote…

maybe make this pagination_test.go.

Done.


pkg/server/pagination.go, line 40 at r2 (raw file):

Previously, knz (kena) wrote…

is this not an API error instead?

also you need to check offset < 0

also maybe this entire function would benefit from a unit test with various conditions of bounds

Done. Unit tests still a TODO for this function.


pkg/server/pagination.go, line 66 at r2 (raw file):

Previously, knz (kena) wrote…

I think function also need a couple of unit tests to check the edge cases.

This should already be unit tested, see TestPaginationState.


pkg/server/pagination.go, line 78 at r2 (raw file):

Previously, knz (kena) wrote…

I think sort.Search is what you need here, may make the code clearer.

I did change to sort.Search as it's cleaner, but on closer inspection it's less efficient than the previous approach as that one is guaranteed to just be O(max(n,m)) while sort.Search is O(nlogm). You can think of this as more of a merge of two sorted runs. I don't think the difference matters much, so I'm happy with it either way.


pkg/server/pagination.go, line 92 at r2 (raw file):

Previously, knz (kena) wrote…

maybe worth having a unit test with a set of pagination states, and checks that marshal -> unmarshal cycles reproduce the original content

There's one example of this in TestPaginationState.


pkg/server/pagination.go, line 111 at r2 (raw file):

Previously, knz (kena) wrote…

here and below, check the node ID is not zero or negative. Maybe use strconv.ParseUint with 32 bit precision instead, the same as node IDs, to check for overflows.

Done.


pkg/server/pagination.go, line 169 at r2 (raw file):

Previously, knz (kena) wrote…

nit: period missing at end.

Done. The entire comment was incomplete.


pkg/server/pagination.go, line 229 at r2 (raw file):

Previously, knz (kena) wrote…

if the limit is larger than the total size of responses combined, this condition is never reached. In which case do you store r.done and close reponseChan then?

Yep - in that case we rely on the for numNodes loop to figure this out down in the consumer.


pkg/server/pagination.go, line 278 at r2 (raw file):

Previously, knz (kena) wrote…

I'd recommend extracing the computation of startIdx/count/endIdx in a different function so it's easier to review and understand for the next maintainer of the code.

Done. It's in paginationState.paginate now, and is unit tested.


pkg/server/pagination.go, line 301 at r2 (raw file):

Previously, knz (kena) wrote…

don't you need to pass ctx here and check for cancellation inside the response function too?

This is a very simple function as it just aggregates the results, so no. I did add more ctx cancellation checks in this function though


pkg/server/testserver.go, line 879 at r1 (raw file):

Previously, knz (kena) wrote…

Can you double-check this? The condition in getLocalSessions() is if isAdmin, not if user == security.RootUser. Have you verified that root gets different behavior from a non-root admin?

(Conversely, if you did verify that root has a different behavior, can you point to me in the code where we implement this special case? It seems wrong and may need to be corrected.)

It's in the hasViewActivity role over there. You're right in that we don't necessarily need a superuser, but I thought that having a superuser http.Client could be useful in tests for multiple reasons anyway.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I like the convergence so far!

Reviewed 10 of 10 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @knz)


pkg/server/pagination.go, line 78 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

I did change to sort.Search as it's cleaner, but on closer inspection it's less efficient than the previous approach as that one is guaranteed to just be O(max(n,m)) while sort.Search is O(nlogm). You can think of this as more of a merge of two sorted runs. I don't think the difference matters much, so I'm happy with it either way.

I'm OK with your previous algorithm then, but I'd rather see it extracted into a separate function with a docstring which motivates its existence.


pkg/server/pagination.go, line 111 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Done.

This is still not checking for 32-bit overflow though. You'd get this and the neg check for free with ParseUint.


pkg/server/pagination.go, line 91 at r3 (raw file):

}

// limitNodeResponse processes the response from a given node, and returns

comment does not start with the same name as the function.


pkg/server/pagination.go, line 300 at r3 (raw file):

		r.mu.currentLen += resp.len
		if nodeID == r.pagState.inProgress {
			// We're resuming partway

nit: period at end of sentence.


pkg/server/pagination_test.go, line 30 at r3 (raw file):

)

func TestPaginationState(t *testing.T) {

I'll welcome comments here:

  1. a docstring above the function which documents the datadriven DSL

  2. inline comments throughout the function that spell out the general steps taken by the test, in prose.

The second part is important for the review, because it enables me (the reviewer) to check that the code effectively mirrors your own understanding of what you intended to do.


pkg/server/pagination_test.go, line 93 at r3 (raw file):

			}
			return "ok"
		case "merge-node-ids":

nit: our guide style suggests (or should suggest, if it doesn't already), an empty line before each case after the first.


pkg/server/pagination_test.go, line 136 at r3 (raw file):

}

func TestRPCPaginator(t *testing.T) {

ditto comments here


pkg/server/testserver.go, line 879 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

It's in the hasViewActivity role over there. You're right in that we don't necessarily need a superuser, but I thought that having a superuser http.Client could be useful in tests for multiple reasons anyway.

what is "over there"?

@itsbilal itsbilal force-pushed the api-new-tree branch 2 times, most recently from 4a5150a to 8ad6ebe Compare January 25, 2021 23:58
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @knz)

@itsbilal itsbilal force-pushed the api-new-tree branch 3 times, most recently from 8bbae42 to 58bbb32 Compare January 26, 2021 23:00
Copy link
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs! I should have addressed all the TODOs meant for this PR so far, with the exception of telemetry counters. I'll add those early tomorrow. Thanks again for the reviews so far.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


pkg/server/api.go, line 488 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

I'm okay with dropping the other handlers for now, and moving them to another PR while we distill down the responses. I think that approach makes sense.

Done.


pkg/server/pagination.go, line 78 at r2 (raw file):

Previously, knz (kena) wrote…

I'm OK with your previous algorithm then, but I'd rather see it extracted into a separate function with a docstring which motivates its existence.

I moved back to the original algorithm and improved comments around here. Since this method is just a sort-then-merge, and this is the merge side of it, I'm not sure if I'd want to split it away from the sorting half of the function. That said, if you still want me to split it up, and the new comments aren't sufficient, I'd be happy to do that.


pkg/server/pagination.go, line 107 at r2 (raw file):

Previously, knz (kena) wrote…

I don't think you need to trim spaces and check the length below. If the forma tis invalid, Atoi will return an error anyway.

The thing is we don't want to return an error if len(part) == 0, as MarshalText can sometimes do that. The encoding leaves a stray comma at the end. Maybe I could change that, but the current MarshalText is lazier and the UnmarshalText is slightly more permissive


pkg/server/pagination.go, line 111 at r2 (raw file):

Previously, knz (kena) wrote…

This is still not checking for 32-bit overflow though. You'd get this and the neg check for free with ParseUint.

Done.


pkg/server/pagination.go, line 140 at r2 (raw file):

Previously, knz (kena) wrote…

I think it would be useful to add docstrings here and above to explain the format of marshaled pagination state tokens.

Done.


pkg/server/pagination.go, line 319 at r2 (raw file):

Previously, knz (kena) wrote…

Should this condition not interrupt the loop somehow?

Also, you may need errors.CombineErrors if you're collecting errors across multiple nodes.

The combination would have to happen in errorFunc. I'm leaving that up to the caller to handle, as that's how we do it with iterateNodes, but maybe it's better to handle the aggregation inside the paginator?

Also yes, I've added the loop interruption now.


pkg/server/pagination.go, line 91 at r3 (raw file):

Previously, knz (kena) wrote…

comment does not start with the same name as the function.

Done.


pkg/server/pagination_test.go, line 30 at r3 (raw file):

Previously, knz (kena) wrote…

I'll welcome comments here:

  1. a docstring above the function which documents the datadriven DSL

  2. inline comments throughout the function that spell out the general steps taken by the test, in prose.

The second part is important for the review, because it enables me (the reviewer) to check that the code effectively mirrors your own understanding of what you intended to do.

Done. Let me know if you need more inline comments; reading the tests and comments from top to bottom should help a lot in understanding it.


pkg/server/pagination_test.go, line 136 at r3 (raw file):

Previously, knz (kena) wrote…

ditto comments here

Done.


pkg/server/status.go, line 1240 at r1 (raw file):

Previously, knz (kena) wrote…

If there's a limit you can use this to restrict the number of rows scanned.

Also if the pagination was done using a cursor (nodes already seen) you can encode the value of limit as startKey.
(This works with topology changes because we only add node IDs and never reuse past node IDs for new nodes yet.)

N/A as this code change got deferred to a future PR.


pkg/server/status.go, line 1256 at r1 (raw file):

Previously, knz (kena) wrote…

Our K/V store already sorts the entries by their keys. No way to reuse that pre-sorting here?

N/A as this code change got deferred to a future PR.


pkg/server/status.go, line 1973 at r1 (raw file):

Previously, knz (kena) wrote…

if there's a way to guarantee that the result of nodesStatusWithLiveness is already sorted (I suspect there is, cf my remark about the sorted K/V pairs above), then this sort here is not needed.

The status server returns a map, so the sort order isn't guaranteed. Is there another node-listing API that guarantees a sort order? Either way it might just be cheaper to sort on the fly to get the order we want.


pkg/server/testserver.go, line 879 at r1 (raw file):

Previously, knz (kena) wrote…

what is "over there"?

"over there" is getLocalSessions. It transparently translates calls to be about the current user's sessions only, if the user doesn't have the ViewActivity roleoption. The superuser, by definition, has all role options, but admins don't.

@itsbilal itsbilal changed the title server: add /api/v2/ tree with auth/pagination, port some status endpoints server: add /api/v2/ tree with auth/pagination, port listSessions endpoint Jan 26, 2021
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is about good to go with just a few minor things remaining.

Reviewed 12 of 12 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal)


pkg/server/api.go, line 34 at r5 (raw file):

	res, err := json.Marshal(payload)
	if err != nil {
		panic(err)

I'm not keen on naked panics.
I appreciate that you want this to fail in tests if there's an error; however in production at worst we'll want to log an error and just fail the request with an error.

Maybe this is an opportunity to introduce some kind of assert package: a function with two variants with build tags. I think test code is built with a special tag already. With the tag present this can panic, and without it trhere would just be an error logged.

You can also move the marshal above w.WriteHeader above; so that if it fails the API fails properly with HTTP 500.


pkg/server/pagination.go, line 107 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

The thing is we don't want to return an error if len(part) == 0, as MarshalText can sometimes do that. The encoding leaves a stray comma at the end. Maybe I could change that, but the current MarshalText is lazier and the UnmarshalText is slightly more permissive

Thanks for explaining. Worth copy-pasting this in an explanatory comment too then.


pkg/server/status.go, line 1973 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

The status server returns a map, so the sort order isn't guaranteed. Is there another node-listing API that guarantees a sort order? Either way it might just be cheaper to sort on the fly to get the order we want.

You're good here.


pkg/server/testserver.go, line 879 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

"over there" is getLocalSessions. It transparently translates calls to be about the current user's sessions only, if the user doesn't have the ViewActivity roleoption. The superuser, by definition, has all role options, but admins don't.

Wait what? A non-root user with role admin does not get the VIEWACTIVITY privilege? That sounds like a bug if confirmed.

@itsbilal itsbilal force-pushed the api-new-tree branch 2 times, most recently from cb3dbc3 to 64d3ef3 Compare January 27, 2021 23:00
Copy link
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! Added a telemetry counter for list-sessions too. Wondering if I should add a more general solution to that, but I can also see it being unhelpful to have counters for every single API request. Or maybe I'm wrong and we probably should be adding them to every route definition and calling them in an outer http.Handler. Happy to hear some opinions.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


pkg/server/api.go, line 34 at r5 (raw file):

Previously, knz (kena) wrote…

I'm not keen on naked panics.
I appreciate that you want this to fail in tests if there's an error; however in production at worst we'll want to log an error and just fail the request with an error.

Maybe this is an opportunity to introduce some kind of assert package: a function with two variants with build tags. I think test code is built with a special tag already. With the tag present this can panic, and without it trhere would just be an error logged.

You can also move the marshal above w.WriteHeader above; so that if it fails the API fails properly with HTTP 500.

Done. Woops. I meant to remove this at some time but forgot. Thanks!


pkg/server/pagination.go, line 107 at r2 (raw file):

Previously, knz (kena) wrote…

Thanks for explaining. Worth copy-pasting this in an explanatory comment too then.

Done.


pkg/server/testserver.go, line 879 at r1 (raw file):

Previously, knz (kena) wrote…

Wait what? A non-root user with role admin does not get the VIEWACTIVITY privilege? That sounds like a bug if confirmed.

Yep, that's the case. You can see the TestListSessionsV2 test actively reconstructs that situation.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is about to go. Just a nit about the telemetry thing, and then you'll be done.

Reviewed 9 of 9 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal)


pkg/server/api.go, line 30 at r6 (raw file):

// Telemetry counters for API endpoints.
var listSessionsCounter = telemetry.GetCounterOnce("api.v2.list-sessions")

that won't do - we really want the counters to be auto-generated, so that we are guarnateed telemetry even if the next engineers doesn't think about it.

I invite you to review getServerEndpointCounter() in server.go and how it's used, and do something similar.


pkg/server/testserver.go, line 879 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Yep, that's the case. You can see the TestListSessionsV2 test actively reconstructs that situation.

That is ... a problem. Let's not block this PR on this, but please add a comment to explain this finding, and TODO that this may need to be simplified if we find that this distinction does not need to exist.

Copy link
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! Cleaned up the GetRootAuthenticatedHTTPClient stuff after I made the admin/root user propagation fix. Also did the automatic telemetry count decoration. PTAL!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


pkg/server/api.go, line 30 at r6 (raw file):

Previously, knz (kena) wrote…

that won't do - we really want the counters to be auto-generated, so that we are guarnateed telemetry even if the next engineers doesn't think about it.

I invite you to review getServerEndpointCounter() in server.go and how it's used, and do something similar.

Done. Let me know how you find it now.


pkg/server/testserver.go, line 879 at r1 (raw file):

Previously, knz (kena) wrote…

That is ... a problem. Let's not block this PR on this, but please add a comment to explain this finding, and TODO that this may need to be simplified if we find that this distinction does not need to exist.

Ended up investigating it - it was bad context conversion for gRPC fan-out calls. Latest revision fixes it. Thanks for highlighting!

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: modulo a final tweak

Reviewed 6 of 6 files at r7.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal)


pkg/server/api.go, line 116 at r7 (raw file):

		var handler http.Handler
		handler = &callCountDecorator{
			counter: telemetry.GetCounterOnce(fmt.Sprintf("api.v2.%s", route.name)),

Can you simply use the url as route name for the counter. This removes an argument, and also makes it easier for PMs to interpret the data.

Copy link
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @knz)


pkg/server/api.go, line 116 at r7 (raw file):

Previously, knz (kena) wrote…

Can you simply use the url as route name for the counter. This removes an argument, and also makes it easier for PMs to interpret the data.

Done. Might need to change this when we add keyword URLs that go like /sessions/{session_id}/ but for now this works.

@itsbilal itsbilal force-pushed the api-new-tree branch 6 times, most recently from a984a33 to 1149238 Compare February 1, 2021 18:00
…point

This change adds the skeleton for a new API tree that lives in
`/api/v2/` in the http listener, and currently reimplements the `/sessions/`
endpoint that is also implemented in `/_status/`. The new v2 API tree avoids
the need to use GRPC Gateway, as well as cookie-based authentication which is
less intuitive and idiomatic for REST APIs. Instead, for authentication,
it uses a new session header that needs to be set on every request.

As many RPC fan-out APIs use statusServer.iterateNodes, this change
implements a pagination-aware method, paginatedIterateNodes, that
works on a sorted set of node IDs and arranges results in such a way
to be able to return the next `limit` results of an arbitary slice
after the `next` cursor passed in. An example of how this works in practice
is the new `/api/v2/sessions/` endpoint.

A dependency on gorilla/mux is added to be able to pattern-match
arguments in the URL. This was already an indirect dependency; now it's
a direct dependency of cockroach.

TODO that are likely to fall over into future PRs:
 - API Documentation, need to explore using swagger here.
 - Porting over remaining /_admin/ and /_status/ APIs, incl. SQL based ones

Part of cockroachdb#55947.

Release note (api change): Adds a new API tree, in /api/v2/*, currently
undocumented, that avoids the use of and cookie-based
authentication in favour of sessions in headers, and support for pagination.
@itsbilal
Copy link
Contributor Author

itsbilal commented Feb 1, 2021

TFTR! Used the last couple pushes to fix lint issues.

bors r+

@craig craig bot merged commit f37712a into cockroachdb:master Feb 1, 2021
@craig
Copy link
Contributor

craig bot commented Feb 1, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants