Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/ui: Make tracez v2 page node-aware #90244

Merged
merged 1 commit into from
Nov 17, 2022
Merged

Conversation

benbardin
Copy link
Collaborator

@benbardin benbardin commented Oct 19, 2022

Release note: None

Demo video below. Note that it starts on the v1 trace page, and I navigate to the new one by typing in the address bar.

Screen.Recording.2022-10-19.at.11.23.02.AM.mov

Part of #83679

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@benbardin benbardin force-pushed the tracez2 branch 2 times, most recently from cce83ed to 004f444 Compare October 19, 2022 17:16
Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

Reviewed 1 of 9 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @benbardin, @dhartunian, and @sjbarag)


-- commits line 4 at r1:
Please put more in this commit message.
Please explain how the interaction with the cookie works. The cookie in question is the one used for routing, right? I'm surprised our page needs understand anything about it.

Am I right to assume that if there's no node ID in the URL and no cookie, I get the data from the node serving the page? I think that'd be a good default. We'll probably need to find a way to resolve the node ID of the local node. If that's too much work, and also in cases where the local node does not yet have a node id, I think showing it as "local" in the nodes dropdown would be fine.
Another thing I want to make sure is that the page works against the current node even if the current node can't talk to anybody else and even if the /nodes endpoint returns an error. Reliably getting data from the local node is important for a lower-level tool like this page.

Related to all this, I also want to make sure that we're not making it hard for this page to work for tenant sql servers. I don't know if it works currently, but it'll need to work soon. Even if we don't figure out how to list all of the tenant's pods, I want at least it shouldn't get harder to make getting data from the server serving the page.


pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx line 88 at r1 (raw file):

  const nodeIDStr = getMatchParamByName(match, "nodeID");

  // Load iniital data.

iniitial


pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx line 97 at r1 (raw file):

      return;
    }
    refreshSnapshots(parseInt(nodeIDStr));

if we can lift the parseInts out of these functions, I think that'd be good.


pkg/ui/workspaces/db-console/src/app.spec.tsx line 502 at r1 (raw file):

  describe("'/debug/tracez_v2/snapshot/:id' path", () => {
    test("routes to <ScheduleDetails> component", () => {

is this "ScheduleDetails" right?

Copy link
Collaborator Author

@benbardin benbardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @andreimatei, @dhartunian, and @sjbarag)


-- commits line 4 at r1:

Previously, andreimatei (Andrei Matei) wrote…

Please put more in this commit message.
Please explain how the interaction with the cookie works. The cookie in question is the one used for routing, right? I'm surprised our page needs understand anything about it.

Am I right to assume that if there's no node ID in the URL and no cookie, I get the data from the node serving the page? I think that'd be a good default. We'll probably need to find a way to resolve the node ID of the local node. If that's too much work, and also in cases where the local node does not yet have a node id, I think showing it as "local" in the nodes dropdown would be fine.
Another thing I want to make sure is that the page works against the current node even if the current node can't talk to anybody else and even if the /nodes endpoint returns an error. Reliably getting data from the local node is important for a lower-level tool like this page.

Related to all this, I also want to make sure that we're not making it hard for this page to work for tenant sql servers. I don't know if it works currently, but it'll need to work soon. Even if we don't figure out how to list all of the tenant's pods, I want at least it shouldn't get harder to make getting data from the server serving the page.

Discussed offline - we agreed to ignore the cookie entirely, and use the local node if none is supplied. We'll attempt to look up the local node's ID, but won't block on that in case it isn't available for some reason.


pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx line 88 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

iniitial

Done.


pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx line 97 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

if we can lift the parseInts out of these functions, I think that'd be good.

Done.


pkg/ui/workspaces/db-console/src/app.spec.tsx line 502 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

is this "ScheduleDetails" right?

nope! fixed :)

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @benbardin, @dhartunian, and @sjbarag)


pkg/ui/workspaces/cluster-ui/src/api/statusApi.ts line 13 at r2 (raw file):

import { cockroach } from "@cockroachlabs/crdb-protobuf-client";
import { fetchData } from "src/api";
import NodeRequestMessage = cockroach.server.serverpb.NodeRequest;

is suffixing such imports with Message a usual thing to do?


pkg/ui/workspaces/cluster-ui/src/api/statusApi.ts line 19 at r2 (raw file):

export function getNodeUI(
  nodeID: string, // String to support `local`

we always pass "local" for this param. Let's get rid of it. Also see below.


pkg/ui/workspaces/cluster-ui/src/api/statusApi.ts line 26 at r2 (raw file):

  return fetchData(
    NodeResponseMessage,
    `${STATUS_PREFIX}/nodes_ui/${nodeID}`,

Instead of this nodes_ui endpoint, can we use this thing? Seems much lighter weight; I believe it doesn't involve any RPCs (i.e. the ID of the node that served the page is embedded into the page source somehow).

In fact, I think the way to go is to reuse the NodeIDSelector component that we already have (and NodeIDSelectorConnected also?). We don't want the selector setting any cookies, like it does on the Advanced Debug page, but I think that part is already factored out.
The concerns about the page working when the list of nodes cannot be retrieved still apply - hopefully that component doesn't behave badly or we can make it not behave badly.
FWIW, I've asked @dhartunian about reusing that component, and he seems to support it. Let's get him on the review too.


pkg/ui/workspaces/cluster-ui/src/api/tracezApi.ts line 43 at r2 (raw file):

    // "local," it still queries gossip while doing it. We'd like to avoid a hard dependency
    // on that to support malfunctioning clusters or nodes.
    return path;

I'm curious to see what this gossip reference is about. I've very superficially traced what I think is the relevant code to here, and it looks to me like, gossip or no gossip, a "local" value for the nodeID turns into a true return from that function, which I'd hope makes everything just work.


pkg/ui/workspaces/cluster-ui/src/api/tracezApi.ts line 45 at r2 (raw file):

    return path;
  }
  return path + `?remote_node_id=${nodeID}`;

Clever use of ?remote_node_id.


pkg/ui/workspaces/cluster-ui/src/api/tracezApi.ts line 53 at r2 (raw file):

  return fetchData(
    cockroach.server.serverpb.ListTracingSnapshotsResponse,
    proxyNonLocalNode(`${API_PREFIX}/trace_snapshots`, nodeID),

instead of munging all the URLs yourself, I'd vote we push the ability to add the magic arg inside fetchData.

Copy link
Collaborator Author

@benbardin benbardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @andreimatei, @dhartunian, and @sjbarag)


pkg/ui/workspaces/cluster-ui/src/api/statusApi.ts line 13 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

is suffixing such imports with Message a usual thing to do?

no, turns out! We do it in tracezApi, perhaps it's more common in db-console. But I'll remove that there too.


pkg/ui/workspaces/cluster-ui/src/api/statusApi.ts line 19 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

we always pass "local" for this param. Let's get rid of it. Also see below.

removing the file!


pkg/ui/workspaces/cluster-ui/src/api/statusApi.ts line 26 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Instead of this nodes_ui endpoint, can we use this thing? Seems much lighter weight; I believe it doesn't involve any RPCs (i.e. the ID of the node that served the page is embedded into the page source somehow).

In fact, I think the way to go is to reuse the NodeIDSelector component that we already have (and NodeIDSelectorConnected also?). We don't want the selector setting any cookies, like it does on the Advanced Debug page, but I think that part is already factored out.
The concerns about the page working when the list of nodes cannot be retrieved still apply - hopefully that component doesn't behave badly or we can make it not behave badly.
FWIW, I've asked @dhartunian about reusing that component, and he seems to support it. Let's get him on the review too.

Oh, dataFromServer is perfect here, thanks!

Using NodeIDSelector doesn't sit quite right with me - it's in db-console, not cluster-ui, so I'd have to move it; the current dropdown using the appropriate design language off the bat; and I'm not sure we'll actually be using this very often (maybe if we expand /debug functionality a ton it'd make sense? but we're not there yet.) Curious to hear what David thinks, I'll ping him!


pkg/ui/workspaces/cluster-ui/src/api/tracezApi.ts line 43 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I'm curious to see what this gossip reference is about. I've very superficially traced what I think is the relevant code to here, and it looks to me like, gossip or no gossip, a "local" value for the nodeID turns into a true return from that function, which I'd hope makes everything just work.

Yup, you're right! I saw Gossip.Get and assumed it did something RPC-like, but it's just wrapping a value with proto.


pkg/ui/workspaces/cluster-ui/src/api/tracezApi.ts line 45 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Clever use of ?remote_node_id.

Thank you!


pkg/ui/workspaces/cluster-ui/src/api/tracezApi.ts line 53 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

instead of munging all the URLs yourself, I'd vote we push the ability to add the magic arg inside fetchData.

Yeah, I see what you mean, but I'm not sure I see how to do that smoothly. The challenge I'm not sure how to handle, is that fetchData() already takes a bunch of arguments:

export const fetchData = <P extends ProtoBuilder<P>, T extends ProtoBuilder<T>>(
  RespBuilder: T,
  path: string,
  ReqBuilder?: P,
  reqPayload?: FirstConstructorParameter<P>,
  timeout?: string,
): Promise<InstanceType<T>>

This leads to some silly-looking calls like

src/api/statementsApi.ts
40:  return fetchData(
41-    cockroach.server.serverpb.StatementsResponse,
42-    `${STATEMENTS_PATH}?${queryStr}`,
43-    null,
44-    null,
45-    "30M",

By the numbers, fetchData() has 20 uses. Of those 20, 13 have no query params. Of the 7 that do, 4 are in this new tracezApi, and 3 are elsewhere (in statementsApi.ts and jobsApi.ts).

The usability hit from requiring a third null for fetchData() calls 65% of the time, seems more important to me than the usability gain from supporting query params 35% of the time. But reasonable people can disagree, or perhaps you just see something I've missed? What do you think?

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru, @benbardin, @dhartunian, and @sjbarag)


pkg/ui/workspaces/cluster-ui/src/api/tracezApi.ts line 43 at r2 (raw file):

Previously, benbardin (Ben Bardin) wrote…

Yup, you're right! I saw Gossip.Get and assumed it did something RPC-like, but it's just wrapping a value with proto.

then this comment needs adjustment, right?


pkg/ui/workspaces/cluster-ui/src/api/tracezApi.ts line 53 at r2 (raw file):

Previously, benbardin (Ben Bardin) wrote…

Yeah, I see what you mean, but I'm not sure I see how to do that smoothly. The challenge I'm not sure how to handle, is that fetchData() already takes a bunch of arguments:

export const fetchData = <P extends ProtoBuilder<P>, T extends ProtoBuilder<T>>(
  RespBuilder: T,
  path: string,
  ReqBuilder?: P,
  reqPayload?: FirstConstructorParameter<P>,
  timeout?: string,
): Promise<InstanceType<T>>

This leads to some silly-looking calls like

src/api/statementsApi.ts
40:  return fetchData(
41-    cockroach.server.serverpb.StatementsResponse,
42-    `${STATEMENTS_PATH}?${queryStr}`,
43-    null,
44-    null,
45-    "30M",

By the numbers, fetchData() has 20 uses. Of those 20, 13 have no query params. Of the 7 that do, 4 are in this new tracezApi, and 3 are elsewhere (in statementsApi.ts and jobsApi.ts).

The usability hit from requiring a third null for fetchData() calls 65% of the time, seems more important to me than the usability gain from supporting query params 35% of the time. But reasonable people can disagree, or perhaps you just see something I've missed? What do you think?

I'd introduce a fetchDataEx() that groups all the optional args in a struct.


pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx line 87 at r3 (raw file):

  const columnTitle = match.params.columnTitle || "";

  const snapshotIDStr = getMatchParamByName(match, "snapshotID");

do we need this variable? If possible, I think it'd be nicer if we dealt only with snapshotID in int form.


pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx line 118 at r3 (raw file):

  // If no node was provided, navigate to the default if provided, or the first.
  useEffect(() => {

any change we can combine this and the next effect into one?


pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx line 126 at r3 (raw file):

      targetNodeID = "local";
    }
    console.log(targetNodeID);

get rid of this

Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru, @andreimatei, @benbardin, and @sjbarag)


pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx line 139 at r3 (raw file):

    history.location.pathname =
      "/debug/tracez_v2/node/" + nodeID + "/snapshot/" + lastSnapshotID;
    history.replace(history.location);

can we merge the two useEffect calls that modify the pathname? I'd be worried about a double refresh or either of them clobbering each others work.


pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx line 177 at r3 (raw file):

      // Load the new snapshot.
      history.location.pathname =
        "/debug/tracez_v2/node/" +

can the base path you're using everywhere be derived from the router props? then maybe the page is resilient to base URL changes.

Copy link
Collaborator Author

@benbardin benbardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @adityamaru, @andreimatei, @dhartunian, and @sjbarag)


pkg/ui/workspaces/cluster-ui/src/api/tracezApi.ts line 43 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

then this comment needs adjustment, right?

Yes, and better, the whole thing can go away!


pkg/ui/workspaces/cluster-ui/src/api/tracezApi.ts line 53 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I'd introduce a fetchDataEx() that groups all the optional args in a struct.

Ah, gotcha. With the updated proxying arg I think that's probably not worth it right now, but if I keep running into it I can make that upgrade.


pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx line 87 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

do we need this variable? If possible, I think it'd be nicer if we dealt only with snapshotID in int form.

Yup, good idea!


pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx line 118 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

any change we can combine this and the next effect into one?

Done.


pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx line 126 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

get rid of this

Done.


pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx line 139 at r3 (raw file):

Previously, dhartunian (David Hartunian) wrote…

can we merge the two useEffect calls that modify the pathname? I'd be worried about a double refresh or either of them clobbering each others work.

Done.


pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx line 177 at r3 (raw file):

Previously, dhartunian (David Hartunian) wrote…

can the base path you're using everywhere be derived from the router props? then maybe the page is resilient to base URL changes.

discussed offline - yes, but we'd have to move the route structure into this component. We don't use that pattern elsewhere (though we probably should!) so considered out of scope for now.

Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @adityamaru, @andreimatei, and @sjbarag)

@benbardin benbardin force-pushed the tracez2 branch 3 times, most recently from beb2625 to f5de352 Compare November 14, 2022 21:58
Adds an easy toggle for selecting across nodes without reproxying every time.
If no node is provided, will redirect with the local node by default, and
attempt to look up its ID.

Release note: None
@benbardin
Copy link
Collaborator Author

bors r+

@benbardin
Copy link
Collaborator Author

bors cancel

@benbardin
Copy link
Collaborator Author

bors r-

@rail
Copy link
Member

rail commented Nov 16, 2022

bors ping

@craig
Copy link
Contributor

craig bot commented Nov 16, 2022

pong

@rail
Copy link
Member

rail commented Nov 16, 2022

bors r=andreimatei,dhartunian

@craig
Copy link
Contributor

craig bot commented Nov 16, 2022

Build failed:

@benbardin
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 16, 2022

Build failed:

@benbardin
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 17, 2022

Build succeeded:

@craig craig bot merged commit 0540026 into cockroachdb:master Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants