Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builtins: add generator builtin for span payloads #60784

Merged
merged 1 commit into from
Feb 22, 2021

Conversation

angelapwen
Copy link

@angelapwen angelapwen commented Feb 19, 2021

Following up from #60616. Part of addressing #55733.

Previously we introduced a crdb_internal.payloads_for_span
builtin that returned a JSONB array representing all payloads for
a particular span. To improve the user experience of the builtin
as well as prevent OOMs resulting from builtin usage, we replace it
with a generator builtin that returns a table representing all payloads
for a span, where each row represents a single payload.

The new builtin, also called crdb_internal.payloads_for_span, has
columns for the payload_type so that the user has the ability to
easily filter on the type, and payload_jsonb so the user can use
jsonb builtins to filter further.

Release note (sql change): Update crdb_internal.payloads_for_span
builtin to return a table instead of a JSONB array. Each row of the
table represents one payload for the given span. It has columns for
payload_type and payload_jsonb.

@angelapwen angelapwen added the do-not-merge bors won't merge a PR with this label. label Feb 19, 2021
@angelapwen angelapwen self-assigned this Feb 19, 2021
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@angelapwen angelapwen force-pushed the payload-generator-builtin branch 2 times, most recently from 01fc0ba to 953ebe7 Compare February 19, 2021 10:04
@angelapwen
Copy link
Author


pkg/sql/logictest/testdata/logic_test/contention_event, line 68 at r2 (raw file):

# TODO(angelapwen) This should have 2 columns, payload_type and full_payload.
query T colnames
WITH spans AS (

This PR is not ready yet because I am trying to figure out why the builtin (named crdb_internal.payloads_for_span_as_table for now actually seems to return one column where each row is a tuple. It should return two columns, payload_type and full_payload so that we are able to filter on them.

I seem to be adhering to the way to return a row in the generator overload correctly, and in the functions.md doc everything seems to be labeled appropriately (see crdb_internal.check_consistency as an example of another generator builtin that returns several columns.

After I can get the builtin to return the columns, I will modify this logic test to look more like the one above (filtering on the payload_type column). This is the current output of the logic test so that reviewers can see the status of the builtin.

@angelapwen angelapwen force-pushed the payload-generator-builtin branch 2 times, most recently from e283b62 to e997837 Compare February 19, 2021 10:52
@knz
Copy link
Contributor

knz commented Feb 19, 2021

for now actually seems to return one column where each row is a tuple. It should return two columns

It does return two columns.

Try this: SELECT * FROM crdb_internal.payloads_for_span_as_table(123)

@angelapwen
Copy link
Author

angelapwen commented Feb 19, 2021

@knz when I do so I only get one column, which is called crdb_internal.payloads_for_span_as_table:

query T colnames
SELECT crdb_internal.payloads_for_span_as_table(123);
 ----
crdb_internal.payloads_for_span_as_table

If I run it with query TT I get an "expected two columns, but found one" error.

@knz
Copy link
Contributor

knz commented Feb 19, 2021

I wrote SELECT * FROM ..., not SELECT .... The two forms are different behavior.

@angelapwen
Copy link
Author

A ha I missed that! I see. So it is an issue with the way my query is written and not in the generator.

@knz
Copy link
Contributor

knz commented Feb 19, 2021

It's not really an "issue" - that's the regular, expected semantics of set-returning functions.

  • if you use a SRF as a table, then it returns a table
  • if you use a SRF as a scalar, then it returns a tuple

@angelapwen
Copy link
Author

Understood, I meant it was an 'issue' with the way I wrote the logic test and expected a table after using the generator as a scalar. I'm having trouble with the syntax then — how can I generate a table comprised of sub-tables that each represent the return value of the SRF as a table for a span ID?

Basically I want to write something like

  SELECT span_id
  FROM crdb_internal.node_inflight_trace_spans
  WHERE trace_id = crdb_internal.trace_id()
) SELECT *
  FROM crdb_internal.payloads_for_span_as_table(SELECT * FROM spans);

but I cannot pass in a table as the arguments to the SRF.

@knz
Copy link
Contributor

knz commented Feb 19, 2021

You need to use the LATERAL keyword as explained here https://www.postgresql.org/docs/13/queries-table-expressions.html

@angelapwen angelapwen force-pushed the payload-generator-builtin branch from e997837 to f47f53f Compare February 19, 2021 12:58
@angelapwen angelapwen changed the title [WIP] builtins: add generator builtin for span payloads builtins: add generator builtin for span payloads Feb 19, 2021
@angelapwen angelapwen removed the do-not-merge bors won't merge a PR with this label. label Feb 19, 2021
@angelapwen
Copy link
Author


pkg/sql/logictest/testdata/logic_test/builtin_function, line 2921 at r3 (raw file):

query TTT colnames
SELECT *
FROM crdb_internal.payloads_for_span(0)

I wanted to write a logic test for the schema of the built-in but I'm not sure I can simply pass in 0 as the span — it is possible that a span's ID is 0 right?

Copy link
Author

@angelapwen angelapwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @knz TIL about LATERAL!

The PR is now ready for review, and I added a couple of questions/comments of mine as well related to testing.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @tbg)


pkg/sql/logictest/testdata/logic_test/contention_event, line 62 at r3 (raw file):

) SELECT count(*) > 0
  FROM payloads
  WHERE payload_type = '"type.googleapis.com/cockroach.roachpb.ContentionEvent"';

I fetch the value of the type by using FetchValKey() in JSON which replicates the -> operator in the JSONB builtin. There is no replication for ->> which prints the value in string format (without the double quotes). Is this an alright user experience? I'm not sure a user would know to filter on a double-quoted in the string unless they SELECT * from the table first.

I can strip the double quotes in a hacky way, unless there is a better way.

@angelapwen angelapwen force-pushed the payload-generator-builtin branch from f47f53f to d22ede5 Compare February 19, 2021 15:02
Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 5 files at r1, 3 of 5 files at r3, 1 of 4 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angelapwen, @irfansharif, @knz, and @tbg)


pkg/sql/logictest/testdata/logic_test/builtin_function, line 2921 at r3 (raw file):

Previously, angelapwen (Angela P Wen) wrote…

I wanted to write a logic test for the schema of the built-in but I'm not sure I can simply pass in 0 as the span — it is possible that a span's ID is 0 right?

Very unlikely, but yes. We could instead try using a negative number, which is definitely not possible.


pkg/sql/logictest/testdata/logic_test/contention_event, line 62 at r3 (raw file):

) SELECT count(*) > 0
  FROM payloads
  WHERE payload_type = '"type.googleapis.com/cockroach.roachpb.ContentionEvent"';

I wonder if we should strip out everything but roachpb.ContentionEvent. Seems like gunk that would be visible everywhere.


pkg/sql/sem/builtins/generator_builtins.go, line 344 at r3 (raw file):

			payloadsForSpanGeneratorType,
			makePayloadsForSpanGenerator,
			"Returns the payload(s) of the span whose ID is passed in the argument.",

Returns the payload(s) for the requested span.


pkg/sql/sem/builtins/generator_builtins.go, line 1396 at r3 (raw file):

// over all recordings for a given Span.
type payloadsForSpanGenerator struct {
	// The span to iterate over, set by the constructor.

No need to mention that it's set by the constructor.


pkg/sql/sem/builtins/generator_builtins.go, line 1399 at r3 (raw file):

	span *tracing.Span

	// recordingIndex maintains the current position of the index of the iterator

I understand that we're not iterating through all child recordings at the very beginning to try and be a bit more "stream-ey" about it, but does that actually help us with anything? We already have a handle on the tracing.Span, which already has all the payloads accessible to it in-memory. So it's already been allocated somewhere. What's the downside of simply slurping up everything the very first time and flattening it out in payloads?

(Asking more for my own understanding. The PR as written also obviously works, but I didn't realize there was a need to do anything fancy here.)


pkg/sql/sem/builtins/generator_builtins.go, line 1426 at r3 (raw file):

	spanID := uint64(*(args[0].(*tree.DInt)))
	span, found := ctx.Settings.Tracer.GetActiveSpanFromID(spanID)
	// A span may not be found if its ID was surfaced previously but its

Or it may just be a made up ID, we can skip saying anything here.

@angelapwen angelapwen force-pushed the payload-generator-builtin branch from d22ede5 to e727493 Compare February 19, 2021 15:49
@blathers-crl blathers-crl bot requested a review from irfansharif February 19, 2021 15:49
Copy link
Author

@angelapwen angelapwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @tbg)


pkg/sql/logictest/testdata/logic_test/builtin_function, line 2921 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Very unlikely, but yes. We could instead try using a negative number, which is definitely not possible.

I'm worried about using a negative number because of this issue: #60668 — we are coercing a Tree.DInt (which is an int64) into a uint64, which is what the span ID is saved as internally, so I believe a negative number is also likely to be cast into some possible span ID.


pkg/sql/logictest/testdata/logic_test/contention_event, line 62 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

I wonder if we should strip out everything but roachpb.ContentionEvent. Seems like gunk that would be visible everywhere.

I could write a function to strip it out but worried it would be a bit hacky. The other possible event type right now is "type.googleapis.com/cockroach.sql.distsqlrun.ComponentStats" so I guess that could just be sql/distqlrun.ComponentStats if I stripped out "type.googleapis.com/cockroach"?


pkg/sql/sem/builtins/generator_builtins.go, line 344 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Returns the payload(s) for the requested span.

Done.


pkg/sql/sem/builtins/generator_builtins.go, line 1396 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

No need to mention that it's set by the constructor.

Done.


pkg/sql/sem/builtins/generator_builtins.go, line 1399 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

I understand that we're not iterating through all child recordings at the very beginning to try and be a bit more "stream-ey" about it, but does that actually help us with anything? We already have a handle on the tracing.Span, which already has all the payloads accessible to it in-memory. So it's already been allocated somewhere. What's the downside of simply slurping up everything the very first time and flattening it out in payloads?

(Asking more for my own understanding. The PR as written also obviously works, but I didn't realize there was a need to do anything fancy here.)

The memory issue was something @knz had brought up in the non-generator builtin PR so I will defer to him to answer this. I hadn't realized that all the payloads were already accessible in-memory and thought they were accessed on the fly when the Structured() visitor is called.


pkg/sql/sem/builtins/generator_builtins.go, line 1426 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Or it may just be a made up ID, we can skip saying anything here.

Done.

@angelapwen angelapwen force-pushed the payload-generator-builtin branch from e727493 to ba06d6c Compare February 19, 2021 16:13
Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as a non-expert in sql code, I'll defer to you+Rafa.

@angelapwen
Copy link
Author


pkg/sql/logictest/testdata/logic_test/builtin_function, line 2921 at r3 (raw file):

Previously, tbg (Tobias Grieger) wrote…

You could also just say WHERE false which is less ambiguous about what you want.

Thank you! That makes sense. Done.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

although, as discussed elsewhere - I do not see the payload_pretty column as useful nor necessary, and in fact it nearly doubles memory usage of the results.

Reviewed 1 of 5 files at r1, 4 of 6 files at r5, 1 of 1 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif)


pkg/sql/sem/builtins/generator_builtins.go, line 1399 at r3 (raw file):

We already have a handle on the tracing.Span, which already has all the payloads accessible to it in-memory. So it's already been allocated somewhere

It's allocated somewhere but it's not accounted and not restricted by memory budgets. That is in fact a problem and a bug that needs to be fixed.

What's the downside of simply slurping up everything the very first time and flattening it out in payloads?

  1. The JSON blobs are alloc-heavy and amplify the allocations from the span by a factor 2x-3x. This is not needed.

  2. The streaming approach ensures that all the individual bits go through the explicit memory budgeting limits.

FROM payload_types
WHERE payload_type = 'type.googleapis.com/cockroach.roachpb.ContentionEvent';
FROM payloads
WHERE payload_type = 'roachpb.ContentionEvent'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, how does it actually render the contention event?

Previously we introduced a `crdb_internal.payloads_for_span`
builtin that returned a JSONB array representing all payloads for
a particular span. To improve the user experience of the builtin
as well as prevent OOMs resulting from builtin usage, we replace it
with a generator builtin that returns a table representing all payloads
for a span, where each row represents a single payload.

The new builtin, also called `crdb_internal.payloads_for_span`, has
columns for the `payload_type` so that the user has the ability to
easily filter on the type, and `payload_jsonb` so the user can use
jsonb builtins to filter further.

Release note (sql change): Update `crdb_internal.payloads_for_span`
builtin to return a table instead of a JSONB array. Each row of the
table represents one payload for the given span. It has columns for
`payload_type` and `payload_jsonb`.
@angelapwen angelapwen force-pushed the payload-generator-builtin branch from 737a05c to 8210796 Compare February 22, 2021 10:10
Copy link
Author

@angelapwen angelapwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raphael convinced me that we don't need the pretty column as there is already a builtin called jsonb_pretty that the user could use on the payload_jsonb value if they wanted to view it that way ✨

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @tbg)


pkg/sql/logictest/testdata/logic_test/contention_event, line 62 at r6 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I'm curious, how does it actually render the contention event?

JSONB output:

{"@type": "type.googleapis.com/cockroach.roachpb.ContentionEvent", "duration": "0.006409s", "key": "vYkSawABiA==", "txnMeta": {"epoch": 0, "id": "aed8bb08-cbb2-4c16-9edc-f46864079f0d", "key": "vYkSawABiA==", "minTimestamp": {"logical": 0, "synthetic": false, "wallTime": "1613988895236491000"}, "priority": 785132, "sequence": 1, "writeTimestamp": {"logical": 0, "synthetic": false, "wallTime": "1613988895236491000"}}}

@tbg
Copy link
Member

tbg commented Feb 22, 2021

BTW I was just trying to use this in #60754 (comment) and it's hard to use because it's payloads_per_span and not payloads_for_trace. Can we add payloads_for_trace as a view on payloads_for_span? (Doesn't have to be now).

@tbg
Copy link
Member

tbg commented Feb 22, 2021

Huh, I tried a manual attempt (the insert took ~4s because I was holding an intent open in another txn) and I'm not seeing any contention meta:

Time: 0ms total (execution 0ms / network 0ms)

[email protected]:26257/defaultdb  OPEN> insert into tx_serializable_sums(num) select sum(num)::int from tx_serializable_sums;INSERT 1

Time: 3.922s total (execution 3.922s / network 0.000s)

[email protected]:26257/defaultdb  OPEN> with spans as (select span_id from crdb_internal.node_inflight_trace_spans where trace_id = crdb_internal.trace_id()), payloads as (select * from spans, lateral crdb_internal.payloads_for_span(spans.span_id)) select * from payloads;
        span_id       | crdb_internal.payloads_for_span
----------------------+----------------------------------
  6479027071542099544 | []
  7132622407058633032 | []
  4426974785644592795 | []
   403294142450210244 | []
  4714758890355102862 | []
  3909666706374677468 | []
  1150940528138474845 | []
  7640069654626708429 | []
  7693281255713796796 | []
  8504838299517633469 | []
  5792428408619171572 | []
(11 rows)

Time: 2ms total (execution 2ms / network 0ms)

@tbg
Copy link
Member

tbg commented Feb 22, 2021

(btw this is on master, not this PR)

@angelapwen
Copy link
Author

BTW I was just trying to use this in #60754 (comment) and it's hard to use because it's payloads_per_span and not payloads_for_trace. Can we add payloads_for_trace as a view on payloads_for_span? (Doesn't have to be now).

Yes, that's the plan! Raphael mentioned that we could add this within the SQL layer as a delegate for syntactic sugar on the builtin_for_trace.

@knz
Copy link
Contributor

knz commented Feb 22, 2021

Huh, I tried a manual attempt (the insert took ~4s because I was holding an intent open in another txn) and I'm not seeing any contention meta:

This seems to be a problem unrelated to this PR. I would advise filing a separate issue.

@tbg
Copy link
Member

tbg commented Feb 22, 2021

Huh, I tried a manual attempt (the insert took ~4s because I was holding an intent open in another txn) and I'm not seeing any contention meta:

This seems to be a problem unrelated to this PR. I would advise filing a separate issue.

I'm currently trying to repro in a unit test. Will file an issue as appropriate, definitely not something for this PR.

@angelapwen
Copy link
Author

Huh, I tried a manual attempt (the insert took ~4s because I was holding an intent open in another txn) and I'm not seeing any contention meta:

This seems to be a problem unrelated to this PR. I would advise filing a separate issue.

I'm currently trying to repro in a unit test. Will file an issue as appropriate, definitely not something for this PR.

I'm also not sure why your 2nd column populates as the scalar version of the builtin 🤔 When I run the same query I get:

span_id | payload_type | payload_jsonb
----------+--------------+----------------

@angelapwen
Copy link
Author

bors r=irfansharif,knz

@angelapwen
Copy link
Author

@tbg was background tracing on when you tried it? I'm surprised you also didn't see the ComponentStats payloads that look like:

{"@type": "type.googleapis.com/cockroach.sql.distsqlrun.ComponentStats", "component": {"flowId": "94aefff6-8e52-4fc1-a997-8f4eed1dfd3c", "id": 0, "nodeId": 1, "type": "FLOW"}, "exec": {"execTime": {"valuePlusOne": "0s"}, "maxAllocatedDisk": {"valuePlusOne": "0"}, "maxAllocatedMem": {"valuePlusOne": "0"}}, "flowStats": {"maxMemUsage": {"valuePlusOne": "1"}}, "inputs": [], "kv": {"bytesRead": {"valuePlusOne": "0"}, "contentionTime": {"valuePlusOne": "0s"}, "kvTime": {"valuePlusOne": "0s"}, "tuplesRead": {"valuePlusOne": "0"}}, "netRx": {"bytesReceived": {"valuePlusOne": "0"}, "deserializationTime": {"valuePlusOne": "0s"}, "latency": {"valuePlusOne": "0s"}, "messagesReceived": {"valuePlusOne": "0"}, "tuplesReceived": {"valuePlusOne": "0"}, "waitTime": {"valuePlusOne": "0s"}}, "netTx": {"bytesSent": {"valuePlusOne": "0"}, "messagesSent": {"valuePlusOne": "0"}, "tuplesSent": {"valuePlusOne": "0"}}, "output": {"numBatches": {"valuePlusOne": "0"}, "numTuples": {"valuePlusOne": "0"}}}

@angelapwen
Copy link
Author

Well, I guess it would have to be to get the span IDs..

@tbg
Copy link
Member

tbg commented Feb 22, 2021

Heh, I trip over this every time! I bet it's this one: #59315

@angelapwen
Copy link
Author

Heh, I trip over this every time! I bet it's this one: #59315

Sigh of relief 😌

@craig
Copy link
Contributor

craig bot commented Feb 22, 2021

Build failed:

@angelapwen
Copy link
Author

Unrelated test flakes.

bors r=irfansharif,knz

tbg added a commit to tbg/cockroach that referenced this pull request Feb 22, 2021
It's nice that it works, but the final query is a monster! I'm glad
there are [plans] to improve the UX.

[plans]: cockroachdb#60784 (comment)

Release note: None
@craig
Copy link
Contributor

craig bot commented Feb 22, 2021

Build succeeded:

@craig craig bot merged commit 7074004 into cockroachdb:master Feb 22, 2021
@angelapwen angelapwen deleted the payload-generator-builtin branch February 22, 2021 12:37
@angelapwen
Copy link
Author

Currently this builtin, payloads_for_span, has a name that implies that we can surface payloads for any span ID passed into the builtin. This builtin will actually at the moment only work appropriately for a root span ID. If a non-root span ID is entered, the builtin will return false (indicating failure). This is because the map of spans being introspected only holds root spans (see #61407).

We may want to consider changing the builtin name to payloads_for_root_span if we expect customer usage, or at least change some of the generated documentation here so the description of the builtin is accurate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants