Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: fall back to raw SQL for statement diagnostics bundle, and add errors.txt #94440

Merged
merged 2 commits into from
Jan 4, 2023

Conversation

michae2
Copy link
Collaborator

@michae2 michae2 commented Dec 29, 2022

sql: fall back to raw SQL for statement diagnostics bundle

When gathering the statement for a statement diagnostics bundle, fall
back to using the raw SQL when we cannot pretty-print the statement AST,
which happens for statements that fail with certain errors.

(This also fixes a crash in addTrace, which was not handling these
error cases. It looks like this crash was partially fixed in #56768 but
not completely.)

Fixes: #94416 #56705

Epic: None

Release note (bug fix): Fix a crash that occurs on the gateway node when
collecting a statement diagnostics bundle (e.g. EXPLAIN ANALYZE (DEBUG)) on a statement that fails with certain errors. This crash has
existed in various forms since the introduction of statement bundles in
v20.1.0.

sql: add errors.txt to statement diagnostics bundles

Add an errors.txt file to statement diagnostics bundles containing the
printed messages of various error objects.

Informs: #94416

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@michae2 michae2 changed the title 94416 sql: fall back to raw SQL for statement diagnostics bundle, and add errors.txt Dec 29, 2022
@michae2
Copy link
Collaborator Author

michae2 commented Dec 29, 2022

Now for that bad AOST query we get this statement.sql (notice the EXPLAIN ANALYZE (DEBUG) hasn't been stripped off):

EXPLAIN ANALYZE (DEBUG) SELECT * FROM t AS OF SYSTEM TIME '-0.0000001s'

and this errors.txt:

res error:
<nil>

payload error:
AS OF SYSTEM TIME: interval value '-0.0000001s' too small, absolute value must be >= 1µs

ret error:
<nil>

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this! :lgtm:

Reviewed 2 of 2 files at r1, 3 of 3 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)


-- commits line 17 at r1:
Can you make it explicit that this would crash the gateway node? Also, do we know how far back this was present?

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! :lgtm:

Reviewed 2 of 2 files at r1, 3 of 3 files at r2, all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @michae2)


pkg/sql/explain_bundle.go line 215 at r1 (raw file):

}

// buildStatement saves the pretty-printed statement, without arguments.

nit: /s/buildStatement/buildPrettyStatement/.


pkg/sql/explain_bundle.go line 234 at r1 (raw file):

}

// addStatement adds the pretty-printed statement as file statement.txt.

nit: maybe mention that the pretty-printed statement is stored in b.stmt.


pkg/sql/explain_bundle.go line 134 at r2 (raw file):

	trace tracingpb.Recording,
	placeholders *tree.PlaceholderInfo,
	resErr, payloadErr, retErr error,

nit: let's rename resErr to queryErr (it's an error encountered during the query optimization / execution) and retErr to commErr (since it should be about an error communicating with the client).


pkg/sql/explain_bundle.go line 450 at r2 (raw file):

		resErr, payloadErr, retErr,
	)
	b.z.AddFile("errors.txt", output)

It'd be good to add a simple unit test for this.

When gathering the statement for a statement diagnostics bundle, fall
back to using the raw SQL when we cannot pretty-print the statement AST,
which happens for statements that fail with certain errors.

(This also fixes a crash in addTrace, which was not handling these
error cases. It looks like this crash was partially fixed in cockroachdb#56768 but
not completely.)

Fixes: cockroachdb#94416 cockroachdb#56705

Epic: None

Release note (bug fix): Fix a crash that occurs on the gateway node when
collecting a statement diagnostics bundle (e.g. `EXPLAIN ANALYZE
(DEBUG)`) on a statement that fails with certain errors. This crash has
existed in various forms since the introduction of statement bundles in
v20.1.0.
Add an errors.txt file to statement diagnostics bundles containing the
printed messages of various error objects.

Informs: cockroachdb#94416

Release note: None
Copy link
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner and @yuzefovich)


-- commits line 17 at r1:

Previously, mgartner (Marcus Gartner) wrote…

Can you make it explicit that this would crash the gateway node? Also, do we know how far back this was present?

Done. Looks like this existed back in v20.1.0 (when statement bundles were first introduced), and then was partially fixed in #56768 but remained an issue until now.


pkg/sql/explain_bundle.go line 215 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: /s/buildStatement/buildPrettyStatement/.

Done.


pkg/sql/explain_bundle.go line 234 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: maybe mention that the pretty-printed statement is stored in b.stmt.

Done.


pkg/sql/explain_bundle.go line 134 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: let's rename resErr to queryErr (it's an error encountered during the query optimization / execution) and retErr to commErr (since it should be about an error communicating with the client).

Ah, thank you! Done.


pkg/sql/explain_bundle.go line 450 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

It'd be good to add a simple unit test for this.

Now checked in TestExplainAnalyzeDebug/error.

@michae2
Copy link
Collaborator Author

michae2 commented Jan 3, 2023

TFYRs!

bors r=mgartner,yuzefovich

@craig
Copy link
Contributor

craig bot commented Jan 4, 2023

Build succeeded:

@craig craig bot merged commit a4c71fd into cockroachdb:master Jan 4, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jan 4, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 4f66dcc to blathers/backport-release-22.1-94440: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1.x failed. See errors above.


error creating merge commit from 4f66dcc to blathers/backport-release-22.2-94440: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash: Node crashes when using EXPLAIN ANALYZE (DEBUG) AS OF SYSTEM TIME with a very small time interval
4 participants