-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
util/pretty: mitigate stack overflows of Pretty #110374
util/pretty: mitigate stack overflows of Pretty #110374
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach taken here works if we choose to error out when the SQL is sufficiently complex.
But is that the right UX wrt user expectations?
We are looking here at a formatter. If the SQL is sufficiently complex, we still have an option to render the SQL -- using the simple formatter (the one that's not pretty).
What of avoiding an error
return in Pretty()
and instead have it call .AST.String()
if memory was exceeded?
Reviewed 14 of 14 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @michae2)
pkg/util/pretty/pretty.go
line 119 at r1 (raw file):
} const maxBeDepth = 100_000
arguably i could see us use 10.000 here (anything above 1000 looks suspicious to me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I agree with @knz that we should make an effort to try to render the SQL even if it is slightly ugly.
Reviewed 14 of 14 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner)
pkg/cmd/reduce/reduce/reducesql/reducesql.go
line 443 at r1 (raw file):
p, err := cfg.Pretty(stmt) if err != nil { panic(err)
Maybe instead of panic we should revert to using stmt
?
pkg/internal/sqlsmith/sqlsmith.go
line 195 at r1 (raw file):
p, err := prettyCfg.Pretty(stmt) if err != nil { panic(err)
Same comment here.
pkg/sql/explain_bundle.go
line 266 at r1 (raw file):
b.stmt, err = cfg.Pretty(b.plan.stmt.AST) if err != nil { return err
If we hit this error, instead of giving up on saving the statement, I propose we copy this logic from the b.plan.stmt.AST == nil
case above so that the bundle still has something useful:
b.stmt = stmtRawSQL
// If we're collecting a redacted bundle, redact the raw SQL completely.
if b.flags.RedactValues && b.stmt != "" {
b.stmt = string(redact.RedactedMarker())
}
If you want to save the formatting error, could add something like "\n-- failed to format SQL: %v"
to the b.stmt.
pkg/util/pretty/pretty.go
line 119 at r1 (raw file):
} const maxBeDepth = 100_000
Personally I would go even smaller (maybe 10k) but this is also fine.
pkg/util/pretty/pretty.go
line 125 at r1 (raw file):
defer func() { b.beDepth-- }() if b.beDepth > maxBeDepth { return nil, errors.AssertionFailedf("max call stack depth of be exceeded")
Another idea, if @knz doesn't like this, is to panic instead of adding error handling to every stack frame.
lol, yes I am known to advertise for this type of error handling, which would be super justified here. |
9879564
to
9d51f6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the UX could improve, but I'd argue this is already a vast improvement. An internal error is strictly better than a crashed node.
IMO it should be up to the caller of Pretty to decide what to do if there is an error, but if you both feel strongly that it should just spit out a "non-pretty" string I can make that change.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @michae2)
pkg/util/pretty/pretty.go
line 119 at r1 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Personally I would go even smaller (maybe 10k) but this is also fine.
Heh, you two are on the same page today, huh? 😉 Done.
pkg/util/pretty/pretty.go
line 119 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
arguably i could see us use 10.000 here (anything above 1000 looks suspicious to me).
Done.
pkg/util/pretty/pretty.go
line 125 at r1 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Another idea, if @knz doesn't like this, is to panic instead of adding error handling to every stack frame.
A panic here won't be guaranteed to be caught by a recover
. I originally wanted to do this but from what I could tell the code path for the explain bundle would not recover from the panic, so it'd end up crashing the node.
Or are you suggesting to add |
yes, that. panic-based error handling should never cross package boundaries. |
9d51f6a
to
45fa1e1
Compare
45fa1e1
to
340f65d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that. panic-based error handling should never cross package boundaries.
👍 Done.
I added two more commits to avoid some import cycles. PTAL!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @michae2)
pkg/cmd/reduce/reduce/reducesql/reducesql.go
line 443 at r1 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Maybe instead of panic we should revert to using
stmt
?
Done.
pkg/internal/sqlsmith/sqlsmith.go
line 195 at r1 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Same comment here.
Done.
pkg/sql/explain_bundle.go
line 266 at r1 (raw file):
Previously, michae2 (Michael Erickson) wrote…
If we hit this error, instead of giving up on saving the statement, I propose we copy this logic from the
b.plan.stmt.AST == nil
case above so that the bundle still has something useful:b.stmt = stmtRawSQL // If we're collecting a redacted bundle, redact the raw SQL completely. if b.flags.RedactValues && b.stmt != "" { b.stmt = string(redact.RedactedMarker()) }If you want to save the formatting error, could add something like
"\n-- failed to format SQL: %v"
to the b.stmt.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 28 of 28 files at r3, 10 of 10 files at r4, 14 of 14 files at r5, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @michae2)
-- commits
line 18 at r4:
💯 nice
pkg/kv/kvpb/errors.go
line 1631 at r3 (raw file):
// StoreDescNotFoundError is reported when a stored descriptor is missing. type StoreDescNotFoundError struct {
For the sake of error encode/decode, I recommend you keep a single data type (maybe with a boolean var to distinguish node/store); then you will want to add this to the init function:
errorutilpath := reflect.TypeOf(errorutil.XXX_TBD{}).PkgPath()
errors.RegisterTypeMigration(errorutilpath, "*errorutil.descriptorNotFound", &DescNotFoundError)
where XXX_TBD
is a sentinel type remaining behind in errorutil
to enable retrieving the PkgPath via reflect.
pkg/sql/explain_bundle.go
line 261 at r5 (raw file):
b.stmt, err = cfg.Pretty(b.plan.stmt.AST) if err != nil { // Use the raw statement string if pretty-printing fails.
I'm not keen on having a blanket ignore of the object in err
. This feels like a candidate for errors.Is
. If there's any other error later in the Pretty interface that is more "Serious" we won't want that to get ignored.
pkg/sql/show_create_clauses.go
line 173 at r5 (raw file):
} query, err = cfg.Pretty(parsed.AST) if err != nil {
ditto my previous comment about errors.Is
pkg/sql/show_create_clauses.go
line 174 at r5 (raw file):
query, err = cfg.Pretty(parsed.AST) if err != nil { log.Warningf(ctx, "error pretty-printing query for view %s (%v): %+v",
What is this warning going to help with? What should an operator do if they see it?
pkg/sql/sem/builtins/builtins.go
line 11162 at r5 (raw file):
p, err := p.Pretty(stmts[idx].AST) if err != nil { return "", err
I think this is a case where we probably want an alternate representation if this one fails. This code is used when displaying queries in DB Console.
pkg/util/pretty/pretty.go
line 141 at r5 (raw file):
defer func() { b.beDepth-- }() if b.beDepth > maxBeDepth { panic(errors.AssertionFailedf("max call stack depth of be exceeded"))
possibly a custom error marker here to enable errors.Is
.
340f65d
to
964e2ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @michae2)
Previously, knz (Raphael 'kena' Poss) wrote…
💯 nice
👍
pkg/kv/kvpb/errors.go
line 1631 at r3 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
For the sake of error encode/decode, I recommend you keep a single data type (maybe with a boolean var to distinguish node/store); then you will want to add this to the init function:
errorutilpath := reflect.TypeOf(errorutil.XXX_TBD{}).PkgPath() errors.RegisterTypeMigration(errorutilpath, "*errorutil.descriptorNotFound", &DescNotFoundError)where
XXX_TBD
is a sentinel type remaining behind inerrorutil
to enable retrieving the PkgPath via reflect.
Done.
pkg/sql/explain_bundle.go
line 261 at r5 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I'm not keen on having a blanket ignore of the object in
err
. This feels like a candidate forerrors.Is
. If there's any other error later in the Pretty interface that is more "Serious" we won't want that to get ignored.
Great catch. Done.
pkg/sql/show_create_clauses.go
line 173 at r5 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
ditto my previous comment about
errors.Is
Done.
pkg/sql/show_create_clauses.go
line 174 at r5 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
What is this warning going to help with? What should an operator do if they see it?
It wasn't intended for an operator. I thought it might aid us in debugging in the future, but I'm not sure exactly how. I've removed it.
pkg/sql/sem/builtins/builtins.go
line 11162 at r5 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I think this is a case where we probably want an alternate representation if this one fails. This code is used when displaying queries in DB Console.
Makes sense, done.
pkg/util/pretty/pretty.go
line 141 at r5 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
possibly a custom error marker here to enable
errors.Is
.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 31 files at r6, 9 of 10 files at r7, 15 of 15 files at r8, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @michae2)
pkg/kv/kvpb/errors.go
line 1662 at r8 (raw file):
func (e *DescNotFoundError) SafeFormatError(p errors.Printer) (next error) { if e.isStore { p.Printf("store descriptor with store ID %d was not found", e.storeID)
nit: maybe
s := redact.SafeString("node")
id := uint64(e.NodeID)
if e.isStore {
s = "store"
id = uint64(e.StoreID)
}
p.Printf("%s descriptor with %s ID %d was not found", s, s, id)
(you could even collapse the two ID members into a single integer field)
pkg/sql/sem/builtins/builtins.go
line 11161 at r8 (raw file):
for idx := range stmts { p, err := p.Pretty(stmts[idx].AST) if err != nil {
errors.Is
964e2ec
to
5caf95a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @michae2)
pkg/kv/kvpb/errors.go
line 1662 at r8 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
nit: maybe
s := redact.SafeString("node") id := uint64(e.NodeID) if e.isStore { s = "store" id = uint64(e.StoreID) } p.Printf("%s descriptor with %s ID %d was not found", s, s, id)(you could even collapse the two ID members into a single integer field)
👍 Done.
pkg/sql/sem/builtins/builtins.go
line 11161 at r8 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
errors.Is
Done.
pkg/util/pretty/pretty.go
line 135 at r8 (raw file):
// maxBeDepth is the maximum allowed recursive call depth of be. If the depth // exceeds this value, be will panic. const maxBeDepth = 10_000
I raised this back to 100,000 because this test was failing to pretty-round-trip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo settling on the max size constant.
Reviewed 1 of 19 files at r11.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @michae2)
pkg/util/pretty/pretty.go
line 135 at r8 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I raised this back to 100,000 because this test was failing to pretty-round-trip.
I would rather change the test than the constant here. These tests were not principled.
5caf95a
to
5fb5720
Compare
Previously, knz (Raphael 'kena' Poss) wrote…
The test looks like a regression test and it's not a terribly obscure case—it's an insert of ~400 rows. Shouldn't we be able to handle such a statement? |
I would expect so, but not with a recursion depth of 100000. What is the effective recursion depth used for it? |
The depth for that test grows to 19,361. |
141d2d7
to
dfd056c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 28 files at r3, 6 of 31 files at r6, 25 of 25 files at r9, 11 of 11 files at r10, 15 of 19 files at r11, 1 of 1 files at r12, 3 of 3 files at r13, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @knz and @mgartner)
pkg/util/pretty/pretty.go
line 135 at r8 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
The test looks like a regression test and it's not a terribly obscure case—it's an insert of ~400 rows. Shouldn't we be able to handle such a statement?
🤷 I'm ok either way.
(Personally I think Go's default stack limit of 1 GiB is giant but I'm coming from a previous codebase where we could only use 20 KiB stacks, so I'm biased.)
ef5ae0d
to
209b8ef
Compare
Previously, michae2 (Michael Erickson) wrote…
I updated it to 50,000, which allows the test to pass with plenty of buffer and also catches the stack overflow. Let me know what y'all think. |
i'm ok with it. |
209b8ef
to
0190f24
Compare
Errors for missing store and node descriptors have been moved from the errorutil package to the kvpb package, because the errorutil package is a low-level package to aid in working with errors in general. It should not contain facilities for creating specific errors as this muddles the package and can lead to import cycles. Release note: None
The `SendReport` function has been moved out of the errorutil package and into a new sentryutil package. This avoids muddling the errorutil package with a Sentry-specific function, and it breaks errorutil's dependence on `pkg/settings` and `pkg/util/log/logcrash`. Release note: None
This commit reduces the chance of a stack overflow from recursive calls of `*beExec.be`. The `Pretty` function will now return an internal error if the recursive depth of `be` surpasses 10,000. Informs cockroachdb#91197 Release note: None
0190f24
to
e8d3de2
Compare
TFTRs! bors r+ |
Build failed: |
Flaked. bors r+ |
Build succeeded: |
`StoreNotFoundError` and `NodeNotFoundError` errors were moved to the `kvpb` pkg in cockroachdb#110374. As part of the move, `crdb_internal` functions which checked if the error were `DescNotFoundError` were also updated so that node/store not found errors would be recognized e.g. ``` errors.Is(kvpb.NewNodeNotFoundError(nodeID), &kvpb.DescNotFoundError{}) ``` This didn't work, because the error doesn't match the reference error variable being given. It does match the type. Update these error assertions to use `HasType` instead. Resolves: cockroachdb#111084 Epic: none Release note: None
…111232 110967: asim: enable random zone config event generation r=kvoli a=wenyihu6 Previously, zone config event generation used hardcoded span configurations. This limits our ability to test the allocator more thoroughly. To improve this, this patch enables random span configs to be generated and applied as part of the simulation. These configurations are generated by randomly selecting the primary region, region V.S. zone survival goal, and leaseholder preference. ``` The following command is now supported: "rand_events" [cycle_via_random_survival_goals] ``` Part of: #106192 Release Note: none Epic: none 111192: bulk: allow caller-configurable priority in SSTBatcher r=adityamaru a=stevendanna This adds the ability for some callers to use a higher admission priority in SSTBatcher. This is helpful for streaming where we want to run at a priority that isn't subject to the elastic admission regime. Epic: none Release note: None 111206: kv: fix (store|node) not found err checking r=erikgrinaker a=kvoli `StoreNotFoundError` and `NodeNotFoundError` errors were moved to the `kvpb` pkg in #110374. As part of the move, `crdb_internal` functions which checked if the error were `DescNotFoundError` were also updated so that node/store not found errors would be recognized e.g. ``` errors.Is(kvpb.NewNodeNotFoundError(nodeID), &kvpb.DescNotFoundError{}) ``` This didn't work, because the error doesn't match the reference error variable being given. It does match the type. Update these error assertions to use `HasType` instead. Resolves: #111084 Epic: none Release note: None 111214: release: fix roachtest artifacts name r=srosenberg a=rail This fixes the roachtest artifacts directory name. Epic: none Release note: None 111217: cloud/azure: Fix azure schemes r=benbardin a=benbardin Part of: https://cockroachlabs.atlassian.net/browse/CRDB-31120 Release note (bug fix): Fixes azure schemes in storage, kms and external conns. 111223: storage: populate timestamp field in lock table values r=nvanbenschoten a=nvanbenschoten Informs #109645. This commit starts writing the `Timestamp` field in lock table `MVCCMetadata` values for shared and exclusive locks. This mirrors the behavior of intent locks. This is not strictly needed, as the timestamp is always equal to `Txn.WriteTimestamp`, but it is cheap to do and helps unify some stats logic, which uses this field to compute "lock age". Maybe we'll get rid of this for all lock strengths one day... Release note: None 111230: authors: add xhesikam to authors r=xhesikam a=xhesikam Release note: None Epic: None 111231: backupccl: add missing ctx cancel check r=msbutler a=adityamaru In #111159 we deduced from the stacks a situation in which the goroutine draining `spanCh` had exited due to a context cancelation but the writer was not listening for a ctx cancelation. This manifests as a stuck restore when using the non-default make simple import spans implementation. Fixes: #111159 Release note: None 111232: kv: deflake TestLeaseExpirationBasedDrainTransferWithExtension r=nvanbenschoten a=nvanbenschoten Informs #110715, which will be fixed by a non-clean backport (see 42e45b4) of this commit. This commit deflakes `TestLeaseExpirationBasedDrainTransferWithExtension` by disabling the node suspect timer in leaseTransferTest tests. These tests manually control the clock and have a habit of inducing destabilizing clock jumps. In this case, if n2 looked at liveness immediately after one of these manual clock jumps, it would mark n1 as suspect and refuse to transfer it the lease for the 30 second `server.time_after_store_suspect`, which is longer than the 5 second `server.shutdown.lease_transfer_wait`. This would cause the test to fail. Before this patch, the test would fail under stress race in about 8 minutes. Since the patch, it hasn't failed in over 30 minutes. Release note: None Co-authored-by: wenyihu6 <[email protected]> Co-authored-by: Steven Danna <[email protected]> Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Rail Aliiev <[email protected]> Co-authored-by: Ben Bardin <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: xhesikam <[email protected]> Co-authored-by: adityamaru <[email protected]>
errorutil: moved kv-specific errors out of errorutil
Errors for missing store and node descriptors have been moved from the
errorutil package to the kvpb package, because the errorutil package is
a low-level package to aid in working with errors in general. It should
not contain facilities for creating specific errors as this muddles the
package and can lead to import cycles.
Release note: None
errorutil: move SendReport to new sentryutil package
The
SendReport
function has been moved out of the errorutil packageand into a new sentryutil package. This avoids muddling the errorutil
package with a Sentry-specific function, and it breaks errorutil's
dependence on
pkg/settings
andpkg/util/log/logcrash
.Release note: None
util/pretty: mitigate stack overflows of Pretty
This commit reduces the chance of a stack overflow from recursive calls
of
*beExec.be
. ThePretty
function will now return an internal errorif the recursive depth of
be
surpasses 10,000.Informs #91197
Release note: None