Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: fix internal executor when it encounters a retry error #101477

Merged
merged 5 commits into from
Apr 19, 2023

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Apr 13, 2023

This PR contains several commits that fix a long-standing bug in the internal executor that could make it double count some things (rows or metadata) if an internal retry error occurs. In particular, since at least 21.1 (when we introduced the "streaming" internal executor in #59330) if the IE encounters a retry error after it has communicated some results to the client, it would proceed to retry the corresponding command as if the incomplete execution and the retry error never happened. In other words, it was possible for some rows to be double "counted" (either to be directly included multiple times into the result set or indirectly into the "rows affected" number). This PR fixes the problem by returning the retry error to the client in cases when some actual rows have already been communicated and by resetting the number of "rows affected" when "rewinding the stmt buffer" in order to retry the error transparently.

Fixes: #98558.

@yuzefovich yuzefovich added backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.1.0 labels Apr 13, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich force-pushed the fix-rewind branch 7 times, most recently from 8f90a3d to f183f6d Compare April 14, 2023 01:49
@yuzefovich yuzefovich changed the title sql: fix internal executor when it retries sql: fix internal executor when it encounters a retry error Apr 14, 2023
@yuzefovich yuzefovich marked this pull request as ready for review April 14, 2023 01:52
@yuzefovich yuzefovich requested a review from a team as a code owner April 14, 2023 01:52
@yuzefovich yuzefovich requested a review from a team April 14, 2023 01:52
@yuzefovich yuzefovich requested review from a team as code owners April 14, 2023 01:52
@yuzefovich yuzefovich requested review from miretskiy, michae2, rafiss, cucaroach and knz and removed request for a team, miretskiy and michae2 April 14, 2023 01:52
@yuzefovich
Copy link
Member Author

@rafiss @knz I think two of you have the most context around this code, so I'm asking for review from both of you; @cucaroach I think you might have looked into this stuff recently, so I'd welcome your review too. Extra scrutiny is much appreciated given the trickiness and ubiquitousness of the internal executor.

@yuzefovich
Copy link
Member Author

Oh, I think @ZhouXing19 also has exposure - welcome your review too!

@knz
Copy link
Contributor

knz commented Apr 14, 2023

Amazing work. I'll review this shortly.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the 2nd commit, could I interest you in a test (something like the one from the 3rd commit) which checks that the rowsaffected are at the value expected even when a retry has occurred?

I haven't fully reviewed the changes to sql/internal.go yet because I find them intimidating :)

Reviewed 4 of 4 files at r1, 7 of 7 files at r2, 3 of 4 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach, @rafiss, @yuzefovich, and @ZhouXing19)


-- commits line 60 at r3:
For columns I'm not sure -- a retry could potentially bring a txn over a schema change that alters the number of columns?


-- commits line 76 at r4:
What happens if Exec shares a kv.Txn with the caller? In that case, if there's a retry error, it's not correct to just retry inside internalexecutor; because the caller may have performed some other SQL using the same kv.Txn prior to calling Exec, and all of that needs to be retried too.

I think we are only allowed to retry things inside internalexecutor if the caller specified a nil kv.Txn to start with. Where is this checked?

@yuzefovich yuzefovich force-pushed the fix-rewind branch 2 times, most recently from f67fec5 to b6b091c Compare April 19, 2023 00:18
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. I can't say that I fully grokked how all this machinery works when I implemented the "streaming" internal executor a couple years ago. Now that I spent significant amount of time debugging this code, I have a decent mental model, so I just added another commit (now second) which attempts to outline how all moving pieces fit together, and then with each consequent commit I tried to update the corresponding comment accordingly.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @knz, @rafiss, and @ZhouXing19)


-- commits line 60 at r3:

Previously, knz (Raphael 'kena' Poss) wrote…

Can we get some help from Faizan or someone else in the schema team to get a working test.

Do you think this is a blocking question? I'm not saying we shouldn't introduce a test like this, but it seems like an edge case that we can leave a TODO for.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new doc is very good. Thank you.

Reviewed 1 of 1 files at r13, 1 of 10 files at r14, 2 of 3 files at r15, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @rafiss, @yuzefovich, and @ZhouXing19)


-- commits line 60 at r3:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Do you think this is a blocking question? I'm not saying we shouldn't introduce a test like this, but it seems like an edge case that we can leave a TODO for.

No it is not blocking this PR, but we still need this test to happen.


pkg/sql/internal.go line 818 at r13 (raw file):

// It's also worth noting that execInternal doesn't return until the
// connExecutor reaches the execution engine (i.e. until after the query
// planning has been performed). This is needed in order to avoid concurrent

I don't understand why this fully prevents concurrent access to the kv.Txn. If more rows are returned to the caller via the iterator, aren't those rows the product of a currently-used kv.Txn? What if there's another call to IE "under" the iterator?


pkg/sql/internal.go line 820 at r13 (raw file):

// planning has been performed). This is needed in order to avoid concurrent
// access to the txn.
// TODO(yuzefovich): currently, this statement is not entirely true if the retry

I'm curious - what is not true about it?


pkg/sql/internal.go line 833 at r13 (raw file):

// internalClientComm implements the ClientLock interface in such a fashion as
// if any command can be transparently retried.
// TODO(yuzefovich): fix this.

Maybe worth filing issues and linking them from the comment.

This commit changes the `RestrictedCommandResult` interface a bit to
make `SetRowsAffected` just set the number of rows affected, rather than
increment it. This method is supposed to be called only once, so this
change seems reasonable on its own, but it also has an additional
benefit for the follow-up commit. In particular, in the follow-up commit
we will preserve the ability to automatically retry statements of
non-ROWS stmt type by the internal executor.

Note that this commit on its own fixes the behavior for the "rows
affected" statements (the only known occurrence of the bug that is
generally fixed in the following commit).

Release note: None
This commit fixes a bug with the internal executor when it encounters an
internal retry. Previously, the implementation of the "rewind
capability" was such that we always assumed that we can rewind the
StmtBuf for the IE at any point; however, that is not generally true. In
particular, if we communicated anything to the client of the IE's
connExecutor (`rowsIterator` in this context), then we cannot rewind the
current command we're evaluating. In theory, we could have pushed some
rows through the iterator only to encounter the internal retry later which
would then lead to re-executing the command from the start (in other
words, the rows that we've pushed before the retry would be
double-pushed); in practice, however, we haven't actually seen this (at
least yet). What we have seen is the number of rows affected being
double counted. This particular reproduction was already fixed in the
previous commit, but this commit fixes the problem more generally.

This commit makes it so that for every `streamingCommandResult` we are
tracking whether it has already communicated something to the client so
that the result can no longer be rewound, and then we use that tracking
mechanism to correctly implement the rewind capability. We have three
possible types of data that we communicate:
- rows
- number of rows affected
- column schema.

If the retry happens after some rows have been communicated, we're out
of luck - there is no way we can retry the stmt internally, so from now
on we will return a retry error to the client.

If the retry happens after "rows affected", then given the adjustment in
the previous commit we can proceed transparently.

In order to avoid propagating the retry error up when it occurs after
having received the column schema but before pushing out any rows, this
commit adjusts the behavior to always keep the latest column schema,
thus, we can still proceed transparently in this case.

This bug has been present since at least 21.1 when
`streamingCommandResult` was introduced. However, since we now might
return a retry error in some cases, this could lead to test failures or
flakes, or even to errors in some internal CRDB operations that execute
statements of ROWS type (if there is no appropriate retry logic), so
I intend to only backport this to 23.1. There is also no release note
since the only failure we've seen is about double counted "rows
affected" number, the likelihood of which has significantly increased
due to the jobs system refactor (i.e. mostly 23.1 is affected AFAIK).

Additionally, this commit makes it so that we correctly block the
`execInternal` call until the first actual, non-metadata result is seen
(this behavior is needed to correctly synchronize access to the txn
before the stmt is given to the execution engine).

Release note: None
This commit teaches the internal executor to avoid propagating the retry
errors to the client when executing `Exec{Ex}` methods. In those
methods, we only return the number of rows affected, thus, in order to
preserve the ability to transparently retry all statements (even if they
are of ROWS statement type), we simply need to be able to reset the rows
affected number on the `rowsIterator` if a result is being discarded
(which indicates that a retry error has been encountered and the rewind
mechanism is at play).

This structure relies on the assumption that we push at most one command
into the StmtBuf that results in "rows affected" which is true at the
moment of writing, and I don't foresee that changing, at least for now.
(This assumption would be incorrect only if we were to allow executing
multiple statements as part of a single method invocation on the
internal executor, but I don't think that's going to happen - we only
might do so in a few tests.)

Release note: None
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews everyone!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @knz, @rafiss, and @ZhouXing19)


-- commits line 60 at r3:

Previously, knz (Raphael 'kena' Poss) wrote…

No it is not blocking this PR, but we still need this test to happen.

Left a TODO for now.


pkg/sql/internal.go line 818 at r13 (raw file):
It prevents the concurrent access between two goroutines at play - the rowsIterator and the "internal" connExecutor goroutines. If the "synchronous" ieResultChannel is used, only one goroutine will be active at any point in time. When more rows are returned to the caller, the caller was blocked on ieResultChannel while the rows were being produced.

What if there's another call to IE "under" the iterator?

This question is spot on - we actually put this synchronization in-place due to a bug with "nested" IE #62415. I expanded the comment.


pkg/sql/internal.go line 820 at r13 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

I'm curious - what is not true about it?

The problem was that if the retry occurs after the column schema was sent but before any rows were communicated, this call wouldn't block until the control flow is in the execution engine (when reevaluating the stmt after the retry). In other words, if we had a sequence like

  • send columns
  • encounter a retry
  • send columns (1)
  • go into execution engine (2)

the blocking was done only up until (1), but it should be up until (2). This is fixed in the fourth commit by looping until we see non-column data.


pkg/sql/internal.go line 833 at r13 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Maybe worth filing issues and linking them from the comment.

This is fixed in the fourth commit to - this TODO is the general bug this PR is fixing.

@craig
Copy link
Contributor

craig bot commented Apr 19, 2023

Build failed:

@yuzefovich
Copy link
Member Author

Known flake #101519.

bors r+

@craig
Copy link
Contributor

craig bot commented Apr 19, 2023

Build succeeded:

@craig craig bot merged commit 3042c4f into cockroachdb:master Apr 19, 2023
@yuzefovich yuzefovich deleted the fix-rewind branch April 19, 2023 18:06
yuzefovich added a commit to yuzefovich/cockroach that referenced this pull request Nov 14, 2023
This commit introduces a limit on the number of times the
InternalExecutor machinery can retry errors in `Exec{Ex}` methods. That
logic was introduced in c09860b in
order to reduce the impact of fixes in other commits in cockroachdb#101477.
However, there is no limit in the number of retries, and we hit the
stack overflow twice in our own testing recently seemingly in this code
path. Thus, this commit adds a limit, 5 by default.

It is stored in the session data, but it actually has no meaning in the
user's session, so this commit also adds a corresponding cluster setting
to be able to change the default used in the internal executor (just in
case we need it).

Note that I'm not sure exactly what bug, if any, can lead to the stack
overflow. One seemingly-unlikely theory is that there is no bug, meaning
that we were simply retrying forever because the stmts were pushed by
higher priority txns every time. Still, this change seems beneficial on
its own and should prevent stack overflows even if we don't fully
understand the root cause.

There is no release note given we've only seen this twice in our own
testing and it involved cluster-to-cluster streaming.

Release note: None
yuzefovich added a commit to yuzefovich/cockroach that referenced this pull request Nov 15, 2023
This commit introduces a limit on the number of times the
InternalExecutor machinery can retry errors in `Exec{Ex}` methods. That
logic was introduced in c09860b in
order to reduce the impact of fixes in other commits in cockroachdb#101477.
However, there is no limit in the number of retries, and we hit the
stack overflow twice in our own testing recently seemingly in this code
path. Thus, this commit adds a limit, 5 by default.

It is stored in the session data, but it actually has no meaning in the
user's session, so this commit also adds a corresponding cluster setting
to be able to change the default used in the internal executor (just in
case we need it).

Note that I'm not sure exactly what bug, if any, can lead to the stack
overflow. One seemingly-unlikely theory is that there is no bug, meaning
that we were simply retrying forever because the stmts were pushed by
higher priority txns every time. Still, this change seems beneficial on
its own and should prevent stack overflows even if we don't fully
understand the root cause.

An additional improvement is that we now track the depth of recursion,
and once it exceeds 1000, we'll return an error. This should prevent the
stack overflows due to other reasons.

There is no release note given we've only seen this twice in our own
testing and it involved cluster-to-cluster streaming.

Release note: None
yuzefovich added a commit to yuzefovich/cockroach that referenced this pull request Nov 15, 2023
This commit introduces a limit on the number of times the
InternalExecutor machinery can retry errors in `Exec{Ex}` methods. That
logic was introduced in c09860b in
order to reduce the impact of fixes in other commits in cockroachdb#101477.
However, there is no limit in the number of retries, and we hit the
stack overflow twice in our own testing recently seemingly in this code
path. Thus, this commit adds a limit, 5 by default.

Note that I'm not sure exactly what bug, if any, can lead to the stack
overflow. One seemingly-unlikely theory is that there is no bug, meaning
that we were simply retrying forever because the stmts were pushed by
higher priority txns every time. Still, this change seems beneficial on
its own and should prevent stack overflows even if we don't fully
understand the root cause.

An additional improvement is that we now track the depth of recursion,
and once it exceeds 1000, we'll return an error. This should prevent the
stack overflows due to other reasons.

There is no release note given we've only seen this twice in our own
testing and it involved cluster-to-cluster streaming.

Release note: None
craig bot pushed a commit that referenced this pull request Nov 16, 2023
114398: sql: introduce limit on number of retries in IntExec.Exec{Ex} methods r=yuzefovich a=yuzefovich

This commit introduces a limit on the number of times the InternalExecutor machinery can retry errors in `Exec{Ex}` methods. That logic was introduced in c09860b in order to reduce the impact of fixes in other commits in #101477. However, there is no limit in the number of retries, and we hit the stack overflow twice in our own testing recently seemingly in this code path. Thus, this commit adds a limit, 5 by default.

Note that I'm not sure exactly what bug, if any, can lead to the stack overflow. One seemingly-unlikely theory is that there is no bug, meaning that we were simply retrying forever because the stmts were pushed by higher priority txns every time. Still, this change seems beneficial on its own and should prevent stack overflows even if we don't fully understand the root cause.

An additional improvement is that we now track the depth of recursion, and once it exceeds 1000, we'll return an error. This should prevent the stack overflows due to other reasons.

There is no release note given we've only seen this twice in our own testing and it involved cluster-to-cluster streaming.

Fixes: #109197.

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
blathers-crl bot pushed a commit that referenced this pull request Nov 16, 2023
This commit introduces a limit on the number of times the
InternalExecutor machinery can retry errors in `Exec{Ex}` methods. That
logic was introduced in c09860b in
order to reduce the impact of fixes in other commits in #101477.
However, there is no limit in the number of retries, and we hit the
stack overflow twice in our own testing recently seemingly in this code
path. Thus, this commit adds a limit, 5 by default.

Note that I'm not sure exactly what bug, if any, can lead to the stack
overflow. One seemingly-unlikely theory is that there is no bug, meaning
that we were simply retrying forever because the stmts were pushed by
higher priority txns every time. Still, this change seems beneficial on
its own and should prevent stack overflows even if we don't fully
understand the root cause.

An additional improvement is that we now track the depth of recursion,
and once it exceeds 1000, we'll return an error. This should prevent the
stack overflows due to other reasons.

There is no release note given we've only seen this twice in our own
testing and it involved cluster-to-cluster streaming.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
7 participants