Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: do not evaluate AOST timestamp in session migrations #108503

Merged
merged 1 commit into from
Aug 10, 2023

Conversation

rafiss
Copy link
Collaborator

@rafiss rafiss commented Aug 10, 2023

fixes https://github.com/cockroachlabs/support/issues/2510
refs #108305
Release note (bug fix): Fixed a bug where a session migration performed by SHOW TRANSFER STATE would not handle prepared statements that used the AS OF SYSTEM TIME clause. Users who encountered this bug would see errors such as expected 1 or 0 for number of format codes, got N. This bug was present since v22.2.0.

@rafiss rafiss added the backport-23.1.x Flags PRs that need to be backported to 23.1 label Aug 10, 2023
@rafiss rafiss requested a review from a team as a code owner August 10, 2023 08:17
@rafiss rafiss requested a review from rharding6373 August 10, 2023 08:17
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rafiss
Copy link
Collaborator Author

rafiss commented Aug 10, 2023

NB: I'm only backporting this to v23.1 and not 22.2 since we should prioritize fixing this for Serverless clusters, which are all on 23.1 now.

@rafiss rafiss requested review from yuzefovich, ecwall and jaylim-crl and removed request for rharding6373 August 10, 2023 08:18
Copy link
Collaborator

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Can we also log the error here instead of swallowing it during session migration?

if err := prepare(ctx, ex.state.mu.txn); err != nil && origin != PreparedStatementOriginSessionMigration {
return nil, err
}

As an aside, @jeffswenson thinks that we should have errored out if we fail to prepare any statements during deserialization instead of having it error when the user uses them.

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)

Release note (bug fix): Fixed a bug where a session migration performed
by SHOW TRANSFER STATE would not handle prepared statements that used
the AS OF SYSTEM TIME clause. Users who encountered this bug would see
errors such as `expected 1 or 0 for number of format codes, got N`. This
bug was present since v22.2.0.
@rafiss rafiss force-pushed the fix-aost-session-migration branch from 2fad53a to 5befaa7 Compare August 10, 2023 14:50
@rafiss
Copy link
Collaborator Author

rafiss commented Aug 10, 2023

Added the warning log.

As an aside, @jeffswenson thinks that we should have errored out if we fail to prepare any statements during deserialization instead of having it error when the user uses them.

I disagree with this - even without session migrations, it is possible for a prepared statement to become invalid. One example is if a schema change occurs on a table referenced by a prepared statement. That was the motivation for this commit: dda2fa9. If we error out eagerly, then that would be contrary to the goal of making session migrations transparent to the user.

IMO the missing piece here is that we weren't logging the reason why a statement couldn't be prepared.

@jeffswenson
Copy link
Collaborator

Added the warning log.

As an aside, @jeffswenson thinks that we should have errored out if we fail to prepare any statements during deserialization instead of having it error when the user uses them.

I disagree with this - even without session migrations, it is possible for a prepared statement to become invalid. One example is if a schema change occurs on a table referenced by a prepared statement. That was the motivation for this commit: dda2fa9. If we error out eagerly, then that would be contrary to the goal of making session migrations transparent to the user.

IMO the missing piece here is that we weren't logging the reason why a statement couldn't be prepared.

The current behavior only makes sense to me IFF we are confident that all of the errors are the result of schema changes. If there is a bug in the prepare logic that impacts transferred sessions, it would be much better for the transfer to fail than to have the session limp along with corrupted state. The warning log is an improvement, but I'm still worried this will hide hard to track down bugs.

Thinking through how applications are developed, I'm skeptical that prepared statements broken by schema changes is actually a common scenario. Usually schema changes are backwards compatible with the running application.

@rafiss
Copy link
Collaborator Author

rafiss commented Aug 10, 2023

The current behavior only makes sense to me IFF we are confident that all of the errors are the result of schema changes.

I have confidence. This bug being fixed seems to be the first issue we've heard about in a over year of this feature being used, so I believe we can get to a place where unexpected errors during session transfers do not happen unless something is majorly wrong.

Given how commonly people ask about invalidated prepared statements in bug trackers, I think it is probably more common than you predict, and failing eagerly would be more disruptive. (The user's experience would be that their connection was dropped for no clear reason.)

@jeffswenson
Copy link
Collaborator

Given how commonly people ask about invalidated prepared statements in bug trackers, I think it is probably more common than you predict, and failing eagerly would be more disruptive. (The user's experience would be that their connection was dropped for no clear reason.)

I'm open to keeping the current implementation as long as we periodically check for this log warning.

That said, if there is a broken plan, closing the connection is likely the least disruptive thing we can do. We can't migrate sessions in the middle of a transaction and as long as they are using a connection pool, the connection will be reopened for them.

@rafiss
Copy link
Collaborator Author

rafiss commented Aug 10, 2023

I'm open to keeping the current implementation as long as we periodically check for this log warning.

for sure. i will add it to our SLI dashboard

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 2 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rafiss)

@rafiss
Copy link
Collaborator Author

rafiss commented Aug 10, 2023

tftr!

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 10, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 10, 2023

This PR was included in a batch that was canceled, it will be automatically retried

@craig
Copy link
Contributor

craig bot commented Aug 10, 2023

Build succeeded:

@craig craig bot merged commit c0fbeb3 into cockroachdb:master Aug 10, 2023
@blathers-crl
Copy link

blathers-crl bot commented Aug 10, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 5befaa7 to blathers/backport-release-23.1-108503: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants