Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discuss] Improve the process around backporting and version checks #8070

Open
msfroh opened this issue Jun 14, 2023 · 7 comments
Open

[Discuss] Improve the process around backporting and version checks #8070

msfroh opened this issue Jun 14, 2023 · 7 comments
Labels
discuss Issues intended to help drive brainstorming and decision making

Comments

@msfroh
Copy link
Collaborator

msfroh commented Jun 14, 2023

Describe the bug
When people backport an API change to the 2.x branch, the backward-compatibility tests on the main branch are broken until the version check in main is updated.

To Reproduce
Steps to reproduce the behavior:

  1. Write code that involves a change to something that gets serialized and sent between nodes. Gate it behind a check like if (out.getVersion().onOrAfter(Version.V_3_0_0))
  2. Submit a PR and get your code merged into the main branch.
  3. Create a backport PR for the 2.x branch that changes the version check to the upcoming 2.x release (e.g. V_2_9_0).
  4. Backward compatibility / mixed cluster tests on main are broken.
  5. Create a PR against main that changes the version check to the upcoming 2.x release, to match the 2.x branch.

The mixed cluster tests fail because the main / 3.0 nodes don't send the new stuff to the 2.x / 2.9.0 nodes, but the 2.9.0 nodes are expecting the new stuff.

Changing the order of PRs wouldn't help. If you change the version check on main to V_2_9_0 before the code is backported to 2.x, then the tests would fail because main / 3.0 sends the 2.x / 2.9.0 nodes the new stuff before they know what to do with it.

In a perfect world, we would be able to atomically merge the backport PR and the updated version check on main in a single operation. Alas the world is imperfect and we can't change multiple branches in a single operation.

Proposed solution

The best solution that I've been able to think of is:

  1. Contributors post both the backport PR and the version check update PR, and link them to each other.
  2. Maintainers:
    a. Confirm that the version check update PR exists (and is just making the correct version check change) before merging the backport PR.
    b. Commit to merging the version check update PR really soon after merging the backport PR.

This is a terrible solution that relies on good intentions and asks humans to implement a transaction. I hate this solution and would love to hear better suggestions.

Note: The version check update PR will have failing mixed cluster tests until the backport PR has been merged. We could suggest that maintainers merge the backport PR, retry Gradle Check on the version check update PR, confirm that it now passes, then merge the version check update PR. That would leave other people's Gradle Checks broken for ~35 minutes. Of course, this also assumes no flaky test failures. Hahahaha!

Additional context
See e.g:

@msfroh msfroh added discuss Issues intended to help drive brainstorming and decision making untriaged labels Jun 14, 2023
@reta
Copy link
Collaborator

reta commented Jun 15, 2023

@msfroh this is definitely very painful experience, I have though about it a lot, and the solution you've proposed does make sense. There are two other options I would like to bring in:

  • target the initial pull request to the minimal designated release, not main: it would basically change the flow from backport to forwardport with the benefit the version change dance won't be needed
  • automate the process: it is relatively easy to spot such version checks in the pull request, the automation we have could be extended to (that assumes automation will work but sadly it fails more often than not now):
    a) change the version check on auto backport
    b) create the forwardport request to target branch

@andrross
Copy link
Member

Contributors post both the backport PR and the version check update PR, and link them to each other.

One caveat here is that the version check update PR on main will fail its gradle check until the backport PR is merged.

target the initial pull request to the minimal designated release, not main: it would basically change the flow from backport to forwardport with the benefit the version change dance won't be needed

The thing I don't like about this approach is there is no forcing function to do the forward port. You can develop and release your feature and "forget" to do the forward port and it might not be missed until the release of the next major version (which can be a year or more). With the backport process if you forget to do the backport then your feature doesn't get shipped.


@msfroh @reta I have also thought about this a lot and every option seems to have downsides. I'm going to suggest one more option (which might be the worst one yet):

  • The bwc tests on main should run against the last released minor version, not the in-flight next minor version. We'd be testing bwc against a fixed version and avoid this moving target problem across branches. When the next minor version release branch is cut, main starts doing bwc tests against that branch, and at that point we have to update the version guards in main at every place that changed a serialization format.

@shwetathareja
Copy link
Member

shwetathareja commented Jun 16, 2023

When the next minor version release branch is cut, main starts doing bwc tests against that branch, and at that point we have to update the version guards in main at every place that changed a serialization format.

@andrross isn't this just delaying the problem for longer, rather we should fix the versions in main as soon as backport PR is merged in 2.x branch. It would be difficult to chase the owners then and maintainers would need to take care unless we are planning to automate it.

automate the process: it is relatively easy to spot such version checks in the pull request,

+1 on automating this by adding relevant labels

@andrross
Copy link
Member

@andrross isn't this just delaying the problem for longer

@shwetathareja Yes (I did say it was a bad idea after all :) ). It does delay the problem, but also isolates the breakages to a single point in time on a single PR. Let's use the upcoming 2.9 release as an example. When the 2.9 release branch is cut, there would be a PR on main that would change it to start running the bwc tests against the new 2.9 branch. The tests would not pass in that PR until all the places that require the version guard to be changed from 3.0 to 2.9 are updated. This should be fairly straightforward (just search for Version.V_3_0_0 and change every occurrence related to a feature that is in the 2.9 release). In this model the backports never cause build breakages on main. The reason I think this idea is a non-starter is that if there are legitimate backward compatibility problems between the 2.9 and 3.0 (more complicated than just getting the version guards correct), then you wouldn't find them until the release process starts which is too late.

What is still unsatisfying about the automation approach is that it doesn't prevent build breakages due to backporting a feature. At best it can just reduce the time window in which things are broken.

@dblock
Copy link
Member

dblock commented Jun 21, 2023

Would introducing a version label that shows intent help?

We would annotate the feature as 2.x_intended and the mixed cluster tests in main would try 2.x, fail, then try 3.x and pass. We then just need a mechanism to ensure that we don't leave any intended for a long time.

@andrross
Copy link
Member

Would introducing a version label that shows intent help?

@dblock How is the existing backport 2.x label different from what you're describing? I know it specifically triggers a backport workflow, but doesn't that also express the intent to support the feature in that version?

@dblock
Copy link
Member

dblock commented Jun 26, 2023

I am saying that "2.x_intended" would be in code, not in a GitHub issue/pr/workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making
Projects
None yet
Development

No branches or pull requests

6 participants