-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pre-release testing strategy #16
Comments
The main worry I have is that if we're not modifying the default channel, then I think we lose out on testing of e.g. clippy, etc. that will happen over the course of the few days (tuesday/wednesday) in between having dev-static out and shipping the release itself. I think it's unlikely people will switch to using a versioned release in editor configurations, etc. That could be mitigated with a
I am curious on this aspect -- in theory, the dev-static artifacts are mostly equivalent to the real stable release. So I'm not sure that it's actually that bad to be using them. Perhaps I am weighing this trade-off of daily usage against probability of finding problems differently from you, though? |
I agree with the need for testing, but I also agree that something like How about setting an environment variable within a particular terminal that all tools respect? Once you close your terminal, you drop that environment variable which resets everything back to the normal default state. It would also let you run multiple terminals side by side, each with different toolchains running for comparison purposes. |
@ckaran Setting Even without that, I do think people who already run clippy are likely to test the new release with I'd prefer to see people intentionally testing, and knowing that was what they were doing, rather than accidentally testing without having "I'm trying a pre-release that I need to report bugs about but I can always revert if there's an issue" at the top of their mind. I'd prefer that even at the expense of less testing, because it seems important that people's use of "stable" match the latest stable we've actually released. I agree that generally our pre-releases end up matching the final release in practice, but we ask for testing to catch last-minute bugs, which if found would be one case where they might not match. |
It might be helpful to take a step back and describe the goals for the pre-release builds. As I see them, given the time window, there's several goals:
I am not sure either of these are sufficiently worthwhile for the test to make much sense, as it's currently placed. It may be better for us to just make a post, say, Monday after the release reminding users to update to latest beta release -- that may be better placed in terms of getting us testing, and removes the need for the release team to coordinate dev-static (which does not seem prominently useful, as the two goals above are ~largely covered by beta releases already, IMO). Are there other goals we might have in mind for the dev-static release? |
Sounds like the problem is (mostly) solved then! If there is good enough documentation for this, then I think everyone will be happy. |
@Mark-Simulacrum Reminding people to install/update the beta channel and to test their code with it might well be more productive than doing a last-minute "please try the stable pre-release". Have we, historically, gotten critical bug reports out of the pre-release testing that we haven't gotten from beta? |
Did a survey of the internals posts that I could quickly find: "Potentially critical" breakage:
Non-critical breakage:
No 'bad' feedback before the release, modulo non-per-release discussion and release note change suggestions:
So we've had 3/58 releases with 'critical' feedback on the internals thread, and 8/58 releases have had some feedback given. I could not find threads for 1.0-1.6 (internals search is failing me), but optimistically am assuming all was largely OK. Not sure what conclusions to draw yet, but it's likely also instructive to look at point releases, which are likely a superset of critical breakage / "missed" problems, primarily adding in rare security fixes. Going by the blog to try and find all these point releases. If there's a fix that was previously "unknowable" (e.g., security point release), I'll call that out, as that means it is not really relevant for the discussion here (since prerelease artifact testing can't have found it). Issues are X/Y, where Y is the total number of issues "fixed", and X is the number of critical issues (i.e., not just incidental backports due to the ocurrence of the release), as quickly assessed by me in hindsight.
Overall it seems like there's basically been one(!) case where a point release was issued for a fix that couldn't have been detected on betas, ignoring cases where we have fairly last minute backports -- and no beta is produced with those commits. My gut feeling is that those commits rarely introduce breakage, so I am not too worried, but it's worth calling out. In general, this data I think shows that prerelease artifact testing is not particularly useful. We very rarely get reports of breakage on those threads; 2-3 cases at most in the last ~5 years. The point release data also suggests a significantly (~10x) higher rate of bugs which are not found by the prerelease testing we currently do, or at least, not fixed. Of the 21 point releases since 1.0, it seems like the vast majority of issues could have been detected on beta releases -- and so, it is unclear that having dedicated artifacts is that instrumental. We currently have those artifacts on Tuesday morning if there are no problems, per the release process, as they're produced Monday evening (US). This gives us a roughly 48 maximum window to rebuild those artifacts, which is quite short, considering an issue needs to get found, noticed by someone on the team(s), a fix identified + developed, and finally artifacts kicked off. This involves typically at least 3-4 people, which is a lot to engage in such a short timespan. It also creates additional overhead for the release team, currently adding ~2 public posts which need to be made (internals + inside Rust) that need to be done ASAP to maximize effectiveness, giving less slack for babysitting artifacts and such. All that to say: I think we need to revisit the strategy of asking to test these artifacts at the last minute like this. Currently, it is the case that we typically do not have those artifacts until the last minute (we almost always have a final round of beta backports included in the stable artifact PR), so moving this testing earlier essentially amounts to an ask to test beta, IMO. How we motivate users to test beta has never been all that clear, as it yields no real benefit for most users, beyond helping the Rust project avoid possible regressions; it's not clear whether a regular call to test will be impactful. It might be that the right solution lies closer to a push to get Rust users onto beta for all local development -- through inclusion of asks to switch to beta in release blog posts, maybe rustup suggestions after stable upgrades, maybe some It's also suggestive that just dropping this additional step from our release process would likely not seriously harm us, in practice, and so may be advisable in that regard -- something to consider, at least. Open to hearing thoughts, obviously lots to digest here, and I'll be thinking more on this as well. |
Is there a way of setting an environment variable that builds projects using all available toolchains? E.g., I have the nightly, beta, and stable channels on my machine, but I only ever build using stable unless I need something that is only available on nightly. Instead of selecting and switching between the various toolchains actively, what about an opt-in mode ( As an added sanity measure, we could also print out a warning each time the tools are used when |
@ckaran I don't know how easily we could support something like However, the concept of "print a warning each time" is a helpful one. We could, without too much difficulty, make rustup support If we wanted to move the call for testing earlier, that could just as easily become |
@joshtriplett OK, if
If you want to test the new tools, call Notes
|
@ckaran Are you just trying to prevent people from having to rebuild their code when they switch back to stable? Because they'll have to rebuild it anyway when they upgrade to the stable release. |
@joshtriplett No, I'm trying to ensure that they don't forget that they're testing and then wonder why their build times are slow, etc. By using a script that sets ephemeral shell variables, as soon as that shell is discarded, you're back to your good old default toolchain. I also think that having a script that you have to call within the shell each time you want to set the variables will help avoid the problems of setting the default toolchain, and then forgetting that you did that. Maybe instead of all of the above, we need something like |
The trouble with environmental variables is that IDEs will not always see them, and won't run the correct language server. I'd maybe suggest using a marker file |
I don't interact all that much with pre-release, but has the obvious question been asked: how often people are forgetting they're on a pre-release toolchain? Is this a big problem or pain point for a lot of folks? |
On the original question of having users override their "stable" channel -- What does rustup actually look at when deciding if an update is needed? The manifests like On the broader question whether this call for testing is useful, I don't know. I do often use that src tarball for a scratch build on Fedora infrastructure, but I don't remember ever having an issue to report back from that. Usually if anything, I just find some small build changes that I need to adapt in my rpm spec. That's nice for me to get ahead of release day, at least. :) |
Coming back to this: I'm increasingly finding myself agreeing with @Mark-Simulacrum's assessment that maybe we just don't need to ask people to test the release artifacts in advance. (If we do, I continue to think we should suggest that people install it by version.) |
An interesting midway alternative could be a But I do generally agree that testing against |
@joshtriplett suggested in the 1.55 thread:
Wanted to open this issue so we're not trying to discuss this on internals threads on every release, when the issue is somewhat decoupled in practice.
The text was updated successfully, but these errors were encountered: