-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI is using an incompatible version of the Conda runtime #177
Comments
Weird. I can reproduce this locally. Notably, when I run It would be good to understand what's going on here—is it a Micromamba bug? are we using it wrong?—but I imagine regardless of what's happening, we might still want to change |
Hmm, that's helpful info. It doesn't explain the behavior in the ncov run though? That one resolved to |
Yeah. Weird. |
Seeing the same issue in the seasonal-flu CI now, so no longer just limited to this repo. |
There's a weird issue with the Conda runtime where the inital setup uses an older version.¹ Do an update of the default runtime after setup to ensure we are using the latest version. We could remove this extra step after we rework `nextstrain setup` to use the same logic as `nextstrain update`. ¹ nextstrain/mpox#177
I can reproduce this locally even with the standalone install of Nextstrain CLI by setting up a new Conda runtime from scratch, which makes sense given we think this is a Micromamba issue. |
Quick fix for the CI while we figure out the underlying issue. |
Thanks for the hot fix for CI! I started digging into what's going on inside Micromamba by doing roughly this: $ cd $(mkdir -dt)
$ export NEXTSTRAIN_HOME=$PWD
$ nextstrain debugger
(Pdb) interact
>>> from nextstrain.cli.runner.conda import micromamba, setup_micromamba
>>> setup_micromamba()
>>> micromamba("create", "-vvv", "--dry-run", "nextstrain-base")
… I confirmed that the package index it's using, https://conda.anaconda.org/nextstrain/linux-64/repodata.json, contains the latest package version. It does. Then I diffed the two index entries to see if anything stood out, but nothing does. Next to dig into the actual solver logs. |
The solver starts by considering the latest version, Since I'm guessing that some difference in the solver or resolution algorithm between the conda-base builds and this version of Micromamba are causing the former to produce something the latter thinks is in conflict. Since they should use broadly the same solver/algo (libmamba → libsolv), this would imply that using a newer Micromamba version might fix this. But also, I expect pinning the |
…but actually is not sufficient on its own:
|
Ok, my reading of the libsolv details in the previous comment and double checking the two suitesparse 5.10.1 packages available on conda-forge has me thinking that boa (used in conda-base builds) is producing a bad solve for the versions of suitesparse and metis. Micromamba seems correct here. (But then again, it also does the upgrade to the latest conda-base just fine?? I'm still confused by that still.) I upgraded Micromamba to 1.5.0 (latest version), and it still doesn't like the latest package, but at least it has a better error message:
This matches my reading of libsolv above. |
Still very confused how "install old, update to latest" works and how other CI jobs installed the latest just fine (e.g.). This feels like something changing at a distance. |
I'd thought maybe Nextstrain CLI 7.2.0's relatively recent upgrade of Micromamba 1.0.0 → 1.1.0 might have been implicated, but 1.0.0 exhibits the same issues locally and besides, 7.2.0 was released 2 weeks ago, well before recent CI jobs like the one linked above passed. |
Looks like new builds of metis 5.1.0 and 5.1.1 were released a couple days ago, maybe some changes in dependencies there? Edit: Oh wait, I see. It is using the latest metis build but still able to install suitesparse. Huh... |
I think I have it figured out. Don't think it's our fault. Let me confirm. |
Anaconda appears to have incorrect indexing of (both builds of) suitesparse 5.10.1. Compare the metadata for the distribution (used post-install to solve deps for subsequent installs):
with the metadata in the channel index (used pre-install to solve deps):
This is why initial install of I confirmed that the distribution metadata API is indeed returning the metadata from the actual distribution:
and that it's all the same as what's in the local install:
In short:
and we're hitting caching issues. (An index is a cache.) |
It gets messier: the difference in the index vs. distribution metadata is not accidental, but intentional. I went to read about channel indexing and noticed this step (emphasis mine):
That raised my eyebrows. So I read further about repodata patching, which mentioned how conda-forge applies repodata patches using https://github.com/conda-forge/conda-forge-repodata-patches-feedstock/. Any sign of suitesparse or metis in there? Oh, you bet! conda-forge/conda-forge-repodata-patches-feedstock@2a2c288 Committed just a couple days ago. So this is intentional, to fix an actual ABI breakage, but it has the side-effect of breaking a previously-working combination of packages. This unfortunate risk is noted by the Conda docs linked to above:
I think if we rebuild conda-base again, now that the hotfixing is in place, we'll be ok for new installs again. Will confirm that next. |
That conda-forge repodata patches change was merged 30 Aug at about 10:19 US/Pacific. To take effect it then would have to be built, uploaded, and finally used by Anaconda during index update. Our latest nextstrain-base version ( |
Rebuild is looking promising already. |
So assuming that rebuild mostly* resolves the issue for now, how do we avoid similar issues in the future? One way might be having scheduled CI in conda-base that regularly tests if the latest package version is still initially installable (similar to how Nextstrain CLI regularly tests if its standalone installers still work, since they're also dependent on external resources). If that test breaks, we get an early warning to see what's up. If we're really fancy, we could potentially even try to detect certain kinds of breakages like this kind here and automatic remediate it by kicking off another package build. * nextstrain-base versions between |
This looks resolved by the just-released nextstrain-base |
Closing this as this reported issue is resolved. We'd maybe like to do more to prevent it from happening in the future, but I opened a conda-base issue for that: nextstrain/conda-base#41 |
…to catch issues¹ earlier. Since successful installation relies on external resources out of our control, we want to regularly test it to ensure we know when an external change breaks it. ¹ e.g. <nextstrain/mpox#177>
This our intent and expectation, and it's good to be explicit about it. It may surface more installation issues, such as the one we observed in monkeypox CI¹ with the latest version not being installable, but obscuring those or surfacing them later on is _probably_ worse than addressing them head on earlier. This change will mean that any macOS 10.14 users, if any, would have to use NEXTSTRAIN_CONDA_BASE_PACKAGE="nextstrain-base ==20230615T171309Z" since newer versions aren't installable for them.² This behaviour also parallels update's behaviour since "runner.conda: Explicitly specify a nextstrain-base version when updating" (d6e4f2b). It was an oversight (on my part) to not use the same behaviour during setup, but at the time I was focused on fixing an update bug. ¹ <nextstrain/mpox#177> ² <nextstrain/conda-base#38>
This our intent and expectation, and it's good to be explicit about it. It may surface more installation issues, such as the one we observed in monkeypox CI¹ with the latest version not being installable, but obscuring those or surfacing them later on is _probably_ worse than addressing them head on earlier. This change will mean that any macOS 10.14 users, if any, would have to use NEXTSTRAIN_CONDA_BASE_PACKAGE="nextstrain-base ==20230615T171309Z" since newer versions aren't installable for them.² This behaviour also parallels update's behaviour since "runner.conda: Explicitly specify a nextstrain-base version when updating" (d6e4f2b). It was an oversight (on my part) to not use the same behaviour during setup, but at the time I was focused on fixing an update bug. Related-to: <nextstrain/conda-base#41> Related-to: <nextstrain/conda-base#42> ¹ <nextstrain/mpox#177> ² <nextstrain/conda-base#38>
…to catch issues like the one in monkeypox CI¹ earlier. Since successful installation relies on external resources out of our control, we want to regularly test it to ensure we know when an external change breaks it. * As it stands currently, this isn't strictly the _latest_ package, just that there's _some_ package version that's installable. To ensure the former, we'd have to query the latest version (e.g. similar to what devel/download-latest does) and pass it into setup-nextstrain-cli as an input, which would then pass it to `nextstrain` in NEXTSTRAIN_CONDA_BASE_PACKAGE. Alternatively, we're likely to update `nextstrain setup conda` anyway to query and install the latest version itself explicitly, just like `nextstrain update conda` does, and so this workflow can simply wait for that change to happen. Resolves <#41>. ¹ e.g. <nextstrain/mpox#177>
|
…to catch issues like the one in monkeypox CI¹ earlier. Since successful installation relies on external resources out of our control, we want to regularly test it to ensure we know when an external change breaks it. Resolves <#41>. ¹ e.g. <nextstrain/mpox#177>
Currently (as observed in #176), the Conda runtime job instance of
pathogen-ci
is failing with the following error:Augur version 22.1.0 is coming from this version of the Conda runtime:
nextstrain-base 20230717T174555Z
.This used to work without any noticeable changes. Example: when the Augur minimum version was bumped to 22.2.0, Augur version 22.2.0 was available in this CI run. Notably, the version of the Conda runtime is
nextstrain-base 20230731T212806Z
.This also seems to be working fine in the ncov repo, where the latest run resolved to
nextstrain-base 20230830T164409Z
.My outstanding question is: why is an older version of the Conda runtime being resolved now, and seemingly only in this repo?
The text was updated successfully, but these errors were encountered: