Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure cached packages installed in CI test phase #14

Open
jakirkham opened this issue Jan 24, 2024 · 9 comments
Open

Ensure cached packages installed in CI test phase #14

jakirkham opened this issue Jan 24, 2024 · 9 comments

Comments

@jakirkham
Copy link
Member

Recently we ran into an issue on a project (cuCIM) where older packages of the project (libcucim & cucim from 23.12) were installed instead of the most recent packages from the PR (24.02.00a*). This made the installation look successful. However old issues that had been fixed in the development branch (branch-24.02) were not getting picked up

This was ultimately caused by a solver issue. However we were not able to ascertain that until we pinned the packages installed in the test phase to the latest version. Then the solver issue became clear and we could work to resolve that

Think we should take a closer look at this issue and come up with a way to guarantee the cached packages are picked up as opposed to some other packages. Attempted to do this more directly by using the <channel>::<package> syntax, but this didn't work well with file based channels. Maybe there is a better way to do this

@jameslamb
Copy link
Member

@jakirkham I'm a bit confused by what specifically "cached" means in your description, but I came here to report an issue that felt like the conda equivalent of #69 because that's how @vyasr described this issue in #69 (comment).

In cugraph 24.10, near the end of burndown, we observed a situation where 24.12 packages were getting pulled in on CI jobs targeting branch-24.10 (rapidsai/cugraph#4690).

It seems like the root cause was this pattern, that's used across RAPIDS CI scripts:

# create an environment
rapids-dependency-file-generator \
  --output conda \
  --file-key test_notebooks \
  --matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION}" | tee env.yaml

rapids-mamba-retry env create --yes -f env.yaml -n test

# download packages built in earlier CI jobs
CPP_CHANNEL=$(rapids-download-conda-from-s3 cpp)
PYTHON_CHANNEL=$(rapids-download-conda-from-s3 python)

# install those packages
rapids-mamba-retry install \
  --channel "${CPP_CHANNEL}" \
  --channel "${PYTHON_CHANNEL}" \
  libcugraph pylibcugraph cugraph

Adding those locally-downloaded files with --channel doesn't force conda install to use the packages... it's free to choose other packages to satisfy constraints like "cugraph". Including those from other RAPIDS releases!

It would be safer for CI purposes to do something like this for that last install:

RAPIDS_VERSION="$(rapids-version-major-minor)"

rapids-mamba-retry install \
  --channel "${CPP_CHANNEL}" \
  --channel "${PYTHON_CHANNEL}" \
  "libcugraph=${RAPIDS_VERSION}.*" \
  "pylibcugraph=${RAPIDS_VERSION}.*" \
  "cugraph=${RAPIDS_VERSION}.*"

That would prevent the solver from ever falling back to older releases (e.g. using 24.08 packages on the 24.10 branch) picking up newer releases (e.g. using 24.12 packages on the 24.10 branch).

I think it's worth making changes like that across RAPIDS in 24.12, what do you think?

cc @bdice @msarahan @KyleFromNVIDIA

@jakirkham
Copy link
Member Author

Meaning the packages from S3

TBH this is something that would be solved by strict channel priority

@jameslamb
Copy link
Member

Got it, ok.

I don't think the particular issue I just noted about would be solved by strict channel priority. It's a case where conda chose 24.12.* packages from the rapidsai-nightly channel when we wanted to use 24.10.* packages from the rapidsai-nightly channel.

@jakirkham
Copy link
Member Author

Hmm...am not following

With your use case, it sounds like cuGraph packages should be in ${CPP_CHANNEL} and ${PYTHON_CHANNEL} correct?

If we added those channels higher in the channel order than rapidsai-nightly and enabled strict channel priority, it would only grab the cuGraph packages from ${CPP_CHANNEL} and ${PYTHON_CHANNEL} and ignore rapidsai-nightly. This is what we want, right?

What am I missing?

@jameslamb
Copy link
Member

🙈 you're absolutely right John, the 24.10.* are coming from those local channels, not rapidsai-nightly, and strict channel priority would solve this. Sorry, I was not thinking about this the right way. Thanks for correcting me.

We might still want to consider something like the script changes I mentioned above, since they're a relatively small amount of effort compared to enforcing strict channel priority.

@jakirkham
Copy link
Member Author

All good. Just wanted to make sure I wasn't missing something

Agree that would be a good improvement. We did the same thing in cuCIM: rapidsai/cucim#680

Another improvement that has been helpful has been this suggestion by Charles ( #22 ). It is good both for overall runtime and reliability

@vyasr
Copy link
Contributor

vyasr commented Oct 4, 2024

The reason that I likened this issue to #69 is that fundamentally --find-links is analogous to adding a conda channel because it gives package manager an extra place to pull dependencies from. The main difference between pip and conda in this regard is that pip has no concept of channel priority, so with pip you had to introduce constraints in order to force installation of the desired file. With conda, as John pointed out the channel priority should solve the problem. In the interim, though, the suggestion that you proposed would work and would be the closest direct port possible of the pip constraints approach that you've added.

@jameslamb
Copy link
Member

Yep yep makes sense, thanks for saying you also support the idea. I'm gonna plan on doing this (adding the constraints in scripts) next week, on 24.12. Seems like a few hours of effort, well worth it in my opinion to minimize pain like what we're experiencing in cugraph for 24.10 and trying to fix in rapidsai/cugraph#4690

@jameslamb
Copy link
Member

I created a sub-issue to track the work of updating those scripts: #106.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants