Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Conda-Store strips environment vars from env.yaml spec. #719

Closed
dharhas opened this issue Jan 4, 2024 · 7 comments · Fixed by #721
Closed

[BUG] - Conda-Store strips environment vars from env.yaml spec. #719

dharhas opened this issue Jan 4, 2024 · 7 comments · Fixed by #721
Assignees
Labels
area: configuration ⚙️ area: dependencies 📦 Issues related to conda-store dependencies area: user experience 👩🏻‍💻 Items impacting the end-user experience type: bug 🐛 Something isn't working

Comments

@dharhas
Copy link
Member

dharhas commented Jan 4, 2024

Describe the bug

The conda env.yaml spec allows setting environment variables :

https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#setting-environment-variables

We need this to enable GPU builds when using conda-store. It looks like conda-store is stripping the env variable section from the yaml.

Expected behavior

The environment variables are maintained in the yaml and used in building the environment.

How to Reproduce the problem?

Use the following env:

channels:
  - conda-forge
dependencies:
  - pytorch
  - ipykernel
variables:
   CONDA_OVERRIDE_CUDA: 11.8

build the environment and then open the yaml again and the variables section is missing.

if you look at the generated lock file you can also see that the env variables were not used:

{
version: 1,
metadata: {
content_hash: {
linux-64: "2a255c84a3c70a2fec49eb99f5f45a464a73d812efc7902ae201538876fcb49a"
},
channels: [
{
url: "conda-forge",
used_env_vars: [ ]
}
],
platforms: [
"linux-64"
],
sources: [
"environment.yaml"
]
},

Output

No response

Versions and dependencies used.

No response

Anything else?

No response

@dharhas dharhas added the type: bug 🐛 Something isn't working label Jan 4, 2024
@kcpevey kcpevey added this to the Release 2024.1.1 milestone Jan 4, 2024
@nkaretnikov nkaretnikov self-assigned this Jan 4, 2024
@nkaretnikov nkaretnikov moved this from New 🚦 to TODO 📬 in conda-store 🐍 Jan 4, 2024
@amjames
Copy link

amjames commented Jan 4, 2024

I think this is an issue with conda-lock. When parsing the environment.yaml they simply don't consider the variables section at all.

Short term, The run_lock api we are using does have a with_cuda=<version string> option. We could check for CONDA_OVERRIDE_CUDA variable explicitly and use it to set that option. I am unsure if using the with_cuda argument this way will properly sidestep the the underlying issue with building cuda-envs on non-cuda machines. The option triggers changes to the channel set and a warning if the toolkit is pulled into the spec implicitly rather than as an explicit dependency.

Long term, the handling of the variables section should be fixed upstream in conda-lock.

@nkaretnikov
Copy link
Contributor

nkaretnikov commented Jan 4, 2024

I'll take a look and say for sure. It could be that our specification parser just strips it before conda-lock is called.

UPD: Looks like Andrew is right. Variable parsing landed in #424, but then it broke once we switched to conda-lock.

For reference, in conda-lock, the parsing is done via parse_environment_file, which only parses pre-defined fields and ignores the variables field.

I think it should be possible to pass __cuda via virtual_package_spec since it affects how cuda is detected in make_lock_files, but it requires having a separate file just for virtual packages, so not worth it.

It seems like setting with_cuda based on the value of the said env var is the easiest workaround since it ends up creating a FakePackage via default_virtual_package_repodata in conda-lock anyway.

UPD2: Unlike the server, conda-store-ui doesn't have support for variables, even when you use the raw YAML field, so using the above will only work via the server admin environment edit page.

@nkaretnikov nkaretnikov moved this from TODO 📬 to In Progress 🏗 in conda-store 🐍 Jan 4, 2024
nkaretnikov added a commit to nkaretnikov/conda-store that referenced this issue Jan 5, 2024
@nkaretnikov nkaretnikov moved this from In Progress 🏗 to In review 👀 in conda-store 🐍 Jan 5, 2024
@nkaretnikov
Copy link
Contributor

Status update: opened a PR with the quick fix on the server side. Will need to open another PR on the UI side to make this work via the UI (inlcuding via raw YAML) because the UI strips variables. With the current PR, it will only work via the server admin environment page.

@amjames
Copy link

amjames commented Jan 5, 2024

Follow up:

#721 will provide a path for the CONDA_OVERRIDE_CUDA variable to be respected. However generally the problem that the variables section does not persist into the lock file will still exist. Is there any need in the foreseeable future for other variables to be used?

I am asking because 1) the use of the CONDA_OVERRIDE_CUDA variable may give the impression that environmental variables propagate from the environment.yaml through to the final environment properly, so if we are not going to support that it should be documented somehow, and 2) In order to generally support the variables section of the environment.yaml changes will have to be made upstream in conda-lock.

cc @dharhas

@dharhas
Copy link
Member Author

dharhas commented Jan 5, 2024

I vote on making the appropriate fixes upstream in conda-lock. In the short term, does #721 at least allow us to build GPU enabled environments?

@nkaretnikov
Copy link
Contributor

@dharhas

does #721 at least allow us to build GPU enabled environments?

I've only tested this by generating a lockfile on a CPU-only machine, then manually transferring that lockfile and building from it on a machine with a GPU. Described in more detail here: #721 (comment)

I think we should test on Nebari: run a CPU-only instance, build, then switch to a GPU-enabled instance and see if it works. But I don't have access to a Nebari deployment where I could try that at the moment. I'll follow up offline.

nkaretnikov added a commit to nkaretnikov/conda-store that referenced this issue Jan 16, 2024
@github-project-automation github-project-automation bot moved this from In review 👀 to Done 💪🏾 in conda-store 🐍 Jan 16, 2024
@nkaretnikov
Copy link
Contributor

Status update: merged the quick fix, but it'll only work via the admin UI. To make this work via the new UI, this needs to be reviewed and merged as well: conda-incubator/conda-store-ui#354

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: configuration ⚙️ area: dependencies 📦 Issues related to conda-store dependencies area: user experience 👩🏻‍💻 Items impacting the end-user experience type: bug 🐛 Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants