Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to work with nixpkgs unstable #1

Merged
merged 2 commits into from
Mar 22, 2023
Merged

Update to work with nixpkgs unstable #1

merged 2 commits into from
Mar 22, 2023

Conversation

bcdarwin
Copy link
Contributor

@bcdarwin bcdarwin commented Feb 22, 2023

Just a bunch of updates, trying to get things simplified a bit by using nixpkgs expressions when possible before doing any refactoring. Currently one big monolithic commit -- can be split out into more logical pieces before final review. Also note I have not built the full GPU-ized version here.

A few issues:

  • sometimes it's not clear why a package has been overridden -- in future this should probably be indicated with a comment when it's not obvious.
  • not all Python packages existing in the overlay are actually built by nix develop
  • it's not clear which packages are actually needed by us and which are there for demonstration purposes (the list of dependencies seems long enough I suspect the former)
  • many of the included packages have no tests (also a problem for nixpkgs python+js packages tbh)
  • as a result there's no clear acceptance criteria for the repo.

Nonetheless, initial comments welcome.

@bcdarwin
Copy link
Contributor Author

bcdarwin commented Feb 23, 2023

So, currently we cannot build this package set due to NixOS/nixpkgs#217878, preventing (at least) torchvision from building. It's not trivial to resolve this until NVIDIA releases a CUDNN compatible with CUDA 12.

Currently, options include:

  1. build torchvision without CUDA support (unsure how speed/memory of our models will be affected)
  2. revert part or all of the gcc11Stdenv -> gcc12Stdenv bump (mass rebuild, unclear extent and whether it will work)
  3. keep the old package set for now.

At the moment we're leaning toward keeping the old package set, so marking this draft for now.

@bcdarwin bcdarwin marked this pull request as draft February 23, 2023 19:54
@bcdarwin bcdarwin force-pushed the unstable-update branch 2 times, most recently from f72f44e to 6747a34 Compare February 23, 2023 20:00
@bcdarwin bcdarwin force-pushed the unstable-update branch 3 times, most recently from 119de93 to 976274c Compare March 8, 2023 22:09
@bcdarwin bcdarwin force-pushed the unstable-update branch 3 times, most recently from 85c8ee9 to 6a681e4 Compare March 15, 2023 16:34
@bcdarwin
Copy link
Contributor Author

@cfhammill this should be ready to go -- once we think it's ready I'll split apart the commit before merging.

@bcdarwin
Copy link
Contributor Author

So, currently we cannot build this package set due to NixOS/nixpkgs#217878, preventing (at least) torchvision from building. It's not trivial to resolve this until NVIDIA releases a CUDNN compatible with CUDA 12.

Currently, options include:

1. build torchvision without CUDA support (unsure how speed/memory of our models will be affected)

2. revert part or all of the gcc11Stdenv -> gcc12Stdenv bump (mass rebuild, unclear extent and whether it will work)

3. keep the old package set for now.

At the moment we're leaning toward keeping the old package set, so marking this draft for now.

This has been mostly addressed by some recent CUDA-related fixes in Nixpkgs. The main outstanding issue for us is NixOS/nixpkgs#220341, which stops us from building grad-cam or from enabling cudaSupport globally (since this also breaks python310Packages.jax which is somehow in our dependency chain).

@bcdarwin bcdarwin force-pushed the unstable-update branch 3 times, most recently from 57d525c to 07517dc Compare March 20, 2023 17:23
@bcdarwin bcdarwin marked this pull request as ready for review March 20, 2023 17:23
@bcdarwin bcdarwin requested a review from cfhammill March 20, 2023 17:24
flake.nix Show resolved Hide resolved
flake.nix Outdated Show resolved Hide resolved
flake.nix Show resolved Hide resolved
@cfhammill
Copy link
Collaborator

So, currently we cannot build this package set due to NixOS/nixpkgs#217878, preventing (at least) torchvision from building. It's not trivial to resolve this until NVIDIA releases a CUDNN compatible with CUDA 12.
Currently, options include:

1. build torchvision without CUDA support (unsure how speed/memory of our models will be affected)

2. revert part or all of the gcc11Stdenv -> gcc12Stdenv bump (mass rebuild, unclear extent and whether it will work)

3. keep the old package set for now.

At the moment we're leaning toward keeping the old package set, so marking this draft for now.

This has been mostly addressed by some recent CUDA-related fixes in Nixpkgs. The main outstanding issue for us is NixOS/nixpkgs#220341, which stops us from building grad-cam or from enabling cudaSupport globally (since this also breaks python310Packages.jax which is somehow in our dependency chain).

I think jax is sneaking in via optuna, not bad to have around though. Does this mean jax can't be built with CUDA support at all (like grad-cam) or just doesn't work with cuda enabled globally?

@bcdarwin
Copy link
Contributor Author

So, currently we cannot build this package set due to NixOS/nixpkgs#217878, preventing (at least) torchvision from building. It's not trivial to resolve this until NVIDIA releases a CUDNN compatible with CUDA 12.
Currently, options include:

1. build torchvision without CUDA support (unsure how speed/memory of our models will be affected)

2. revert part or all of the gcc11Stdenv -> gcc12Stdenv bump (mass rebuild, unclear extent and whether it will work)

3. keep the old package set for now.

At the moment we're leaning toward keeping the old package set, so marking this draft for now.

This has been mostly addressed by some recent CUDA-related fixes in Nixpkgs. The main outstanding issue for us is NixOS/nixpkgs#220341, which stops us from building grad-cam or from enabling cudaSupport globally (since this also breaks python310Packages.jax which is somehow in our dependency chain).

I think jax is sneaking in via optuna, not bad to have around though. Does this mean jax can't be built with CUDA support at all (like grad-cam) or just doesn't work with cuda enabled globally?

The latter; I've added jaxlibWithCuda to the dependencies for now (until config.enableCuda works) to verify that we have the correct build.

@bcdarwin bcdarwin requested a review from cfhammill March 21, 2023 14:12
@cfhammill
Copy link
Collaborator

do you know why the build results are different for jaxWithCuda and with cuda enabled globally?

@bcdarwin
Copy link
Contributor Author

bcdarwin commented Mar 21, 2023

do you know why the build results are different for jaxWithCuda and with cuda enabled globally?

Hits nixpkgs/#220341 when config.enableCuda. Hopefully fixed soon though.

@bcdarwin bcdarwin merged commit b369e04 into main Mar 22, 2023
@bcdarwin bcdarwin deleted the unstable-update branch March 23, 2023 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants