Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python310Packages.jax: 0.3.6 -> 0.3.16 #183051

Merged
merged 1 commit into from
Aug 30, 2022

Conversation

mcwitt
Copy link
Contributor

@mcwitt mcwitt commented Jul 27, 2022

Description of changes

Upgrades jax to 0.3.16 and jaxlib to 0.3.15.

Still a draft because this depends on #183052.

Note: This marks the darwin build as broken because jax>=0.3.14 sets macos_minimum_os=10.14, but as far as I can tell 10.12 is the latest supported for x86_64 in nixpkgs (relevant issue: #101229).

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.11 Release Notes (or backporting 22.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

@mcwitt mcwitt changed the title python3Packages.jax: 0.3.6 -> 0.3.15 WIP: python3Packages.jax: 0.3.6 -> 0.3.15 Jul 27, 2022
@mcwitt mcwitt changed the title WIP: python3Packages.jax: 0.3.6 -> 0.3.15 WIP: python310Packages.jax: 0.3.6 -> 0.3.15 Jul 27, 2022
@mcwitt mcwitt force-pushed the squashed/upgrade-jax branch 4 times, most recently from 07ec157 to 7b4dce1 Compare July 29, 2022 23:55
@mcwitt
Copy link
Contributor Author

mcwitt commented Aug 3, 2022

Status update: I'm stuck trying to get this to build on x86_64 darwin, getting errors like the following:

external/org_tensorflow/tensorflow/stream_executor/allocator_stats.h:45:3: error: no template named 'optional' in namespace 'std'; did you mean 'absl::optional'?

I'm less familiar with darwin, but it seems like this might be an issue with nixpkgs using an unsupported darwin SDK version for x86_64. Since v0.3.14, jax has bumped macos_minimum_os to 10.14, but as far as I can tell nixpkgs currently only supports 10.12 for x86_64 (relevant issue: #101229).

For now I'm going to mark this broken on darwin since I haven't been able to figure out a workaround.

@mcwitt mcwitt force-pushed the squashed/upgrade-jax branch 2 times, most recently from 65472a1 to 1da50b0 Compare August 3, 2022 16:02
@mweinelt mweinelt requested a review from samuela August 5, 2022 21:20
Copy link
Member

@samuela samuela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together @mcwitt! Overall this looks thorough and most of the changes make sense to me. Just a few questions here and there...

Comment on lines +126 to 136
] ++ lib.optionals (!stdenv.isDarwin) [
nsync
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that the Darwin build was being disabled until we get the macOS 10.14 SDKs?

Copy link
Contributor Author

@mcwitt mcwitt Aug 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came from syncing with the tensorflow derivation (here:

] ++ lib.optionals (!stdenv.isDarwin) [
nsync
];
)
My thought was to leave the darwin-specific workarounds in place (even though the build is disabled for now) in case switching to the newer SDK fixes the remaining issues. I'm not really sure of the recommended practice in this situation, though -- happy to remove the gating if that seems preferable (and I guess we may end up doing it anyway if the SDK upgrade also fixes the nsync build on darwin).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, yeah now that we will have darwin builds working soon (shout out to @uri-canva!), we can figure how to make this work after all

pkgs/development/python-modules/jaxlib/default.nix Outdated Show resolved Hide resolved
@@ -197,7 +203,9 @@ let
"typing_extensions_archive"
"wrapt"
"zlib"
];
] ++ lib.optionals (!stdenv.isDarwin) [
"nsync" # fails to build on darwin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that the package fails to build on darwin entirely, or that it fails to build without this particular line of code? Or does it mean that the vendored version of nsync fails to build?

Copy link
Contributor Author

@mcwitt mcwitt Aug 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it means that nsync in nixpkgs is not supported on darwin (darwin is not listed in platforms:

# On macOS we get an error for some reason:
# > mkdir: cannot create directory 'build': File exists
platforms = lib.platforms.linux;
)

(This was taken from the tensorflow derivation.)

@uri-canva
Copy link
Contributor

The darwin build might be ok after #184395, since that updates the package to use SDK 11.0, which is available in nixpkgs, it's just not the default one.

@samuela
Copy link
Member

samuela commented Aug 6, 2022

The darwin build might be ok after #184395, since that updates the package to use SDK 11.0, which is available in nixpkgs, it's just not the default one.

Once we get #184395 merged, we should def rebase this on top of that

@mcwitt mcwitt force-pushed the squashed/upgrade-jax branch 2 times, most recently from d9dbfae to 605d135 Compare August 6, 2022 17:40
@ofborg ofborg bot requested a review from samuela August 6, 2022 18:11
@mcwitt
Copy link
Contributor Author

mcwitt commented Aug 24, 2022

Result of nixpkgs-review pr 183051 run on x86_64-linux 1

I skimmed over the breakages. Many seem to be caused by the following:

  • chex: broken by upstream removal of tree_multimap. Fixed by upgrading to latest (0.1.4)
  • jmp: broken by upstream removal of tree_multimap. Fixed by upgrading to latest (google-deepmind/jmp@260e5ba)
  • distrax: many tests fail; upgrade to latest not possible since it adds a numpy<1.23 version bound
  • tensorflow-datasets: long-running test suite; not sure if it'll be fixed by the above fixes or a version bump

The others seem to be broken by unrelated issues (mostly incompatible scipy and rich versions).

python310Packages.jaxlib: 0.3.0 -> 0.3.15
@mcwitt
Copy link
Contributor Author

mcwitt commented Aug 25, 2022

Unfortunately rebasing with #184395 merged hasn't magically fixed the darwin build.

  • Bazel fails to find libtool. I think this is fixed by adding cctools to nativeBuildInputs (mcwitt@4f8786c)
  • ERROR: /private/tmp/nix-build-bazel-build-jaxlib-0.3.15.drv-0/output/external/llvm-project/mlir/BUILD.bazel:6549:11: Compiling mlir/lib/TableGen/Argument.cpp failed: undeclared inclusion(s) in rule '@llvm-project//mlir:TableGen' (log). @uri-canva do you have any insight here?

@uri-canva
Copy link
Contributor

I have not seen the undeclared inclusion(s) in rule in my testing. Did you test if jaxlib works on its own without this branch? Maybe it's already broken in master.

@uri-canva
Copy link
Contributor

It shows as successful 9 hours ago: https://hydra.nixos.org/eval/1777705?filter=jaxlib&compare=1777668&full=

@uri-canva
Copy link
Contributor

Oh I see, it's probably some new transitive dependency from the version upgrade.

@uri-canva
Copy link
Contributor

Seems to be a known issue with bazel:
tensorflow/tensorflow#55563
bazelbuild/bazel#15359

@uri-canva
Copy link
Contributor

The workarounds might be a bit hard for us though, not sure how well gcc works on darwin, but we can try.

@samuela
Copy link
Member

samuela commented Aug 25, 2022

I skimmed over the breakages. Many seem to be caused by the following:

Thanks for digging into these failures @mcwitt!

Unfortunately rebasing with #184395 merged hasn't magically fixed the darwin build.

Just to confirm: @mcwitt are you building on x86_64-darwin?

@mcwitt
Copy link
Contributor Author

mcwitt commented Aug 25, 2022

Just to confirm: @mcwitt are you building on x86_64-darwin?

@samuela yes, the undeclared inclusion(s) error occurs for x86_64-darwin. Here's the ofborg build log: https://logs.nix.ci/?key=nixos/nixpkgs.183051&attempt_id=fe336da2-c899-425e-b4fc-a0e9328d45b0

@samuela
Copy link
Member

samuela commented Aug 25, 2022

It appears that python310Packages.jax.x86_64-darwin is also failing on master but due to a different failure: https://hydra.nixos.org/build/188060892/nixlog/1

@samuela
Copy link
Member

samuela commented Aug 25, 2022

Is there any hope trying to fix or patch the x86_64-darwin bug ourselves? It appears to be a quite hairy issue in bazel IIUC... Perhaps we should also raise it with the JAX folks?

@mcwitt
Copy link
Contributor Author

mcwitt commented Aug 25, 2022

I'm ok letting these slide for now as long as someone promises to come back and do the necessary upgrades after this PR is merged.

@samuela I'd be happy to handle upgrading chex and jmp after this is merged (already have the updates building in local branches so it shouldn't be too much work 🤞 )

@samuela
Copy link
Member

samuela commented Aug 29, 2022

@mcwitt @uri-canva How do ya'll feel about merging this as is, despite the fact that it breaks the x86_64-darwin build? On the one hand I hate to break the build... OTOH a) this breakage isn't our fault, b) I don't want PRs to get stuck forever. wdyt?

@uri-canva
Copy link
Contributor

Yeah I think it's ok, I only just fixed the darwin build, and I'm not sure how long it was broken before that, so I don't think there would be anyone depending on it.

@uri-canva
Copy link
Contributor

Let's mark it as broken and add a comment with all the context / links to the issues for future reference. That way when someone looks at it in the future they won't have to start figuring it out from scratch.

@samuela
Copy link
Member

samuela commented Aug 30, 2022

Yeah, I def don't want you to think that the x86_64-darwin work was for nothing @uri-canva! We should def commit to getting darwin support back up and running as soon as upstream progress on this issue allows.

@SuperSandro2000
Copy link
Member

so we can merge this now or not?

@samuela samuela merged commit 2307799 into NixOS:master Aug 30, 2022
@samuela
Copy link
Member

samuela commented Aug 30, 2022

I think we can!

@mcwitt mcwitt deleted the squashed/upgrade-jax branch August 30, 2022 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants