-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{lib}[foss/2022a] TensorFlow v2.11.0 w/ Python 3.10.4 (+ CUDA 11.7.0) #17241
{lib}[foss/2022a] TensorFlow v2.11.0 w/ Python 3.10.4 (+ CUDA 11.7.0) #17241
Conversation
…orFlow-2.11.0-foss-2022a.eb and patches: TensorFlow-2.11.0_disable-avx512-extensions.patch, TensorFlow-2.11.0_fix-eigen-atan-on-PPC.patch, TensorFlow-2.11.0_fix-link-error.patch, TensorFlow-2.11.0_remove-libclang-and-io-gcs-deps.patch, TensorFlow-2.5.0_fix-arm-vector-intrinsics.patch
Test report by @Flamefire |
ec8faa9
to
ca2e0f3
Compare
I hope I am wrong, but downloading this MR and checking it against
|
You are right, they are from the other PRs and I forgot about them and the unmerged #17101 likely hides those missing files from the CI check/output by the bot. I added the required PRs to the description. See all my TF PRs here ;-) |
Test report by @Flamefire |
Test report by @Flamefire |
@surak I added those patches to this PR. I hope that doesn't lead to conflicts when the others are merged |
Test report by @Flamefire |
Test report by @SebastianAchilles |
After discussion with the developers of Eigen I'm testing a variation of the patch for PPC. --> Converted to draft in the meantime. This only affects PPC, so any other architecture can test & use this already. |
Test report by @Flamefire |
|
I have this patch TensorFlow-2.9.1_fix-protobuf-include-def.patch Fix an issue where google/protobuf/port_def.inc is not found.
diff -ruN tensorflow-2.9.1_old/third_party/systemlibs/protobuf.BUILD tensorflow-2.9.1/third_party/systemlibs/protobuf.BUILD
--- tensorflow-2.9.1_old/third_party/systemlibs/protobuf.BUILD 2022-11-10 16:57:13.649126750 +0100
+++ tensorflow-2.9.1/third_party/systemlibs/protobuf.BUILD 2022-11-10 17:00:42.548576599 +0100
@@ -43,4 +43,6 @@
],
),
"wrappers": ("google/protobuf/wrappers.proto", []),
+ "port_def": ("google/protobuf/port_def.inc", []),
+ "coded_stream": ("google/protobuf/io/coded_stream.h", []),
}
RELATIVE_WELL_KNOWN_PROTOS = [proto[1][0] for proto in WELL_KNOWN_PROTO_MAP.items()] |
If I apply the patch above, I end up here: ERROR: /dev/shm/strube1/jusuf/TensorFlow/2.11.0/foss-2022a-CUDA-11.7/TensorFlow/tensorflow-2.11.0/tensorflow/cc/BUILD:824:11: Compiling tensorflow/cc/framework/cc_op_gen_main.cc failed: undeclared inclusion(s) in rule '//tensorflow/cc:cc_op_gen_main':
this rule is missing dependency declarations for the following files included by 'tensorflow/cc/framework/cc_op_gen_main.cc':
'bazel-out/k8-opt/bin/external/com_google_protobuf/google/protobuf/io/coded_stream.h'
Target //tensorflow/tools/pip_package:build_pip_package failed to build |
Are you using the updated easyblock? |
I am using the one on the latest easybuild 4.7.0 - is there a newer one around? |
Yes, see the PR description: easybuilders/easybuild-easyblocks#2854 |
I see that flatbuffers is there, but it tells me this:
Can it be that we are missing the |
I put that into the main flatbuffers which is referenced in the description: #17114 So it doesn't require 2 dependencies (anymore) |
It works for me, installed on Jülich's Juwels Booster, Jureca DC, Juwels Cluster, Jusuf and the HDFML machine. |
Test report by @VRehnberg |
Test report by @smoors |
I'm running into
which looks similar to bazelbuild/bazel#15359 . Anyone seen that before? Did I miss to rebuild something / include some patch / ...? (NB: I am running with Update: I'm running with |
Test report by @branfosj |
Test report by @branfosj |
Test report by @casparvl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the LLVM, which I think you still intended to remove, all looks good to me now. We also have plenty of succesful tests now.
The only thing not working is --rpath
support, but I don't want to block this PR over that.
easybuild/easyconfigs/t/TensorFlow/TensorFlow-2.11.0-foss-2022a-CUDA-11.7.0.eb
Outdated
Show resolved
Hide resolved
@boegelbot please test @ jsc-zen2 |
@casparvl: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster PR test command '
Test results coming soon (I hope)... - notification for comment with ID 1474245051 processed Message to humans: this is just bookkeeping information for me, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I'll give it one final test run just to make sure that the removal of LLVM
as build dep doesn't affect anything. Once that completes, I'd consider this ready to be merged.
@boegelbot please test @ generoso |
@casparvl: Request for testing this PR well received on login1 PR test command '
Test results coming soon (I hope)... - notification for comment with ID 1474552675 processed Message to humans: this is just bookkeeping information for me, |
Test report by @boegelbot |
Test report by @casparvl |
Going in, thanks @Flamefire! |
Test report by @boegelbot |
(created using
eb --new-pr
)[ ] Requires {lib}[foss/2021b] TensorFlow v2.8.4 w/ CUDA-11.4.1 and fix patches + extensions in easyconfig for TensorFlow 2.8.4 w/ foss/2021b #17058[ ] Requires {lib}[foss/2022a] TensorFlow v2.9.1 w/ Python 3.10.4 #17092