-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First PyTorch tests for TorchScript inference CPU/CUDA #43475
Conversation
0bfcc4a
to
590cdd2
Compare
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43475/38037
|
A new Pull Request was created by @valsdav (Davide Valsecchi) for master. It involves the following packages:
The following packages do not have a category, yet: PhysicsTools/PyTorch @cmsbuild, @tvami can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
} | ||
} | ||
|
||
std::string testBasePyTorch::cmsswPath(std::string path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be feasible to use edm::FileInPath
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smuzaffar Should this binary file be placed in cms-data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid adding binary files to cmssw. It is better to move it to cms-data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm changing this to use the cmsml
docker image to generate the binary files for tests on the fly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This header file seems to be very tied to the testTorchSimpleDnn.cc
. Would it be feasible to just include the contents of the header in the source file? Or is the header expected to be used by multiple source files in the future?
Same question for testBaseCUDA.h
and testTorchSimpleDnnCUDA.cc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to follow the same approach as in the TensorFlow tests https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/TensorFlow/test/testBaseCUDA.h and I'm planning to add more tests with a similar structure
// runModel("/data/user/dvalsecc/simple_dnn.pt", cuda); | ||
|
||
// return 0; | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this commented out code still useful?
// runModel("/data/user/dvalsecc/simple_dnn.pt", cuda); | ||
|
||
// return 0; | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this commented out code still useful?
<bin name="testTorchTimeSeries" file="time_serie_prediction.cpp"> | ||
<use name="pytorch"/> | ||
<use name="pytorch-cuda"/> | ||
</bin> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smuzaffar Does this test binary definition need any of the <iftool name="cuda">
, or does the binary get ignored implicitly when cuda
is not available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests came from the @iarspider PR #41162
We can move them in the new PyTorch subpackage I think.
#include "FWCore/Utilities/interface/Exception.h" | ||
#include "FWCore/Utilities/interface/ResourceInformation.h" | ||
#include "HeterogeneousCore/CUDAServices/interface/CUDAInterface.h" | ||
#include "HeterogeneousCore/CUDAUtilities/interface/requireDevices.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This #include
seems to be unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is used to test the presence of devices if (!cms::cudatest::testDevices())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I see the call to cms::cudatest::testDevices()
is in testTorchSimpleDnnCUDA.cc
, so this #include
should be moved there.
On the other hand, that check is not really needed, because (nowadays) scram b runtests
will skip the test if the node has no GPU. And in the case of USER_UNIT_TESTS=cuda scram b runtests
we want the test program to fail. Therefore, I'd suggest to either remove the call to cms::cudatest::testDevices()
, or if you want more clear error message in case of devices not being available, use the not testDevices()
to print the clearer error message and fail the test program.
<use name="FWCore/ServiceRegistry"/> | ||
<use name="FWCore/Utilities"/> | ||
<use name="HeterogeneousCore/CUDAServices"/> | ||
<use name="HeterogeneousCore/CUDAUtilities"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the dependence on CUDAUtilities
is not really needed.
test parameters:
|
@cmsbuild, please test |
@valsdav coukd you add a |
Pull request #43475 was updated. @wpmccormack, @valsdav, @tvami can you please check and sign again. |
590cdd2
to
07c40c0
Compare
+1 Size: This PR adds an extra 20KB to repository Comparison SummarySummary:
GPU Comparison SummarySummary:
|
+ml |
+1
|
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @mandrenguyen, @antoniovilela, @rappoccio (and backports should be raised in the release meeting by the corresponding L2) |
Just wanted to add that a README with docs and a more refined user-facing interface for PyTorch models will be added in a later PR. We wanted to merge this one adding basic tests for the pytorch inference functionality. |
assign core |
New categories assigned: core @Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks |
+Core |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @sextonkennedy, @antoniovilela, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
Are these unit test failures coming from this PR? |
Yes. The cause is in
i.e. we either need the container for Other tests in this package show a printout
Does PyTorch try to initialize CUDA every time? Is there a way to control the CUDA initialization explicitly? |
PR description:
This PR adds a new subpackage
PhysicsTools/PyTorch
, including the first tests for TorchScriptinference on CPU and CUDA.
The PR is built on top of #41162 and cmsdist PR cms-sw/cmsdist#8388, which is adding C++ PyTorch library support.
PR validation:
A simple model is exported in TorchScript and saved to disk. The model is loaded in cmssw and runned on CPU and CUDA.
The code to generate the model is included in the PR, but cannot be run directly in CMSSW, as we are not including the Python interface to PyTorch in cmsdist for the moment.
@smuzaffar @makortel @iarspider @wpmccormack