-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics fails or wrong output device with CUDA #5919
Comments
Hi @yiheng-wang-nv , Seems there is some issue here? Could you please help take a look and may submit a PR to fix it? Thanks in advance. |
Hi @matteo-bastico , for
for |
Signed-off-by: Yiheng Wang <[email protected]> Fixes #5919 . ### Description This PR is used to unify input output tensor devices for the following metrics: 1. HausdorffDistanceMetric 2. SurfaceDiceMetric 3. SurfaceDistanceMetric ### Types of changes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [ ] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [ ] In-line docstrings updated. - [ ] Documentation updated, tested `make html` command in the `docs/` folder. Signed-off-by: Yiheng Wang <[email protected]>
HI @yiheng-wang-nv, the output of
|
Hi @matteo-bastico , thanks for posting the versions. I double checked the code, it seems the latest pytorch dev version (
Therefore, I did not not meet the same error. When I downgraded to 1.13.1, same error happens. |
Signed-off-by: Yiheng Wang <[email protected]> Fixes #5919 . ### Description This PR is used to fix the device issue of function `compute_generalized_dice`, and cuda tensor input will not raise errors. ### Types of changes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [ ] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [ ] In-line docstrings updated. - [ ] Documentation updated, tested `make html` command in the `docs/` folder. --------- Signed-off-by: Yiheng Wang <[email protected]>
Signed-off-by: Yiheng Wang <[email protected]> Fixes Project-MONAI#5919 . ### Description This PR is used to fix the device issue of function `compute_generalized_dice`, and cuda tensor input will not raise errors. ### Types of changes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [ ] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [ ] In-line docstrings updated. - [ ] Documentation updated, tested `make html` command in the `docs/` folder. --------- Signed-off-by: Yiheng Wang <[email protected]>
Describe the bug
Hello, some metrics have bugs when used with CUDA tensors. In particular, Generalized Dice Score fails with the following error when both
y_pred
andy
are on the same CUDA device.RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Instead, in Average Surface Distance the device of the output is always cpu even if both outputs are on gpu.
To Reproduce
For Generalized Dice Score:
will raise the
RuntimeError
.For Average Surface Distance
will return a
tensor([[...]], dtype=torch.float64)
instead oftensor([[...]], device='cuda:0')
(see Mean Dice for correct behavior).Expected behavior
The metrics are computed without raising errors when inputs are on gpu and the returned output is on the same device.
Environment
Monai: 1.1.0
The text was updated successfully, but these errors were encountered: