Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDARelativeDifferencePrior #1408

Merged
merged 51 commits into from
Jul 8, 2024
Merged

Conversation

Imraj-Singh
Copy link
Contributor

First attempt at integrating CudaRelativeDifferencePrior class into STIR. The idea was create a child of RelativeDifferencePrior to override the compute_value/compute_gradient methods with the CUDA-accelerated counterpart, and then add an set_up_cuda method. This is in a first-draft state with work left to do:

  1. There are a lot of warnings with regards to -- virtual function override intended?.
  2. Only a 3x3x3 neighbourhood size is implemented.
  3. Threads and blocks sizes are not set well.
  4. The set_up_cuda is a bit clunky (I did have it override the set-up previously but got a bit confused with CRTP things)

At present it does run with a test comparing outputs of CPU/GPU versions. I could include it in the pull-request but I need to spend sometime making the ctest I guess.

Copy link
Collaborator

@KrisThielemans KrisThielemans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

There are a lot of warnings with regards to -- virtual function override intended?.

please give an example

The set_up_cuda is a bit clunky (I did have it override the set-up previously but got a bit confused with CRTP things)

This won't help, as STIR will never call set_up_cuda. There's too much CRTP going on here (if any). Do you mean calling the baseclass set_up()?

I think it'd make more sense to use recon_buildblock/CUDA than CUDA_stir (why have stir twice?)

CMakeLists.txt Outdated
Comment on lines 212 to 222
find_package(CUDAToolkit)
if (CUDAToolkit_FOUND)
set(CMAKE_CUDA_ARCHITECTURES "all")
find_package(CUDA REQUIRED)
include_directories("${CUDA_INCLUDE_DIRS}")
set(STIR_WITH_STIR_CUDA ON)
if(STIR_WITH_STIR_CUDA)
enable_language(CUDA)
message(STATUS "STIR CUDA support enabled. Using CUDA version ${CUDAToolkit_VERSION}.")
endif()
endif()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the other way around. I'd probably use https://cmake.org/cmake/help/latest/module/CheckLanguage.html#module:CheckLanguage first, if present, then enable_language, then find_package(CUDAToolkit) which probably needs a REQUIRED.

Before all this, I'd do a check on CMAKE_VERSION. Certainly at least 3.18, but 3.23 if we want to use "all" for [CMAKE_CUDA_ARCHITECTURES](https://cmake.org/cmake/help/latest/prop_tgt/CUDA_ARCHITECTURES.html#prop_tgt:CUDA_ARCHITECTURES) (which we do). If the CMake version is too old, generate FATAL_ERROR with the message to set DISABLE_STIR_CUDA=ON`.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, don't use include_directories but target_include_directories

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done here: d3f3732

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's this line:

if (STIR_WITH_CUDA)
  target_include_directories(stir_registries PRIVATE ${CUDA_INCLUDE_DIRS})
endif()

src/cmake/STIRConfig.cmake.in Show resolved Hide resolved

START_NAMESPACE_STIR

static void
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not repeat those. We can make them static members of RelativeDifferencePrior or wherever they need to be

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is with regards to the compute weights @KrisThielemans ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@Imraj-Singh
Copy link
Contributor Author

There are a lot of warnings with regards to -- virtual function override intended?.

please give an example
Looks like this:

/home/user/sirf/STIR/src/include/stir/Array.h(177): warning #997-D: overloaded function "stir::VectorWithOffset<T>::resize [with T=stir::Array<1, float>]" is hidden by "stir::Array<num_dimensions, elemT>::resize [with num_dimensions=2, elemT=float]" -- virtual function override intended?
          detected during:
            instantiation of class "stir::Array<num_dimensions, elemT> [with num_dimensions=2, elemT=float]" 
(109): here
            instantiation of class "stir::Array<num_dimensions, elemT> [with num_dimensions=3, elemT=float]" 
/home/user/sirf/STIR/src/include/stir/recon_buildblock/RelativeDifferencePrior.h(177): here

/home/user/sirf/STIR/src/include/stir/Array.h(180): warning #997-D: overloaded function "stir::VectorWithOffset<T>::grow [with T=stir::Array<2, float>]" is hidden by "stir::Array<num_dimensions, elemT>::grow [with num_dimensions=3, elemT=float]" -- virtual function override intended?
          detected during instantiation of class "stir::Array<num_dimensions, elemT> [with num_dimensions=3, elemT=float]" 
/home/user/sirf/STIR/src/include/stir/recon_buildblock/RelativeDifferencePrior.h(177): here

/home/user/sirf/STIR/src/include/stir/Array.h(177): warning #997-D: overloaded function "stir::VectorWithOffset<T>::resize [with T=stir::Array<2, float>]" is hidden by "stir::Array<num_dimensions, elemT>::resize [with num_dimensions=3, elemT=float]" -- virtual function override intended?
          detected during instantiation of class "stir::Array<num_dimensions, elemT> [with num_dimensions=3, elemT=float]" 
/home/user/sirf/STIR/src/include/stir/recon_buildblock/RelativeDifferencePrior.h(177): here

/home/user/sirf/STIR/src/include/stir/recon_buildblock/RelativeDifferencePrior.h(152): warning #997-D: function "stir::GeneralisedPrior<DataT>::set_up(const std::shared_ptr<const DataT> &) [with DataT=stir::DiscretisedDensity<3, float>]" is hidden by "stir::RelativeDifferencePrior<elemT>::set_up [with elemT=float]" -- virtual function override intended?
          detected during:
            instantiation of class "stir::RelativeDifferencePrior<elemT> [with elemT=float]" 
/home/user/sirf/STIR/src/include/stir/recon_buildblock/CUDA/CudaRelativeDifferencePrior.h(12): here
            instantiation of class "stir::CudaRelativeDifferencePrior<elemT> [with elemT=float]" 
/home/user/sirf/STIR/src/recon_buildblock/CUDA/CudaRelativeDifferencePrior.cu(376): here

@KrisThielemans
Copy link
Collaborator

ok. Nothing to do with this PR. Would you mind creating a separate issue for it? Please add which compiler you're using. (It's a false positive, but would be nice to have a clean way to prevent the warning)

@Imraj-Singh
Copy link
Contributor Author

ok. Nothing to do with this PR. Would you mind creating a separate issue for it? Please add which compiler you're using. (It's a false positive, but would be nice to have a clean way to prevent the warning)

I am having some issues getting it to register correctly in the stir registries. Perhaps it's related to that? I am not sure I am correctly making CudaRelativeDifferencePrior a child of RelativeDifferencePrior. I am a bit hesistant to open a issue as I think something is a bit wrong at the moment.

CMakeLists.txt Outdated Show resolved Hide resolved
@@ -244,6 +244,9 @@ add_library(stir_registries OBJECT ${STIR_REGISTRIES})
# TODO, really should use stir_libs.cmake
target_include_directories(stir_registries PRIVATE ${STIR_INCLUDE_DIR})
target_include_directories(stir_registries PRIVATE ${Boost_INCLUDE_DIR})
if (STIR_WITH_CUDA)
target_include_directories(stir_registries PRIVATE ${CUDA_INCLUDE_DIRS})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really hope we don't need CUDA_INCLUDE_DIRS for the registries. STIR .h files ideally have no CUDA includes at all. It's going to create trouble.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we'd want the target to be the registeries as we need the cuda_runtime.h when building cuda files that will be in the registry... Also I couldn't quite figure out where else to put the target_include_directories, where would you suggest?

Copy link
Collaborator

@KrisThielemans KrisThielemans May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see,cuda_runtime.h should NOT be included in src/include/stir/recon_buildblock/CUDA/CudaRelativeDifferencePrior.h, but in the .cxx/.cu. That means that the registries wouldn't need to know about the CUDA include dir, therefore you don't need the target_include_directories` at all. (The .cu files need it, but they get it by depending on CUDA::cudart target).


#include "stir/recon_buildblock/RelativeDifferencePrior.h"
#include "stir/DiscretisedDensity.h"
#include <cuda_runtime.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to cxx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is done but dim3 is cuda specific, I changed it for a native struct and removed the cuda_runtime include, aren't the other two needed?

@KrisThielemans
Copy link
Collaborator

ok. Nothing to do with this PR. Would you mind creating a separate issue for it? Please add which compiler you're using. (It's a false positive, but would be nice to have a clean way to prevent the warning)

I am having some issues getting it to register correctly in the stir registries. Perhaps it's related to that?

no

I am not sure I am correctly making CudaRelativeDifferencePrior a child of RelativeDifferencePrior. I am a bit hesistant to open a issue as I think something is a bit wrong at the moment.

Please show errors/problems. I have no time to try this myself. sorry.

@Imraj-Singh
Copy link
Contributor Author

ok. Nothing to do with this PR. Would you mind creating a separate issue for it? Please add which compiler you're using. (It's a false positive, but would be nice to have a clean way to prevent the warning)

I am having some issues getting it to register correctly in the stir registries. Perhaps it's related to that?

no

I am not sure I am correctly making CudaRelativeDifferencePrior a child of RelativeDifferencePrior. I am a bit hesistant to open a issue as I think something is a bit wrong at the moment.

Please show errors/problems. I have no time to try this myself. sorry.

Thanks @KrisThielemans

I was running ./STIR/build/src/utilities/stir_list_registries and did not see the prior under GeneralisedPrior<DiscretisedDensity<3,float>>. The output looks like this:

WARNING: FactoryRegistry:: overwriting previous value of key in registry.
     key: None
------------ ProjectorByBinPair --------------
Matrix
None
Parallelproj
Separate Projectors
------------ ForwardProjectorByBin --------------
Matrix
None
Parallelpr etc etc

Is the overwriting causing the issue, or I am not sure if I have updated all the neccessary files to register it correctly. I guess part of my concern is whether I updated all the neccessary files so that STIR can include the new class in the registeries. This is a quite new and pretty confusing

@KrisThielemans
Copy link
Collaborator

Do you have the equivalent of https://github.com/UCL/STIR/blob/8ced2d73933420457e0bc76074964b7d4ff00f0c/src/recon_buildblock/RelativeDifferencePrior.cxx#L140C1-L141C97
Maybe these days this can be done in the .h file (which would be cleanest)

// Explicit template instantiations
template class stir::CudaRelativeDifferencePrior<float>;
template <typename elemT>
const char* const CudaRelativeDifferencePrior<elemT>::registered_name = "Cuda Relative Difference Prior";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KrisThielemans yes I added an overwrite it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use exact same syntax as in the lines quoted above. C++ is a bit tricky there. (or try and see what happens if you put the initialiser in the .h file). If that doesn't work, comment out the one in RelativeDifferencePrior, or remove that from the registry completely. I don't think we've ever tried deriving from an existing class in the registry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KrisThielemans is the reason for not deriving from an existing class in the registry, because it's just a bad idea? I couldn't think of a better idea...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean. Ideally we can derive from an existing class, but if it doesn't work, we'll have to find another way...

So first thing to do is to fix your syntax and see what happens.

@KrisThielemans
Copy link
Collaborator

@Imraj-Singh any chance you can pick this up? I think my suggestions might resolve your trouble. Also, it'll need a merge with master of course.

I'd like to you the main CUDA support stuff elsewhere...

@Imraj-Singh
Copy link
Contributor Author

@Imraj-Singh any chance you can pick this up? I think my suggestions might resolve your trouble. Also, it'll need a merge with master of course.

I'd like to you the main CUDA support stuff elsewhere...

Just back I will spend some time the next couple of days.

@Imraj-Singh
Copy link
Contributor Author

@KrisThielemans I am still having troubles with the registering the class derived from an already registered class... The error is below.

/usr/bin/ld: ../recon_buildblock/librecon_buildblock.a(CudaRelativeDifferencePrior.cu.o):(.data.rel.ro.local+0x0): multiple definition of stir::CudaRelativeDifferencePrior::registered_name'; ../CMakeFiles/stir_registries.dir/recon_buildblock/recon_buildblock_registries.cxx.o:(.data.rel.ro.local+0x0): first defined here
collect2: error: ld returned 1 exit status
gmake[3]: *** [src/utilities/CMakeFiles/stir_list_registries.dir/build.make:220: src/utilities/stir_list_registries] Error 1
gmake[2]: *** [CMakeFiles/Makefile2:3108: src/utilities/CMakeFiles/stir_list_registries.dir/all] Error 2
gmake[1]: *** [CMakeFiles/Makefile2:3115: src/utilities/CMakeFiles/stir_list_registries.dir/rule] Error 2
gmake: *** [Makefile:894: stir_list_registries] Error 2`

I'll try remove the registering in the RelativeDifferencePrior

@Imraj-Singh
Copy link
Contributor Author

@KrisThielemans I tried removing the registering of RelativeDifferencePrior but still having errors. FYI I took all the content out of the CUDA files to try and diagnose the issue.

@KrisThielemans
Copy link
Collaborator

/usr/bin/ld: ../recon_buildblock/librecon_buildblock.a(CudaRelativeDifferencePrior.cu.o):(.data.rel.ro.local+0x0): multiple definition of stir::CudaRelativeDifferencePrior::registered_name;

This is because your initialisation of registered_name is in the .h file. That means it will exist in every .cxx/.cu file that includes, so there are multiple definitions.

Possible solutions:

  • move it to the .cu
  • move it to a new .cxx
  • try to initialise inside the class definition, not outside. This apparently needs C++-17 https://en.cppreference.com/w/cpp/language/static, but we've moved to that anyway (you might have to merge master)
    class CudaRelativeDifferencePrior : public RelativeDifferencePrior<elemT> {
      public:
          using RelativeDifferencePrior<elemT>::RelativeDifferencePrior;
          // Name which will be used when parsing a GeneralisedPrior object
          inline static const char* const registered_name = "CudaRelativeDifferencePrior;
         ...

If the latter works, it's my preferred solution (by far clearest for the user), and we could do it for the rest of STIR now as well.

@KrisThielemans
Copy link
Collaborator

Also, please add check_language(CUDA) and if it isn't there, set DISABLE_STIR_CUDA=ON, possibly with a warning. At present, if you don't have CUDA, it just fails without any help what to do.

Problems with numerical gradients.

I had to revert the change to
GeneralisedPrior::set_penalisation_factor which set
_already_set_up to false, as that made test_priors fail
on all priors (as the test changes the penalisation factor)
@KrisThielemans
Copy link
Collaborator

This should be nearly done now. For a 256x251x211 image with 2 cylinders, cuda_rdp_tests gives

norms: diff of gradients: 5.09262e-07 org: 64.8277 rel: 7.85563e-09
The Prior Value (CUDA) = 3385.94
The Prior Value (CPU) = 3385.94
Difference = 3.06435e-05
Value CUDA time = 1.35823s
Value CPU time = 8.6639s
Gradient CUDA time = 0.555103s
Gradient CPU time = 12.3153s

It also works for a smaller image with much more structure.

Sadly, I have problems in test_priors.

----- test Cuda Relative Difference Prior_no_kappa_with_eps  --> Gradient

INFO: Computing gradient

INFO: Computing objective function at target

INFO: Computing gradient of objective function by numerical differences (this will take a while)
Error : unequal values are 1.71268 and 1.61074. gradient
Numerical gradient test failed with for Cuda Relative Difference Prior_no_kappa_with_eps
Writing diagnostic files Cuda Relative Difference Prior_no_kappa_with_eps_target.hv, *gradient.hv (and *numerical_gradient.hv if full gradient test is used)
----- test Cuda Relative Difference Prior_no_kappa_with_eps  --> Hessian-vector product for convexity
----- test Cuda Relative Difference Prior_no_kappa_with_eps  --> Hessian against numerical
Error : unequal values are 1.19587 and 1.21844. Hessian
Numerical-Hessian test failed with for Cuda Relative Difference Prior_no_kappa_with_eps prior
----- testing Hessian-vector product (accumulate_Hessian_times_input)

INFO: Comparing Hessian*dx with difference of gradients

INFO: Computing gradient

ERROR: CUDA error in compute_gradient kernel execution: invalid argument
terminate called after throwing an instance of 'std::runtime_error'
  what():
ERROR: CUDA error in compute_gradient kernel execution: invalid argument

Aborted

Doing a visual check on the analytic vs numerical gradient indicates they are very similar, but numerically the diff is rather large.
image
(left images have range ~[-15.5,5.6] while the diff on the right has range ~[-.5,5])

The kernel invalid argument is in

compute-sanitizer --tool memcheck src/recon_test/test_priors
...
========= Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaMemcpy.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame: [0x34d163]
=========                in /usr/lib/wsl/drivers/nvdd.inf_amd64_fa2520b946613838/libcuda.so.1.1
=========     Host Frame:cudaMemcpy [0x6b8aa]
=========                in /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudart.so.12
=========     Host Frame:void stir::array_to_device<3, float>(float*, stir::Array<3, float> const&) [0x28b8a4]
=========                in /home/kris/devel/buildCondaGT/builds/STIR/build/src/recon_test/test_priors
=========     Host Frame:stir::CudaRelativeDifferencePrior<float>::compute_gradient(stir::DiscretisedDensity<3, float>&, stir::DiscretisedDensity<3, float> const&) [0x28bab3]

I don't know why.

@KrisThielemans
Copy link
Collaborator

Note that the test image are very small (8x9x10). Not sure if that matters or not. @Imraj-Singh ?

- there was a missing epsilon in the gradient kernel
- abs was used, which is potentially dangerous, so now using std::abs
- switch value back to double, as otherwise test_priors still failed
The Hessian() functions all incorrectly threw an error if kappa
was alright.

I've now moved this test into the check() functions, cleaning
code a little bit.
- tests were never run with kappa
- RDP limit tests with kappa were wrong

Still numerical problems for the Hessian test for RDP
@KrisThielemans
Copy link
Collaborator

The above problems have now been resolved. 2 things remaining

  • the numerical Hessian test for the RDP tails with kappa (independent of the CUDA stuff)
  • GHA cannot run CUDA, so we have to disable the tests for CudaRDP in test_priors if there's no CUDA run-time

RDP test was failing with kappa as eps was too large.
Now it is set relative to the iamge max.
CUDA run-time is not available on GitHub actions,
so we need to able to disable them.
@KrisThielemans
Copy link
Collaborator

Everything fine now, except that there's no CUDA run-time on GHA, so we have to disable that test somehow.

@KrisThielemans
Copy link
Collaborator

All good now! (except a Zenodo time-out)

@KrisThielemans
Copy link
Collaborator

Should be done now. I've added the timings to stir_timings. With default mMR image size, I get on my desktop (AMD Ryzen 9 5900 12-Core, RTX 3070, default num_threads=21) (timings in ms, first is CPU-time, second is wall-clock time)

        RDP_value                                                637.000                         524.449
        RDP_grad                                                4607.667                         879.874
        Cuda_RDP_value                                          1732.778                          86.533
        Cuda_RDP_grad                                            919.778                          46.042

compared to parallelproj forward proj 225ms and back proj 573ms.

Thanks @Imraj-Singh!

Copy link
Collaborator

@KrisThielemans KrisThielemans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All done

@KrisThielemans KrisThielemans merged commit b153fda into UCL:master Jul 8, 2024
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants