-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster hessian_matrix_*
and structure_tensor_eigvals
via analytical eigenvalues for the 3D case
#434
Faster hessian_matrix_*
and structure_tensor_eigvals
via analytical eigenvalues for the 3D case
#434
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
I can't go over this PR in detail due to a lack of knowledge of this algorithm, but it looks good to me.
Benchmark result for
If we instead benchmark the full process of getting the eigenvalues from the original image rather than its hessian (i.e. benchmarking
|
More than 100x improvement in runtime in 3D and more than 10x reduction in memory used
…ernel avoids some instances of nan occuring in 3d ridge filter test cases
…obtaining the eigenvalues sorted by magnitude
87f6dab
to
26982a2
Compare
I had to update the eigenvalue kernels to always use |
…x3-analytical-eigenvalues
@gpucibot merge |
closes #354
This MR implements faster 2D and 3D pixelwise eigenvalue computations for
hessian_matrix_eigvals
,structure_tensor_eigvals
andhessian_matrix_det
. The 2D case already had a fairly fast code path, but it is further improved here by switching from a fused kernel to an elementwise kernel that removed the need for a separate call tocupy.stack
. In 3D runtime is reduced by ~30x for float32 and >100x for float64. The 3D case also uses MUCH less RAM than previously (>20x reduction). For example computing the eigenvalues for size (128, 128, 128) float32 arrays would run out of memory even on an A6000 (40GB). With the changes here, it works even for 16x larger data of shape (512, 512, 512).Functions that benefit from this are:
cucim.skimage.feature.hessian_matrix_det
cucim.skimage.feature.hessian_matrix_eigvals
cucim.skimage.feature.structure_tensor_eigenvalues
cucim.skimage.feature.shape_index
cucim.skimage.feature.multiscale_basic_features
cucim.skimage.filters.meijering
cucim.skimage.filters.sato
cucim.skimage.filters.frangi
cucim.skimage.filters.hessian
Independently of the above, the function
cucim.skimage.measure.inertia_tensor_eigvals
was updated with custom kernels so it can operate purely on the GPU for the 2D and 3D cases (formly these used copies to/from the host). These operate on tiny arrays, so they use only a single GPU thread. Despite the lack of paralellism, this is lower overhead than round trip host/device transfer. This will also improve region properties making use of these eigenvalues (e.g. theaxis_major_length
andaxis_minor_length
properties forregionprops_table
)