Faster `hessian_matrix_*` and `structure_tensor_eigvals` via analytical eigenvalues for the 3D case #434

grlee77 · 2022-11-15T18:50:29Z

closes #354

This MR implements faster 2D and 3D pixelwise eigenvalue computations for hessian_matrix_eigvals, structure_tensor_eigvals and hessian_matrix_det. The 2D case already had a fairly fast code path, but it is further improved here by switching from a fused kernel to an elementwise kernel that removed the need for a separate call to cupy.stack. In 3D runtime is reduced by ~30x for float32 and >100x for float64. The 3D case also uses MUCH less RAM than previously (>20x reduction). For example computing the eigenvalues for size (128, 128, 128) float32 arrays would run out of memory even on an A6000 (40GB). With the changes here, it works even for 16x larger data of shape (512, 512, 512).

Functions that benefit from this are:

cucim.skimage.feature.hessian_matrix_det
cucim.skimage.feature.hessian_matrix_eigvals
cucim.skimage.feature.structure_tensor_eigenvalues
cucim.skimage.feature.shape_index
cucim.skimage.feature.multiscale_basic_features
cucim.skimage.filters.meijering
cucim.skimage.filters.sato
cucim.skimage.filters.frangi
cucim.skimage.filters.hessian

Independently of the above, the function cucim.skimage.measure.inertia_tensor_eigvals was updated with custom kernels so it can operate purely on the GPU for the 2D and 3D cases (formly these used copies to/from the host). These operate on tiny arrays, so they use only a single GPU thread. Despite the lack of paralellism, this is lower overhead than round trip host/device transfer. This will also improve region properties making use of these eigenvalues (e.g. the axis_major_length and axis_minor_length properties for regionprops_table)

gigony

Thank you!
I can't go over this PR in detail due to a lack of knowledge of this algorithm, but it looks good to me.

grlee77 · 2022-11-16T13:59:08Z

Benchmark result for hessian_matrix_eigvals when run on pre-computed hessian matrix, H. I see 1.5-2x speedup for 2D case and >100x for larger 3D images. I could only benchmark the old implementation up to (96, 96, 96) as larger sizes ran out of GPU memory.

function	shape	acceleration vs. old implementation
hessian_matrix_eigvals	(256, 256)	2.2693
hessian_matrix_eigvals	(512, 512)	1.5650
hessian_matrix_eigvals	(1024, 1024)	1.5719
hessian_matrix_eigvals	(2048, 2048)	1.7348
hessian_matrix_eigvals	(4096, 4096)	1.7687
hessian_matrix_eigvals	(16, 16, 16)	23.3574
hessian_matrix_eigvals	(32, 32, 32)	83.6846
hessian_matrix_eigvals	(64, 64, 64)	178.3335
hessian_matrix_eigvals	(96, 96, 96)	203.8865

If we instead benchmark the full process of getting the eigenvalues from the original image rather than its hessian (i.e. benchmarking hessian_matrix_eigvals(hessian_matrix(img))), then we still see a benefit even though the hessian_matrix computation itself is not improved in this MR.

function	shape	acceleration vs. old implementation
hessian_matrix + hessian_matrix_eigvals	(256, 256)	1.0459
hessian_matrix + hessian_matrix_eigvals	(512, 512)	1.0340
hessian_matrix + hessian_matrix_eigvals	(1024, 1024)	1.0446
hessian_matrix + hessian_matrix_eigvals	(2048, 2048)	1.0862
hessian_matrix + hessian_matrix_eigvals	(4096, 4096)	1.0880
hessian_matrix + hessian_matrix_eigvals	(16, 16, 16)	1.4988
hessian_matrix + hessian_matrix_eigvals	(32, 32, 32)	3.8251
hessian_matrix + hessian_matrix_eigvals	(64, 64, 64)	20.7775
hessian_matrix + hessian_matrix_eigvals	(96, 96, 96)	57.2562

More than 100x improvement in runtime in 3D and more than 10x reduction in memory used

…sor_eigvals

…lementation

…ernel avoids some instances of nan occuring in 3d ridge filter test cases

…internally

…obtaining the eigenvalues sorted by magnitude

grlee77 · 2022-11-16T19:21:38Z

I had to update the eigenvalue kernels to always use double precision internally to avoid nan values in a few ridge filter test cases. The benchmark results as reported above are nearly identical after the change, so we must be limited more by memory bandwidth than compute here. Note that the eigenvector outputs are NOT always in double, there is just use of double precision internally for the computation.

…x3-analytical-eigenvalues

grlee77 · 2022-11-16T21:29:47Z

@gpucibot merge

grlee77 added non-breaking Introduces a non-breaking change performance Performance improvement labels Nov 15, 2022

grlee77 added this to the v22.12.00 milestone Nov 15, 2022

grlee77 requested a review from a team as a code owner November 15, 2022 18:50

grlee77 added the improvement Improves an existing functionality label Nov 15, 2022

gigony approved these changes Nov 15, 2022

View reviewed changes

grlee77 mentioned this pull request Nov 16, 2022

improved implementation of ridge filters (bug fixes and reduced memory footprint) #423

Merged

grlee77 added 11 commits November 16, 2022 12:53

Fast code paths for both 2x2 and 3x3 symmetric eigenvalues

95611a2

More than 100x improvement in runtime in 3D and more than 10x reduction in memory used

avoid host/device copies for 2D and 3D inertia_tensor and inertia_ten…

fb7f073

…sor_eigvals

flake8

ef504ba

Fix bug in inertia_tensor when provided mu is not of order=2

ce8cdae

add low memory code path for 2D and 3D in hessian_matrix_det

3d2fa41

add additional sanity tests for new eigenvalue kernels vs. legacy imp…

8fd78cf

…lementation

fix: add missing axis to cp.prod call in hessian_matrix_det

c5227c5

use double precision internally for _get_real_symmetric_3x3_eigvals_k…

bed3e8d

…ernel avoids some instances of nan occuring in 3d ridge filter test cases

switch _get_real_symmetric_2x2_eigvals_kernel to also cast to double …

c839977

…internally

improve efficiency of filters.frangi and filters.hessian by directly …

f3af142

…obtaining the eigenvalues sorted by magnitude

flake8

26982a2

grlee77 force-pushed the symmetric-3x3-analytical-eigenvalues branch from 87f6dab to 26982a2 Compare November 16, 2022 19:19

grlee77 added 2 commits November 16, 2022 14:47

Merge remote-tracking branch 'upstream/branch-22.12' into symmetric-3…

f441641

…x3-analytical-eigenvalues

remove residual use of xp

883e7cd

rapids-bot bot merged commit 762d739 into rapidsai:branch-22.12 Nov 16, 2022

grlee77 mentioned this pull request Nov 28, 2022

Analytical 2D and 3D eigenvalues and determinant scikit-image/scikit-image#6636

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster `hessian_matrix_*` and `structure_tensor_eigvals` via analytical eigenvalues for the 3D case #434

Faster `hessian_matrix_*` and `structure_tensor_eigvals` via analytical eigenvalues for the 3D case #434

grlee77 commented Nov 15, 2022 •

edited

Loading

gigony left a comment

grlee77 commented Nov 16, 2022

grlee77 commented Nov 16, 2022 •

edited

Loading

grlee77 commented Nov 16, 2022

Faster hessian_matrix_* and structure_tensor_eigvals via analytical eigenvalues for the 3D case #434

Faster hessian_matrix_* and structure_tensor_eigvals via analytical eigenvalues for the 3D case #434

Conversation

grlee77 commented Nov 15, 2022 • edited Loading

gigony left a comment

Choose a reason for hiding this comment

grlee77 commented Nov 16, 2022

grlee77 commented Nov 16, 2022 • edited Loading

grlee77 commented Nov 16, 2022

Faster `hessian_matrix_*` and `structure_tensor_eigvals` via analytical eigenvalues for the 3D case #434

Faster `hessian_matrix_*` and `structure_tensor_eigvals` via analytical eigenvalues for the 3D case #434

grlee77 commented Nov 15, 2022 •

edited

Loading

grlee77 commented Nov 16, 2022 •

edited

Loading