-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Serialize+parallelize 4D apply()
into 3D+t and add 'low memory' loading
#215
Conversation
0b2302f
to
b922fa5
Compare
b922fa5
to
86b3d11
Compare
…reenable-parallelization-apply-214
3D transform chains resulting of composing several transformations (e.g., affine and deformation fields in spatial normalization) should not be split into its components. This is in contrast to lists of 3D transforms such as head-motion correcting affines, where each applies to one timepoint. These should be considered 4D and in some future they may integrate slice timing correction in them.
apply()
into 3D+t including parallelizationapply()
into 3D+t
de81e23
to
8dd883d
Compare
|
…llelization-apply-214
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #215 +/- ##
==========================================
+ Coverage 94.39% 94.76% +0.36%
==========================================
Files 15 15
Lines 1713 1756 +43
Branches 323 328 +5
==========================================
+ Hits 1617 1664 +47
+ Misses 79 76 -3
+ Partials 17 16 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
…214-parallel ENH: Parallelize serialized 3D+t transforms
ed1ffc8
to
026a10a
Compare
a659355
to
7dcc78d
Compare
Resolves: #218. Co-authored-by: Chris Markiewicz <[email protected]>
…214-dtypes ENH: Implement a memory limitation mechanism in loading data
apply()
into 3D+tapply()
into 3D+t and add 'low memory' loading
@effigies I'm not planning on adding any more features within this PR, it is safe to start reviewing :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the code looks like it should work. I think we're pushing a surprising amount of process control logic into this library, and would suggest seeing if it can be avoided. In particular, I don't think we should choose the number of threads, and I would try not to use multiprocessing.
The tests, I'm assuming, are mostly copied from their original locations? It's a lot to look over in detail. If there are new bits among large blocks of old, could you highlight them for review?
nitransforms/resampling.py
Outdated
cval: float = 0.0, | ||
prefilter: bool = True, | ||
output_dtype: np.dtype = None, | ||
serialize_nvols: int = SERIALIZE_VOLUME_WINDOW_WIDTH, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no parameter in the docstring, and the whole concept of serialization here is non-obvious. To the best of my understanding, the idea is that you want to parallelize over volumes but only if you hit a threshold that justifies the overhead of multiprocessing.
Have you compared the performance of multiprocessing versus async workers? If you're able to async or thread, you will have much less overhead and might not need this additional concept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it is a great interpretation.
There's something not so explicit: if you want real 4D interpolation (e.g., head motion + slice timing corrections) then you want to set serialize_nvols = np.inf
because you can't split by volume (we've talked about doing this in a moving window, and then this parameter would map onto the width of that window).
Happy to document better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added this parameter's description in the docstring, as it was missing. LMKWYT
Yes, and added parameters to exercise new branches.
Indeed, thanks for taking the time. I'll first see how to remove the process pool and then make a final pass to identify areas that may require focused attention. Thanks a lot :) |
436ae1d
to
b42b172
Compare
Co-authored-by: Chris Markiewicz <[email protected]>
b42b172
to
063e1f0
Compare
# Number of data volumes | ||
data_nvols = 1 if spatialimage.ndim < 4 else spatialimage.shape[-1] | ||
# Number of transforms: transforms chains (e.g., affine + field, are a single transform) | ||
xfm_nvols = 1 if transform.ndim < 4 else len(transform) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is new in this PR. Julien introduced the concept of transform.ndim
to differentiate 3D transforms that require four parameters (e.g., displacements fields are typically stored as 4D NIfTIs, but are still 3D transforms) from 4D transformations (e.g., head-motion correction affines).
Following the logic of the previous version, the output is a 4D array if:
- The input image is 4D and the transform is 3D (e.g., BOLD series and coregistration to T1w image), the same transform is applied to the series.
- The input image is 3D and the transform is 4D (e.g., mapping a fieldmap onto all volumes of BOLD series), the same moving image is transformed through the transforms series.
- The input image is 4D and the transform is 4D (e.g., head motion correction).
All these 4D transforms can be called "3D+t" if the coordinates in time are not moving (e.g., no slice-timing correction). If the transform is 3D+t, then we can serialize (i.e., apply one-by-one) and use an embarrassingly parallel approach for concurrence.
Julien also implemented 4D transforms -- this was new in his PR and would require more thorough testing (esp. testing moving coordinates in time). This code is executed if serialize_4d is not True
.
Seems to be working and the issues have been addressed - merging. Happy to get back to this in future post-merge reviews. |
Re-implements serialization (i.e., splitting a 4D transformation into 3D transformations). For time-dependent transforms (e.g., when also slice-time correcting), parallelization will require a different partition (spatial and/or temporal windowing).