-
-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for efficiently reprojecting multiple images with the same wcs #332
Conversation
Codecov Report
@@ Coverage Diff @@
## main #332 +/- ##
==========================================
+ Coverage 93.58% 94.15% +0.57%
==========================================
Files 24 24
Lines 717 787 +70
==========================================
+ Hits 671 741 +70
Misses 46 46
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great @svank, thank you!
I agree with putting the onus on the user to construct the stacked array otherwise we would indeed need to somehow compare WCSes etc. However I wonder if you could put an example in the docs showing how to do this?
Once this is done I would be happy to merge. I'll wait a few days anyway in case @entaylor wants to try this out.
@svank - please ping me once you've added the docs!
@svank - when you work on this next, could you rebase? (the CI issues should be fixed) |
7ad360d
to
93064af
Compare
@astrofrog Done! Thanks for reviewing! |
Thanks! After a quick discussion with @Cadair, I wanted to bring up a point related to the API - at the moment, this PR might tie us in to having leading dimensions being the ones to ignore, but there are some other use cases we might want to consider. For example, one might want to pass a 3D spectral+celestial cube, and reprojecting just the celestial or spectral part - in this case, we might want to have a way to specify the input WCS as being 3D and then giving the part of the WCS we want to reproject as the target WCS and then having reproject figure out internally which dimensions to ignore/loop over. This might not be in conflict with what is done in this PR because in this PR I think you are saying that leading dimensions are ignored compared to the dimensionality of the input WCS. However I wonder if we should perhaps allow the dimensions to be ignored to be configurable (defaulting to extra leading dimensions). Although again maybe as long as we know we can add that option in future maybe what is here is fine. At least I think we should make it clear that what matters here is dimensionality of the data compared to the dimensionality of the input WCS. |
Glad the API choice is getting some review. I think you're right that if we're comfortable defaulting to taking the leading dimensions as the extra loop dimensions (and that feels sensible to me), this PR doesn't preclude adding additional flexibility later. Right now, the input and output WCS are required to have the same number of dimensions---this was already explicitly checked for in What I'm interpreting as the future API you're envisioning is matching up the output WCS dimensions to input WCS dimensions to determine the transformation and looping over any other input dimensions, whether 'extra' dimensions not in the input WCS or input WCS dimensions not present in the output WCS, and maybe with some sort of So if I make sure the docs and code refer specifically to the input WCS dimensionality, do you think this is in a good current state? |
I agree with your assessment and indeed the PR will be ready to go after those tweaks. |
@svank : Thank you for this! I will be very excited to give this a spin when i come back from leave early next week. The API question is worth considering carefully. Indeed, IFS data is one potentially very interesting use case for this functionality. From my perspective, for what it's worth: Typically, IFS data will have 3 WCS dimensions; two spatial and one spectral. If one wanted to reproject IFS data, then i think the way this could be done with the current structure would be to pass the 3D IFS data array, and then with only two of the three WCS dimensions. And if i wanted to reproject both IFS data and weight/variance arrays, then i would pass in a 4D array; e.g. of shape (2, nwl, nx, ny). Does that make sense from your point of view? And/or does it impact at all the structure that you are envisaging? |
As part of this, move the logic for updating shape_out to utils.py
93064af
to
1650c89
Compare
Docs should now specifically refer to the input WCS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, happy to approve this now!
However I will wait for @entaylor to try this out early next week before going ahead and merging in case there are any issues.
@entaylor - did you get a chance to look at this? |
I am going to go ahead and merge this as I am going to be preparing a new release shortly - but @entaylor if you get to try this out and run into any issues please post a comment here! |
This addresses #296. @entaylor, want to take a look?
This PR adds support for reprojecting multiple images that share the exact same coordinate information---e.g. a data array and a corresponding error array, or an L1 and L2 image pair. By providing multiple images in one reproject call, the coordinate transformation work can be done only once before reprojecting each image, offering a speedup.
I chose the syntax of
i.e. multiple images passed as an extra input array dimension. There's no support for doing this by passing some sort of mult-HDU FITS files directly to the reproject function---it seems like it would be too much work to verify each HDU has the same WCS.
I tried to mimic the way numpy handles broadcasting operations between two arrays---the trailing dimensions are matched up, while extra leading dimensions are looped over. For reproject, I'm taking the trailing dimensions of the input and output arrays to correspond to the WCS dimensions, and any extra leading dimensions beyond what the WCS contains are looped over. Any number of extra leading dimensions is supported---maybe a user with data and uncertainty arrays from both L1 and L2 wants to do a
2 × 2 × nx × ny
input array.When
shape_out
is provided, it can contain the extra dimensions that the input array contains (and they must match exactly), orshape_out
can be just the shape of a single output image, and the extra dimensions are prepended automatically.The implementation is a bit different for each of the interpolation, adaptive, and exact algorithms:
For interpolation, I'm computing the pixel-to-pixel mapping once, and then looping over the extra dimensions, calling
map_coordinates
once for each image. To support the new blocked mode (which is only in this algorithm), I did change how that mode approaches 3+ dimensional WCSes. It breaks up the data in only two dimensions---those were originally the first two dimensions, but I changed it to be the last two dimensions, since those will always be WCS dimensions. (The alternatives would be "the first two WCS dimensions, which are not necessarily the first two array dimensions" or "the first two array dimensions, which are not necessarily WCS dimensions", which would both complicate the logic a bit.)For adaptive, I made the changes at the Cython level, since that's where some of the coordinate work occurs. The few spots that update the output array (the inner-most part of the loop) are modified to do everything along the extra dimension.
For exact, the changes are at the Python level, similar to interpolation with a loop over the extra dimensions after calculating the coordinates once. This algorithm has a parallel-processing mode which could in principle parallelize across the extra dimensions, but then there would have to be separate parallel modes depending on whether there are extra dimensions, so I've left the parallelization inside the loop over extra dimensions.
My speed test results with two PSP/WISPR images, each about 1k x 1k:
tl;dr A 2x speedup for two images in interpolating or adaptive reprojection
(There's one pre-existing test failure that's gonna tank CI.)