Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize reproject_and_coadd for N-dimensional data, and add option to specify blank pixel value and progress bar #351

Merged
merged 13 commits into from
Jun 5, 2024

Conversation

keflavich
Copy link
Contributor

@keflavich keflavich commented Mar 13, 2023

This enables memmap'd output, which will be needed for very large (greater-than-memory) output cubes

EDIT: it also extends reproject_and_coadd into three dimensions

@codecov
Copy link

codecov bot commented Mar 13, 2023

Codecov Report

Attention: Patch coverage is 69.72477% with 33 lines in your changes missing coverage. Please review.

Project coverage is 90.60%. Comparing base (b42f7c6) to head (0054941).
Report is 16 commits behind head on main.

Current head 0054941 differs from pull request most recent head f5e3016

Please upload reports for the commit f5e3016 to get more accurate results.

Files Patch % Lines
reproject/mosaicking/coadd.py 73.33% 20 Missing ⚠️
reproject/mosaicking/subset_array.py 63.33% 11 Missing ⚠️
reproject/utils.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #351      +/-   ##
==========================================
- Coverage   93.63%   90.60%   -3.03%     
==========================================
  Files          25       25              
  Lines         895      947      +52     
==========================================
+ Hits          838      858      +20     
- Misses         57       89      +32     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@astrofrog astrofrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is useful, thanks! Just some comments below, and be sure to include some tests - thanks!

reproject/mosaicking/coadd.py Outdated Show resolved Hide resolved
reproject/mosaicking/coadd.py Outdated Show resolved Hide resolved
reproject/mosaicking/coadd.py Outdated Show resolved Hide resolved
reproject/mosaicking/coadd.py Outdated Show resolved Hide resolved
@keflavich
Copy link
Contributor Author

I'm a little stymied on this:
ValueError: Chunks do not add up to shape. Got chunks=((4,), (4, 100, 101), (4, 100, 101)), shape=(4, 174, 173)
It looks like the block_size isn't playing nice with the different-shaped output; I think that relies on your WIP PR @astrofrog ?

@keflavich
Copy link
Contributor Author

@astrofrog I have some questions now:

The loop for reproject_and_coadd is presently not lazy or possible to make lazy, correct? We need your daskified PR to make it lazy?

I think I'm going to move the write-to-big-array into the parent loop, because otherwise everything still has to be held in memory

@keflavich
Copy link
Contributor Author

ahhh, there's good reason for that - background matching! Yikes.

@keflavich
Copy link
Contributor Author

so, as is, this does coadding on a dataset-by-dataset basis - which, for cubes, means a cube-by-cube basis. It may be more efficient to do it on a plane-by-plane basis. That seems like a lot of additional refactoring work though

@keflavich
Copy link
Contributor Author

There is a huge efficiency boost to be gained if we can reproject directly into the output array, but that results in overwriting the individual values instead of adding to them.

@keflavich
Copy link
Contributor Author

@astrofrog I don't see any way to do this, but do you have ideas? Basically, instead of map_coordinates(..., output=array) writing directly into memory, it would do output += map_coordinates(...). That looks like it is problematic at the lowest levels, and I don't know if it would work at all for other reprojection methods.

This sacrifices flexibility even further to improve speed and robustness and reduce memory use. In the case where you don't want to match backgrounds, this approach just loads the unweighted data into an array and the footprint into another array and divides them at the end. The robustness bit is that we can flush to disk granularly.

I'm not sure this is worth pursuing further, though, because of the depth of refactor that would be needed. Maybe this is something just solved well enough by dask.

@astrofrog
Copy link
Member

I'll have a think about all this! Should be able to work on it a bit this week.

@keflavich
Copy link
Contributor Author

@astrofrog I've been using this PR in practice for a while now, and I just had a look into rebasing, but the codebase has diverged a ton, making this a rather challenging rebase. I'd like to go ahead with it, but this time make sure the changes can get merged. What's your feeling - is there anything blocking this if it's rebased?

@keflavich
Copy link
Contributor Author

ok the rebase wasn't as bad as I thought, but there were some confusing items that I am not sure I've resolved yet.

@astrofrog
Copy link
Member

I can try and prioritise this!

@keflavich
Copy link
Contributor Author

with the new modes (first, last, min, max) we again face duplicating code because we need a version that does, and another version that does not, try to background-match before the final combination step. The current approach simply breaks if match_background is not on and one tries first/last/min/max. Is there a more elegant approach?

@astrofrog
Copy link
Member

Will investigate!

@astrofrog
Copy link
Member

To try and keep things a bit simpler I've extracted out the changes related to specifying output arrays/footprints into a separate PR: #387

@astrofrog
Copy link
Member

In terms of supporting 3D arrays (and also the median combine mode), I think things are getting complicated enough that it's worth thinking about a different approach for doing the co-adding. I'm working on an experimental re-write of reproject_and_coadd which uses dask internally and will open an alternative PR soon so that we can compare and see what works best in practice.

@astrofrog
Copy link
Member

astrofrog commented Sep 11, 2023

See #388 for an alternative approach.

@keflavich
Copy link
Contributor Author

@astrofrog I'm still using this branch in production. It's apparently working (verification in progress). I clearly ran out of time to test 388. Any further thoughts on which direction to take? Both PRs are big, I don't immediately remember/know what the reasons are for one or the other.

@astrofrog
Copy link
Member

Sorry for dropping the ball on this, I'll try and see if I can wrap things up in the next week or so.

@keflavich
Copy link
Contributor Author

my last commit adds a hack to solve this issue: radio-astro-tools/spectral-cube#900 (should've posted that here, perhaps, but it isn't obvious to me where it came from)

@keflavich
Copy link
Contributor Author

@astrofrog I'm at a workshop with @e-koch, and I'd like to show off that we can do the mosaicing in radio-astro-tools/spectral-cube#868. Any chance we can make a little progress on this shortly?

@astrofrog
Copy link
Member

@keflavich yes sorry for the delay, I will be back at work on Monday and will try and prioritise wrapping this up.

@astrofrog astrofrog changed the title Add some more flexibility to coadd: output array specification Generalize reproject_and_coadd for N-dimensional data, and add option to specify blank pixel value and progress bar Jun 5, 2024
Copy link
Member

@astrofrog astrofrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a couple of commits cleaning this up - I've generalized it to N dimensions which simplifies things, and changed the default for blank_pixel_value to 0 for backward-compatibility. Thanks @keflavich, and sorry for taking so long to wrap this up!

@astrofrog astrofrog merged commit 8ab8a78 into astropy:main Jun 5, 2024
11 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants