-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple Identification of Discrete Piles #560
Conversation
Conflicts: src/kbmod/region_search.py tests/test_region_search.py tests/utils/mock_butler.py
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Included @ColinOrionChandler since there is a fair bit of streamlining from his previous approach and would be interested in his feedback on the notebook. So rather than the whole PR rather just a focus on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks ok to me. I am mostly worried about scalability of that inner loop and introducing yet another term ("piles", "pointing groups" etc.),. Minor complain regarding URI handling.
Otherwise the notebook is great and runs pretty good, there's a lot of very big images that do kind of pushed my kernel memory, maybe we can opt to plot fewer. Interface is nice and I like the dataset_type_in_collection_frequency
functionality.
overlapping_sets = [] | ||
for i in range(len(all_ra_dec) - 1): | ||
coord = all_ra_dec[i] | ||
if i not in processed_data_ids: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a note since Im not sure it's obvious: this is a for loop in the backend, so we're double looping for datasets with a lot of ids when we get towards the end.
Yeah, I'm still a bit unsure about terminology here myself (I initially pulled "discrete piles" from @ColinOrionChandler for reference) and am personally open to changing it. Maybe a discussion to bring up at one of our meetings because adding even more terminology confusion would be a pain. I definitely agree that this sort of brute force approach won't be that scalable, and I'm interested in clustering approaches. In addition to options from sk-learn, I wonder if an approach taking more direct advantage of lsst/sphgeom might be helpful here. |
This provides a simple approach of identifying discrete piles of images that are all within a given arcsecond radius. While too costly for a large dataset, this sort of approach is nice for a simplified end-to-end test on small test data. This is a streamlined version the discrete piles method added by @ColinOrionChandler in Region Searching Workbook
This PR also provides an example notebook which uses a butler repo with inserted fakes constructed by @DinoBektesevic and gives an example of choosing some discrete piles and then performing reprojection and kbmod search using a butler repo.