-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch VDR Data from Butler For Region Search #531
Conversation
@DinoBektesevic I was hoping to spend a bit more time on the butler mocking, but wanted to go ahead and mark this for review so you could take a pass at the interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My general comment on the PR is that it's ok and I want to approve. There is one major issue of the build-system configuration that will require redoing how importing works, and then sorting out a few doc-strings.
My general comment about the implementation is that I worry about wrapping the butler functionality because there's so much of it.
In struggled with the same in the ButlerStandardizer
. The user could have executed a near-to-sql-query by using the butler.registry
and get back lists of tuples or whatever data structure they Rubin gives back to them. There's so many ways in which those can be derived and they change all the time that it's hard to standardize on a particular way.
I wonder about the utility of wrapping butler.queryCollections
with additional checks on whether a collection was given or wasn't - because butler.queryCollections
will already perform those checks anyhow. The purpose here looks more educational than functional, and that's ok if these are the tools you need in order to make progress atm, but I wonder how far that would scale as we start adding more. Ultimately all you're doing is executing:
# all of these are just database queries really
butler = Butler(repo_root, without_datastore=True)
# changeable
collections = ("DECam/imdiff/20210318/20240130T225831Z",)
# safe to assume, but could be changed
datasets=("goodSeeingDiff_differenceExp", )
# basically constant
dimensions = ["visit", "physical_filter", "exposure", "visit", "detector", "patch", "tract",]
# DataCoordinate iterator
datcoord = butler.registry.queryDataIds(dimensions, collections=collections, dimensions=dimensions)
# This would require using a datastore
uris = butler.getURis(datasets[0], data_id=list(datcoord)[0], collections=collections)
# but this gets a dataset ref and it doesn't even need the datastore
refs = b.find_dataset(datasets[0], data_id=list(datcoord)[0], collections=collections)
# butler.get would work I think (with datastore), butler.find, butler.getDataset etc...
so perhaps we should, long term, be looking to take collections of data coordinates, unwinding them for ra, dec, and then actually performing some sort of grouping based on spatial and temporal constraints. As a side-note, if butler was instantiated with a datastore (leve out without_datastore
) the refs can directly be passed to the Standardizer
from which KBMOD can run searches.
Yeah I think these are fair concerns. There's definitely some danger with wrapping too much of the butler functionality, and we'll have to revisit this approach for scaling. In particular I was thinking we would need our own datastore for caching butler query results and the spatial computations we're doing that would replace the astropy table, but I wasn't as aware of the options around instantiating a butler with a datastore and should probably revisit that. I think the main goal here was to start the process of moving current region search work into modules with tests but still have a nice interface for interfacing with notebooks. The next steps are taking the various approaches, like find overlapping images and generating discrete piles, and move them into this module as well. So after the next PR the envisioned interface for generating ImageCollections for a discrete pile would be something like:
But I agree we'll probably soon need to reassess how this approach scales |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, can you just in-indent the class TestRegionSearch
and cleanup the unused improrts and feel free to merge.
Initial PR for Region Search module that creates a representation of VDR (Visit, Detector, Region) data fetched from a Butler repository for analysis of which images to use for KBMOD.
Note that we have broken out butler mocking to its own utils file.
The RegionSearch object also provides a few static methods which are wrappers for initial exploration of how collections datatypes are organized in the butler.