-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mosviz Retrieve SourceID from meta #1851
Conversation
1eae366
to
1288021
Compare
Since this needs glue-viz/glue-jupyter#336 , should we convert this back to draft until that patch is merged and released? Then this PR would also need to bump glue-jupyter minversion. Also since the original bug was found in private, how should we really review this? |
This is my first PR I've authored that requires an upstream fix. I opened this to for review because I figured we could review this simultaneously as we are getting the upstream miniversion bump ready? But I'm happy to move this back to draft if that's the policy!
The data was from our Viz Stress Test notes. See JDAT-2736 for the internal data |
Moving this to draft PR status would prevent accidental merge even after someone approves it. |
Got it; moved back to draft |
The glue-jupyter patch has been merged. Can this be resumed? @duytnguyendtn |
Remove unnecessary book keeping fix hdu parser Consolidate meta/hdu sourceid methods Switch meta parser to common sourceid finder Initialize table at config load Generic parser: parse image data before image metadata to make image metadata available for metadata parser Explicitly define which data type is being parsed Parse IDs from proper data type Fix meta parser detection Make meta parser language generic to data type Rely on 1D spectra entirely for identifier info Parse 1D spectra metadata first to put Identifier as first column Rewrite sourceid meta finder to be generic for any keyword Meta parser use generic metadata searcher rather than hdu Refactor NIRISS parser to start using generic metadata finder Codestyle Cleanup Remove expected warning due to increased robustness Modify tests to expect mos table as first data object codestyle Change test to expect mos table to be first Modify linking assumption to use first real data for Mosviz (accomodate mos table being dc[0]) Change test to expect mos table to be first Simplify iterable check Change test to expect mos table to be first Change nirspec tests to expect new load order Codestyle Increase sourceid fallback robustness Add Docstrings Revive and move hdu parsing to source id by hdu method
5d96012
to
74af722
Compare
Almost ready for review; a note for @rosteen; this new strategy seems works out-of-the-box for the level 2 and nircam test case you included in #1835 because it actually searches the data.meta loaded for all entries loaded for the sourceid; hence, no need to manually specify the I did my best to gracefully remove it from this PR, but you may want to double check that I didn't misinterpret something or remove something that's actually necessary. |
Codecov ReportBase: 91.81% // Head: 91.78% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #1851 +/- ##
==========================================
- Coverage 91.81% 91.78% -0.03%
==========================================
Files 140 140
Lines 15045 15066 +21
==========================================
+ Hits 13814 13829 +15
- Misses 1231 1237 +6
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Okay, I think I've cleared this PR as ready for review! Most of the changes are infrastructure behind-the-scenes; the coverage dropped a tiny bit, but weirdly seems unrelated to this PR (the patch coverage is passing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good and the application runs well, nice work! I did have some difficulty testing all aspects of mosviz because of existing bugs/recent changes in main
(slit overlay not appearing, images switching to a black screen after row select), but everything else works well and I think this is definitely an improvement over how the metaparser operated previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple comments, still need to actually test it. Looks like a good improvement overall.
_add_to_table(app, filters_gratings, "Filter/Grating") | ||
elif spectra and sp1d: | ||
_add_to_table(app, names, "Identifier") | ||
# Search all given keys to see if they exist. Return the first hit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is the use case for this "the metadata I want might be under one of multiple keys" rather than "give me the metadata for every key in this list"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's correct? Definitely the first part is correct. The "Identifier" field is the easiest to trace the logic path through; we previously pulled the Identifier through either "SOURCEID" or in some cases "OBJECT". Rather than manually hardcoding that loop, you can just provide this method a list of ['SOURCEID', 'OBJECT']
and it will search those keys for a value.
On top of that, the order you provide the list also specifies the priority order to return. In the above example, if both SOURCEID
and OBJECT
are present, then it will return SOURCEID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, just checking that my understanding was correct.
filters = query_metadata_by_component(app, "FILTER", "2D Spectra", FALLBACK_NAME) | ||
gratings = query_metadata_by_component(app, "GRATING", "2D Spectra", FALLBACK_NAME) | ||
|
||
if np.all([isinstance(x, fits.HDUList) for x in data_obj]): | ||
filters_gratings = [(f+'/'+g) for f, g in zip(filters, gratings)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed when working on getting NIRCam to load that there is also some of this information in the "PUPIL" header keyword. It seems like the different instruments put different information in FILTER/GRATING/PUPIL - maybe this would be a good opportunity to get all three and display whichever ones are populated in their own separate columns. I think it should be as simple as adding pupil = query_metadata...
and having three separate add_to_table
later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a point I've been thinking of as well. We have historically combined Filter/Grating, but I've been thinking of whether it would be useful to split this out to separate columns in the table. (AKA, why are we even combining them in the first place?). This would also give room to include another Pupil field as well, and yes it would be as simple as what you describe above.
I hesitated on doing that here, as I wanted this PR to mainly keep the same functionality, but demonstrate the new infrastructure. I'd advocate a separate discussion with a PO/scientist to confirm whether this is what we want to do, and to a separate PR to make those changes, so that this infrastructure PR doesn't become even more bloated than it already is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, that would be a small follow-up PR anyway. Let me test this and make sure it looks good in-notebook, hopefully I'll approve shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If in reality I actually coded this to half as elegantly as I architected it, it should hopefully be a very small change
meta_filters = query_metadata_by_component(app, 'FILTER', "2D Spectra") | ||
_add_to_table(app, meta_filters, "Filter/Grating") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm not understanding the context that these would be called in vs the add_to_table
calls up in mos_meta_parser
- do they not conflict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha isn't that interesting... 😉 From the description:
As a stretch goal, I also switch both the NIRISS and Generic/NIRspec parsers to using this new metadata parsing strategy. Notice how the metadata block at the bottom looks almost identical to the mos_metadata_parser used for the generic/nirspec strategy.... how interesting.... 😉 (Future PR)
The NIRISS parser here never used the mos_meta_parser
and still doesn't here. It used to have its own parsing logic. In this PR, I gutted out the NIRISS meta logic and replaced it with code that looks STRANGELY close to the mos_meta_parser
as you discovered here. It's almost like we might be able to combine them soon~!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirmed that the table populates for NIRCam/NIRISS/NIRSpec, approving.
Thanks for the eyes Jesse and Ricky! |
Description
This PR fixes a bug found by a user with a private dataset where mosviz fails to find and populate the
SOURCEID
column, despiteSOURCEID
being registered in the metadata plugin. The main fix does this by introducing a new, preferred method of finding theSOURCEID
by searching through the.meta
attached to the data object rather than re-reading the file and searching through the header. This requires glue-viz/glue-jupyter#336 to ensure the.meta
is available before the metadata is parsedThis new strategy requires the data to be parsed and loaded first, to populate the data collection and associated
.meta
entries. Then, after all the data is loaded, then search for the metadata afterwards. Because the data already has its metadata attached to it, no need to reopen/pass around hdus anymore.This PR also cleans up the metadata detection by explicitly requiring the parsers to specify the
data_type
of the data's metadata being parsed. We were kind of already requiring this, but in a convoluted way (relying on aspectra
andsp1d
flag, where ifspectra
andsp1d
were both true, it's 1D spectra, ifspectra
was true, but notsp1d
, it's 2D spectra, and if neither were true, it was an image 😖)As a stretch goal, I also switch both the NIRISS and Generic/NIRspec parsers to using this new metadata parsing strategy. Notice how the metadata block at the bottom looks almost identical to the
mos_metadata_parser
used for the generic/nirspec strategy.... how interesting.... 😉 (Future PR)Tests are failing due to the need of this upstream patch. Please install glue-viz/glue-jupyter#336 when doing your local testing!
Blocked by
Data
to be loaded into the Table Viewer glue-viz/glue-jupyter#336glue-jupyter
minversion insetup.cfg
after upstream fix is merged and released.Change log entry
CHANGES.rst
? If you want to avoid merge conflicts,list the proposed change log here for review and add to
CHANGES.rst
before merge. If no, maintainershould add a
no-changelog-entry-needed
label.Checklist for package maintainer(s)
This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.
trivial
label.