Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mosviz Retrieve SourceID from meta #1851

Merged
merged 4 commits into from
Jan 13, 2023

Conversation

duytnguyendtn
Copy link
Collaborator

@duytnguyendtn duytnguyendtn commented Nov 16, 2022

Description

This PR fixes a bug found by a user with a private dataset where mosviz fails to find and populate the SOURCEID column, despite SOURCEID being registered in the metadata plugin. The main fix does this by introducing a new, preferred method of finding the SOURCEID by searching through the .meta attached to the data object rather than re-reading the file and searching through the header. This requires glue-viz/glue-jupyter#336 to ensure the .meta is available before the metadata is parsed

This new strategy requires the data to be parsed and loaded first, to populate the data collection and associated .meta entries. Then, after all the data is loaded, then search for the metadata afterwards. Because the data already has its metadata attached to it, no need to reopen/pass around hdus anymore.

This PR also cleans up the metadata detection by explicitly requiring the parsers to specify the data_type of the data's metadata being parsed. We were kind of already requiring this, but in a convoluted way (relying on a spectra and sp1d flag, where if spectra and sp1d were both true, it's 1D spectra, if spectra was true, but not sp1d, it's 2D spectra, and if neither were true, it was an image 😖)

As a stretch goal, I also switch both the NIRISS and Generic/NIRspec parsers to using this new metadata parsing strategy. Notice how the metadata block at the bottom looks almost identical to the mos_metadata_parser used for the generic/nirspec strategy.... how interesting.... 😉 (Future PR)

Tests are failing due to the need of this upstream patch. Please install glue-viz/glue-jupyter#336 when doing your local testing!

Blocked by

  • Need to bump glue-jupyter minversion in setup.cfg after upstream fix is merged and released.

Change log entry

  • Is a change log needed? If yes, is it added to CHANGES.rst? If you want to avoid merge conflicts,
    list the proposed change log here for review and add to CHANGES.rst before merge. If no, maintainer
    should add a no-changelog-entry-needed label.

Checklist for package maintainer(s)

This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.

  • Are two approvals required? Branch protection rule does not check for the second approval. If a second approval is not necessary, please apply the trivial label.
  • Do the proposed changes actually accomplish desired goals? Also manually run the affected example notebooks, if necessary.
  • Do the proposed changes follow the STScI Style Guides?
  • Are tests added/updated as required? If so, do they follow the STScI Style Guides?
  • Are docs added/updated as required? If so, do they follow the STScI Style Guides?
  • Did the CI pass? If not, are the failures related?
  • Is a milestone set? Set this to bugfix milestone if this is a bug fix and needs to be released ASAP; otherwise, set this to the next major release milestone.
  • After merge, any internal documentations need updating (e.g., JIRA, Innerspace)?

@pllim
Copy link
Contributor

pllim commented Nov 17, 2022

Since this needs glue-viz/glue-jupyter#336 , should we convert this back to draft until that patch is merged and released? Then this PR would also need to bump glue-jupyter minversion.

Also since the original bug was found in private, how should we really review this?

@duytnguyendtn
Copy link
Collaborator Author

should we convert this back to draft until that patch is merged and released?

This is my first PR I've authored that requires an upstream fix. I opened this to for review because I figured we could review this simultaneously as we are getting the upstream miniversion bump ready? But I'm happy to move this back to draft if that's the policy!

original bug was found in private, how should we really review this?

The data was from our Viz Stress Test notes. See JDAT-2736 for the internal data

@pllim
Copy link
Contributor

pllim commented Nov 17, 2022

Moving this to draft PR status would prevent accidental merge even after someone approves it.

@duytnguyendtn duytnguyendtn marked this pull request as draft November 17, 2022 16:45
@duytnguyendtn
Copy link
Collaborator Author

Got it; moved back to draft

@rosteen rosteen modified the milestones: 3.2, 3.3 Jan 4, 2023
@camipacifici
Copy link
Contributor

The glue-jupyter patch has been merged. Can this be resumed? @duytnguyendtn

Remove unnecessary book keeping

fix hdu parser

Consolidate meta/hdu sourceid methods

Switch meta parser to common sourceid finder

Initialize table at config load

Generic parser: parse image data before image metadata to make image metadata available for metadata parser

Explicitly define which data type is being parsed

Parse IDs from proper data type

Fix meta parser detection

Make meta parser language generic to data type

Rely on 1D spectra entirely for identifier info

Parse 1D spectra metadata first to put Identifier as first column

Rewrite sourceid meta finder to be generic for any keyword

Meta parser use generic metadata searcher rather than hdu

Refactor NIRISS parser to start using generic metadata finder

Codestyle Cleanup

Remove expected warning due to increased robustness

Modify tests to expect mos table as first data object

codestyle

Change test to expect mos table to be first

Modify linking assumption to use first real data for Mosviz (accomodate mos table being dc[0])

Change test to expect mos table to be first

Simplify iterable check

Change test to expect mos table to be first

Change nirspec tests to expect new load order

Codestyle

Increase sourceid fallback robustness

Add Docstrings

Revive and move hdu parsing to source id by hdu method
@duytnguyendtn
Copy link
Collaborator Author

duytnguyendtn commented Jan 10, 2023

Almost ready for review; a note for @rosteen; this new strategy seems works out-of-the-box for the level 2 and nircam test case you included in #1835 because it actually searches the data.meta loaded for all entries loaded for the sourceid; hence, no need to manually specify the repeat number. Specifically:

https://github.com/rosteen/jdaviz/blob/30452e47d19bf8b06b509c52d5b45fc2fa631ce2/jdaviz/configs/mosviz/plugins/parsers.py#L191-L198

I did my best to gracefully remove it from this PR, but you may want to double check that I didn't misinterpret something or remove something that's actually necessary.

@codecov
Copy link

codecov bot commented Jan 10, 2023

Codecov Report

Base: 91.81% // Head: 91.78% // Decreases project coverage by -0.02% ⚠️

Coverage data is based on head (9186dab) compared to base (ffd343e).
Patch coverage: 92.85% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1851      +/-   ##
==========================================
- Coverage   91.81%   91.78%   -0.03%     
==========================================
  Files         140      140              
  Lines       15045    15066      +21     
==========================================
+ Hits        13814    13829      +15     
- Misses       1231     1237       +6     
Impacted Files Coverage Δ
jdaviz/configs/mosviz/helper.py 87.38% <81.81%> (+0.08%) ⬆️
jdaviz/configs/mosviz/plugins/parsers.py 89.83% <90.66%> (-0.85%) ⬇️
jdaviz/app.py 94.18% <100.00%> (-0.12%) ⬇️
jdaviz/configs/mosviz/tests/test_data_loading.py 100.00% <100.00%> (ø)
jdaviz/configs/mosviz/tests/test_parsers.py 99.07% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@duytnguyendtn duytnguyendtn marked this pull request as ready for review January 10, 2023 20:21
@duytnguyendtn
Copy link
Collaborator Author

duytnguyendtn commented Jan 10, 2023

Okay, I think I've cleared this PR as ready for review! Most of the changes are infrastructure behind-the-scenes; the coverage dropped a tiny bit, but weirdly seems unrelated to this PR (the patch coverage is passing)

Copy link
Contributor

@javerbukh javerbukh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good and the application runs well, nice work! I did have some difficulty testing all aspects of mosviz because of existing bugs/recent changes in main (slit overlay not appearing, images switching to a black screen after row select), but everything else works well and I think this is definitely an improvement over how the metaparser operated previously.

Copy link
Collaborator

@rosteen rosteen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple comments, still need to actually test it. Looks like a good improvement overall.

_add_to_table(app, filters_gratings, "Filter/Grating")
elif spectra and sp1d:
_add_to_table(app, names, "Identifier")
# Search all given keys to see if they exist. Return the first hit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is the use case for this "the metadata I want might be under one of multiple keys" rather than "give me the metadata for every key in this list"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's correct? Definitely the first part is correct. The "Identifier" field is the easiest to trace the logic path through; we previously pulled the Identifier through either "SOURCEID" or in some cases "OBJECT". Rather than manually hardcoding that loop, you can just provide this method a list of ['SOURCEID', 'OBJECT'] and it will search those keys for a value.

On top of that, the order you provide the list also specifies the priority order to return. In the above example, if both SOURCEID and OBJECT are present, then it will return SOURCEID

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, just checking that my understanding was correct.

Comment on lines +486 to +489
filters = query_metadata_by_component(app, "FILTER", "2D Spectra", FALLBACK_NAME)
gratings = query_metadata_by_component(app, "GRATING", "2D Spectra", FALLBACK_NAME)

if np.all([isinstance(x, fits.HDUList) for x in data_obj]):
filters_gratings = [(f+'/'+g) for f, g in zip(filters, gratings)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed when working on getting NIRCam to load that there is also some of this information in the "PUPIL" header keyword. It seems like the different instruments put different information in FILTER/GRATING/PUPIL - maybe this would be a good opportunity to get all three and display whichever ones are populated in their own separate columns. I think it should be as simple as adding pupil = query_metadata... and having three separate add_to_table later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a point I've been thinking of as well. We have historically combined Filter/Grating, but I've been thinking of whether it would be useful to split this out to separate columns in the table. (AKA, why are we even combining them in the first place?). This would also give room to include another Pupil field as well, and yes it would be as simple as what you describe above.

I hesitated on doing that here, as I wanted this PR to mainly keep the same functionality, but demonstrate the new infrastructure. I'd advocate a separate discussion with a PO/scientist to confirm whether this is what we want to do, and to a separate PR to make those changes, so that this infrastructure PR doesn't become even more bloated than it already is

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, that would be a small follow-up PR anyway. Let me test this and make sure it looks good in-notebook, hopefully I'll approve shortly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If in reality I actually coded this to half as elegantly as I architected it, it should hopefully be a very small change

Comment on lines +957 to +958
meta_filters = query_metadata_by_component(app, 'FILTER', "2D Spectra")
_add_to_table(app, meta_filters, "Filter/Grating")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm not understanding the context that these would be called in vs the add_to_table calls up in mos_meta_parser - do they not conflict?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha isn't that interesting... 😉 From the description:

As a stretch goal, I also switch both the NIRISS and Generic/NIRspec parsers to using this new metadata parsing strategy. Notice how the metadata block at the bottom looks almost identical to the mos_metadata_parser used for the generic/nirspec strategy.... how interesting.... 😉 (Future PR)

The NIRISS parser here never used the mos_meta_parser and still doesn't here. It used to have its own parsing logic. In this PR, I gutted out the NIRISS meta logic and replaced it with code that looks STRANGELY close to the mos_meta_parser as you discovered here. It's almost like we might be able to combine them soon~!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it 😄

Copy link
Collaborator

@rosteen rosteen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed that the table populates for NIRCam/NIRISS/NIRSpec, approving.

@duytnguyendtn
Copy link
Collaborator Author

Thanks for the eyes Jesse and Ricky!

@duytnguyendtn duytnguyendtn merged commit 10085eb into spacetelescope:main Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants