Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outlier Improvement for FS/MOS outlier rejection #7828

Closed
stscijgbot-jp opened this issue Aug 16, 2023 · 25 comments
Closed

Outlier Improvement for FS/MOS outlier rejection #7828

stscijgbot-jp opened this issue Aug 16, 2023 · 25 comments

Comments

@stscijgbot-jp
Copy link
Collaborator

Issue JP-3347 was created on JIRA by Elena Manjavacas:

FS/MOS: many significant outliers and NaNs appear in the 2D and 1D extracted spectra. The solution proposed is to apply the IFU algorithm for outliers rejection.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Nimisha Kumari on JIRA:

This item came up in the JWST CAL WG meeting on 2023-08-01: https://outerspace.stsci.edu/display/JWSTCC/2023-08-01+Meeting+notes

 
This ticket is intended as the next step.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Anton Koekemoer on JIRA:

thanks for filing this ticket - to help with prioritizing it, could we ask NIRSpec also please to indicate rating numbers in the "Criticality Rationale" field for the "impact" and "urgency", along with additional text to describe the reason for the rating numbers that will be assigned? (see the figure further down the page at https://outerspace.stsci.edu/display/JWSTPWG/INS+Ticket+Prioritization])

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Jane Morrison on JIRA:

For an NIRSpec MOS science project David Law adapted the IFU outlier detection to work with some MSA data. It seems to work well.

Anton Koekemoer  how should be proceed on this work ? Elena Manjavacas would like to review what we have coded or who on the NIRSpec team/MSA should we run this by ?

@stscijgbot-jp
Copy link
Collaborator Author

stscijgbot-jp commented Nov 7, 2023

Comment by Jane Morrison on JIRA:

For an NIRSpec MOS science project David Law adapted the IFU outlier detection to work with some MSA data. It seems to work well.

Anton Koekemoer  how should be proceed on this work ? Elena Manjavacas would like to review what we have coded or who on the NIRSpec team/MSA should we run this by ?

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Jane Morrison on JIRA:

I took the routine that was developed for a NIRSpec MSA project David and I are working on and tried it on another MSA data set. It does need some refinement. It seems to flag the center of sources has bad. In the picture the green pixels are pixels flagged as bad using this test MSA outlier flagging algorithm and the bottom is the rate file before this algorithm was run on it. 

!image-2023-11-07-10-53-20-147.png|width=1113,height=573!

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Elena Manjavacas on JIRA:

Jane Morrison I am the FS pipeline lead, for MOS-related tests you should contact Susan Kassin .

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Anton Koekemoer on JIRA:

thanks Jane Morrison  for the updates and for your question above, indeed I'd suggest that you include in the discussion (and on this ticket) the relevant NIRSpec MOS staff, and have them review the results so far, along with any suggestions they might provide you in terms of additional datasets or other tests that might necessitate changes in the algorithms or parameters (or both). Thanks again!

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Melanie Clarke on JIRA:

David Law - I was just about to start looking at this ticket, for the next build.  Does it need more discussion first?

@stscijgbot-jp
Copy link
Collaborator Author

Comment by David Law on JIRA:

Melanie Clarke I think it requires some figuring out of what the problems are, and whether we need a new algorithm in order to be able to deal with it better.  I'm perfectly happy for you to look at it, but since it's a little vague I didn't want you to have to dig in alone.  I'd reassigned it to myself to set up some discussion with Christian Hayes and the NIRSpec team and see if we can converge on a clearer path forward.  If we're looking at JP-3682 this may also be related if we're potentially making some algorithms more similar between different modes.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Melanie Clarke on JIRA:

Okay, sounds good.  I also expected that this would need some exploration.  I can start by looking at what's going on in the current algorithm, and at the proposal to use something similar to the IFU algorithm.  Please loop me in when you and the NIRSpec team are ready to discuss.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Melanie Clarke on JIRA:

Looking at the current algorithm, there are two things that I think would help NIRSpec data right away.

One, the default weight type for the resampling is ivm, which is a poor match for many NIRSpec exposures.  With the default value of maskpt=0.7, much of the array is thrown away by the weight threshold check after resampling, so outlier values are not even assessed in those regions. Defaulting to weight_type=exptime would help.

Two, outlier rejection works by comparing the difference between the input data and a median image (resampled from all input and blotted back to match the input data) to the input error value.  Large outliers in the input data tend to also have large input errors, so this comparison fails for some of the worst outliers in the data.  It would be better to compare the input data to a median error value instead, computed across the dithered data.

Testing a quick implementation of these two changes with no further modification on some fixed slit and MOS data from the INS regression test set helps a lot to clean up the worst of the outliers. 

David Law Christian Hayes - Do you want me to start by making this change available for testing, and we can decide from there what else might be needed?

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Christian Hayes on JIRA:

Melanie Clarke thanks for taking a look, and if you could make that change available to test, we'd be happy to take a look and see if that plus some modification of the default options might work well for general cases.

@stscijgbot-jp
Copy link
Collaborator Author

stscijgbot-jp commented Sep 24, 2024

Comment by Melanie Clarke on JIRA:

Thanks Christian Hayes!  I have a rough draft here:

#8828

It's building on some other in-progress refactoring work for outlier detection, so it's not ready for merge, but it should be okay to test.  Let me know if you see problems!  My changes are only in the last commit.

For now, the median error thresholding change is implemented only for spectral data, but we might consider whether the same change should be done for imaging. 

While testing, I was running with:

--steps.outlier_detection.weight_type=exptime --steps.outlier_detection.fillval=NAN

@stscijgbot-jp
Copy link
Collaborator Author

Comment by David Law on JIRA:

Sounds good to me!  If the existing algorithm can be made to work that avoids the drawbacks of the IFU algorithm (which really looks for bad pixels rather than outliers per se).

A few considerations:

  • Presumably you mean comparing to the median error of a given resampled pixel across exposures, rather than a median across some spatial area?
  • I'd focus attention on whether or not this produces any unexpected outlier flagging in the regions around bright point sources where undersampling effects will be most important

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Melanie Clarke on JIRA:

Yes, median error across exposures.  Currently I'm doing this:

  • drizzle the error array the same way as the science, to approximate a resampled error for each exposure
  • median combine the drizzled error arrays across exposures, in the same way the drizzled science arrays are median combined
  • blot the median error back to the input data coordinates, in the same way the median science image is handled

Then, (input data - blotted median) is compared to blotted median error for thresholding purposes.

Do you have MIRI LRS examples with outlier detection problems?  I'm looking at the data in the INS regression test set, but it looks like outliers are already detected fine in that one, and these changes make no significant difference.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by David Law on JIRA:

I'm not aware of any particular issues with LRS outlier detection at the moment.  Tagging Greg Sloan though in case he knows of some.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Melanie Clarke on JIRA:

A further note on parameters, for NIRSpec testing:

I noticed that sometimes, it looks like weight_type=ivm is better for resampling in outlier detection, when there's not a lot of nods/dithers, since it tends to let fewer outliers into the resampled median image, making them easier to identify in the input.  To avoid the issue with blocking large regions of the array when using ivm weights, you can instead set the maskpt parameter to something low, or even 0.0.

So, we should consider whether this parameter set:

  weight_type=exptime, maskpt=0.7

or this parameter set:

  weight_type=ivm, maskpt=0.0

does a better job by default.

For a specific example from the INS regression test data set, I'm looking at PID 1810, obs 1, source 18139, G140M (ASN file in MAST is jw01810-o001_20240913t223253_spec3_00003_asn.json). For this 2 nod example, weight_type=ivm, maskpt=0.0 does a better job at catching the extreme outliers than weight_type=exptime, maskpt=0.7.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Christian Hayes on JIRA:

Thanks Melanie Clarke, I've done a bit of initial testing and I think that this change will do what we want, though we will need to do some investigation work on our end to set reasonable defaults for the outlier_detection parameters.

So far I've found that exptime/maspt=0.7 and ivm/maskpt=0.0 perform fairly similarly for catching outliers.  I haven't tried a 2-pt nod yet so I'll take a look at the example you mentioned.  

The one thing I have found is that with either of these settings the outlier detection can be a bit over-agressive when 1) flagging low S/N data, 2) flagging the trace of negative nods (though this shouldn't be used anyway), and 3) we still run into a few cases where some of the trace of high S/N point sources can get clipped (though that is already an issue with the current outlier detection).  I think all three of these might be addressed by changing some of the outlier detection thresholds.  In particular I found that changing the "scale" to "3.0 2.0" seemed to work reasonably well with the branch Melanie linked.  If it's useful here are some example datasets I saw these in:

  1. PID 2736 obs 7 source 6355, G395M (jw02736-o007_20240629t160903_spec3_00001_asn.json) - I see patchy nans throughout the combined s2d

  2. PID 1764 obs 14 source 1 (fixed slit - combining S200A1 and A2), G395H (jw01764-o014_20240911t092357_spec3_00002_asn.json) - the negative traces are flagged

  3. PID 1538 obs 160 G140M (using a custom asn for flux cal, where we noticed the issue of clipping fluxes) - I've attached an zoom in on one of the dithers in the crf file for this example since the data is public.  The outlier detection tends to clip the central trace of the point source when it concentrates in a single pixel.  Modifying the scale value for the derivative helps.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Melanie Clarke on JIRA:

Thanks so much for testing Christian Hayes!  The modification to the scale parameter looks very promising.

I can start cleaning up my PR for the code changes to use median errors so we can move it through reviews.  I will assume we want it for the resampled spectral modes only, so other modes are not affected (imaging, coron, ifu, tso). NIRSpec FS and MOS will be affected, and MIRI LRS. David Law - let me know if that's not okay for MIRI, or if you want me to look into doing the same change for imaging.

I will leave any parameter updates to future reference files from the NIRSpec team, except that I will update the default value for 'fillval' to 'NAN' in the software, instead of 'INDEF', to match current defaults in the resampling step.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Melanie Clarke on JIRA:

David Law Greg Sloan - pinging again to ask if these changes are not desirable for MIRI LRS.  I have currently written them to be the new standard behavior for all slit spectra, but I can instead add a parameter to allow NIRSpec to turn it on and MIRI to turn it off, if necessary.  I have not yet found any MIRI examples for which it makes a significant difference.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by David Law on JIRA:

Melanie Clarke I think these code changes make physical sense for MIRI LRS as well, so I don't think it's necessary to add such a parameter.  In the couple quick cases I've tested as well there is minimal or zero impact on the result, with the only couple pixels in which I saw a difference favoring the new code.  Greg Sloan Please feel free to comment as well if you know of any problem cases.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Melanie Clarke on JIRA:

Great, thanks for following up!  I'll move forward with the PR as is, then.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Tyler Pauly on JIRA:

Fixed by #8828

David Law I believe this has already been tested as part of the development process by both NIRSpec and MIRI reps, but for completeness I've reassigned to you to confirm this, and I've set the ticket to RTT.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by David Law on JIRA:

Christian Hayes Since the impact on MIRI seems pretty negligible, I'll leave it to you to test and close this.  One related question- in the figure attached above showing the algorithm performance on an example case it uses EXPTIME weighting rather than (what I think is still the default) IVM weighting.  Does this mean that NIRSpec also needs a param ref file update to go along with this PR?  If so this would presumably be in addition to the update provided recently as a patch for the 11.1 build?

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Christian Hayes on JIRA:

David Law thanks happy to take it over.  Yes we will want to update param ref files and are looking into the changes we want to make.  The defaults are currently sufficiently strict (especially now that we have the files in for the 11.1 build patch) that I don't expect these changes will have a significant impact on the data yet, but this will enable parameter changes to have a meaningful effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant