-
-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concept: pseudocode proposal for wavelength calibration infrastructure #152
Conversation
cal1d = WavelengthCalibration1D(spec1d_lamp, (CalibrationLine(5500, 20), CalibrationLine(6200, 32))) | ||
cal1d = WavelengthCalibration1D(spec1d_lamp, (CalibrationLine.by_name('H_alpha', 20, 'uphill'), CalibrationLine.by_name('Lyman_alpha', 32, 'max', {'range': 10}))) | ||
cal1d = WavelengthCalibration1D(spec1d_lamp, ((5500, 20), ('Lyman_alpha', 32))) | ||
cal1d = WavelengthCalibration1D(spec1d_lamp, ((5500, 20), (6000, 30)), default_refinement_method='gaussian') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to support per-line or global refinement options, or both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my mind, the user would do a first pass automatically and then refine the lines that need refinement. I wonder if the refinement to the line centroid should happen earlier and not while the wavelength solution is calculated. To follow Tim's process (and if I understand this code correctly), this represents step 3 (fit model), while the centroid refinement happens in step 2 (identify).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, this proposal merges Tim's steps 2 and 3 into a single call, but does allow the user to inspect the results from step 2 and modify the inputs. My reasoning for having these as a single class is:
- I don't expect the wavelength solution to be expensive at all, so having it computed and then modifying the inputs shouldn't be a computational "burden". And although both steps are covered in the same class, the user technically can defer step 3 by asking for
cal1d.refined_lines
and modifying those inputs as necessary before callingcal1d(spec_targ)
. - The 2D case (reidentify, "step 4") requires information about how to refine the lines at any given "row", and I think it's elegant to have similar inputs to the 1D and 2D cases, so I like having refinement information passed to the 1D case as well. Alternatively, we could split the 2D case into two steps: the first that just refines the positions across all the rows and returns those positions and the second that then fits a wavelength solution.
In other words, I would suggest we either merge steps 2 & 3 from Tim's summary below or also split step 4 into "reidentify lines across rows" and "fit 2D model".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my mind, the user would do a first pass automatically and then refine the lines that need refinement. I wonder if the refinement to the line centroid should happen earlier and not while the wavelength solution is calculated. To follow Tim's process (and if I understand this code correctly), this represents step 3 (fit model), while the centroid refinement happens in step 2 (identify).
this is correct. my step 2 has a step 2a: find lines and centroid them and step 2b: match lines to lines with known wavelengths. centroids shouldn't need to be tweaked. there will be iteration between steps 2b and 3 to find new matches as the wavelength solution is refined and then refining the solution further with more matched lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tepickering - I think the steps you state as "match lines to lines with known wavelengths. centroids shouldn't need to be tweaked. there will be iteration between steps 2b and 3 to find new matches as the wavelength solution is refined and then refining the solution further with more matched lines." would fall within this proposal's autoidentify
classmethod. That is imagined to take a line list instead of a manual list of wavelength-pixel pairs, find the lines in the spectrum, match to the lines in the list, and create that list of wavelength-pixel pairs automatically. At that point they can still be refined when passing on to the 2D step.
So modifying the original example:
cal1d = WavelengthCalibration1D.autoidentify(spec1d_lamp, common_stellar_line_list)
cal1d.lines # list of wavelength-pixel pairs determined automatically by identifying and matching lines between the observations and the line-list. User can plot, inspect, modify, etc.
cal2d = WavelengthCalibration2D(spec2d_lamp, trace_lamp, cal1d.lines)
calibrated_spec2d = cal2d(spec2d_source)
Does that sound reasonable to you or is there something missing or in the wrong order for this workflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tepickering - just wanted to revist this... do you think the modification about would accomplish your use-case? Or if not, do you have suggestions for how to alter the API in order to support your steps better?
# the refined lines from the 1D calibration as the starting point | ||
cal2d = WavelengthCalibration2D(spec2d_lamp, trace_lamp, ((5500, 20), (6000, 30))) | ||
cal2d = WavelengthCalibration2D(spec2d_lamp, trace_lamp, cal1d.refined_lines) | ||
spec2d_targ_cal = cal2d(spec2d_targ, trace_targ?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does 2D calibration need to support different trace for lamp and source? If so, how would that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the answer is "no" here in one 2D case, but see the broader comment I'm about to leave for a bit more nuance
in the other case. ("no" means "what you have here is fine")
Codecov Report
@@ Coverage Diff @@
## main #152 +/- ##
=======================================
Coverage 75.46% 75.46%
=======================================
Files 9 9
Lines 693 693
=======================================
Hits 523 523
Misses 170 170 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
let me break down the process at a higher level so that we're all on the same page and can have a clearer picture of how things need to fit together for various use cases:
|
My own answers to @kecnry's open questions. Note I have not yet read @tepickering's comment to try to keep my perspective independent, so I might revise some of this immediately after writing it after I've read his 😉
I think general - i.e. use "spectral_axis" or just "world_coord" or "physical_value", and also, very critically, they must be
I don't see why we can't do both? I.e., dataclass by default, but cache only when needed? (a la https://stackoverflow.com/questions/51079503/dataclasses-and-property-decorator )
I am very against returning a full
I don't completely get this question, but I think the answer is yes - so my comment at the end for how I think about this.
I think it's probably ok to just fit part of the trace that's in-bounds. That is, the model ends up just unconstraned for pixels where the trace fell off the edge (and might fall into the wildly-extrapolated failure mode of polynomials, etc, but that's fine)
Definitely this is an important and highly non-trivial thing. I think it's ok right now to basically say "all the extraction methods don't work with 2D solutions right now", and we can work on that later, as there's a whole pile of literature and experience on this topic we'd need to explore more. But for many science cases one doesn't care because you want to forward model the whole spectrum anyway, so the user doesn't need to extract. |
Thanks for the comments/thoughts @eteq
Ok, I think that is consistent with the proposal here, but correct me if I'm misunderstanding.
Let me try to explain a little better before breaking out a whiteboard. At each "row" (ok, stepping in the cross-dispersion axis away from the trace), the pixel positions of the lines are refined, giving a pixel coordinate at each row for each line. Are these allowed to be entirely independent of each other as just raw data points? Does each row get an independent 1D wavelength solution or does the whole image get a 2D model fitted/applied? Or are the wavelengths smoothed or have a separate 1D model fit to them before the wavelength solution is fitted?
Works for me, and since your answer to the following question is to not handle 2D solutions during extraction yet, we also won't have to handle this scenario... yet. But down the road we may need to think about how extraction will work on a 2D spectrum where some regions have an undefined solution. Either way, as we scope out the work, we at least need to make sure we have the checks to raise an error if passing a 2D wavelength solution into extraction. |
Now some separate high-level thoughts: I found this API's naming a bit un-intuitive because I don't see the calibration as intrinsically 1D or 2D (because it's always 2D) but rather the resulting WCS is 1D or 2D. So I think we want the two cases to be like this: Case 1: follow-the-spectrum-trace:
Case 2: 2D wavelength solution
So the 2D solution isn't a completely separate thing, but is rather an extension of the 1D case. You'll also note that neither require an extracted spectrum, just a trace, and some use cases it's very very important that you not expect an extracted spectrum. Relatedly, but separately, I think too much is happening in the initializer in this API. That is, it looks like you are suggesting that all of the steps above happen on initialization, and the WCS is re-computed when something changes. I think that's not ideal because frequently one wants to apply the same identify steps to a bunch of spectra, and I think it's better to create an object that operates on the spectra, rather than one that contains the spectrum. So instead I'd modify your API to do something like:
that's for my case 1. Case 2 would add this:
I'm less confident that cal2d api there makes sense, but hopefully you get the gist that the one builds from the other, they can't be treated independently. EDIT: I added |
(Oops, sorry, I think our posts collided there, @kecnry - I was writing that before seeing yours) |
To respond to @kecnry last post:
👍 to most of this but I am very very very strongly against "assume default units". History has shown that way leads to madness and sadness.
Sorry, yes you're right, I missed the
Hopefully my post above cleared a lot of this up, but my thinking is that yes, we definitely do need to support this stuff, but it would be the same extraction parameters used to extract the arc in the first place. That is, this class just uses the extraction object to extract from the arc and that would do any smoothing, etc. The simplest version (which would probably be the default?) would be a straight line along the rows, but the user can provide really whatever they want and
👍 |
Ohh, something I just realized: when I was saying And also I agree with all of @tepickering's points/terminology, although I'm not convinced we want step 5, since in my mind "apply" means "put WCS where there wasn't one before", not "update an existing one with small changes" |
I'll need to think carefully through your proposed changes before replying to those, but I just wanted to clarify that I was thinking of having the |
Oh, gotcha, @kecnry . I agree with that in a general sense, but my quibble is: PEP8 (see note 2 under https://peps.python.org/pep-0008/#designing-for-inheritance) says (But I'm not dead set on this - PEP8 also says "However, know when to be inconsistent – sometimes style guide recommendations just aren’t applicable. ") |
Reading more carefully, I like how close #152 (comment) and #152 (comment) in practice even if framed differently. That gives me some confidence this is a good path to go down |
in my parlance,
again, i was conflating a step 5a: apply calibration WCS to target spectra and step 5b: update/tweak target spectra WCS. "apply" could be "put WCS where one wasn't before", but i think it's more accurately update detector WCS to physical WCS, at least in the API we're discussing here. conceptually, step 5b is just using lines within the target spectrum (e.g. sky lines) and doing another iteration of steps 3 and 4 with them and then 5a to update the WCS. so maybe we don't need this separated out, programatically. 5a might just be i do want to note that as a |
@eteq - I'm revisiting this whole conversation and think we're all mostly (?) on the same page, but looks like a few items you brought up were lost in all the threads and not discussed (if there's anything else that you think was not resolved, .
The 1D and 2D in this PR refer to the input spectrum. In the first case the spectrum is either a single row/trace of a 2D spectrum or an already extracted spectrum, and in the 2nd (2D) case, the calibration "walks" up and down a 2D spectral image. Because of the separate inputs and logic under-the-hood, it made sense to me to separate them, but the names themselves could easily be swapped.
To me this feels much more "interactive" and would feel a bit out of place with the current APIs of other steps in specreduce, and maybe more natural as the API within jdaviz's plugin which would wrap the calibration in specreduce. In my mind, the user interacts with specreduce objects by either changing inputs and re-running or possibly by revising the input "parameters" on an already instantiated object and then able to re-run the actual computation/calibration (in other words, no specific methods for interactively plotting/modifying individual lines, etc). But I'm also open to the idea of trying to start heading in this general direction if that is the consensus... in which case we might want to also think about what that means for trace/background/extract. |
I don't have much to add to this discussion, which seems quite nuanced. As long as we capture the following workflow,
|
Is there an expectation here that we will limit this to linear solutions to the dispersion, or do we need to account for non-linear spectral WCS solutions as well? I've been doing a bit of reading up on spectral WCS representations as part of this, and the non-linear case turns out to be more complicated than I expected. As far as I can tell you can't just encode, e.g., a second-degree polynomial instead of a linear solution. Do we want to use GWCS for this, which may be more flexible or have better support in this regard? |
yes, we will need to use GWCS for this. we will need the ability to do distortion corrections of arbitrary order as well as support discontiguous WCS (e.g. IFU slices, echelle orders, multi-slit). the gemini pipeline code, DRAGONS, uses GWCS internally and may be a good source of examples to glean from. |
FOR DISCUSSION: NOT INTENDED TO MERGE
This shows the basic class-infrastructure and syntax that could be designed to support 1d and 2d wavelength calibration, both with manual line input and auto-identifying from a line-list. See also #71 for an older proposal that served as some inspiration here. See the diff for the class architecture itself, and I'll just copy the user-facing proposed syntax here as well:
Open questions:
Other considerations: