Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sss module #473

Merged
merged 50 commits into from
Oct 30, 2023
Merged

Sss module #473

merged 50 commits into from
Oct 30, 2023

Conversation

msilvafe
Copy link
Contributor

@msilvafe msilvafe commented May 23, 2023

This adds the ability to fit out Legendre polynomials in az-signal space for ground/sss template subtraction. I added the basic fitter and signal subtraction functions that interface with the axis managers, much like the hwpss module. There's some basic docs and a simple unit test in here as well. However, I think there are a few things worth discussing and/or adding before we move into final PR review:

  1. Do we want the option to bin in az before fitting?
  2. Do we need to include some fit statistics?
  3. Is this the right place in the repository for this to live?
  4. We should add these processes into the preprocess module as well
  5. Do we have a target time this should run in? I haven't done any realistic timing tests but on the toast data I ran it had 860 detectors for 14 minutes and took 6.5 minutes to perform the fitting.

I have tested that this works on both a toast sim on simons one found here: /home/msilvafe/mnt/flp_dev/sat_sims/20221213/out_sss and a dummy axis manager I created that just has az-legendre polynomials and white noise (function is in the unit test script to generate this). Some examples of those below:

TOAST Sim Test Dummy AxisManager Test
TOAST_test_SSS Dummy_test_SSS
Seems like TOASTs default template to use for the Az signal is not really well modeled by these legendre polynomials but given this is what ABS used that's what I started with. I'm also fairly certain the TOAST default isn't very physical given previous conversations with Reijo. The fit recovers the input as expected when the input signal is just noise + legendre polynomials in az.

Both of these are generated in a notebook on simons1: '/mnt/so1/users/msilvafe/notebooks/flp_dev/20230518_TOD_Processing.ipynb'

@tterasaki
Copy link
Contributor

Thank you for grateful work! I think binning will be helpful.
When we want to assess SSS itself, such as making ground template by coadding SSS of all observation, the full sampled time-ordered SSS data will be too much if not binned.

@msilvafe
Copy link
Contributor Author

msilvafe commented May 23, 2023

Thank you for grateful work! I think binning will be helpful. When we want to assess SSS itself, such as making ground template by coadding SSS of all observation, the full sampled time-ordered SSS data will be too much if not binned.

Thanks for taking a look @tterasaki. Feel free to commit a binning function and any statistics you want. We could probably implement a linreg function as we did in the hwpss module as well using scipy.special.legendre since they're orthogonal in this -1 to 1 space if we end up needing it to run faster.

Now that you point out that we want to be able to bin over all of the observations in order to make a ground template, I'm realizing that my conversion from az pointing -> x won't work. It takes x to +/-1 for the az range in the given TOD not the full 360 deg az range. I suspect that's not what we want to do either. We want x to be az normalized to have +/-180 = +/-1 always and have the bins always be defined over that full +/-180 range. That should be a relatively simple code change, but without testing, I'm not sure the fits won't be worse when fitting a subset of the az-range instead of the full +/-180.

@kmharrington
Copy link
Member

An incomplete list of thoughts:

  1. Yes binning with masking (masking flags) would be a good option. This option should also run significantly faster than a fit to the whole time series. Could you and @tterasaki look at making a more generic version of get_binned_signal that we could put in TOD ops and call in all places we need to bin data? This is the type of operation where c++ acceleration is likely to become helpful, so if we start by having all functions call the same binning function then it will be easier to optimize later.

  2. Fit statistics will be helpful in the future if they're saved as a data product so we can see what sort of distributions we get.

  3. Just move this file into tod_ops and pin the .rst into the tod_ops section.

  4. Yes please.

  5. Ya, very slow. Binning will likely be helpful in speeding things up if the slow speed is just the fits. You can try linearizing the fits like the HWP ones but I'm a bit worried that's not the issue b/c the docs for the function you're calling says it's doing basically the same thing. I could also now be asking for a generalized linear fitting function that can be called on many different times of linear fits.

More thoughts on binning / data products:

  • I'm thinking we could be using this in two different ways, one for instrument characterization of azimuth synched pickup and one for just cleaning low frequency modes from the data. What you have now is great for the second thing but we'd need more dev to use this for the first thing.

  • Binning with the median instead of the mean is more robust to junk in the data but takes longer to calculate.

  • For SSS instrument characterization, we're going to have to deal with more than just az ranges. We'll also be combining as a function of elevation and boresight rotation. It is safe to say the elevation and boresight will be constant over the observation, but mostly pointing out that we'll need to be recombining this data into different groups later. For that, I think saving the binned data (not the fits) on a standard set of bins (configurable in the preprocess module?) will be the most useful. If the binned data / errors are saved then we can combine them from different observations and then fit.

  • You are right that if we make bins from -180 -> +180 then the fits will change. In particular, for single observations there will be modes that we have no constraining power for. Low order modes will all look similar to each other. This is another reason for saving the binned data then doing fits across many observations, at least for when we want to build up the actual SSS from things like ground pickup.

  • I would also leave what you have as an option, because it is what we could use for just cleaning "any low frequency mode" from the data.

@tterasaki
Copy link
Contributor

I have added the binning (and removed the tod fitting).
Now binning function is under the tod_ops, and get_hwpss uses the general binning function as well.
After the binning, two method can be specified: 'interpolation' and 'fitting'.
In 'interpolation', use the binned signal as SSS template directly.
The good point of 'interpolation' is that this method does not depend on fitting.
So I think we will use 'interpolation' usually.
The fit method is the same as before.

One note is that default of binning [-np.pi, np.pi].
In this case, binning is done over [-np.pi, np.pi], which means there is some bins which do not contains samples.
And the full range of [-np.pi, np.pi] is converted to [-1, 1] of x_Lugendre.
So fitting is applied for data in the partial region which contains non-zero samples.
If you want to do fitting using full range from minimum az to maximum az, you can use bin_range = None.

You can see my demonstration in /mnt/so1/users/tterasaki/workspace/sss_dev/230531_debug02_1_bin_and_fit.ipynb

@msilvafe
Copy link
Contributor Author

I have added the binning (and removed the tod fitting). Now binning function is under the tod_ops, and get_hwpss uses the general binning function as well. After the binning, two method can be specified: 'interpolation' and 'fitting'. In 'interpolation', use the binned signal as SSS template directly. The good point of 'interpolation' is that this method does not depend on fitting. So I think we will use 'interpolation' usually. The fit method is the same as before.

One note is that default of binning [-np.pi, np.pi]. In this case, binning is done over [-np.pi, np.pi], which means there is some bins which do not contains samples. And the full range of [-np.pi, np.pi] is converted to [-1, 1] of x_Lugendre. So fitting is applied for data in the partial region which contains non-zero samples. If you want to do fitting using full range from minimum az to maximum az, you can use bin_range = None.

You can see my demonstration in /mnt/so1/users/tterasaki/workspace/sss_dev/230531_debug02_1_bin_and_fit.ipynb

I like the method = "interpolation" or "fit" solution, and I think defaulting to using the bin default to [-pi, pi] is reasonable while retaining the option to map min/max az -> -/+ 1 when None is passed.

Next, I think we want to add the preprocess module process so that the sss_stats axis manager gets passed to the proc_aman. I imagine that if this is runs automatically with preprocess each time (I assume the automatic thing to run is with argument calc but not process so that the proc_aman returns the binned sss with stats but does not subtract the fit/interpolation from the data) then we will build up a ndet x nbin array for each observation and it should be easy to load back a seasons worth of proc_amans and average the binned signal arrays then run the fit method to estimate a ground template. I see one issue with this however; since the fit method is buried inside the SSS function it looks like it cannot accept an ndet x nbin (or axis manager with an axis that is ndet x nbin) and just fit that directly. So I suggest moving everything inside the if method == 'fit': statement to a separate fit_sss function that can take in arguments: sss_stats axis manager, bin_range, nmodes and return the fit or merge/wrap it into the input sss_stats axis manager. I also think that the nmodes argument should be given a default = 20 instead of None since it looks like as it is now if you run with method=fit and don't pass nmodes, the function will exit with an error.

2 related questions for @kmharrington:

  1. Is it possible to load back a season's worth of proc_aman without loading the data as well through the current preprocess framework or does that currently involve some SQL code that's not wrapped nicely in sotodlib functions yet?
  2. Say we have the season-averaged sss/ground template calculated as described in the paragraph above. Can we update the preprocess configs/saved metadata so that when you load the data with process but not calc or save, the fitted sss estimated from the full season can be subtracted from loaded data in the correct order in the processing?

@tterasaki
Copy link
Contributor

@msilvafe Can we merge this PR into master? I mean, although there are something to be updated around preprocessing, the core of this PR looks working with the site data. This function is useful for current site data with SSS. So I want to merge it if it is possible and work on the preprocessing stuff in another branch.

@msilvafe
Copy link
Contributor Author

@msilvafe Can we merge this PR into master? I mean, although there are something to be updated around preprocessing, the core of this PR looks working with the site data. This function is useful for current site data with SSS. So I want to merge it if it is possible and work on the preprocessing stuff in another branch.

I have one last set of local changes to this to pull the fitting part out of the main sss function into its own function. I think that's the last thing I wanted to do before converting to a real PR from draft. Can you wait for that?

@msilvafe msilvafe marked this pull request as ready for review October 14, 2023 16:23
@msilvafe
Copy link
Contributor Author

@mhasself and @tterasaki I think this is ready for review so we can merge and start using it on the commissioning data.

I moved the fitting function out of the main get_sss function to fit_sss so it can be called independently. My thinking on this follows my comment above about using get_sss to output a binned version of the sss over many observations which can be averaged into a new sss_stats-like axis manager and then passed into fit_sss for studying ground template stability over time, el ranges, etc.

One detail which we can decide to update later is if we want this merged to use the binned_signal_sigma as weights in the legendre fitting. The function we are using now numpy.polynomial.legendre.legfit takes in x of size m and y of size m, k but does not allow weights of size m, k. I tried it fitting a single detector (i.e. x size m, y size m, and weights size m) and it did seem to improve the fits (by eye) but if we want to run that way we will have to loop over detectors which sacrifices some of the speedup we got using this function. However if the fit part is not run too often (only on these binned averages of many observations types of datasets) maybe this slow down is fine.

@tterasaki
Copy link
Contributor

Looks nice. Thank you Max!

@tterasaki
Copy link
Contributor

@msilvafe @mhasself
OK, I have reflected all comments from Matthew. Can you check it?

@msilvafe
Copy link
Contributor Author

@msilvafe @mhasself OK, I have reflected all comments from Matthew. Can you check it?

I looked over this, and it all looks good! Thanks for @tterasaki. I ran the sphinx-build on the docs to see how they looked and added some minor things related to typos, missing returned parameters, formatting, etc.

Copy link
Member

@mhasself mhasself left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I think class name "EstimateAzSS" would be more consistent with our other applications of PascalCase; I don't really like "EstimateAZSS". But merging is in your hands now.

@@ -245,22 +245,25 @@ def process(self, aman, proc_aman):
hwp.demod_tod(aman, **self.process_cfgs)


class EstimateSSS(_Preprocess):
"""Fits the amplitude of legendre polynomials in sig vs Az space.
class EstimateAZSS(_Preprocess):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean "AZSS"? I would have thought "AzSS".

@@ -0,0 +1,225 @@
"""Module for estimating Aziumth Synchronous Signal (azss)"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aziumth -> Azimuth.

@msilvafe msilvafe merged commit 38cffc2 into master Oct 30, 2023
4 checks passed
@msilvafe msilvafe deleted the sss_module branch October 30, 2023 17:01
@msilvafe msilvafe mentioned this pull request Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants