Add multilayer preproc #1026

msilvafe · 2024-11-12T14:57:45Z

Addresses issue #1003.

… common file

Wuhyun · 2024-12-03T17:40:33Z

I ran it successfully on two-step preprocessing, split at demodulation. Looks functional to me - thank you for implementing this!

Test performed (found under NERSC /global/cfs/cdirs/sobs/users/wuhyun/multilayer_preproc_tests for anyone interested):

# 1) Initial preproc
preprocess_tod(obs_id, config_1, group_list=[["ws0", "f150"]],...)
# 2) Dependent preproc
multilayer_preprocess_tod(obs_id, config_1, config_2, group_list=[["ws1", "f150"]],...)
# 3) Load from saved results
aman = multilayer_load_and_preprocess(obs_id, config_1, config_2, dets={"wafer_slot": "ws0", "wafer.bandpass": "f150"})

I have one general question. Currently, the initial preprocessing (preprocess_tod) needs to be run before the second (multilayer_preprocess_tod). Would it be worth combining the two processes (e.g. have multilayer run preprocess_tod inside it), so that we can save up some time reloading the first preprocess?

For context, I found that on NERSC login node it takes ~120s to run and ~60s to load a preprocess config up to demodulation (single wafer & bandpass). Maybe these run fast enough that we don't worry about it much?

mmccrackan · 2024-12-03T19:31:10Z

I have one general question. Currently, the initial preprocessing (preprocess_tod) needs to be run before the second (multilayer_preprocess_tod). Would it be worth combining the two processes (e.g. have multilayer run preprocess_tod inside it), so that we can save up some time reloading the first preprocess?

Thanks for testing!

I had considered incorporating preprocess_tod into multilayer_preprocess and it certainly is possible to do this, though there are a few considerations. There are some complexities due to the desire to run many obs in parallel and the need to do the outputs at the end. Since multilayer_preprocess_tod needs to check the pre-existing database from preprocess_tod, it would probably need to be accessed in parallel and populated for all entries that multilayer_preprocess needs.

We have plans to integrate a single main function interface for preprocess_tod, multilayer_preprocess_tod, and preprocess_obs in order to make using these functions simpler, so maybe we can revisit this after we have done that.

Wuhyun · 2024-12-04T15:49:32Z

Thank you for your answer, that sounds good to me.
Please add that one import statement and I'll be happy to approve this if others are.

msilvafe · 2024-12-04T22:30:29Z

Thank you for your answer, that sounds good to me. Please add that one import statement and I'll be happy to approve this if others are.

Hey @Wuhyun thanks for reviewing and testing! What is the import statement that you're referring to?

Also after chatting with @mmccrackan we're going to go ahead and implement your suggestion to allow the multilayer function to run through both database builds and not require that the first one is prebuilt via preprocess_tod.

Wuhyun

Sorry, forgot to submit this eariler.

sotodlib/site_pipeline/multilayer_preprocess_tod.py

…group

mmccrackan · 2024-12-06T21:30:18Z

Okay, I have made a bunch of changes to the functions here. I've tested it a fair bit, but there are a lot of logic branches here, so I probably missed a few cases. The main differences are:

multilayer_preprocess_tod.py will now call preproc_or_load_group.py, meaning that it can run the pipeline on the first config if the db is not found or overwrite is true.
preproc_or_load_group.py has been refactored to allow for one or two config files. If running one, it will operate like it did before. If two, it will either load both or it will load or run the first config and then run the second config.
both preprocess_tod.py and multilayer_preprocess_tod.py now write out temp files for each group instead of just a single file for all groups. The former writes to "temp/" as before, whereas the latter writes to "temp_proc/".
cleanup_mandb is now used for merging all the temp files in both cases.
Some helper functions like find_db and save_groups have been added and are used throughout.
Since we're now matching the per-group files of preproc_or_load_group.py in preprocess_tod.py and multilayer_preprocess_tod.py, the outputs have changed. The destination files are no longer output separately, but are output inside of the output lists. So the outputs for preprocess_tod.py are error and outputs, whereas for multilayer_preprocess_tod.py they are error, outputs_init, outputs_proc. The outputs may be empty depending on the runtime configuration.

msilvafe · 2024-12-06T22:22:47Z