-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multilayer preproc #1026
Add multilayer preproc #1026
Conversation
I ran it successfully on two-step preprocessing, split at demodulation. Looks functional to me - thank you for implementing this! Test performed (found under NERSC /global/cfs/cdirs/sobs/users/wuhyun/multilayer_preproc_tests for anyone interested):
I have one general question. Currently, the initial preprocessing (preprocess_tod) needs to be run before the second (multilayer_preprocess_tod). Would it be worth combining the two processes (e.g. have multilayer run preprocess_tod inside it), so that we can save up some time reloading the first preprocess? For context, I found that on NERSC login node it takes ~120s to run and ~60s to load a preprocess config up to demodulation (single wafer & bandpass). Maybe these run fast enough that we don't worry about it much? |
Thanks for testing! I had considered incorporating We have plans to integrate a single main function interface for |
Thank you for your answer, that sounds good to me. |
Hey @Wuhyun thanks for reviewing and testing! What is the import statement that you're referring to? Also after chatting with @mmccrackan we're going to go ahead and implement your suggestion to allow the multilayer function to run through both database builds and not require that the first one is prebuilt via preprocess_tod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, forgot to submit this eariler.
Okay, I have made a bunch of changes to the functions here. I've tested it a fair bit, but there are a lot of logic branches here, so I probably missed a few cases. The main differences are:
|
outputs_proc = save_group(proc_aman, obs_id, configs_proc, dets, context_proc, | ||
subdir='temp_proc/', overwrite=overwrite) | ||
outputs_proc = save_group(obs_id, configs_proc, dets, context_proc, subdir='temp_proc') | ||
if overwrite or not os.path.exists(outputs_proc['temp_file']): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic should go before pipe.run
if we want to catch the logical case where the code exited after writing some of the groups (but not all) to the temp directory.
|
||
return aman | ||
else: | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In check_cfg_match
if the dependency check fails it just raises a logger.warning()
and returns False
let's raise an exception or print something to stdout here in the else
statement that states that the dependency check fails.
dbix = {'obs:obs_id':obs_id} | ||
for gb, g in zip(group_by, cur_groups[0]): | ||
dbix[f'dets:{gb}'] = g | ||
print(dbix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably go to a logger instead of to stdout (if we actually need this, maybe only needed for debugging)
overwrite: bool | ||
Optional. Whether or not to overwrite existing entries in the preprocess manifest db. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an argument.
@@ -335,35 +337,213 @@ def load_and_preprocess(obs_id, configs, context=None, dets=None, meta=None, | |||
return aman | |||
|
|||
|
|||
def preproc_or_load_group(obs_id, configs, dets, logger=None, | |||
context=None, overwrite=False): | |||
def multilayer_load_and_preprocess(obs_id, configs_init, configs_proc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context_<>
arguments not used
an initial and a dependent pipeline stage. If the preprocess database entry for | ||
this obsid-dets group already exists then this function will just load back the | ||
processed tod calling either the ``load_and_preprocess`` or | ||
''multilayer_load_and_preprocess`` functions. If the db entry does not exist of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong type of quotes should be `` at the beginning of multilayer_load_and_preprocess. If the db entry does not exists or not of
if type(configs_init) == str: | ||
configs_init = yaml.safe_load(open(configs_init, "r")) | ||
|
||
if context_init is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as line 484 here and at 583
if db_init_exist and (not overwrite): | ||
if db_proc_exist: | ||
logger.info(f"both db and depdendent db exist for {obs_id} {dets} loading data and applying preprocessing.") | ||
aman = multilayer_load_and_preprocess(obs_id=obs_id, dets=dets, configs_init=configs_init, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
write this (and line 618) into a try-except similar to lines 624/648 to prevent crashing @chervias's mapmaker script and writing a more useful error message to his error log.
aman.wrap('tags', tags) | ||
proc_aman, success = pipe.run(aman) | ||
proc_aman, success = pipe_init.run(aman) | ||
aman.wrap('preprocess', proc_aman) | ||
except Exception as e: | ||
error = f'Failed to load: {obs_id} {dets}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These error messages should now include reference to whether this is error-ing out in the init
or proc
section to aid debugging where in this mess of nested if-else statements we're exiting.
return success, [obs_id, dets], [obs_id, dets], None | ||
|
||
outputs_init = save_group(obs_id, configs_init, dets, context_init, subdir='temp') | ||
if overwrite or not os.path.exists(outputs_init['temp_file']): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check should probably happen before getting to this step as referenced above in L642
I've added some things to address the existing comments. There are likely still bugs or logic paths that might fail so I'll keep testing, but I wanted to get the updates out so everyone could take a look. The biggest change is the addition of a new function It is probably a good idea to do this before running
I've also wrapped up most commands in |
…yer preprocessing
Addresses issue #1003.