-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Richardson Lucy Parallelization V2 #274
base: develop
Are you sure you want to change the base?
Conversation
…keleton from RichardsonLucy.py
New files: RichardsonLucyParallel.py and RLparallelscript.py Potentially modified files: dataIF_COSI_DC2.py deconvolution_algorithm_base.py image_deconvolution_data_interface_base.py image_deconvolution.py model_base.py
RichardsonLucySimple.py and RichardsonLucy.py were modified to include the propagation of the config file from the user facing image_deconvolution object to the respective deconvolution algorithms
…ed to subsequently overwrite remote. Merge remote-tracking branch 'refs/remotes/origin/develop' into develop
Switching to histpy.Histogram()
Can also work with eps-to-Em mapping. Need to generalize
Interpolated scheme in get_point_source_response() tested and works as intended.
Feature/general response
… custom data types
…interface and main script to test the implementation
Create new RLparallelscript.py with MPI porting capabilities Update dataIFWithParallelSupport.py to cull unnecessary for loops
Fixed bugs with summed_exposure_map(needs to be summed across processes) and dict_bkg_norm (was only being updated in MASTER node)
…pports parallel execution with a simple change to DataIF. Next task is to generalize DataIF
…as been removed. Bug fixed.
Currently, DC2 (existing) and Parallel (new) Data Interfaces can be used interchangeably for serial code. They produce the same output. However, the latter must be used for parallel code. |
Thanks @avalluvan! I think it's a great improvement with respect to V1. I still need to look at the code in detail, but I read your description and checked the files changed. A few first impressions:
It’s good that you open the PR so we can start the review, but I’d wait for these two limitations to be resolved before merging.
|
…llel Three instances (all pertaining to saving results) remain in RichardsonLucy class.
On point 1, I have updated the code to migrate most parallelization features to I have resolved the merge conflicts in the response handling code. I had added a few comments to parts of the imaging code that took me a while to figure out for easier reading. Do you want me to remove those? The tutorial notebooks were probably modified greatly and I would not want to commit those changes to the develop branch. Do you think we should wait till the dataIF code is modified for DC3 and handles FullDetectorResponse objects properly? The current pull request adds a feature for parallel execution on top of the existing DC2 imaging codes, and I think could be merged as an iterative update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @avalluvan. I added some comments on your RL codes directly.
By the way, I noticed that you changed some classes which are probably not related to the RL parallelization itself. For example, FullDetectorResponse, SpacecraftFile, PointSourceResponse. I am concerning that reviewing these different issues simultaneously may cause mistakes easily. So, is it possible to separate them from this PR? Then, we can review this PR more easily.
# expected count histograms | ||
self.expectation_list = self.calc_expectation_list(model = self.initial_model, dict_bkg_norm = self.dict_bkg_norm) | ||
logger.info("The expected count histograms were calculated with the initial model map.") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to keep these lines? To use the updated model for the likelihood calculation, I wanted to perform the expected count calculation at the post-processing and the initialization step and skip it in the Estep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can undo these changes. Do you plan on moving this to Estep()
in the future / removing Estep()
altogether?
@@ -66,16 +70,26 @@ def __init__(self, initial_model, dataset, mask, parameter): | |||
else: | |||
os.makedirs(self.save_results_directory) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that RL needs to know if it is performed on the master node and needs this kind of parameter. I would suggest preparing two parameters alternatively, something like
- self.parallel_computation = True / False
- self.master_node = True / False
I want to prepare a parameter that explicitly tells if the computation is in parallel or not. I will add some suggestions regarding these changes at other lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the ideas we discussed in a previous meeting was to let the program directly infer if it was being run in serial or parallel mode. In fact, the suggested flag variables were what I used in the initial V2 pull request code. Do you recommend making this modification, i.e, inferring self.parallel_computation in image_deconvolution.py
or in RichardsonLucy.py
. The issue with inferring this in the image deconvolution class is - what happens when we have multiple input datasets? --> [dataset1, dataset2, ...], each dataset will have its own "sub_comm" object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand why this file is showing up in this pull request.
@@ -66,16 +70,26 @@ def __init__(self, initial_model, dataset, mask, parameter): | |||
else: | |||
os.makedirs(self.save_results_directory) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the ideas we discussed in a previous meeting was to let the program directly infer if it was being run in serial or parallel mode. In fact, the suggested flag variables were what I used in the initial V2 pull request code. Do you recommend making this modification, i.e, inferring self.parallel_computation in image_deconvolution.py
or in RichardsonLucy.py
. The issue with inferring this in the image deconvolution class is - what happens when we have multiple input datasets? --> [dataset1, dataset2, ...], each dataset will have its own "sub_comm" object.
# expected count histograms | ||
self.expectation_list = self.calc_expectation_list(model = self.initial_model, dict_bkg_norm = self.dict_bkg_norm) | ||
logger.info("The expected count histograms were calculated with the initial model map.") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can undo these changes. Do you plan on moving this to Estep()
in the future / removing Estep()
altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all changes. All files except point_source_injector.ipynb are intact.
It looks like the unit tests are failing because |
Thanks @avalluvan . I haven't checked all of this yet, but about this:
However, mpi4py is a special case, because it needs to have the backend MPI installed, which I don't think you can do with pip (I used conda). One option is
|
Based on feedback that I received on version 1 of RL parallelization, I have incorporated a new setup.
RichardsonLucy.py
comm
that handles all MPI communication if a MPI descriptor is passed as an argument during initializationDataInterfaceWithParallelSupport.py
comm
objectdataset
returned by this new module works exactly the same way as the DataInterfaceDC2 module. Pass it to image_deconvolution throughImageDeconvolution.set_dataset([dataset])
histpy.Histogram
if they exist. Multiple instances of object reconstruction was required.RLparallelscript.py
mpiexec -n <number of processes> python RLparallelscript.py