Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing docker build for 1.0 #72

Merged
merged 44 commits into from
Oct 13, 2023
Merged

Conversation

sarthakpati
Copy link
Member

@sarthakpati sarthakpati commented Aug 21, 2023

Fixes issue #(Enter issue number here)

Proposed Changes

  • completely Docker-compatible build of the FeTS Tool 1.0
  • Docker recipe is trying to replicate the linux installer script

@sarthakpati sarthakpati marked this pull request as draft August 21, 2023 13:50
@hasan7n
Copy link
Contributor

hasan7n commented Aug 24, 2023

@sarthakpati Error when running CLI Segment on sample data:

Traceback (most recent call last):
  File "/Front-End/bin/install/appdir/usr/bin//OpenFederatedLearning//submodules/fets_ai/Algorithms/fets/bin/brainmage_validation_scores_to_disk.py", line 20, in <module>
    from fets.models.pytorch.brainmage.brainmage import BrainMaGeModel
  File "/Front-End/bin/install/appdir/usr/bin/OpenFederatedLearning/venv/lib/python3.7/site-packages/fets/models/pytorch/brainmage/__init__.py", line 1, in <module>
    from .brainmage import BrainMaGeModel
  File "/Front-End/bin/install/appdir/usr/bin/OpenFederatedLearning/venv/lib/python3.7/site-packages/fets/models/pytorch/brainmage/brainmage.py", line 41, in <module>
    from openfl.models.pytorch import PyTorchFLModel
ModuleNotFoundError: No module named 'openfl.models.pytorch'
Using the following directory as logging directory: /data_renamed/logs
Starting subject directory iteration...
= Starting inference for subject: BraTS-GLI-00015-000
== Starting inference using FeTS Singlet Consensus model...
WARNING: The singlet model '52' did not run, please contact [email protected] with this error.
(I removed the rest as they are repetitive)

Hoping the following helps: inside the container, I saw that the following folder exists:

/Front-End/bin/install/appdir/usr/bin/OpenFederatedLearning/openfl/models/pytorch.

However, I saw that this

/Front-End/bin/install/appdir/usr/bin/OpenFederatedLearning/venv/lib/python3.7/site-packages/openfl/models

does not contain a pytorch folder. It contains a folder named dummy and three files: __init__.py, flmodel.py, and inference_only_model_wrapper.py
I wanted to look further but I think you are already more familiar. Thanks!

@hasan7n
Copy link
Contributor

hasan7n commented Aug 24, 2023

@sarthakpati
BTW I want to share as well every command I am running to build the docker image (in case I am missing something, plz let me know) and run the command

git clone --branch fixing_docker_build_for_1.0 https://github.com/FeTS-AI/Front-End
cd Front-End
git submodule sync
git -c protocol.version=2 submodule update --init --force --depth=1
docker build -t local/fets-tool:1.0.3 .

Then, inside the container, I am running:
FeTS_CLI_Segment -d /data_renamed -a fets_singlet,fets_triplet -lF STAPLE,ITKVoting,SIMPLE,MajorityVoting -g 0 -t 0

where data_renamed is some data I copied to the container:

root@8d1baf9a64c9:/Front-End# tree /data_renamed/
/data_renamed/
|-- BraTS-GLI-00001-001
|   |-- BraTS-GLI-00001-001_flair.nii.gz
|   |-- BraTS-GLI-00001-001_t1.nii.gz
|   |-- BraTS-GLI-00001-001_t1ce.nii.gz
|   `-- BraTS-GLI-00001-001_t2.nii.gz
`-- BraTS-GLI-00015-000
    |-- BraTS-GLI-00015-000_flair.nii.gz
    |-- BraTS-GLI-00015-000_t1.nii.gz
    |-- BraTS-GLI-00015-000_t1ce.nii.gz
    `-- BraTS-GLI-00015-000_t2.nii.gz

2 directories, 8 files

@sarthakpati
Copy link
Member Author

/Front-End/bin/install/appdir/usr/bin/OpenFederatedLearning/venv/lib/python3.7/site-packages/openfl/models

This not containing the pytorch folder is confounding to me. By my estimate, this command [ref] should have ensured that the pytorch models in the Algorithms submodule would have gotten installed.

Perhaps @brandon-edwards can help?

@brandon-edwards
Copy link

I would have thought it would be here or the line below when the openfl.pytorch module would have been installed. I will stay with this discussion to help where I can, but I am not the one to get this resolved quickly. @msheller do you have any insight here?

@sarthakpati
Copy link
Member Author

@hasan7n can you try with the latest commit?

@hasan7n
Copy link
Contributor

hasan7n commented Aug 24, 2023

@sarthakpati Thanks this resolves the problem.
Now I am getting this:

Using the following directory as logging directory: /data_renamed/logs
Starting subject directory iteration...
= Starting inference for subject: BraTS-GLI-00015-000
== Starting inference using FeTS Singlet Consensus model...
usage: brainmage_validation_scores_to_disk.py [-h] --data_path DATA_PATH
                                              --plan_path PLAN_PATH
                                              --model_weights_path_wt
                                              MODEL_WEIGHTS_PATH_WT
                                              --model_weights_path_et
                                              MODEL_WEIGHTS_PATH_ET
                                              --model_weights_path_tc
                                              MODEL_WEIGHTS_PATH_TC
                                              --output_pardir OUTPUT_PARDIR
                                              [--model_output_tag MODEL_OUTPUT_TAG]
                                              [--device DEVICE]
                                              [--process_training_data]
brainmage_validation_scores_to_disk.py: error: unrecognized arguments: -md cpu
WARNING: The singlet model '52' did not run, please contact [email protected] with this error.

@sarthakpati
Copy link
Member Author

Thanks. For some reason, the image on my local machine is picking up a different version of script. Anyway, pushed a fix.

@hasan7n
Copy link
Contributor

hasan7n commented Aug 24, 2023

@sarthakpati The fix will indeed resolve it but I am sure that a sequence of errors will happen after this (I did some hotfixes inside the container but more and more errors are raised). I suggest we understand why it works for you, perhaps I am initializing the repository in an incorrect manner?

One of the things I want to know: so your local machine is picking a brainmage_validation_scores_to_disk.py that allows -md ? or is it picking a CLI_SEGMENT that uses --device for calling brainmage_validation_scores_to_disk?

If the case is the former, could you please point me to the commit hash where brainmage_validation_scores_to_disk allows for -md? I searched the submodules git history multiple times and couldn't find a version like this.

Thanks!

@sarthakpati
Copy link
Member Author

Hmm, that's what I am trying to figure out right now. I'll get back to you.

sarthakpati and others added 6 commits August 24, 2023 14:20
- remove `-ptd` to load inference_loader instead of trainval_loader
- rename paths for output files as expected by fusion code
@sarthakpati sarthakpati marked this pull request as ready for review October 12, 2023 16:02
@sarthakpati
Copy link
Member Author

hey @hasan7n, is this operational and ready to merge?

@sarthakpati sarthakpati merged commit 70d94d8 into master Oct 13, 2023
@sarthakpati sarthakpati deleted the fixing_docker_build_for_1.0 branch October 13, 2023 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants