Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple fixes on benchmark ensembling problems #6414

Merged
merged 5 commits into from
Apr 22, 2023

Conversation

heyufan1995
Copy link
Member

Fixes # .

Description

Fixed the problem with import_bundle_history with algo trained outside autorunner. If outside autorunner, the algo_object.pkl will not have score meta, and import_bundle_history will not recognize the algo as trained. Changed that to read progress.yaml.

Fixed the OOM problem during ensembling. Move to CPU if OOM. Also do not append prediction tensors to list and return. Save each predictions separately and return the save path.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

@wyli
Copy link
Contributor

wyli commented Apr 21, 2023

/build

@wyli
Copy link
Contributor

wyli commented Apr 21, 2023

/integration-test
/black

@mingxin-zheng
Copy link
Contributor

mingxin-zheng commented Apr 22, 2023

Hi @heyufan1995 , the original behavior was (somewhat) looking for "best_metrics" in the first version of design, and then changed in the skip algo train PR:
#6290
Please double check if it won't make conflicts. Thanks!

@wyli
Copy link
Contributor

wyli commented Apr 22, 2023

Fyi the current pr doesn't pass the integration test https://github.com/Project-MONAI/MONAI/actions/runs/4766813105/jobs/8474307100

@heyufan1995
Copy link
Member Author

@mingxin-zheng I checked the current monai dev branch, the 'trained" algo should have AlgoKeys.SCORE value in the algorithm pickle file, or else it's untrained. I added a logic here, if AlgoKeys.SCORE is not in algorithm pickle (which is the case if the training is done outside autorunner), use algo.get_score to read from progress.yaml. If there is progress.yaml with a score, then consider it as trained. So I don't think there is a conflict with skipping algo. But the risk is if an algo is trained for some epoch and had validation score, but the training somehow failed, this algo will still be considered "trained". So I think one way is to write a FINISH flag file directly from algo.train, not by setting a score in the pickle file after training in autorunner._train_algo_in_sequence

@heyufan1995
Copy link
Member Author

@wyli I looked at the test results, it says "

File "/__w/MONAI/MONAI/tests/test_auto3dseg_hpo.py", line 182, in test_get_history
assert len(history) == 1 error

But in this PR I changed this line to "assert len(history) == 3" to avoid this assertion error.

Signed-off-by: monai-bot <[email protected]>
@mingxin-zheng
Copy link
Contributor

mingxin-zheng commented Apr 22, 2023

@mingxin-zheng I checked the current monai dev branch, the 'trained" algo should have AlgoKeys.SCORE value in the algorithm pickle file, or else it's untrained. I added a logic here, if AlgoKeys.SCORE is not in algorithm pickle (which is the case if the training is done outside autorunner), use algo.get_score to read from progress.yaml. If there is progress.yaml with a score, then consider it as trained. So I don't think there is a conflict with skipping algo. But the risk is if an algo is trained for some epoch and had validation score, but the training somehow failed, this algo will still be considered "trained". So I think one way is to write a FINISH flag file directly from algo.train, not by setting a score in the pickle file after training in autorunner._train_algo_in_sequence

Do you suggest writing it to progress.yaml file @heyufan1995 ?

@wyli
Copy link
Contributor

wyli commented Apr 22, 2023

/build

Copy link
Contributor

@wyli wyli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wyli wyli merged commit 3d16a6e into Project-MONAI:dev Apr 22, 2023
@mingxin-zheng mingxin-zheng mentioned this pull request Mar 27, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants