Multiple fixes on benchmark ensembling problems #6414

heyufan1995 · 2023-04-21T14:33:30Z

Fixes # .

Description

Fixed the problem with import_bundle_history with algo trained outside autorunner. If outside autorunner, the algo_object.pkl will not have score meta, and import_bundle_history will not recognize the algo as trained. Changed that to read progress.yaml.

Fixed the OOM problem during ensembling. Move to CPU if OOM. Also do not append prediction tensors to list and return. Save each predictions separately and return the save path.

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
In-line docstrings updated.
Documentation updated, tested make html command in the docs/ folder.

wyli · 2023-04-21T16:16:30Z

/build

wyli · 2023-04-21T16:16:56Z

/integration-test
/black

Signed-off-by: heyufan1995 <[email protected]>

mingxin-zheng · 2023-04-22T06:16:05Z

Hi @heyufan1995 , the original behavior was (somewhat) looking for "best_metrics" in the first version of design, and then changed in the skip algo train PR:
#6290
Please double check if it won't make conflicts. Thanks!

wyli · 2023-04-22T06:49:09Z

Fyi the current pr doesn't pass the integration test https://github.com/Project-MONAI/MONAI/actions/runs/4766813105/jobs/8474307100

heyufan1995 · 2023-04-22T12:32:33Z

@mingxin-zheng I checked the current monai dev branch, the 'trained" algo should have AlgoKeys.SCORE value in the algorithm pickle file, or else it's untrained. I added a logic here, if AlgoKeys.SCORE is not in algorithm pickle (which is the case if the training is done outside autorunner), use algo.get_score to read from progress.yaml. If there is progress.yaml with a score, then consider it as trained. So I don't think there is a conflict with skipping algo. But the risk is if an algo is trained for some epoch and had validation score, but the training somehow failed, this algo will still be considered "trained". So I think one way is to write a FINISH flag file directly from algo.train, not by setting a score in the pickle file after training in autorunner._train_algo_in_sequence

heyufan1995 · 2023-04-22T12:40:22Z

@wyli I looked at the test results, it says "

File "/__w/MONAI/MONAI/tests/test_auto3dseg_hpo.py", line 182, in test_get_history
assert len(history) == 1 error

But in this PR I changed this line to "assert len(history) == 3" to avoid this assertion error.

Signed-off-by: monai-bot <[email protected]>

mingxin-zheng · 2023-04-22T13:45:17Z

@mingxin-zheng I checked the current monai dev branch, the 'trained" algo should have AlgoKeys.SCORE value in the algorithm pickle file, or else it's untrained. I added a logic here, if AlgoKeys.SCORE is not in algorithm pickle (which is the case if the training is done outside autorunner), use algo.get_score to read from progress.yaml. If there is progress.yaml with a score, then consider it as trained. So I don't think there is a conflict with skipping algo. But the risk is if an algo is trained for some epoch and had validation score, but the training somehow failed, this algo will still be considered "trained". So I think one way is to write a FINISH flag file directly from algo.train, not by setting a score in the pickle file after training in autorunner._train_algo_in_sequence

Do you suggest writing it to progress.yaml file @heyufan1995 ?

wyli · 2023-04-22T14:30:27Z

/build

wyli

Integration test: https://github.com/Project-MONAI/MONAI/actions/runs/4772828611

heyufan1995 added this to the Auto3DSeg enhancement [P0 v1.2] milestone Apr 21, 2023

heyufan1995 requested review from wyli and mingxin-zheng April 21, 2023 14:33

wyli approved these changes Apr 21, 2023

View reviewed changes

heyufan1995 added 4 commits April 21, 2023 23:09

Ensemble issue fix

efc48a6

Signed-off-by: heyufan1995 <[email protected]>

Change Algo ensemble call to return save path

587d0db

Signed-off-by: heyufan1995 <[email protected]>

Add even_divisible to partition dataset to avoid hang

fff5294

Signed-off-by: heyufan1995 <[email protected]>

Change unit test cases

b643c82

Signed-off-by: heyufan1995 <[email protected]>

heyufan1995 force-pushed the local-dev branch from d40fb30 to b643c82 Compare April 22, 2023 03:10

[MONAI] code formatting

fe135b5

Signed-off-by: monai-bot <[email protected]>

wyli approved these changes Apr 22, 2023

View reviewed changes

wyli merged commit 3d16a6e into Project-MONAI:dev Apr 22, 2023

mingxin-zheng mentioned this pull request Mar 27, 2024

Fix bundle_root for NNIGen #7586

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple fixes on benchmark ensembling problems #6414

Multiple fixes on benchmark ensembling problems #6414

heyufan1995 commented Apr 21, 2023

wyli commented Apr 21, 2023

wyli commented Apr 21, 2023 •

edited

Loading

mingxin-zheng commented Apr 22, 2023 •

edited

Loading

wyli commented Apr 22, 2023

heyufan1995 commented Apr 22, 2023

heyufan1995 commented Apr 22, 2023

mingxin-zheng commented Apr 22, 2023 •

edited

Loading

wyli commented Apr 22, 2023

wyli left a comment

Multiple fixes on benchmark ensembling problems #6414

Multiple fixes on benchmark ensembling problems #6414

Conversation

heyufan1995 commented Apr 21, 2023

Description

Types of changes

wyli commented Apr 21, 2023

wyli commented Apr 21, 2023 • edited Loading

mingxin-zheng commented Apr 22, 2023 • edited Loading

wyli commented Apr 22, 2023

heyufan1995 commented Apr 22, 2023

heyufan1995 commented Apr 22, 2023

mingxin-zheng commented Apr 22, 2023 • edited Loading

wyli commented Apr 22, 2023

wyli left a comment

Choose a reason for hiding this comment

wyli commented Apr 21, 2023 •

edited

Loading

mingxin-zheng commented Apr 22, 2023 •

edited

Loading

mingxin-zheng commented Apr 22, 2023 •

edited

Loading