Auto3DSeg AutoRunner HPO training would save model checkpoint to wrong locations #7585

mingxin-zheng · 2024-03-27T03:20:19Z

Describe the bug

When the NNI HPO process starts, it will first use generate to generate templated algo with the new hyper-parameters and then run_algo with NNI to do algo.train(). However, because the generated algo doesn't update bundle_root, and the model checkpoints path is saved relative to the bundle_root config, the model checkpoints will appear in wrong locations.

To Reproduce

import os
import tempfile

from monai.bundle.config_parser import ConfigParser
from monai.apps import download_and_extract

from monai.apps.auto3dseg import AutoRunner

if __name__ == '__main__':
    directory = "./build"
    root_dir = tempfile.mkdtemp() if directory is None else directory
    print(root_dir)

    msd_task = "Task04_Hippocampus"
    resource = "https://msd-for-monai.s3-us-west-2.amazonaws.com/" + msd_task + ".tar"

    compressed_file = os.path.join(root_dir, msd_task + ".tar")


    dataroot = os.path.join(root_dir, msd_task)
    if not os.path.exists(dataroot):
        download_and_extract(resource, compressed_file, root_dir)

    datalist_file = os.path.join("../tutorials/auto3dseg/tasks", "msd", msd_task, "msd_" + msd_task.lower() + "_folds.json")

    input_cfg = {
        "name": msd_task,  # optional, it is only for your own record
        "task": "segmentation",  # optional, it is only for your own record
        "modality": "MRI",  # required
        "datalist": datalist_file,  # required
        "dataroot": dataroot,  # required
    }
    input = "./input.yaml"
    ConfigParser.export_config_file(input_cfg, input)


    runner = AutoRunner(work_dir="./work_dir", algos=('swinunetr', 'segresnet'), input=input, analyze=False, hpo=True, ensemble=False)
    num_epoch = 2
    hpo_params = {
        "maxTrialNumber": 20,
        "maxExperimentDuration": "30m",
        "num_epochs_per_validation": 1,
        "num_images_per_batch": 1,
        "num_epochs": 2,
        "num_warmup_epochs": 1,
        "training#num_epochs": 2,
        "training#num_epochs_per_validation": 1,
        "searching#num_epochs": 2,
        "searching#num_epochs_per_validation": 1,
        "searching#num_warmup_epochs": 1,
        "training#auto_scale_allowed": False,  # new
        "auto_scale_allowed": False,  # new
    }
    search_space = {"learning_rate": {"_type": "choice", "_value": [0.0001, 0.01]}}
    runner.set_num_fold(num_fold=1)
    runner.set_hpo_params(params=hpo_params)
    runner.set_nni_search_space(search_space)
    runner.run()

Expected behavior

Model checkpoint saved correctly.

The text was updated successfully, but these errors were encountered:

Fixes #7585 . ### Description Because the NNI test takes too much time, the previous behavior did not get caught with the dry-run mode of HPO Gen ### Types of changes  - [x] Non-breaking change (fix or new feature that would not break existing functionality). --------- Signed-off-by: Mingxin Zheng <[email protected]> Co-authored-by: YunLiu <[email protected]>

Fixes Project-MONAI#7585 . ### Description Because the NNI test takes too much time, the previous behavior did not get caught with the dry-run mode of HPO Gen ### Types of changes  - [x] Non-breaking change (fix or new feature that would not break existing functionality). --------- Signed-off-by: Mingxin Zheng <[email protected]> Co-authored-by: YunLiu <[email protected]> Signed-off-by: Yu0610 <[email protected]>

mingxin-zheng self-assigned this Mar 27, 2024

mingxin-zheng mentioned this issue Mar 27, 2024

Fix bundle_root for NNIGen #7586

Merged

1 task

KumoLiu closed this as completed in #7586 Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto3DSeg AutoRunner HPO training would save model checkpoint to wrong locations #7585

Auto3DSeg AutoRunner HPO training would save model checkpoint to wrong locations #7585

mingxin-zheng commented Mar 27, 2024

Auto3DSeg AutoRunner HPO training would save model checkpoint to wrong locations #7585

Auto3DSeg AutoRunner HPO training would save model checkpoint to wrong locations #7585

Comments

mingxin-zheng commented Mar 27, 2024