Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging/overriding configs with same format in monai bundle run #5899

Closed
surajpaib opened this issue Jan 26, 2023 · 13 comments · Fixed by #7109
Closed

Merging/overriding configs with same format in monai bundle run #5899

surajpaib opened this issue Jan 26, 2023 · 13 comments · Fixed by #7109

Comments

@surajpaib
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Currently, overriding config can be done through an args_file argument pointing to a json/yaml provided to the run script. However, it only merges configs at the base level, and to override nested structures, the # format must be used.

Example,

monai.bundle.run("", **{
    "args_file": "./config2.json",
    "config_file": "./config1.json", 
    
})
```
config1.json
```
{
    "trainer":
    {
        "max_epochs": 5

    }
}
```
config2.json
```
{
    "trainer#devices": 1
}

This gives the output,

[{'_meta_': {}, 'trainer': {'max_epochs': 5, 'devices': 1}}]

However, if the config provided to args_file is in the same format (shown below) as the config_file, it does not do the merge,

```
config2.json
```
{
    "trainer":
    {
        "devices": 1

    }
}

Describe the solution you'd like
It would be beneficial to have the ability to merge these two configs when they are in the same format to allow for use cases where a default config is set to have good parameter definitions, and the user only needs to define/override specific fields.

Describe alternatives you've considered
I tried specifying two of these configs as a list in this form as well,

monai.bundle.run("", **{
    "config_file": ["./config1.json", "./config2.json"], 
    
})

But it doesn't seem to handle overriding keys, although it merges non-conflicting keys.

My temporary fix
To achieve the functionality I needed, I've changed the following,

def update(self, pairs: dict[str, Any]):

to

    def update(self, pairs: dict[str, Any]):
        """
        Set the ``id`` and the corresponding config content in pairs, see also :py:meth:`__setitem__`.
        For example, ``parser.update({"train#epoch": 100, "train#lr": 0.02})``

        Args:
            pairs: dictionary of `id` and config pairs.

        """
        for k, v in pairs.items():
            if isinstance(v, dict):
                self.update({f'{k}#{_k}': _v for _k, _v in v.items()})
                return 
                
            self[k] = v

Additional context
Ideally, it would be nice to have the ability to merge configs in the config_file list. Is there a way to do this, and am potentially missing something?

@surajpaib surajpaib changed the title Merging/overriding config in monai bundle run Merging/overriding configs with same format in monai bundle run Jan 26, 2023
@Nic-Ma
Copy link
Contributor

Nic-Ma commented Jan 27, 2023

Hi @surajpaib ,

Thanks for the feature request, we already have some recursive overriding for the args:
https://github.com/Project-MONAI/MONAI/blob/dev/monai/bundle/scripts.py#L52
Do you mean to also apply it in the config overriding?

Thanks.

@surajpaib
Copy link
Contributor Author

@Nic-Ma Yes exactly.

Among one use case, we might want to have a base config defining a run for a particular dataset and have a separate config for a new dataset. This separate config could have several changes such as a larger batch size, different patch size, more workers etc. It would be inconvenient to define all of these as args to the command and becomes less readable when using the # syntax to define it in the config

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Jan 27, 2023

Hi @surajpaib ,

If I understand correctly, for your use case, you can pass both the "initial config" and the "override config" to the "config_file" arg, it will automatically override it, refer to the example command in this README about how we run multi-gpu training: https://github.com/Project-MONAI/model-zoo/tree/dev/models/spleen_ct_segmentation.
It's also based on the "#" path for nested items, source code is: https://github.com/Project-MONAI/MONAI/blob/dev/monai/bundle/config_parser.py#L406.
The override arg in the "monai.bundle.run" was designed to quickly change config parameter in CLI command at runtime, not to support complicated (and in JSON file) overriding in the config.

Thanks.

@surajpaib
Copy link
Contributor Author

Hi @Nic-Ma,

Yes, having both the configs in config_file is precisely how I want to override it, but it seems like I have to change my config structure to include # format.

Consider the following basic config,

{
    "model1": {
        "network": {
            "_target_": "UNet",
            "spatial_dims": 3,
            "in_channels": 1,
            "out_channels": 2,
            "channels": [
                16,
                32,
                64,
                128,
                256
            ],
            "strides": [
                2,
                2,
                2,
                2             
            ],
            "num_res_units": 2,
            "norm": "batch"
        },
        "loss": {
            "_target_": "DiceCELoss",
            "to_onehot_y": true,
            "softmax": true,
            "squared_pred": true,
            "batch": true
        },
        "optimizer": {
            "_target_": "torch.optim.Adam",
            "params": "[email protected]()",
            "lr": 0.0001
        }
    },
    "model2": {
        "network": {
            "_target_": "UNet",
            "spatial_dims": 3,
            "in_channels": 2,
            "out_channels": 1,
            "channels": [
                16,
                32,
                64,
                128,
                256
            ],
            "strides": [
                2,
                2,
                2,
                2             
            ],
            "num_res_units": 2,
            "norm": "batch"
        },
        "loss": {
            "_target_": "BCELoss",
        },
        "optimizer": {
            "_target_": "torch.optim.SGD",
            "params": "[email protected]()",
            "lr": 0.001
        }
    }
}

Overriding it now would be done something like this,

{
    "model1#network#channels": [
                16,
                32,
                64,
                128,
                512
            ],
    "model2#network#channels":  [
                16,
                32,
                64,
                128,
                512
            ],
    "model1#optimizer": {
        "_target_": "torch.optim.SGD",
        "params": "[email protected]()",
        "lr": 0.01
    } ,
    "model2#optimizer": {
        "_target_": "torch.optim.Adam",
        "params": "[email protected]()",
        "lr": 0.0001
    }
}

I think this makes it more difficult for the user to read through the configs and make changes, especially when the configs get large. model1 and model2 attributes can be defined separately here, one in the first line and one in the last, making less readable. If we allow configs in the standard json structure without the # tags, it would look like this,

{
    "model1": {
        "network": {
            "channels": [
                16,
                32,
                64,
                128,
                512
            ]
        },
        "optimizer": {
            "_target_": "torch.optim.SGD",
            "params": "[email protected]()",
            "lr": 0.001
        }
    },
    "model2": {
        "network": {
            "channels": [
                16,
                32,
                64,
                128,
                512
            ]
        },
        "optimizer": {
            "_target_": "torch.optim.Adam",
            "params": "[email protected]()",
            "lr": 0.0001
        }
    }
}

Very easy to go through and edit.
Another important use-case is I want to copy over an optimizer or training config from another monai bundle configuration and override it. I have to manually change every key to ensure it follows the # structure.

Do you think it makes sense to add this capability to allow for these use-cases? Thanks in advance!

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Jan 30, 2023

Hi @surajpaib ,

Thanks for the detailed description of the fearure request.
I have 1 question about your proposal:
For example, in the current design of MONAI bundle config, if we define a model in the config A:

"network": {
    "_target_": "UNet",
    "spatial_dims": 3,
    "in_channels": 1,
    "out_channels": 2,
    "channels": [16, 32, 64, 128, 256],
    "strides": [2, 2, 2, 2],
    "num_res_units": 2,
    "norm": "batch"
}

There can be 2 kinds of cases to override it:
(1) Only override 1 parameter, and use other parameters as the original config, like:

"network#in_channels": 3

(2) Override the whole model definition with a new set of parameters, like:

"network": {
    "_target_": "UNet",
    "spatial_dims": 2,
    "in_channels": 3,
    "out_channels": 1,
    "channels": [16, 32, 64, 128, 256]
}

Then all the "network" config item will be overridden, no need to care about the previous parameters anymore.
So in summary, the "#" path can help to identify the "start position" to override.
How can we distinguish these 2 cases in your proposal?

Thanks in advance.

@surajpaib
Copy link
Contributor Author

Hi @Nic-Ma ,

For cases (1) and (2), the format would be very similar indeed.
So for 1,

"network": {
        "in_channels": 3
}

However, only parameters that are defined should be overridden, both for cases (1) and (2). Having (1) and (2) consistent for the user with the logic that what is defined is overridden makes it easier to understand, in my opinion. Functionally, they both operate very similarly, so I wonder if there is a need to define a different syntax for them?

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Feb 2, 2023

Hi @surajpaib ,

I still feel it's necessary to have "#" path to help define the "start point" of overriding in the nested structure, let me give you another example:
If we have the below transform chain list in the config:

{
  "train": {
      "transforms": [
          {
              "_target_": "LoadImaged",
              "keys": [
                  "image",
                  "label"
              ]
          },
          {
              "_target_": "EnsureChannelFirstd",
              "keys": [
                  "image",
                  "label"
              ]
          },
          {
              "_target_": "Orientationd",
              "keys": [
                  "image",
                  "label"
              ],
              "axcodes": "RAS"
          },
          {
              "_target_": "Spacingd",
              "keys": [
                  "image",
                  "label"
              ],
              "pixdim": [
                  1.5,
                  1.5,
                  2.0
              ],
              "mode": [
                  "bilinear",
                  "nearest"
              ]
          },
          {
              "_target_": "ScaleIntensityRanged",
              "keys": "image",
              "a_min": -57,
              "a_max": 164,
              "b_min": 0,
              "b_max": 1,
              "clip": true
          },
          {
              "_target_": "EnsureTyped",
              "keys": [
                  "image",
                  "label"
              ]
          },
          {
              "_target_": "RandCropByPosNegLabeld",
              "keys": [
                  "image",
                  "label"
              ],
              "label_key": "label",
              "spatial_size": [
                  96,
                  96,
                  96
              ],
              "pos": 1,
              "neg": 1,
              "num_samples": 4,
              "image_key": "image",
              "image_threshold": 0
          }
      ]
   }
}

If I only want to override the pixdim of Spacingd transform from "[1.5, 1.5, 2.0]" to "[1.5, 1.5, 1.5]", in the current design, we only need 1 line: "train#transforms#3#pixdim#2": 1.5
How to achieve it in your proposal?

Thanks in advance.

@surajpaib
Copy link
Contributor Author

Hi @Nic-Ma,

What you pointed out makes perfect sense. There is no need to remove the current behavior, which is very useful for precisely the case you outlined above. But adding support to have it in a standard config format identical to the base format can also be very useful too.

I think the # format is far superior when it comes to overriding a few elements but the standard format becomes useful when overriding a large number of elements. Also very useful when we want to copy elements from different base configs and put together a new config.

Is there a possibility of supporting both?

Thanks!

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Feb 3, 2023

Hi @wyli @ericspod ,

Based on the above discussion, do you think it's necessary to support both current "#" based overriding and this nested overriding requirement? I personally feel it would be better to keep the user experience simple only in 1 way.
Anyway, open for discussion.

Thanks.

@ericspod
Copy link
Member

ericspod commented Feb 6, 2023

Would I summarise the issue as that overriding values in a second config file does work as needed if using the # operator and some nesting where needed, but that for large or deep replacements this can be unwieldy? The solution here is to allow a sparse structure to be defined which is merged with the original, replacing or adding values in places. My issue with that is this sparse structure looks a lot like any other that represents a full definition rather than an aspect of one.

wyli added a commit to wyli/MONAI that referenced this issue Oct 10, 2023
@wyli
Copy link
Contributor

wyli commented Oct 10, 2023

this feature could be easily added, so I'm including it f2ba40c

wyli added a commit that referenced this issue Oct 10, 2023
Fixes #6387
Fixes #5899

### Description
- add api for update_kwargs
- add support of merging multiple configs files and dictionaries
- remove warning message of directory in `runtests.sh`

### Types of changes
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [ ] New tests added to cover the changes.
- [ ] Integration tests passed locally by running `./runtests.sh -f -u
--net --coverage`.
- [ ] Quick tests passed locally by running `./runtests.sh --quick
--unittests --disttests`.
- [ ] In-line docstrings updated.
- [ ] Documentation updated, tested `make html` command in the `docs/`
folder.

---------

Signed-off-by: Wenqi Li <[email protected]>
@surajpaib
Copy link
Contributor Author

This is great! Thanks @wyli

@wyli
Copy link
Contributor

wyli commented Oct 11, 2023

for the record it's implemented at the args_file input via monai.bundle.update_kwargs:

given config2.json

{
    "trainer":
    {
        "devices": 1

    }
}

config1.json

{
    "trainer":
    {
        "max_epochs": 5

    }
}
python -m monai.bundle run trainer --args_file="['config1.json','config2.json']" --config_file="config1.json"
2023-10-11 18:04:08,097 - INFO - --- input summary of monai.bundle.scripts.run ---
2023-10-11 18:04:08,097 - INFO - > trainer: {'devices': 1, 'max_epochs': 5}
2023-10-11 18:04:08,097 - INFO - > config_file: 'config1.json'
2023-10-11 18:04:08,097 - INFO - > run_id: 'trainer'
2023-10-11 18:04:08,097 - INFO - ---

the merged config will be trainer: {'devices': 1, 'max_epochs': 5}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants