[Question] Dynamic parsing without .parse_known_args() #133

Schinkikami · 2022-03-21T14:54:23Z

Hello, I love your library, however, I struggle with your design decision of removing .parse_known_args() as a function from the parser.

My current setup (in simplified pseudocode form) is like this:


class A(...):
    def add_args(partent_parser):
        parent_parser.add_argument(--sub_op0, ...)
        parent_parser.add_argument(--sub_op1, ...)
    def __init__(op0, op1):
         ....
         
class B(...):
    def add_args(partent_parser):
        parent_parser.add_argument(--sub_op0, ...)
        parent_parser.add_argument(--sub_op2, ...)
        parent_parser.add_argument(--sub_op3, ...)
    def __init__(op0, op2, op3):
         ....
        
        
def main():
    parser = argparse.Argparse()

    parser.add_argument("--op4")
    parser.add_argument("--sub_select", choices= ["A", "B"])
    
    args, _ = parser.parse_known_args()
    
    if args.sub_select == "A":
         A.add_args(parser)
    else:
        B.add_args(parser)
        
     args = parser.parse_args()

This could now be called with python bla.py --op4 --sub_select A --sub_op0 --sub_op1.
As the fix with the underscores was really ugly, I was now hoping, that I could easily restructure this so sub_opX could be now specified by --sub.opX (with nested dicts after parse_args()).
So a call should now look like this
python bla.py --sub_select A --sub.op0 --sub.op1
or with something like subcommands
python bla.py --sub_select A --op0 --op1

How can I achieve such a behavior without having access to .parse_known_args()?
As far as I understand with subcommands all I could achieve would be something like this
python bla.py --op4 A --op0 --op1

However, I would like this dynamic parsing to depend on the evaluated value of the argument --sub_select and not on a singleton argument A, which could stand somewhere in a list of arguments and be easily overlooked.

My inspiration for this is the PyTorch Lightning Guideline.

Thanks

The text was updated successfully, but these errors were encountered:

mauvilsa · 2022-03-21T17:53:58Z

My inspiration for this is the PyTorch Lightning Guideline.

Would be nicer if you took inspiration from LightningCLI than from the old way that used plain argparse. In particular I don't like the def add_args(parent_parser): methods. With jsonargparse the idea is that the parsers are easily created from the class signature, not by having to implement a method in the respective class to add its arguments.

However, I would like this dynamic parsing to depend on the evaluated value of the argument --sub_select and not on a singleton argument A, which could stand somewhere in a list of arguments and be easily overlooked.

This is what subcommands are for, and I don't really get your concern. Subcommands are a common approach for many command line tools, so much that it will be more likely that your cli's users will require less effort to learn how to use it if you follow the subcommand approach than doing something different. To me seems an unfounded concern to think that the subcommand will be overlooked. This mostly depends on your design and what names you give to the subcommands. Take for example git, which works like git [global_options] subcommand_name [subcommand_options]. If you see a command such as git --git-dir=../git commit -a -m "My commit message" it is quite clear which is the subcommand and why --git-dir goes before the subcommand and the other options after. Being required to do git ... --select commit ... would make it a worse design.

If you do use subcommands, your example with jsonargparse would be convenient to structure as follows:

from jsonargparse import ArgumentParser

class A:
    def __init__(self, op0: int = 1, op1: str = 'op1 default'):
         ...

class B:
    def __init__(self, op0: float = 0.2, op2: int = 3, op3: bool = False):
         ...

def main():
    parser_a = ArgumentParser()
    parser_a.add_class_arguments(A)

    parser_b = ArgumentParser()
    parser_b.add_class_arguments(B)

    parser = ArgumentParser()
    parser.add_argument("--gop1")
    subcommands = parser.add_subcommands()
    subcommands.add_subcommand('select_A', parser_a)
    subcommands.add_subcommand('select_B', parser_b)

    config = parser.parse_args()

Doing this, then you would run it for example as python cli.py --gop1=x select_B --op2=5 .... Note that by adding type hints to your parameters, the values are validated accordingly. If additionally you add docstrings with good descriptions of the parameters, then you get for free a nice output for help commands, e.g. python cli.py --help and python cli.py select_A --help.

Schinkikami · 2022-03-24T09:06:03Z

Thank you for your very extensive reply! You are completely right on the add_argsparse_arguments() methods, I also planned to remove them.

In my case, the classes are two nn.Modules (let's call them ModelA and ModelB). I would now like to select the correct model with --model ModelA or --model ModelB, as I think this makes the calls more clear and not just writing "ModelA" as a singleton. In your example with the git commands you are definitely right, but in my case I would prefer to dynamically select the correct model with --model ModelA and then be able to further configure the model by --model.layers X --model.ensemble_size Y.

Looking through the LightningCLI it seems possible there, so is my best bet directly switching to LightningCLI, or is this also possible in pure jsonargparse?

Thanks

mauvilsa · 2022-03-24T09:24:42Z

Without subcommands what you want is not possible. If which arguments will be available depend on the value of one argument has, then the cli.py --help is not able to show which are all the arguments possible. That is why the way to do this is different.

If you are using pytorch-lightning for certain better if you use LightningCLI. There the selection of models is not through subcommands. How to achieve this with jsonargparse is a bit more complex. But with LightningCLI this is already implemented, so it will be less effort for you.

psirenny · 2022-07-23T16:25:01Z

I know this issue is closed, but I have a related question. I'm creating a CLI application for a reinforcement learning model that accepts an arbitrary list of policies/models. I'd like for the script to be compatible with AWS SageMaker Hyperparameter tuning.

I considered a script that accepted arguments like this:

train.py \
  --policy_ids='["red", "blue"]' \
  --policies.hidden_layer_size='{"red": 128, "blue": 512}' \ # Tunable parameters

train.py \
  --policy_ids='["alien", "predator", "marine"]' \
  --policies.hidden_layer_size='{"alien": 128, "predator": 512, "marine": 256}' \ # Tunable parameters

But the above doesn't work because SageMaker tunable parameters can't output JSON. It can only output integers, floats, and categorical strings. Given that constraint, I considered a script like this instead:

train.py \
  --policy_ids='["red", "blue"]' \
  --red_policy.hidden_layer_size=128  \
  --blue_policy.hidden_layer_size=512

train.py \
  --policy_ids='["alien", "predator", "marine"]' \
  --alien_policy.hidden_layer_size=128 \
  --predator_policy.hidden_layer_size=512 \
  --marine_policy.hidden_layer_size=256

But this requires an initial parser.parse_known_args() to add the arguments dynamically. Finally, I considered creating 10 static policy arguments like this:

train.py \
  --policy_ids='["red", "blue"]' \
  --policy_0.hidden_layer_size=128 \
  --policy_1.hidden_layer_size=512 \

train.py \
  --policy_ids='["alien", "predator", "marine"]' \
  --policy_0.hidden_layer_size=128 \
  --policy_1.hidden_layer_size=512 \
  --policy_2.hidden_layer_size=256

But this is suboptimal. It requires allocating a fixed number of arguments. It also pollutes the help text and requires a lot of boilerplate to convert the hardcoded arguments into a list.

I was wondering if you have any suggestions (other than abandoning SageMaker 😆).

mauvilsa · 2022-07-24T16:08:13Z

@psirenny I haven't used AWS SageMaker Hyperparameter tuning. Can you please explain a bit how it works. You give it a template of the command it should run?

psirenny · 2022-07-24T20:57:45Z

It takes configuration for a tuning job like this:

sagemaker_client.create_hyper_parameter_tuning_job(
    HyperParameterTuningJobName = "red_blue_tuning_job",
    HyperParameterTuningJobConfig = {
        "HyperParameterTuningJobObjective": {
            "MetricName": "episode_reward_mean",
            "Type": "Maximize"
        },
        "ParameterRanges": {
            "IntegerParameterRanges": [
                "Name": "red_policy.hidden_layer_size",
                "MinValue": 32,
                "MaxValue": 2048,
            ],
            "IntegerParameterRanges": [
                "Name": "blue_policy.hidden_layer_size",
                "MinValue": 32,
                "MaxValue": 2048,
            ],
        },
        "ResourceLimits": {
            "MaxNumberOfTrainingJobs": 100,
            "MaxParallelTrainingJobs": 5,
        },
        "Strategy": "Bayesian",
    },
    TrainingJobDefinition = {
        "AlgorithmSpecification": {
            "TrainingImage": docker_image,
            "TrainingInputMode": "File"
        },
        "ResourceConfig": {
            "InstanceCount": 1,
            "InstanceType": "ml.c4.2xlarge",
        },
        "StaticHyperParameters": {
            "batch_size": 2000,
            "learning_rate": 0.00005,
            "policy_ids": "'[\"red\", \"blue\"]'",
        },
        "StoppingCondition": {
            "MaxRuntimeInSeconds": 60 * 60
        },
        # …
    },
)

and spawns multiple SageMaker training jobs from that configuration. Each training job invokes a dockerized CLI program with arguments serialized from the above parameters. Kind of like this:

docker run <docker_image> train \
  --batch_size=2000 \
  --learning_rate=0.00005 \
  --policy_ids='["red", "blue"]' \
  --red_policy.hidden_layer_size=49 \ # parameter being tuned
  --blue_policy.hidden_layer_size=63  # parameter being tuned

I suppose the real issue is in the way that SageMaker training jobs serialize parameters to CLI arguments.

As I type this, it occurs to me that SageMaker also writes parameters to a JSON file. That JSON file is available to the docker container. Perhaps the best way forward is to create a CLI that loads the JSON file instead.

mauvilsa · 2022-07-25T16:16:16Z

Several features have been added to allow configuring complex type hints with individual command line arguments, for example List append and Class type and sub-classes#command-line. There is a similar pending feature, to add individual items to an argument with type dict or Dict[<key-type>, <value-type>]. This would resolve your issue, by being able to run the command like:

train.py \
  --policy_ids='["red", "blue"]' \
  --policy_hidden_layer_size.red=128 \
  --policy_hidden_layer_size.blue=512

mauvilsa · 2022-07-27T04:58:26Z

I have implemented (commit 2397f90) the support for individual dict items from command line, see docs-latest#dict-items. This is not yet released but you could try it out if you install from source.

mauvilsa added the question Further information is requested label Mar 21, 2022

Schinkikami closed this as completed Mar 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Dynamic parsing without .parse_known_args() #133

[Question] Dynamic parsing without .parse_known_args() #133

Schinkikami commented Mar 21, 2022

mauvilsa commented Mar 21, 2022

Schinkikami commented Mar 24, 2022

mauvilsa commented Mar 24, 2022

psirenny commented Jul 23, 2022 •

edited

Loading

mauvilsa commented Jul 24, 2022

psirenny commented Jul 24, 2022

mauvilsa commented Jul 25, 2022

mauvilsa commented Jul 27, 2022

[Question] Dynamic parsing without .parse_known_args() #133

[Question] Dynamic parsing without .parse_known_args() #133

Comments

Schinkikami commented Mar 21, 2022

mauvilsa commented Mar 21, 2022

Schinkikami commented Mar 24, 2022

mauvilsa commented Mar 24, 2022

psirenny commented Jul 23, 2022 • edited Loading

mauvilsa commented Jul 24, 2022

psirenny commented Jul 24, 2022

mauvilsa commented Jul 25, 2022

mauvilsa commented Jul 27, 2022

psirenny commented Jul 23, 2022 •

edited

Loading