Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Dynamic parsing without .parse_known_args() #133

Closed
Schinkikami opened this issue Mar 21, 2022 · 8 comments
Closed

[Question] Dynamic parsing without .parse_known_args() #133

Schinkikami opened this issue Mar 21, 2022 · 8 comments
Labels
question Further information is requested

Comments

@Schinkikami
Copy link

Hello, I love your library, however, I struggle with your design decision of removing .parse_known_args() as a function from the parser.

My current setup (in simplified pseudocode form) is like this:


class A(...):
    def add_args(partent_parser):
        parent_parser.add_argument(--sub_op0, ...)
        parent_parser.add_argument(--sub_op1, ...)
    def __init__(op0, op1):
         ....
         
class B(...):
    def add_args(partent_parser):
        parent_parser.add_argument(--sub_op0, ...)
        parent_parser.add_argument(--sub_op2, ...)
        parent_parser.add_argument(--sub_op3, ...)
    def __init__(op0, op2, op3):
         ....
        
        
def main():
    parser = argparse.Argparse()

    parser.add_argument("--op4")
    parser.add_argument("--sub_select", choices= ["A", "B"])
    
    args, _ = parser.parse_known_args()
    
    if args.sub_select == "A":
         A.add_args(parser)
    else:
        B.add_args(parser)
        
     args = parser.parse_args()
     
    

This could now be called with python bla.py --op4 --sub_select A --sub_op0 --sub_op1.
As the fix with the underscores was really ugly, I was now hoping, that I could easily restructure this so sub_opX could be now specified by --sub.opX (with nested dicts after parse_args()).
So a call should now look like this
python bla.py --sub_select A --sub.op0 --sub.op1
or with something like subcommands
python bla.py --sub_select A --op0 --op1

How can I achieve such a behavior without having access to .parse_known_args()?
As far as I understand with subcommands all I could achieve would be something like this
python bla.py --op4 A --op0 --op1

However, I would like this dynamic parsing to depend on the evaluated value of the argument --sub_select and not on a singleton argument A, which could stand somewhere in a list of arguments and be easily overlooked.

My inspiration for this is the PyTorch Lightning Guideline.

Thanks

@mauvilsa
Copy link
Member

My inspiration for this is the PyTorch Lightning Guideline.

Would be nicer if you took inspiration from LightningCLI than from the old way that used plain argparse. In particular I don't like the def add_args(parent_parser): methods. With jsonargparse the idea is that the parsers are easily created from the class signature, not by having to implement a method in the respective class to add its arguments.

However, I would like this dynamic parsing to depend on the evaluated value of the argument --sub_select and not on a singleton argument A, which could stand somewhere in a list of arguments and be easily overlooked.

This is what subcommands are for, and I don't really get your concern. Subcommands are a common approach for many command line tools, so much that it will be more likely that your cli's users will require less effort to learn how to use it if you follow the subcommand approach than doing something different. To me seems an unfounded concern to think that the subcommand will be overlooked. This mostly depends on your design and what names you give to the subcommands. Take for example git, which works like git [global_options] subcommand_name [subcommand_options]. If you see a command such as git --git-dir=../git commit -a -m "My commit message" it is quite clear which is the subcommand and why --git-dir goes before the subcommand and the other options after. Being required to do git ... --select commit ... would make it a worse design.

If you do use subcommands, your example with jsonargparse would be convenient to structure as follows:

from jsonargparse import ArgumentParser

class A:
    def __init__(self, op0: int = 1, op1: str = 'op1 default'):
         ...

class B:
    def __init__(self, op0: float = 0.2, op2: int = 3, op3: bool = False):
         ...

def main():
    parser_a = ArgumentParser()
    parser_a.add_class_arguments(A)

    parser_b = ArgumentParser()
    parser_b.add_class_arguments(B)

    parser = ArgumentParser()
    parser.add_argument("--gop1")
    subcommands = parser.add_subcommands()
    subcommands.add_subcommand('select_A', parser_a)
    subcommands.add_subcommand('select_B', parser_b)

    config = parser.parse_args()

Doing this, then you would run it for example as python cli.py --gop1=x select_B --op2=5 .... Note that by adding type hints to your parameters, the values are validated accordingly. If additionally you add docstrings with good descriptions of the parameters, then you get for free a nice output for help commands, e.g. python cli.py --help and python cli.py select_A --help.

@mauvilsa mauvilsa added the question Further information is requested label Mar 21, 2022
@Schinkikami
Copy link
Author

Thank you for your very extensive reply! You are completely right on the add_argsparse_arguments() methods, I also planned to remove them.

In my case, the classes are two nn.Modules (let's call them ModelA and ModelB). I would now like to select the correct model with --model ModelA or --model ModelB, as I think this makes the calls more clear and not just writing "ModelA" as a singleton. In your example with the git commands you are definitely right, but in my case I would prefer to dynamically select the correct model with --model ModelA and then be able to further configure the model by --model.layers X --model.ensemble_size Y.

Looking through the LightningCLI it seems possible there, so is my best bet directly switching to LightningCLI, or is this also possible in pure jsonargparse?

Thanks

@mauvilsa
Copy link
Member

Without subcommands what you want is not possible. If which arguments will be available depend on the value of one argument has, then the cli.py --help is not able to show which are all the arguments possible. That is why the way to do this is different.

If you are using pytorch-lightning for certain better if you use LightningCLI. There the selection of models is not through subcommands. How to achieve this with jsonargparse is a bit more complex. But with LightningCLI this is already implemented, so it will be less effort for you.

@psirenny
Copy link
Contributor

psirenny commented Jul 23, 2022

I know this issue is closed, but I have a related question. I'm creating a CLI application for a reinforcement learning model that accepts an arbitrary list of policies/models. I'd like for the script to be compatible with AWS SageMaker Hyperparameter tuning.

I considered a script that accepted arguments like this:

train.py \
  --policy_ids='["red", "blue"]' \
  --policies.hidden_layer_size='{"red": 128, "blue": 512}' \ # Tunable parameters

train.py \
  --policy_ids='["alien", "predator", "marine"]' \
  --policies.hidden_layer_size='{"alien": 128, "predator": 512, "marine": 256}' \ # Tunable parameters

But the above doesn't work because SageMaker tunable parameters can't output JSON. It can only output integers, floats, and categorical strings. Given that constraint, I considered a script like this instead:

train.py \
  --policy_ids='["red", "blue"]' \
  --red_policy.hidden_layer_size=128  \
  --blue_policy.hidden_layer_size=512

train.py \
  --policy_ids='["alien", "predator", "marine"]' \
  --alien_policy.hidden_layer_size=128 \
  --predator_policy.hidden_layer_size=512 \
  --marine_policy.hidden_layer_size=256

But this requires an initial parser.parse_known_args() to add the arguments dynamically. Finally, I considered creating 10 static policy arguments like this:

train.py \
  --policy_ids='["red", "blue"]' \
  --policy_0.hidden_layer_size=128 \
  --policy_1.hidden_layer_size=512 \

train.py \
  --policy_ids='["alien", "predator", "marine"]' \
  --policy_0.hidden_layer_size=128 \
  --policy_1.hidden_layer_size=512 \
  --policy_2.hidden_layer_size=256

But this is suboptimal. It requires allocating a fixed number of arguments. It also pollutes the help text and requires a lot of boilerplate to convert the hardcoded arguments into a list.

I was wondering if you have any suggestions (other than abandoning SageMaker 😆).

@mauvilsa
Copy link
Member

@psirenny I haven't used AWS SageMaker Hyperparameter tuning. Can you please explain a bit how it works. You give it a template of the command it should run?

@psirenny
Copy link
Contributor

It takes configuration for a tuning job like this:

sagemaker_client.create_hyper_parameter_tuning_job(
    HyperParameterTuningJobName = "red_blue_tuning_job",
    HyperParameterTuningJobConfig = {
        "HyperParameterTuningJobObjective": {
            "MetricName": "episode_reward_mean",
            "Type": "Maximize"
        },
        "ParameterRanges": {
            "IntegerParameterRanges": [
                "Name": "red_policy.hidden_layer_size",
                "MinValue": 32,
                "MaxValue": 2048,
            ],
            "IntegerParameterRanges": [
                "Name": "blue_policy.hidden_layer_size",
                "MinValue": 32,
                "MaxValue": 2048,
            ],
        },
        "ResourceLimits": {
            "MaxNumberOfTrainingJobs": 100,
            "MaxParallelTrainingJobs": 5,
        },
        "Strategy": "Bayesian",
    },
    TrainingJobDefinition = {
        "AlgorithmSpecification": {
            "TrainingImage": docker_image,
            "TrainingInputMode": "File"
        },
        "ResourceConfig": {
            "InstanceCount": 1,
            "InstanceType": "ml.c4.2xlarge",
        },
        "StaticHyperParameters": {
            "batch_size": 2000,
            "learning_rate": 0.00005,
            "policy_ids": "'[\"red\", \"blue\"]'",
        },
        "StoppingCondition": {
            "MaxRuntimeInSeconds": 60 * 60
        },
        # …
    },
)

and spawns multiple SageMaker training jobs from that configuration. Each training job invokes a dockerized CLI program with arguments serialized from the above parameters. Kind of like this:

docker run <docker_image> train \
  --batch_size=2000 \
  --learning_rate=0.00005 \
  --policy_ids='["red", "blue"]' \
  --red_policy.hidden_layer_size=49 \ # parameter being tuned
  --blue_policy.hidden_layer_size=63  # parameter being tuned

I suppose the real issue is in the way that SageMaker training jobs serialize parameters to CLI arguments.

As I type this, it occurs to me that SageMaker also writes parameters to a JSON file. That JSON file is available to the docker container. Perhaps the best way forward is to create a CLI that loads the JSON file instead.

@mauvilsa
Copy link
Member

Several features have been added to allow configuring complex type hints with individual command line arguments, for example List append and Class type and sub-classes#command-line. There is a similar pending feature, to add individual items to an argument with type dict or Dict[<key-type>, <value-type>]. This would resolve your issue, by being able to run the command like:

train.py \
  --policy_ids='["red", "blue"]' \
  --policy_hidden_layer_size.red=128 \
  --policy_hidden_layer_size.blue=512

@mauvilsa
Copy link
Member

I have implemented (commit 2397f90) the support for individual dict items from command line, see docs-latest#dict-items. This is not yet released but you could try it out if you install from source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants