Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically parse missing as None in clickify_parameters? #329

Open
landmanbester opened this issue Aug 29, 2024 · 15 comments
Open

Automatically parse missing as None in clickify_parameters? #329

landmanbester opened this issue Aug 29, 2024 · 15 comments

Comments

@landmanbester
Copy link
Collaborator

When defining inputs to be parsed by clickify_parameters such as

inputs:
  foo:
    dtype: int

this will get parsed as an UNSET value. In pfb I get around this like so and I now realise you can also get around this by making parameters optional with no default (eg. Optional[int] above) but should it perhaps be the default behaviour?

@landmanbester
Copy link
Collaborator Author

A related question is how to UNSET a value from the CLI when it has a default but you want it evaluated to None. Is there a way to do that? One use case I have run into is when you want to use the FLAG column by default but ignore it some of the time (eg. spectral line imaging)

@o-smirnov
Copy link
Member

Would flag: Optional[str] = "FLAG", and --flag None not work? Hmm, maybe it'll get parsed to a "None" string. It's a bit of a tricky problem in CLIs in general.

@landmanbester
Copy link
Collaborator Author

Yeah --flag None gets parsed as a string

@o-smirnov
Copy link
Member

o-smirnov commented Aug 29, 2024

Well this issue is really in the domain of CLIs, as far as I'm concerned. So let's think of what makes sense from a CLI point of view. A --no-flag option? With a corresponding policies field of none_negates_option to tell Stimela to use this when the option is a None? That said, I'm not even sure this is something that click can express easily in the first place...

If this is not a common use case, it might be simpler to explicitly check for a "" value for the parameter inside the Python code, and treat "" as no flag column. Then you have a clean way to disable the flag column from both YaML and the CLI, no changes to the Stimela codebase necessary.

@landmanbester
Copy link
Collaborator Author

It is not that common and I have indeed gone with the latter option but that only works for string inputs. I was just wondering is there is a cleaner way to do this eg. something like --parameter None or --parameter UNSET where the string conversion gets handled in clickify_parameters. But this is a bit orthogonal to the current issue.

I think it makes sense for clickify_parameters to evaluate unset parameters to None. Surely that is the intended behaviour for parameters defined without defaults? I also noted that, and this may just be something I am doing wrong, even with the Optional[dtype] definition in my yaml files I still end up with the _UNSET_DEFAULT values getting parsed. Am I doing something wrong here? Here is sample of the defaults that get parsed for a worker if I remove the last few lines handling the conversion

<UNSET DEFAULT VALUE>
<UNSET DEFAULT VALUE>
I
<UNSET DEFAULT VALUE>
True
True
<UNSET DEFAULT VALUE>
1
<UNSET DEFAULT VALUE>
True
<UNSET DEFAULT VALUE>
<UNSET DEFAULT VALUE>
<UNSET DEFAULT VALUE>
<UNSET DEFAULT VALUE>
<UNSET DEFAULT VALUE>
False
<UNSET DEFAULT VALUE>
DATA
<UNSET DEFAULT VALUE>
<UNSET DEFAULT VALUE>
FLAG
<UNSET DEFAULT VALUE>
-1
32
double
1.0
3.0
<UNSET DEFAULT VALUE>
1
<UNSET DEFAULT VALUE>
True
False

@landmanbester
Copy link
Collaborator Author

I can get this working like so but I am not sure if it will have unintended consequences elsewhere. Also not sure if this makes sense for positional arguments. What do you think @o-smirnov ?

@JSKenyon
Copy link
Collaborator

Random question, but does click let you do something like --parameter1 --parameter2 value where parameter1 should take a value? I think that that currently results in a None in QC.

@landmanbester
Copy link
Collaborator Author

Nope. I get eg. Error: Option '--flag-column' requires an argument. Not sure if it works for positional arguments but not for Options

@o-smirnov
Copy link
Member

o-smirnov commented Aug 30, 2024

In principle, parameters without a specified default are supposed to be caught here: https://github.com/caracal-pipeline/stimela/blob/master/scabha/schema_utils.py#L276. The idea is that the click option is then created without a default keyword, so when a CLI option is omitted and click invokes the Python function, that option is not passed to the function call at all. So I don't understand how "" comes through. Maybe the in should be a string comparison rather?

I think we should just hash out the desired logic (maybe on Tuesday) and write it down clearly so we can test against it. Here I'll just list the subtle aspects of the problem:

  • Although Python code often employs None in the sense of "missing", this is just a convention and nowhere near universal. So I did not want to use None to mark missing parameters within Stimela, as that would create difficulties with tools that do not employ this convention. Hence the special UNSET value (which ultimately results in the unset parameter being removed from the final dict that gets passed to the cab).

  • When calling a Python function, there is a difference between "pass an explicit None value" and "pass nothing and rely on the function definition to provide its own default". When @landmanbester catches everything in the pfb workers via **kw (which I find a bit naughty, as it means the true function signature is obscured), this becomes the difference between putting a key=None in kw, or not putting the key into kw.

  • I attempted to address this via the pass_missing_as_none policy -- I'm open to arguments about whether this policy should be false or true by default, I never had a clear preference.

  • At CLI level, this difference manifests itself in whether @click.option is created with or without a default argument.

  • Likewise for defaults in cab definitions -- I wanted to maintain a clear distinction between default: None and no default at all. Hence the internal UNSET_DEFAULT gymnastics.

  • As a final wrinkle, when we write a cab definition for a Python function, we can potentially define a default value that could be different from the default value defined in the Pythoin signature. This could be used for both good and evil...

  • You'll notice there is also a suppress_cli_default policy. This is meant to address the wrinkle above. What it means is that the cab-definition defaults are in effect when calling the tool from stimela, but when a CLI is constructed, the defaults are not passed in, causing click to rely on the function signature defaults.

@landmanbester
Copy link
Collaborator Author

In principle, parameters without a specified default are supposed to be caught here: https://github.com/caracal-pipeline/stimela/blob/master/scabha/schema_utils.py#L276. The idea is that the click option is then created without a default keyword, so when a CLI option is omitted and click invokes the Python function, that option is not passed to the function call at all. So I don't understand how "" comes through. Maybe the in should be a string comparison rather?

"" doesn't come through. I have to catch it like this

https://github.com/ratt-ru/pfb-imaging/blob/ce9890ea914a7629227f5a3f6d2a4c0696491ad3/pfb/workers/init.py#L189

Did you see what I did here?

if policies.pass_missing_as_none:

Is that what you meant? I now just have this policy in the the config.yaml eg.

https://github.com/ratt-ru/pfb-imaging/blob/ce9890ea914a7629227f5a3f6d2a4c0696491ad3/pfb/parser/init.yaml#L137

When calling a Python function, there is a difference between "pass an explicit None value" and "pass nothing and rely on the function definition to provide its own default". When @landmanbester catches everything in the pfb workers via **kw (which I find a bit naughty, as it means the true function signature is obscured), this becomes the difference between putting a key=None in kw, or not putting the key into kw.

This was an attempt at defining parameters in a single place. I agree that it's not best practice but the alternative is a lot more work and has proven to be more error prone given the number of rapidly evolving workers I am attempting to maintain. There is an obvious downside to this viz. the function doc string is not informative and has to be viewed by running the CLI command with --help. This is an issue I need to address, open to suggestions. But I do want to avoid defining the function signature in multiple places.

@o-smirnov
Copy link
Member

o-smirnov commented Aug 30, 2024

https://github.com/ratt-ru/pfb-imaging/blob/ce9890ea914a7629227f5a3f6d2a4c0696491ad3/pfb/workers/init.py#L189

Did you see what I did here?

Ah yes. I think this is the right thing to do!

This was an attempt at defining parameters in a single place. I agree that it's not best practice but the alternative is a lot more work and has proven to be more error prone given the number of rapidly evolving workers I am attempting to maintain. There is an obvious downside to this viz. the function doc string is not informative and has to be viewed by running the CLI command with --help. This is an issue I need to address, open to suggestions. But I do want to avoid defining the function signature in multiple places.

Yeah I hear you... it's a pain keeping the YaML and the Python signature in sync with this many parameters. Well, I suppose we can just proclaim the YaML cab definition to be part of the function documentation and leave it at that?

@landmanbester
Copy link
Collaborator Author

Yeah I hear you... it's a pain keeping the YaML and the Python signature in sync with this many parameters. Well, I suppose we can just proclaim the YaML cab definition to be part of the function documentation and leave it at that?

Yeah, good idea. Let me see if I can render that to the docstring without breaking something elsewhere. Thanks for the suggestion.

As yes. I think this is the right thing to do!

Actually, I think my logic was flawed there (I missed the not in). I am running into something that confuses me though. In this line

policies = OmegaConf.merge(default_policies, schema.policies)

the default policies will get overwritten by whatever is in schema.policies correct? But at this stage schema.policies refers to the per parameter policies so even if default_policies has pass_missing_as_none=True it will get overwritten by the policy for that parameter, which seems to be coming in as None. Is this the intended behaviour? That would mean that the per parameter policy takes precedence even when it is None. Or am I missing something?

@o-smirnov
Copy link
Member

Good catch, I think this genuinely a bug! The corresponding cab invocation logic has something more sophisticated than a simple merge, see here: https://github.com/caracal-pipeline/stimela/blob/master/stimela/kitchen/cab.py#L168

I guess the solution is to use that same logic here.

@landmanbester
Copy link
Collaborator Author

I had a go in the clickify_missing_as_none branch but no cigar. For some reason parameters with dtype List[URI] do not get parsed properly, will keep digging

@landmanbester
Copy link
Collaborator Author

So the issue was that the pass_missing_as_none policy wasn't propagated to the list and tuple validators. I added a check for this here

def _validate_list(text: str, element_type, schema, sep=",", brackets=True, pass_missing_as_none=False):

and similarly for _validate_tuple. I think this now works as expected but need to test it more thoroughly. I've also merged master and profiling-hacks into this branch as I need them for my experiments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants