Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imp: Make clap_derive call FromStr::from_str only once per argument. #2206

Closed
wants to merge 1 commit into from

Conversation

sunfishcode
Copy link

Currently the way clap_derive uses a from_str function is to call
it once as a validator, discarding the parsed value, and then to call
it again to recompute the parsed value. This is unfortunate in
cases where from_str is expensive or has side effects.

This PR changes clap_derive to not register from_str as a validator
so that it doesn't do the first of these two calls. Then, instead of
doing unwrap() on the other call, it handles the error. This eliminates
the redundancy, and also avoids the small performance hit mentioned in
the documentation about validators.

This PR doesn't yet use colorized messages for errors generated during
parsing because the ColorWhen setting isn't currently available.
That's fixable with some refactoring, but I'm interested in getting
feedback on the overall approach here first.

Copy link
Member

@pksunkara pksunkara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like you are re-implementing quite a bit of parsing logic in the derive.

IMO the derive should only build the App (with validators), pass it arguments, and structure the result.

But I think, the issue here is happening because we are passing the arguments to the app by calling get_matches_from instead of try_get_matches_from which is what you probably want to focus on for the fix (and you did to certain extent).

@sunfishcode
Copy link
Author

But I think, the issue here is happening because we are passing the arguments to the app by calling get_matches_from instead of try_get_matches_from which is what you probably want to focus on for the fix (and you did to certain extent).

Could you elaborate on this? As I understand it, if the program uses Opt::try_parse, then it does use try_get_matches_from, but the problem is still present. It still ends up calling from_arg_matches which uses unwrap()s on its from_str calls, so it still needs separate from_str calls to do the validation.

@pksunkara pksunkara added this to the 3.0 milestone Nov 9, 2020
@pksunkara
Copy link
Member

But we have a validator that should have already failed before even from_arg_matches need to run.

@sunfishcode
Copy link
Author

I'm hoping to avoid calling from_str twice on the same argument. To achieve that, I think the validator phase needs to not get involved in validating from_str arguments.

@pksunkara
Copy link
Member

I think I understand the context now. But this is not a derive issue. This issue exists even in normal app parsing.

As you can see here, we disregard the validated value. And when we send the ArgMatches to the user, they would need to validate again if they want it in a certain format.

I do see how this can be an issue. But I don't feel comfortable with the approach you are proposing here. As I said, derive should simply be a wrapper.

My gut says we can attack this in derive using parse. Do you think you can experiment with that?

@sunfishcode
Copy link
Author

I think I understand the context now. But this is not a derive issue. This issue exists even in normal app parsing.

As far as I can tell, it does seem to be a derive issue. from_str is only called once per arg in a simple non-derive example:

use std::str::FromStr;
use clap::{App, Arg};

enum Colors {
    Red,
    Green,
    Blue,
}

impl FromStr for Colors {
    type Err = &'static str;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        eprintln!("called Colors::from_str({})", s);
        match s {
            "Red" => Ok(Colors::Red),
            "Green" => Ok(Colors::Green),
            "Blue" => Ok(Colors::Blue),
            _ => Err("no match"),
        }
    }
}

fn main() {
    let m = App::new("myapp")
        .arg(Arg::from("<color> 'The color to use'"))
        .get_matches();

    let t = m.value_of_t("color").unwrap_or_else(|e| e.exit());

    match t {
        Colors::Red => println!("Selected red"),
        Colors::Green => println!("Selected green"),
        Colors::Blue => println!("Selected blue"),
    }
}

As you can see here, we disregard the validated value. And when we send the ArgMatches to the user, they would need to validate again if they want it in a certain format.

For regular non-derive users, this only comes up if a user registers an explicit validator function. And if they do that, then the validator function is called once and the parse function is called once, which unsurprising.

For derive users, the derive effectively registers the from_str function as both the validate function and the parse function, which is why it gets called twice. My patch here removes the validate registration for it, and replaces newly exposed unwrap()s with error handling.

I do see how this can be an issue. But I don't feel comfortable with the approach you are proposing here. As I said, derive should simply be a wrapper.

My gut says we can attack this in derive using parse. Do you think you can experiment with that?

Yes, I'm happy to experiment with different approaches. However, I don't yet see a way to do it differently within clap. Derive is registering from_str as a validate function, so the validator pass calls it and discards the value. Once that happens, we have no choice but to call from_str again.

One option would be to have the validator remember the parsed values, but that looks like it'd be pretty awkward.

Another option would be to avoid registering from_str as a validator, which is what my PR does. This would make the derive case closer to the normal case, because I imagine most normal users don't register validation functions for types that just parse with FromStr.

@pksunkara
Copy link
Member

So, that is what I was saying. Instead of implementing impl FromStr for Effectful in the example, you have to use parse on your field as shown here.

  • impl FromStr is used for both parse and validator.
  • parse is used for parse.

@sunfishcode
Copy link
Author

So, that is what I was saying. Instead of implementing impl FromStr for Effectful in the example, you have to use parse on your field as shown here.

This does work, however the reason I'm interested in this issue is that I'm writing a library, and I'd like to be able to tell my users they can plug the library's types into clap. It's nicer if that "just works" and I don't have to tell them to also remember to write parse(try_from_str = T::from_str).

* `impl FromStr` is used for both `parse` and `validator`.

This is counterintuitive to me, and makes derive code default to being different from typical non-derive code, since as far as I can tell validator functions are uncommon in non-derive code.

I did some more experimenting, and I found way to restructure the patch to move more of the logic out of the derive, so it's now a net reduction in the lines of code in the derive directory. It refactors the ArgMatches functions value_of_t and friends to expose slightly lower-level interfaces, and uses that functionality from the derive code. That makes the derive code simpler, and conceptually closer to typical non-derive code. It also happens to tighten up the type checking for custom parse functions, which caught a type error in clap_derive/tests/custom-string-parsers.rs.

Does this look like a feasible approach?

@pksunkara
Copy link
Member

This is definitely a much better approach than the previous one, but I am still not happy about removing the #validator. It plays a role in argument parsing IIRC and affects the behaviour. It's just not something that's checked after the parsing is done. It is one of the reasons why we allow 2 different ways of defining the parsing of the string.

Do correct me if I am wrong, I haven't touched it since a long time.

@sunfishcode
Copy link
Author

This is definitely a much better approach than the previous one, but I am still not happy about removing the #validator. It plays a role in argument parsing IIRC and affects the behaviour. It's just not something that's checked after the parsing is done. It is one of the reasons why we allow 2 different ways of defining the parsing of the string.

The #validator removal is just it for the TryFromStr and TryFromOsStr cases, which in this patch are redundant because it's the same function as the parse function. Custom user validators are still supported -- I didn't see an existing test for them, so I've now added one to the PR to confirm that they continue to work.

@sunfishcode
Copy link
Author

In fact, if the user specifies a custom validator with clap_derive, it ovewrites the from_str validator, so on master right now if the custom validator accepts things that from_str doesn't, it panics trying to unwrap() the from_str result. With the PR here, it reports the parse error and doesn't panic, which seems nicer. I've now added an assert to the testcase to test this.

@sunfishcode
Copy link
Author

I've now fixed the merge conflicts and rebased this on master. The test failures here also fail for me on master and don't appear to be related to this PR.

@sunfishcode
Copy link
Author

Rebased, merge conflicts fixed, and now all the tests pass!

@pksunkara
Copy link
Member

Hey, want to let you know that we will definitely get this fixed before v3 is out. But unfortunately, I am still hesitant about the current design. I would like to take a crack at this but can only do it after some of other things are solved. Please don't think that I have forgotten this or anything.

@sunfishcode
Copy link
Author

Would you be able to say more about your concern here? With what I know right now, removing the #validator

and doesn't appear to have significant downsides, so it's not clear to me whether the concern here is about the removal of the #validator itself, or the specifics of the current patch here.

I am considering doing the work in #2298 to split out type inference, which will involve some reorganization, and it would help me to understand how you envision this code eventually being organized.

@pksunkara
Copy link
Member

I want to see if we can improve the ergonomics for people building libraries on top of clap using traits for the following stuff:

  • impl FromStr is used for both parse and validator.
  • parse is used for parse.

I would like to personally take a crack at this because my instinct says that we can do it better that way.

Why would doing #2298 need expanding the code twice? Once for parsing and once for validating? Won't both just use the same auto parser?

@sunfishcode
Copy link
Author

Why would doing #2298 need expanding the code twice? Once for parsing and once for validating? Won't both just use the same auto parser?

The autoref specialization technique that #2298 uses doesn't work in generic contexts. That is, we can't put it in a function and call it twice; we have to macro-expand it inline every time we need it.

It's doable, but it's difficult for me to work on without understanding the purpose of organizing the code this way.

@pksunkara
Copy link
Member

Understood. In what cases would that increase compilation time and will there be any peformance hit (shouldn't be, I think)?

@sunfishcode
Copy link
Author

I might be missing something fundamental here.

Why does it make sense to register FromStr::from_str as a validator?

After working on the patch here, and seeing how structopt/clap_derive work on the inside, the most likely explanation that I've come up with is that structopt likely started registering FromStr::from_str as a validator function to work around clap's main API not being flexible enough in how it handled Result values.

The PR here seems to confirm this theory. It makes clap's main API more flexible, with try_from_arg_matches and related things. With these minor API additions, the API is flexible enough to let clap_derive do everything that structopt does, without registering FromStr::from_str as a validator.

On top of that, many other arrows point in this direction. This is how all hand-written clap code that I've seen works. This is how clap's own examples work. This lets users specify their own validator function without overriding clap_derive's implicit validator. This is more efficient. This is easier to explain to people who don't know how clap_derive works -- it just parses all the arguments once and reports any errors it finds. And now, it's easier to implement the "auto" feature.

And I can't find any significant downsides.

So if there's something I'm missing here, it feels like I need to figure that out first.

@sunfishcode sunfishcode force-pushed the from-str-validation branch from b8b112b to da2d630 Compare April 15, 2021 14:40
@sunfishcode sunfishcode force-pushed the from-str-validation branch from da2d630 to 6c6264a Compare May 27, 2021 23:03
Currently the way `clap_derive` uses a `from_str` function is to call
it once as a validator, discarding the parsed value, and then to call
it again to recompute the parsed value. This is unfortunate in
cases where `from_str` is expensive or has side effects.

This PR changes `clap_derive` to not register `from_str` as a validator
so that it doesn't do the first of these two calls. Then, instead of
doing `unwrap()` on the other call, it handles the error. This eliminates
the redundancy, and also avoids the small performance hit mentioned in
[the documentation about validators].

[the documentation about validators]: https://docs.rs/clap-v3/3.0.0-beta.1/clap_v3/struct.Arg.html#method.validator

This PR doesn't yet use colorized messages for errors generated during
parsing because the `ColorWhen` setting isn't currently available.
That's fixable with some refactoring, but I'm interested in getting
feedback on the overall approach here first.
@sunfishcode sunfishcode force-pushed the from-str-validation branch from 6c6264a to db8c060 Compare July 13, 2021 21:50
@epage epage modified the milestones: 3.0, 4.0 Oct 16, 2021
@pksunkara pksunkara removed this from the 4.0 milestone Oct 16, 2021
@sunfishcode sunfishcode deleted the from-str-validation branch October 17, 2021 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants