imp: Make `clap_derive` call `FromStr::from_str` only once per argument. #2206

sunfishcode · 2020-11-09T14:53:07Z

Currently the way clap_derive uses a from_str function is to call
it once as a validator, discarding the parsed value, and then to call
it again to recompute the parsed value. This is unfortunate in
cases where from_str is expensive or has side effects.

This PR changes clap_derive to not register from_str as a validator
so that it doesn't do the first of these two calls. Then, instead of
doing unwrap() on the other call, it handles the error. This eliminates
the redundancy, and also avoids the small performance hit mentioned in
the documentation about validators.

This PR doesn't yet use colorized messages for errors generated during
parsing because the ColorWhen setting isn't currently available.
That's fixable with some refactoring, but I'm interested in getting
feedback on the overall approach here first.

pksunkara

It feels like you are re-implementing quite a bit of parsing logic in the derive.

IMO the derive should only build the App (with validators), pass it arguments, and structure the result.

But I think, the issue here is happening because we are passing the arguments to the app by calling get_matches_from instead of try_get_matches_from which is what you probably want to focus on for the fix (and you did to certain extent).

sunfishcode · 2020-11-09T17:10:38Z

But I think, the issue here is happening because we are passing the arguments to the app by calling get_matches_from instead of try_get_matches_from which is what you probably want to focus on for the fix (and you did to certain extent).

Could you elaborate on this? As I understand it, if the program uses Opt::try_parse, then it does use try_get_matches_from, but the problem is still present. It still ends up calling from_arg_matches which uses unwrap()s on its from_str calls, so it still needs separate from_str calls to do the validation.

pksunkara · 2020-11-09T17:27:25Z

But we have a validator that should have already failed before even from_arg_matches need to run.

sunfishcode · 2020-11-09T17:40:25Z

I'm hoping to avoid calling from_str twice on the same argument. To achieve that, I think the validator phase needs to not get involved in validating from_str arguments.

pksunkara · 2020-11-09T20:58:07Z

I think I understand the context now. But this is not a derive issue. This issue exists even in normal app parsing.

As you can see here, we disregard the validated value. And when we send the ArgMatches to the user, they would need to validate again if they want it in a certain format.

I do see how this can be an issue. But I don't feel comfortable with the approach you are proposing here. As I said, derive should simply be a wrapper.

My gut says we can attack this in derive using parse. Do you think you can experiment with that?

sunfishcode · 2020-11-09T23:20:21Z

I think I understand the context now. But this is not a derive issue. This issue exists even in normal app parsing.

As far as I can tell, it does seem to be a derive issue. from_str is only called once per arg in a simple non-derive example:

use std::str::FromStr;
use clap::{App, Arg};

enum Colors {
    Red,
    Green,
    Blue,
}

impl FromStr for Colors {
    type Err = &'static str;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        eprintln!("called Colors::from_str({})", s);
        match s {
            "Red" => Ok(Colors::Red),
            "Green" => Ok(Colors::Green),
            "Blue" => Ok(Colors::Blue),
            _ => Err("no match"),
        }
    }
}

fn main() {
    let m = App::new("myapp")
        .arg(Arg::from("<color> 'The color to use'"))
        .get_matches();

    let t = m.value_of_t("color").unwrap_or_else(|e| e.exit());

    match t {
        Colors::Red => println!("Selected red"),
        Colors::Green => println!("Selected green"),
        Colors::Blue => println!("Selected blue"),
    }
}

As you can see here, we disregard the validated value. And when we send the ArgMatches to the user, they would need to validate again if they want it in a certain format.

For regular non-derive users, this only comes up if a user registers an explicit validator function. And if they do that, then the validator function is called once and the parse function is called once, which unsurprising.

For derive users, the derive effectively registers the from_str function as both the validate function and the parse function, which is why it gets called twice. My patch here removes the validate registration for it, and replaces newly exposed unwrap()s with error handling.

I do see how this can be an issue. But I don't feel comfortable with the approach you are proposing here. As I said, derive should simply be a wrapper.

My gut says we can attack this in derive using parse. Do you think you can experiment with that?

Yes, I'm happy to experiment with different approaches. However, I don't yet see a way to do it differently within clap. Derive is registering from_str as a validate function, so the validator pass calls it and discards the value. Once that happens, we have no choice but to call from_str again.

One option would be to have the validator remember the parsed values, but that looks like it'd be pretty awkward.

Another option would be to avoid registering from_str as a validator, which is what my PR does. This would make the derive case closer to the normal case, because I imagine most normal users don't register validation functions for types that just parse with FromStr.

pksunkara · 2020-11-10T09:56:33Z

So, that is what I was saying. Instead of implementing impl FromStr for Effectful in the example, you have to use parse on your field as shown here.

impl FromStr is used for both parse and validator.
parse is used for parse.

sunfishcode · 2020-11-10T19:26:21Z

So, that is what I was saying. Instead of implementing impl FromStr for Effectful in the example, you have to use parse on your field as shown here.

This does work, however the reason I'm interested in this issue is that I'm writing a library, and I'd like to be able to tell my users they can plug the library's types into clap. It's nicer if that "just works" and I don't have to tell them to also remember to write parse(try_from_str = T::from_str).

* `impl FromStr` is used for both `parse` and `validator`.

This is counterintuitive to me, and makes derive code default to being different from typical non-derive code, since as far as I can tell validator functions are uncommon in non-derive code.

I did some more experimenting, and I found way to restructure the patch to move more of the logic out of the derive, so it's now a net reduction in the lines of code in the derive directory. It refactors the ArgMatches functions value_of_t and friends to expose slightly lower-level interfaces, and uses that functionality from the derive code. That makes the derive code simpler, and conceptually closer to typical non-derive code. It also happens to tighten up the type checking for custom parse functions, which caught a type error in clap_derive/tests/custom-string-parsers.rs.

Does this look like a feasible approach?

clap_derive/src/derives/from_arg_matches.rs

pksunkara · 2020-11-10T20:03:04Z

This is definitely a much better approach than the previous one, but I am still not happy about removing the #validator. It plays a role in argument parsing IIRC and affects the behaviour. It's just not something that's checked after the parsing is done. It is one of the reasons why we allow 2 different ways of defining the parsing of the string.

Do correct me if I am wrong, I haven't touched it since a long time.

sunfishcode · 2020-11-10T20:39:46Z

This is definitely a much better approach than the previous one, but I am still not happy about removing the #validator. It plays a role in argument parsing IIRC and affects the behaviour. It's just not something that's checked after the parsing is done. It is one of the reasons why we allow 2 different ways of defining the parsing of the string.

The #validator removal is just it for the TryFromStr and TryFromOsStr cases, which in this patch are redundant because it's the same function as the parse function. Custom user validators are still supported -- I didn't see an existing test for them, so I've now added one to the PR to confirm that they continue to work.

sunfishcode · 2020-11-10T21:35:06Z

In fact, if the user specifies a custom validator with clap_derive, it ovewrites the from_str validator, so on master right now if the custom validator accepts things that from_str doesn't, it panics trying to unwrap() the from_str result. With the PR here, it reports the parse error and doesn't panic, which seems nicer. I've now added an assert to the testcase to test this.

sunfishcode · 2020-11-25T01:14:04Z

I've now fixed the merge conflicts and rebased this on master. The test failures here also fail for me on master and don't appear to be related to this PR.

sunfishcode · 2020-12-02T19:42:24Z

Rebased, merge conflicts fixed, and now all the tests pass!

pksunkara · 2020-12-14T02:41:41Z

Hey, want to let you know that we will definitely get this fixed before v3 is out. But unfortunately, I am still hesitant about the current design. I would like to take a crack at this but can only do it after some of other things are solved. Please don't think that I have forgotten this or anything.

sunfishcode · 2021-02-06T15:41:36Z

Would you be able to say more about your concern here? With what I know right now, removing the #validator

and doesn't appear to have significant downsides, so it's not clear to me whether the concern here is about the removal of the #validator itself, or the specifics of the current patch here.

I am considering doing the work in #2298 to split out type inference, which will involve some reorganization, and it would help me to understand how you envision this code eventually being organized.

pksunkara · 2021-02-07T00:45:41Z

I want to see if we can improve the ergonomics for people building libraries on top of clap using traits for the following stuff:

impl FromStr is used for both parse and validator.

parse is used for parse.

I would like to personally take a crack at this because my instinct says that we can do it better that way.

Why would doing #2298 need expanding the code twice? Once for parsing and once for validating? Won't both just use the same auto parser?

sunfishcode · 2021-02-07T14:23:28Z

Why would doing #2298 need expanding the code twice? Once for parsing and once for validating? Won't both just use the same auto parser?

The autoref specialization technique that #2298 uses doesn't work in generic contexts. That is, we can't put it in a function and call it twice; we have to macro-expand it inline every time we need it.

It's doable, but it's difficult for me to work on without understanding the purpose of organizing the code this way.

pksunkara · 2021-02-07T14:26:31Z

Understood. In what cases would that increase compilation time and will there be any peformance hit (shouldn't be, I think)?

sunfishcode · 2021-02-07T22:03:46Z

I might be missing something fundamental here.

Why does it make sense to register FromStr::from_str as a validator?

After working on the patch here, and seeing how structopt/clap_derive work on the inside, the most likely explanation that I've come up with is that structopt likely started registering FromStr::from_str as a validator function to work around clap's main API not being flexible enough in how it handled Result values.

The PR here seems to confirm this theory. It makes clap's main API more flexible, with try_from_arg_matches and related things. With these minor API additions, the API is flexible enough to let clap_derive do everything that structopt does, without registering FromStr::from_str as a validator.

On top of that, many other arrows point in this direction. This is how all hand-written clap code that I've seen works. This is how clap's own examples work. This lets users specify their own validator function without overriding clap_derive's implicit validator. This is more efficient. This is easier to explain to people who don't know how clap_derive works -- it just parses all the arguments once and reports any errors it finds. And now, it's easier to implement the "auto" feature.

And I can't find any significant downsides.

So if there's something I'm missing here, it feels like I need to figure that out first.

Currently the way `clap_derive` uses a `from_str` function is to call it once as a validator, discarding the parsed value, and then to call it again to recompute the parsed value. This is unfortunate in cases where `from_str` is expensive or has side effects. This PR changes `clap_derive` to not register `from_str` as a validator so that it doesn't do the first of these two calls. Then, instead of doing `unwrap()` on the other call, it handles the error. This eliminates the redundancy, and also avoids the small performance hit mentioned in [the documentation about validators]. [the documentation about validators]: https://docs.rs/clap-v3/3.0.0-beta.1/clap_v3/struct.Arg.html#method.validator This PR doesn't yet use colorized messages for errors generated during parsing because the `ColorWhen` setting isn't currently available. That's fixable with some refactoring, but I'm interested in getting feedback on the overall approach here first.

pksunkara requested changes Nov 9, 2020

View reviewed changes

pksunkara added this to the 3.0 milestone Nov 9, 2020

sunfishcode force-pushed the from-str-validation branch from d892f6d to ff7c77d Compare November 9, 2020 22:35

pksunkara reviewed Nov 10, 2020

View reviewed changes

clap_derive/src/derives/from_arg_matches.rs Outdated Show resolved Hide resolved

sunfishcode force-pushed the from-str-validation branch from 434cfbb to 8b49bb0 Compare November 25, 2020 00:57

sunfishcode force-pushed the from-str-validation branch from 8b49bb0 to 1b0088e Compare November 28, 2020 21:04

sunfishcode requested a review from pksunkara December 10, 2020 01:11

sunfishcode force-pushed the from-str-validation branch from 1b0088e to b8b112b Compare January 17, 2021 13:49

sunfishcode mentioned this pull request Jan 18, 2021

feat: Auto-detect FromStr, TryFrom<&OsStr>, etc., in clap_derive #2298

Closed

sunfishcode force-pushed the from-str-validation branch from b8b112b to da2d630 Compare April 15, 2021 14:40

sunfishcode force-pushed the from-str-validation branch from da2d630 to 6c6264a Compare May 27, 2021 23:03

sunfishcode force-pushed the from-str-validation branch from 6c6264a to db8c060 Compare July 13, 2021 21:50

pksunkara mentioned this pull request Aug 13, 2021

Provide data in the application domain rather than the CLI domain #2683

Closed

5 tasks

epage modified the milestones: 3.0, 4.0 Oct 16, 2021

pksunkara removed this from the 4.0 milestone Oct 16, 2021

sunfishcode closed this Oct 17, 2021

sunfishcode deleted the from-str-validation branch October 17, 2021 19:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

imp: Make `clap_derive` call `FromStr::from_str` only once per argument. #2206

imp: Make `clap_derive` call `FromStr::from_str` only once per argument. #2206

sunfishcode commented Nov 9, 2020

pksunkara left a comment

sunfishcode commented Nov 9, 2020

pksunkara commented Nov 9, 2020

sunfishcode commented Nov 9, 2020

pksunkara commented Nov 9, 2020

sunfishcode commented Nov 9, 2020

pksunkara commented Nov 10, 2020

sunfishcode commented Nov 10, 2020

pksunkara commented Nov 10, 2020

sunfishcode commented Nov 10, 2020

sunfishcode commented Nov 10, 2020

sunfishcode commented Nov 25, 2020

sunfishcode commented Dec 2, 2020

pksunkara commented Dec 14, 2020

sunfishcode commented Feb 6, 2021

pksunkara commented Feb 7, 2021

sunfishcode commented Feb 7, 2021

pksunkara commented Feb 7, 2021

sunfishcode commented Feb 7, 2021

imp: Make clap_derive call FromStr::from_str only once per argument. #2206

imp: Make clap_derive call FromStr::from_str only once per argument. #2206

Conversation

sunfishcode commented Nov 9, 2020

pksunkara left a comment

Choose a reason for hiding this comment

sunfishcode commented Nov 9, 2020

pksunkara commented Nov 9, 2020

sunfishcode commented Nov 9, 2020

pksunkara commented Nov 9, 2020

sunfishcode commented Nov 9, 2020

pksunkara commented Nov 10, 2020

sunfishcode commented Nov 10, 2020

pksunkara commented Nov 10, 2020

sunfishcode commented Nov 10, 2020

sunfishcode commented Nov 10, 2020

sunfishcode commented Nov 25, 2020

sunfishcode commented Dec 2, 2020

pksunkara commented Dec 14, 2020

sunfishcode commented Feb 6, 2021

pksunkara commented Feb 7, 2021

sunfishcode commented Feb 7, 2021

pksunkara commented Feb 7, 2021

sunfishcode commented Feb 7, 2021

imp: Make `clap_derive` call `FromStr::from_str` only once per argument. #2206

imp: Make `clap_derive` call `FromStr::from_str` only once per argument. #2206