Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split: better handle numeric and hex suffixes, short and long, with and without values #5198

Merged
merged 9 commits into from
Aug 28, 2023

Conversation

zhitkoff
Copy link
Contributor

Short variants of numeric and hex suffixes now act as flags with value forced to 0
Long variants work without value (default 0) as well as with value (equal sign is enforced)
Any combination of numeric and/or hex suffixes in any order is allowed with last one winning
Short variants can be combined together like split -dxen 4 file
Tests added to cover changes and verified against GNU split to match behavior

Fixes #5171

@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail-2/inotify-dir-recreate

Copy link
Member

@tertsdiepraam tertsdiepraam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I think this can be cleaned up a bit more by offloading the overriding of the arguments to clap.

.default_missing_value("0")
.num_args(0..=1)
.overrides_with(OPT_NUMERIC_SUFFIXES)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each argument should also override with the corresponding option right? Maybe we could also add some tests for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, tests are actually covering that, but will refactor with .overrides_with_all() instead

Ok("0".to_string()) // force value to "0"
})
.overrides_with(OPT_NUMERIC_SUFFIXES_SHORT)
.help("use numeric suffixes instead of alphabetic"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's nice to say something like "like --numeric-suffixes but does not accept an argument" to explain why there are two arguments that do the same thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understood the suggestion, could you please exand on that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion is to put a bit of text in the help string saying that it's the same as --numeric-suffixes. If I wouldn't know about these arguments and I would see two arguments with the same description in --help, I'd be a bit confused. Does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, that makes sense, thanks. Will update it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed exact same order and wording as in GNU split help:
-d use numeric suffixes starting at 0, not alphabetic

--numeric-suffixes[=FROM]
same as -d, but allow setting the start value

-x use hex suffixes starting at 0, not alphabetic

--hex-suffixes[=FROM]
same as -x, but allow setting the start value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!

Comment on lines 160 to 162
.value_parser(|_: &str| -> Result<String,clap::Error> {
Ok("0".to_string()) // force value to "0"
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels convoluted. We have have to combine the arguments anyway, so it would suffice for this to just be a flag without any value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well ... it is a flag as far as behaviour goes (and as far as clap works), but it does produce a value. If -d or -x is used , the value must be specifically set to 0 for numeric or hex suffix correspondingly. It can be set here with value_parser() or harcoded later on when suffixes are processed. Would it be better to have all defaults in the same place as arg definitions?

Copy link
Member

@tertsdiepraam tertsdiepraam Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where you're coming from. It's an interesting idea to try to make all the flags sort of the same in the code. However, I think I still prefer hardcoding them in the processing of the flags. The thing that I find confusing is that if it is a flag, we should use the functions that are appropriate for flags if that makes sense. value_parser is usually used for parsing user generated values, which we don't have here, so hardcoding the 0 gives us the ability to skip the parsing.

So, for instance, I'd write the match expression to combine the flags something more like this:

 match (
    matches.get_one::<String>(OPT_NUMERIC_SUFFIXES),
    matches.get_one::<String>(OPT_HEX_SUFFIXES),
    matches.get_flag(OPT_NUMERIC_SUFFIXES_SHORT),
    matches.get_flag(OPT_HEX_SUFFIXES_SHORT),
) {
    (Some(v), _, _, _) => parse_num_suffix(matches, v),
    (_, Some(v), _, _) => parse_hex_suffix(matches, v),
    (_, _, true, _) => Ok((SuffixType::Decimal, 0)),
    (_, _, _, true) => Ok((SuffixType::Hexadecimal, 0)),
    _ => Ok((SuffixType::Alphabetic, 0)), // no numeric/hex suffix, using default alphabetic
}

(I forgot the exact semantics of those clap functions. Maybe the value source check is necessary as well)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored to remove arg .value_parser() and hardcoded default 0 in the fn suffix_type_from()

Comment on lines 453 to 454
// Compare suffixes by the order specified in command line
// last one (highest index) is effective, all others are ignored
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's much easier to make all the arguments override each other :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, refactoring with overrides_with_all() in args definitions

Copy link
Member

@tertsdiepraam tertsdiepraam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! It's looking pretty nice already. I just have a few more small suggestions/questions.

.default_missing_value("0")
.num_args(0..=1)
.help("use numeric suffixes instead of alphabetic"),
.overrides_with_all([OPT_NUMERIC_SUFFIXES,OPT_NUMERIC_SUFFIXES_SHORT,OPT_HEX_SUFFIXES,OPT_HEX_SUFFIXES_SHORT])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why rustfmt doesn't pick this up? There's probably some string in this builder that makes rustfmt give up on the whole expression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep if you break up this line:

"write to shell COMMAND; file name is $FILE (Currently not implemented for Windows)",

into

                .help(
                    "write to shell COMMAND; file name is $FILE \
                    (Currently not implemented for Windows)",
                ),

rustfmt will kick in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting ... unfortunately for this specific line rustfmt does not do anything regardless how you break it up, so I will format it by hand

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is indeed what I meant. If you break it up by hand rustfmt will start working again on the other lines.

Comment on lines 438 to 441
matches.value_source(OPT_NUMERIC_SUFFIXES) == Some(ValueSource::CommandLine),
matches.value_source(OPT_NUMERIC_SUFFIXES_SHORT) == Some(ValueSource::CommandLine),
matches.value_source(OPT_HEX_SUFFIXES) == Some(ValueSource::CommandLine),
matches.value_source(OPT_HEX_SUFFIXES_SHORT) == Some(ValueSource::CommandLine),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid value_source? If I recall correctly, we only have to use that if there is some default value, but I don't think these arguments have one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah ... considering they are all set to override each other and no default_value, can refactor

Comment on lines 434 to 436
// Note: right now, this exact behavior cannot be handled by
// `ArgGroup` since `ArgGroup` considers a default value `Arg`
// as "defined".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this. What do you mean with that last part?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, my bad - copy/pasted from other section to start with and did not remove. will take it out

#[test]
fn test_hex_suffix_no_value() {
let (at, mut ucmd) = at_and_ucmd!();
ucmd.args(&["-n", "4", "--hex-suffixes", "threebytes.txt"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should test with a larger file here so that we can verify that hex is used (same for the numeric suffix tests).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, will do

@zhitkoff
Copy link
Contributor Author

Nice! It's looking pretty nice already. I just have a few more small suggestions/questions.

Thanks! One question : considering that changes for this pull request addressing issue #5171 modify the same two files as the changes for fixing issue #5033 (handling obsolete lines call as in split -1 in) and both dealing with processing arguments - would you prefer that I merge those changes into this PR or have it separate?

@tertsdiepraam
Copy link
Member

Separate would be best I think. This PR is fairly simple. The other one will be a lot more complicated. So let's get this merged and then move on to the other one.

@zhitkoff
Copy link
Contributor Author

anything else that you think should be included and/or changed to finalize this PR?

Copy link
Member

@tertsdiepraam tertsdiepraam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! I love it! Thanks!

@tertsdiepraam tertsdiepraam merged commit 1eae064 into uutils:main Aug 28, 2023
@zhitkoff zhitkoff deleted the zhitkoff/issue5171 branch November 18, 2023 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

split should ignore values for -d and -x and support --numeric-suffixes and --hex-suffixes without values
2 participants