Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split: better handle numeric and hex suffixes, short and long, with and without values #5198

Merged
merged 9 commits into from
Aug 28, 2023
85 changes: 71 additions & 14 deletions src/uu/split/src/split.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
// * For the full copyright and license information, please view the LICENSE
// * file that was distributed with this source code.

// spell-checker:ignore nbbbb ncccc
// spell-checker:ignore nbbbb ncccc hexdigit

mod filenames;
mod number;
Expand Down Expand Up @@ -34,7 +34,9 @@ static OPT_ADDITIONAL_SUFFIX: &str = "additional-suffix";
static OPT_FILTER: &str = "filter";
static OPT_NUMBER: &str = "number";
static OPT_NUMERIC_SUFFIXES: &str = "numeric-suffixes";
static OPT_NUMERIC_SUFFIXES_SHORT: &str = "-d";
static OPT_HEX_SUFFIXES: &str = "hex-suffixes";
static OPT_HEX_SUFFIXES_SHORT: &str = "-x";
static OPT_SUFFIX_LENGTH: &str = "suffix-length";
static OPT_DEFAULT_SUFFIX_LENGTH: &str = "0";
static OPT_VERBOSE: &str = "verbose";
Expand Down Expand Up @@ -125,28 +127,50 @@ pub fn uu_app() -> Command {
)
.arg(
Arg::new(OPT_NUMERIC_SUFFIXES)
.short('d')
.long(OPT_NUMERIC_SUFFIXES)
.require_equals(true)
.default_missing_value("0")
.num_args(0..=1)
.overrides_with(OPT_NUMERIC_SUFFIXES)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each argument should also override with the corresponding option right? Maybe we could also add some tests for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, tests are actually covering that, but will refactor with .overrides_with_all() instead

.help("use numeric suffixes instead of alphabetic"),
)
.arg(
Arg::new(OPT_SUFFIX_LENGTH)
.short('a')
.long(OPT_SUFFIX_LENGTH)
.value_name("N")
.default_value(OPT_DEFAULT_SUFFIX_LENGTH)
.help("use suffixes of fixed length N. 0 implies dynamic length."),
Arg::new(OPT_NUMERIC_SUFFIXES_SHORT)
.short('d')
.action(clap::ArgAction::SetTrue)
.value_parser(|_: &str| -> Result<String,clap::Error> {
Ok("0".to_string()) // force value to "0"
})
.overrides_with(OPT_NUMERIC_SUFFIXES_SHORT)
.help("use numeric suffixes instead of alphabetic"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's nice to say something like "like --numeric-suffixes but does not accept an argument" to explain why there are two arguments that do the same thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understood the suggestion, could you please exand on that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion is to put a bit of text in the help string saying that it's the same as --numeric-suffixes. If I wouldn't know about these arguments and I would see two arguments with the same description in --help, I'd be a bit confused. Does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, that makes sense, thanks. Will update it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed exact same order and wording as in GNU split help:
-d use numeric suffixes starting at 0, not alphabetic

--numeric-suffixes[=FROM]
same as -d, but allow setting the start value

-x use hex suffixes starting at 0, not alphabetic

--hex-suffixes[=FROM]
same as -x, but allow setting the start value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!

)
.arg(
Arg::new(OPT_HEX_SUFFIXES)
.short('x')
.long(OPT_HEX_SUFFIXES)
.default_missing_value("0")
.require_equals(true)
.num_args(0..=1)
.overrides_with(OPT_HEX_SUFFIXES)
.help("use hex suffixes instead of alphabetic"),
)
.arg(
Arg::new(OPT_HEX_SUFFIXES_SHORT)
.short('x')
.action(clap::ArgAction::SetTrue)
.value_parser(|_: &str| -> Result<String,clap::Error> {
Ok("0".to_string()) // force value to "0"
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels convoluted. We have have to combine the arguments anyway, so it would suffice for this to just be a flag without any value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well ... it is a flag as far as behaviour goes (and as far as clap works), but it does produce a value. If -d or -x is used , the value must be specifically set to 0 for numeric or hex suffix correspondingly. It can be set here with value_parser() or harcoded later on when suffixes are processed. Would it be better to have all defaults in the same place as arg definitions?

Copy link
Member

@tertsdiepraam tertsdiepraam Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where you're coming from. It's an interesting idea to try to make all the flags sort of the same in the code. However, I think I still prefer hardcoding them in the processing of the flags. The thing that I find confusing is that if it is a flag, we should use the functions that are appropriate for flags if that makes sense. value_parser is usually used for parsing user generated values, which we don't have here, so hardcoding the 0 gives us the ability to skip the parsing.

So, for instance, I'd write the match expression to combine the flags something more like this:

 match (
    matches.get_one::<String>(OPT_NUMERIC_SUFFIXES),
    matches.get_one::<String>(OPT_HEX_SUFFIXES),
    matches.get_flag(OPT_NUMERIC_SUFFIXES_SHORT),
    matches.get_flag(OPT_HEX_SUFFIXES_SHORT),
) {
    (Some(v), _, _, _) => parse_num_suffix(matches, v),
    (_, Some(v), _, _) => parse_hex_suffix(matches, v),
    (_, _, true, _) => Ok((SuffixType::Decimal, 0)),
    (_, _, _, true) => Ok((SuffixType::Hexadecimal, 0)),
    _ => Ok((SuffixType::Alphabetic, 0)), // no numeric/hex suffix, using default alphabetic
}

(I forgot the exact semantics of those clap functions. Maybe the value source check is necessary as well)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored to remove arg .value_parser() and hardcoded default 0 in the fn suffix_type_from()

.overrides_with(OPT_HEX_SUFFIXES_SHORT)
.help("use hex suffixes instead of alphabetic"),
)
.arg(
Arg::new(OPT_SUFFIX_LENGTH)
.short('a')
.long(OPT_SUFFIX_LENGTH)
.value_name("N")
.default_value(OPT_DEFAULT_SUFFIX_LENGTH)
.help("use suffixes of fixed length N. 0 implies dynamic length."),
)
.arg(
Arg::new(OPT_VERBOSE)
.long(OPT_VERBOSE)
Expand Down Expand Up @@ -411,15 +435,48 @@ impl Strategy {

/// Parse the suffix type from the command-line arguments.
fn suffix_type_from(matches: &ArgMatches) -> Result<(SuffixType, usize), SettingsError> {
if matches.value_source(OPT_NUMERIC_SUFFIXES) == Some(ValueSource::CommandLine) {
let suffix_start = matches.get_one::<String>(OPT_NUMERIC_SUFFIXES);
// Collect provided suffixes (if any)
// Any combination is allowed
let mut suffixes = vec![];
for suffix_id in [
OPT_NUMERIC_SUFFIXES,
OPT_NUMERIC_SUFFIXES_SHORT,
OPT_HEX_SUFFIXES,
OPT_HEX_SUFFIXES_SHORT,
]
.into_iter()
{
if matches.value_source(suffix_id) == Some(ValueSource::CommandLine) {
suffixes.push((matches.index_of(suffix_id), suffix_id));
}
}
// Compare suffixes by the order specified in command line
// last one (highest index) is effective, all others are ignored
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's much easier to make all the arguments override each other :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, refactoring with overrides_with_all() in args definitions

let mut effective_suffix_index = 0;
let mut effective_suffix_id = "";
for suffix in suffixes.into_iter() {
let (suffix_index, suffix_id) = suffix;
if let Some(i) = suffix_index {
if i >= effective_suffix_index {
effective_suffix_index = i;
effective_suffix_id = suffix_id;
};
}
}

if effective_suffix_id == OPT_NUMERIC_SUFFIXES
|| effective_suffix_id == OPT_NUMERIC_SUFFIXES_SHORT
{
let suffix_start = matches.get_one::<String>(effective_suffix_id);
let suffix_start = suffix_start.ok_or(SettingsError::SuffixNotParsable(String::new()))?;
let suffix_start = suffix_start
.parse()
.parse::<usize>()
.map_err(|_| SettingsError::SuffixNotParsable(suffix_start.to_string()))?;
Ok((SuffixType::Decimal, suffix_start))
} else if matches.value_source(OPT_HEX_SUFFIXES) == Some(ValueSource::CommandLine) {
let suffix_start = matches.get_one::<String>(OPT_HEX_SUFFIXES);
} else if effective_suffix_id == OPT_HEX_SUFFIXES
|| effective_suffix_id == OPT_HEX_SUFFIXES_SHORT
{
let suffix_start = matches.get_one::<String>(effective_suffix_id);
let suffix_start = suffix_start.ok_or(SettingsError::SuffixNotParsable(String::new()))?;
let suffix_start = usize::from_str_radix(suffix_start, 16)
.map_err(|_| SettingsError::SuffixNotParsable(suffix_start.to_string()))?;
Expand Down
158 changes: 155 additions & 3 deletions tests/by-util/test_split.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
// *
// * For the full copyright and license information, please view the LICENSE
// * file that was distributed with this source code.
// spell-checker:ignore xzaaa sixhundredfiftyonebytes ninetyonebytes threebytes asciilowercase fghij klmno pqrst uvwxyz fivelines twohundredfortyonebytes onehundredlines nbbbb
// spell-checker:ignore xzaaa sixhundredfiftyonebytes ninetyonebytes threebytes asciilowercase fghij klmno pqrst uvwxyz fivelines twohundredfortyonebytes onehundredlines nbbbb dxen

use crate::common::util::{AtPath, TestScenario};
use rand::{thread_rng, Rng, SeedableRng};
Expand Down Expand Up @@ -715,7 +715,7 @@ fn test_multiple_of_input_chunk() {
#[test]
fn test_numeric_suffix() {
let (at, mut ucmd) = at_and_ucmd!();
ucmd.args(&["-n", "4", "--numeric-suffixes", "9", "threebytes.txt"])
ucmd.args(&["-n", "4", "--numeric-suffixes=9", "threebytes.txt"])
.succeeds()
.no_stdout()
.no_stderr();
Expand All @@ -728,7 +728,7 @@ fn test_numeric_suffix() {
#[test]
fn test_hex_suffix() {
let (at, mut ucmd) = at_and_ucmd!();
ucmd.args(&["-n", "4", "--hex-suffixes", "9", "threebytes.txt"])
ucmd.args(&["-n", "4", "--hex-suffixes=9", "threebytes.txt"])
.succeeds()
.no_stdout()
.no_stderr();
Expand All @@ -738,6 +738,158 @@ fn test_hex_suffix() {
assert_eq!(at.read("x0c"), "");
}

#[test]
fn test_numeric_suffix_no_equal() {
new_ucmd!()
.args(&["-n", "4", "--numeric-suffixes", "9", "threebytes.txt"])
.fails()
.stderr_contains("split: cannot open '9' for reading: No such file or directory");
}

#[test]
fn test_hex_suffix_no_equal() {
new_ucmd!()
.args(&["-n", "4", "--hex-suffixes", "9", "threebytes.txt"])
.fails()
.stderr_contains("split: cannot open '9' for reading: No such file or directory");
}

/// Test for short numeric suffix not having any value
#[test]
fn test_short_numeric_suffix_no_value() {
let (at, mut ucmd) = at_and_ucmd!();
ucmd.args(&["-n", "4", "-d", "threebytes.txt"])
.succeeds()
.no_stdout()
.no_stderr();
assert_eq!(at.read("x00"), "a");
assert_eq!(at.read("x01"), "b");
assert_eq!(at.read("x02"), "c");
assert_eq!(at.read("x03"), "");
}

/// Test for long numeric suffix not having any value
#[test]
fn test_numeric_suffix_no_value() {
let (at, mut ucmd) = at_and_ucmd!();
ucmd.args(&["-n", "4", "--numeric-suffixes", "threebytes.txt"])
.succeeds()
.no_stdout()
.no_stderr();
assert_eq!(at.read("x00"), "a");
assert_eq!(at.read("x01"), "b");
assert_eq!(at.read("x02"), "c");
assert_eq!(at.read("x03"), "");
}

/// Test for short hex suffix not having any value
#[test]
fn test_short_hex_suffix_no_value() {
let (at, mut ucmd) = at_and_ucmd!();
ucmd.args(&["-n", "4", "-x", "threebytes.txt"])
.succeeds()
.no_stdout()
.no_stderr();
assert_eq!(at.read("x00"), "a");
assert_eq!(at.read("x01"), "b");
assert_eq!(at.read("x02"), "c");
assert_eq!(at.read("x03"), "");
}

/// Test for long hex suffix not having any value
#[test]
fn test_hex_suffix_no_value() {
let (at, mut ucmd) = at_and_ucmd!();
ucmd.args(&["-n", "4", "--hex-suffixes", "threebytes.txt"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should test with a larger file here so that we can verify that hex is used (same for the numeric suffix tests).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, will do

.succeeds()
.no_stdout()
.no_stderr();
assert_eq!(at.read("x00"), "a");
assert_eq!(at.read("x01"), "b");
assert_eq!(at.read("x02"), "c");
assert_eq!(at.read("x03"), "");
}

/// Test for short numeric suffix having value provided after space - should fail
#[test]
fn test_short_numeric_suffix_with_value_spaced() {
new_ucmd!()
.args(&["-n", "4", "-d", "9", "threebytes.txt"])
.fails()
.stderr_contains("split: cannot open '9' for reading: No such file or directory");
}

/// Test for short numeric suffix having value provided after space - should fail
#[test]
fn test_short_hex_suffix_with_value_spaced() {
new_ucmd!()
.args(&["-n", "4", "-x", "9", "threebytes.txt"])
.fails()
.stderr_contains("split: cannot open '9' for reading: No such file or directory");
}

/// Test for some combined short options
#[test]
fn test_short_combination() {
let (at, mut ucmd) = at_and_ucmd!();
ucmd.args(&["-dxen", "4", "threebytes.txt"])
.succeeds()
.no_stdout()
.no_stderr();
assert_eq!(at.read("x00"), "a");
assert_eq!(at.read("x01"), "b");
assert_eq!(at.read("x02"), "c");
assert_eq!(at.file_exists("x03"), false);
}

/// Test for the last effective suffix, ignoring all others - numeric long last
/// Any combination of short and long (as well as duplicates) should be allowed
#[test]
fn test_effective_suffix_numeric_last() {
let (at, mut ucmd) = at_and_ucmd!();
ucmd.args(&[
"-n",
"4",
"--numeric-suffixes=7",
"--hex-suffixes=4",
"-d",
"-x",
"--numeric-suffixes=9",
"threebytes.txt",
])
.succeeds()
.no_stdout()
.no_stderr();
assert_eq!(at.read("x09"), "a");
assert_eq!(at.read("x10"), "b");
assert_eq!(at.read("x11"), "c");
assert_eq!(at.read("x12"), "");
}

/// Test for the last effective suffix, ignoring all others - hex long last
/// Any combination of short and long (as well as duplicates) should be allowed
#[test]
fn test_effective_suffix_hex_last() {
let (at, mut ucmd) = at_and_ucmd!();
ucmd.args(&[
"-n",
"4",
"--hex-suffixes=7",
"--numeric-suffixes=4",
"-x",
"-d",
"--hex-suffixes=9",
"threebytes.txt",
])
.succeeds()
.no_stdout()
.no_stderr();
assert_eq!(at.read("x09"), "a");
assert_eq!(at.read("x0a"), "b");
assert_eq!(at.read("x0b"), "c");
assert_eq!(at.read("x0c"), "");
}

#[test]
fn test_round_robin() {
let (at, mut ucmd) = at_and_ucmd!();
Expand Down