Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for starting suffix numbers #3976

Merged
merged 1 commit into from
Oct 7, 2022

Conversation

andrewbaptist
Copy link
Contributor

This commit now allows split to pass split/numeric.sh

Copy link
Member

@tertsdiepraam tertsdiepraam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! One nit about the for loop. I agree that it might not be slow, but it still feels wrong. It's too bad that Number does not already have some convenient method of this.

Number::DynamicWidth(DynamicWidthNumber::new(radix))
} else {
Number::FixedWidth(FixedWidthNumber::new(radix, suffix_length))
};
FilenameIterator {
// Not very efficient, but typically the suffixes are small.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this is a bit too hacky for my taste :) Maybe you could implement a set_number/set_digits/something_else method on Number? Or add an add method alongside increment that takes an argument?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another alternative is to add a function like DynamicWidthNumber::starting_at(radix, suffix_start) (and FixedWidthNumber::starting_at(radix, suffix_length, suffix_start)) to encapsulate the functionality in this loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - a little more refactoring than expected, but the numeric class is much nicer now.

Copy link
Collaborator

@jfinkels jfinkels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

assert_eq!(it.nth(10 * 9 - 1).unwrap(), "chunk_89.txt");
assert_eq!(it.next().unwrap(), "chunk_9000.txt");
assert_eq!(it.next().unwrap(), "chunk_9001.txt");
}

#[test]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good tests. Could you also write one in the tests/by-util/test_split.rs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Number::DynamicWidth(DynamicWidthNumber::new(radix))
} else {
Number::FixedWidth(FixedWidthNumber::new(radix, suffix_length))
};
FilenameIterator {
// Not very efficient, but typically the suffixes are small.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another alternative is to add a function like DynamicWidthNumber::starting_at(radix, suffix_start) (and FixedWidthNumber::starting_at(radix, suffix_length, suffix_start)) to encapsulate the functionality in this loop

@andrewbaptist andrewbaptist force-pushed the implement_suffix_start branch 3 times, most recently from 608289d to bbf3fc4 Compare September 26, 2022 15:54
Copy link
Member

@tertsdiepraam tertsdiepraam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, turns out --suffix-length always uses fixed width. So we can simplify this a bit, to only have the initial number for fixed width.

Comment on lines 437 to 440
assert_eq!(format!("{}", num(10 * 9)), "90");
assert_eq!(format!("{}", num(10 * 9 + 1)), "91");
assert_eq!(format!("{}", num(10 * 99 - 1)), "989");
assert_eq!(format!("{}", num(10 * 99)), "990");
assert_eq!(format!("{}", num(10 * 99 + 1)), "991");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually very intentional behaviour (and matched GNU), implemented in #2859. Consider what happens to the filename sorting in your implementation:

some_file_98.txt
some_file_99.txt
some_file_100.txt
# gets sorted into this (due to lexical order of strings):
some_file_100.txt
some_file_98.txt
some_file_99.txt

The current algorithm, however, keeps 9s at the start to ensure being sorted after earlier files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so looking at this again I made an important discovery for this PR. With the --numeric-suffix option, the dynamic number is never used. It's always fixed width with a default width of 2. Here is an example:

$ split Cargo.lock Foo_ -l 1 --numeric-suffixes=80
split: output file suffixes exhausted
[ creates Foo_80 to Foo_99 ]
$ split Cargo.lock Foo_ -l 1 --numeric-suffixes=80 --suffix-length=4 
[ creates Foo_80 to Foo_3376 ]
$ split Cargo.lock Foo_ -l 1 -d          # warning creates a lot of files
[ creates Foo_00 up to Foo_992124 (which is 3307 files) ]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - I'll take another pass at this later this week. It makes more sense now how it works, but it is a little crazy!

@andrewbaptist andrewbaptist force-pushed the implement_suffix_start branch 2 times, most recently from c0738a5 to 802035d Compare September 27, 2022 22:07
@sylvestre
Copy link
Contributor

sorry but needs a small rebase

@andrewbaptist andrewbaptist force-pushed the implement_suffix_start branch from 802035d to ca165bf Compare October 4, 2022 00:23
@sylvestre
Copy link
Contributor

fails on:


---- test_split::test_numeric_dynamic_suffix_length stderr ----
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }', tests/common/util.rs:542:37
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    test_split::test_alphabetic_dynamic_suffix_length
    test_split::test_hex_dynamic_suffix_length
    test_split::test_numeric_dynamic_suffix_length

@sylvestre sylvestre force-pushed the implement_suffix_start branch from ca165bf to ddc8609 Compare October 5, 2022 11:31
@andrewbaptist
Copy link
Contributor Author

I plan to finish this today. I have a easy way to compute the names without counting.

@andrewbaptist andrewbaptist force-pushed the implement_suffix_start branch 2 times, most recently from 911235e to e2b3a3e Compare October 5, 2022 13:38
This commit now allows split to pass split/numeric.sh
@andrewbaptist andrewbaptist force-pushed the implement_suffix_start branch from e2b3a3e to 49e1cc6 Compare October 5, 2022 13:52
@andrewbaptist
Copy link
Contributor Author

OK - I think it is ready for review again. I haven't compared the output with the output from gnu split completely yet, but I think the current behavior is much better. The "initial" value for dynamic is a little confusing and could possibly be improved upon.

@andrewbaptist
Copy link
Contributor Author

Let me know if you need anything more changes on this PR.

@sylvestre
Copy link
Contributor

terrific:
Warning: Congrats! The gnu test tests/split/numeric is no longer failing!

@sylvestre sylvestre merged commit 97dd482 into uutils:main Oct 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants