Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split: use iterator to produce filenames #2868

Merged
merged 1 commit into from
Jan 30, 2022

Conversation

jfinkels
Copy link
Collaborator

This pull request replaces the FilenameFactory introduced in pull request #2859 with FilenameIterator and calls to FilenameFactory::make() with calls to FilenameIterator::next(). We did not need the fully generality of being able to produce the filename for an arbitrary chunk index. Instead we need only iterate over filenames one after another. This allows for a less mathematically dense algorithm that is easier to understand and maintain. Furthermore, it can be connected to some familiar concepts from the representation of numbers as a sequence of digits.

The most important part of this code is in the FixedWidthNumber::increment() and DynamicWidthNumber::increment() methods, and in the Display implementation for each of those structs.

This does not change the behavior of the split program, just the implementation of how filenames are produced.

I've adapted the algorithm suggested by @tertsdiepraam here: #2859 (comment) and set the Co-authored-by: line in the commit message accordingly. Thanks a lot for the suggestion!

Comment on lines 122 to 124
self.number.increment().ok()?;
}
Some(format!("{}{}{}", prefix, self.number, suffix))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FilenameIterator repeatedly increments a counter stored in self.number and formats it into a string with a specified prefix and suffix.

/// number would require more digits than are available with the
/// specified width, then this method returns [`Err(Overflow)`].
fn increment(&mut self) -> Result<(), Overflow> {
for i in (0..self.digits.len()).rev() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To increment the number, add 1 to each digit from least significant digit to most significant digit, carrying the 1 if necessary. An overflow results in an error for FixedWidthNumber.

}
}

// If the most significant digit is at its maximum value, then
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For DynamicWidthNumber, an overflow results in resetting the digits to zero and increasing the number of digits by one.

let digits: String = self.digits.iter().map(|d| (b'a' + d) as char).collect();
write!(
f,
"{empty:z<num_fill_chars$}{digits}",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, the number whose digits are vec![1, 2, 3] gets displayed as "zbcd".

Copy link
Member

@tertsdiepraam tertsdiepraam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You went above and beyond with this. If only the entire codebase was this well made, tested and documented. It's fantastic! Thank you!

@sylvestre
Copy link
Contributor

indeed but any idea why we are regressing with the gnu testsuite ? :)

@tertsdiepraam
Copy link
Member

tertsdiepraam commented Jan 15, 2022

That is indeed strange, it seems to be the rm2 test, which looks to have nothing to do with split:

GNU test failed: tests/rm/rm2. tests/rm/rm2 is passing on 'master'.

And this branch is not behind master, so I'm not sure what's going wrong. It might have something to do with the fact that the CI upgraded to Rust 1.58.

Edit: I really have no clue why this happened. The output for 1.58 and 1.57 for rm is identical for the cases that rm2 tests. Even more strange, it is different from the expected output from the test in both, so it shouldn't have passed in the first place?

@tertsdiepraam
Copy link
Member

Don't worry about the clippy lints, I fixed them in #2869

@jfinkels
Copy link
Collaborator Author

The same issue with tests/rm/rm2.sh in the GNU test comparison appears in other recent pull requests. I'm afraid I don't know why that's happening. If it helps you to diagnose, I ran bash util/run-gnu-test.sh tests/rm/rm2.sh on the master branch and I got a test failure.

@jfinkels jfinkels force-pushed the split-filename-iterator branch 2 times, most recently from faf85df to 98d142b Compare January 17, 2022 14:01
@jfinkels jfinkels force-pushed the split-filename-iterator branch 2 times, most recently from 7b560bc to d1875e6 Compare January 29, 2022 16:31
@tertsdiepraam
Copy link
Member

Hi! Could you fix the conflicts?

Replace the `FilenameFactory` with `FilenameIterator` and calls to
`FilenameFactory::make()` with calls to `FilenameIterator::next()`. We
did not need the fully generality of being able to produce the
filename for an arbitrary chunk index. Instead we need only iterate
over filenames one after another. This allows for a less
mathematically dense algorithm that is easier to understand and
maintain. Furthermore, it can be connected to some familiar concepts
from the representation of numbers as a sequence of digits.

This does not change the behavior of the `split` program, just the
implementation of how filenames are produced.

Co-authored-by: Terts Diepraam <[email protected]>
@jfinkels jfinkels force-pushed the split-filename-iterator branch from d1875e6 to a5b435d Compare January 30, 2022 16:19
@tertsdiepraam tertsdiepraam merged commit 7b3cfcf into uutils:main Jan 30, 2022
@tertsdiepraam
Copy link
Member

Thanks!

@jfinkels jfinkels deleted the split-filename-iterator branch February 12, 2022 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants