split: use iterator to produce filenames #2868

jfinkels · 2022-01-15T00:03:55Z

This pull request replaces the FilenameFactory introduced in pull request #2859 with FilenameIterator and calls to FilenameFactory::make() with calls to FilenameIterator::next(). We did not need the fully generality of being able to produce the filename for an arbitrary chunk index. Instead we need only iterate over filenames one after another. This allows for a less mathematically dense algorithm that is easier to understand and maintain. Furthermore, it can be connected to some familiar concepts from the representation of numbers as a sequence of digits.

The most important part of this code is in the FixedWidthNumber::increment() and DynamicWidthNumber::increment() methods, and in the Display implementation for each of those structs.

This does not change the behavior of the split program, just the implementation of how filenames are produced.

I've adapted the algorithm suggested by @tertsdiepraam here: #2859 (comment) and set the Co-authored-by: line in the commit message accordingly. Thanks a lot for the suggestion!

jfinkels · 2022-01-15T00:05:11Z

src/uu/split/src/filenames.rs

+            self.number.increment().ok()?;
+        }
+        Some(format!("{}{}{}", prefix, self.number, suffix))


The FilenameIterator repeatedly increments a counter stored in self.number and formats it into a string with a specified prefix and suffix.

jfinkels · 2022-01-15T00:06:46Z

src/uu/split/src/number.rs

+    /// number would require more digits than are available with the
+    /// specified width, then this method returns [`Err(Overflow)`].
+    fn increment(&mut self) -> Result<(), Overflow> {
+        for i in (0..self.digits.len()).rev() {


To increment the number, add 1 to each digit from least significant digit to most significant digit, carrying the 1 if necessary. An overflow results in an error for FixedWidthNumber.

jfinkels · 2022-01-15T00:07:49Z

src/uu/split/src/number.rs

+            }
+        }
+
+        // If the most significant digit is at its maximum value, then


For DynamicWidthNumber, an overflow results in resetting the digits to zero and increasing the number of digits by one.

jfinkels · 2022-01-15T00:08:43Z

src/uu/split/src/number.rs

+                let digits: String = self.digits.iter().map(|d| (b'a' + d) as char).collect();
+                write!(
+                    f,
+                    "{empty:z<num_fill_chars$}{digits}",


For example, the number whose digits are vec![1, 2, 3] gets displayed as "zbcd".

tertsdiepraam

You went above and beyond with this. If only the entire codebase was this well made, tested and documented. It's fantastic! Thank you!

sylvestre · 2022-01-15T09:57:10Z

indeed but any idea why we are regressing with the gnu testsuite ? :)

tertsdiepraam · 2022-01-15T10:36:56Z

That is indeed strange, it seems to be the rm2 test, which looks to have nothing to do with split:

GNU test failed: tests/rm/rm2. tests/rm/rm2 is passing on 'master'.

And this branch is not behind master, so I'm not sure what's going wrong. It might have something to do with the fact that the CI upgraded to Rust 1.58.

Edit: I really have no clue why this happened. The output for 1.58 and 1.57 for rm is identical for the cases that rm2 tests. Even more strange, it is different from the expected output from the test in both, so it shouldn't have passed in the first place?

tertsdiepraam · 2022-01-15T11:11:27Z

Don't worry about the clippy lints, I fixed them in #2869

jfinkels · 2022-01-15T16:00:15Z

The same issue with tests/rm/rm2.sh in the GNU test comparison appears in other recent pull requests. I'm afraid I don't know why that's happening. If it helps you to diagnose, I ran bash util/run-gnu-test.sh tests/rm/rm2.sh on the master branch and I got a test failure.

tertsdiepraam · 2022-01-30T12:43:24Z

Hi! Could you fix the conflicts?

Replace the `FilenameFactory` with `FilenameIterator` and calls to `FilenameFactory::make()` with calls to `FilenameIterator::next()`. We did not need the fully generality of being able to produce the filename for an arbitrary chunk index. Instead we need only iterate over filenames one after another. This allows for a less mathematically dense algorithm that is easier to understand and maintain. Furthermore, it can be connected to some familiar concepts from the representation of numbers as a sequence of digits. This does not change the behavior of the `split` program, just the implementation of how filenames are produced. Co-authored-by: Terts Diepraam <[email protected]>

tertsdiepraam · 2022-01-30T21:37:43Z

Thanks!

jfinkels commented Jan 15, 2022

View reviewed changes

tertsdiepraam approved these changes Jan 15, 2022

View reviewed changes

jfinkels force-pushed the split-filename-iterator branch 2 times, most recently from faf85df to 98d142b Compare January 17, 2022 14:01

jfinkels force-pushed the split-filename-iterator branch 2 times, most recently from 7b560bc to d1875e6 Compare January 29, 2022 16:31

jfinkels force-pushed the split-filename-iterator branch from d1875e6 to a5b435d Compare January 30, 2022 16:19

tertsdiepraam merged commit 7b3cfcf into uutils:main Jan 30, 2022

jfinkels deleted the split-filename-iterator branch February 12, 2022 19:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split: use iterator to produce filenames #2868

split: use iterator to produce filenames #2868

jfinkels commented Jan 15, 2022

jfinkels Jan 15, 2022

jfinkels Jan 15, 2022

jfinkels Jan 15, 2022

jfinkels Jan 15, 2022

tertsdiepraam left a comment

sylvestre commented Jan 15, 2022

tertsdiepraam commented Jan 15, 2022 •

edited

Loading

tertsdiepraam commented Jan 15, 2022

jfinkels commented Jan 15, 2022

tertsdiepraam commented Jan 30, 2022

tertsdiepraam commented Jan 30, 2022

split: use iterator to produce filenames #2868

split: use iterator to produce filenames #2868

Conversation

jfinkels commented Jan 15, 2022

jfinkels Jan 15, 2022

Choose a reason for hiding this comment

jfinkels Jan 15, 2022

Choose a reason for hiding this comment

jfinkels Jan 15, 2022

Choose a reason for hiding this comment

jfinkels Jan 15, 2022

Choose a reason for hiding this comment

tertsdiepraam left a comment

Choose a reason for hiding this comment

sylvestre commented Jan 15, 2022

tertsdiepraam commented Jan 15, 2022 • edited Loading

tertsdiepraam commented Jan 15, 2022

jfinkels commented Jan 15, 2022

tertsdiepraam commented Jan 30, 2022

tertsdiepraam commented Jan 30, 2022

tertsdiepraam commented Jan 15, 2022 •

edited

Loading