Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small performance enhancements for cut: bytes cutting #357

Merged
merged 1 commit into from
Jul 15, 2014

Conversation

rolfmorel
Copy link
Contributor

Several small performance enhancements.

For very large, or infinite ranges, uutils cut bytes cutting is now 3 times faster than GNU cut.

$ ranges=1-
$ perf stat -r 10 build/cut -b $ranges `find /boot -type f -readable` >/tmp/uucut
...
0.095898887 seconds time elapsed ( +-  0.14% )
$ perf stat -r 10 cut -b $ranges `find /boot -type f -readable` >/tmp/gcut
...
0.308146576 seconds time elapsed ( +-  0.24% )

$ ranges=4-6,3-8,20-30,15-25,23,6
$ perf stat -r 10 build/cut -b $ranges `find /boot -type f -readable` >/tmp/uucut
...
0.102587877 seconds time elapsed ( +-  0.07% )
$ perf stat -r 10 cut -b $ranges `find /boot -type f -readable` >/tmp/gcut
...
0.179558780 seconds time elapsed ( +-  0.06% )

$ ranges=-4097
$ perf stat -r 10 build/cut -b $ranges `find /boot -type f -readable` >/tmp/uucut
...
0.095640648 seconds time elapsed ( +-  0.13% )
$ perf stat -r 10 cut -b $ranges `find /boot -type f -readable` >/tmp/gcut
...
0.305291892 seconds time elapsed ( +-  0.24% )

For relative performance improvement see issue #345.

Get rid of half filled heuristic and use unsafe array indexing because
the indices should always be correct.
@rolfmorel rolfmorel changed the title Small performance enhancements for cut, bytes cutting Small performance enhancements for cut: bytes cutting Jul 15, 2014
@@ -73,7 +75,8 @@ impl<R: Reader> BufReader<R> {
if buffer_used == 0 { return bytes_consumed; }

for idx in range(self.start, self.end) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this loop be for byte in self.buffer.slice_from(self.start, self.end), and thus be safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did a benchmark of the different solutions (with -b 1):

  • 0.073622189 seconds for the current pull, which uses unsafe indexing.
  • 0.096472123 seconds for the current uutils/coreutils master, which uses safe indexing.
  • 0.093891360 seconds for for (idx, byte) in self.buffer.slice(self.start, self.end).iter().enumerate()

Forgoing the index of the newline byte is not possible as it is needed to update the start of the filled buffer.

The choice is between "safety" and all out performance. The safety guaranteed by the compiler is only to fail for invalid/out of bounds indices. This means it does an start <= end && end <= len assert for every byte and just crashes if it fails. This does nothing to guarantee that the indexing in your implementation of the algorithm isn't off by one, but comes with a large runtime cost.

If you rather not have unsafe blocks in the utilities then the buffer.slice().iter().enumerate() is a good solution.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's alright to use unsafe code so long as:

  1. The code is in actuality safe (e.g. won't go out of bounds).
  2. The code is noticeably faster.
  3. unsafe is used sparingly.

This seems to fit those requirements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, however

  1. The code is in actuality safe (e.g. won't go out of bounds).

actually requires verification, and easily can be verified incorrectly: human error is killer.

For this code, the correct way to verify is ensuring that self.start and self.end are always in bounds. That's not obvious, you have to go and track down/follow all uses of the self.start/self.end through out this whole module, as they're not local to this function. (And is even worse below, where you need to be tracking max_segment_len too.)

Furthermore, there's a pile of rules that need to be upheld inside unsafe: breaking any of them is undefined behaviour and can lead to an arbitrarily broken program. unsafe should be your absolute last resort when optimising a program.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, I think rust-lang/rust#11751 may be relevant and will likely make my suggested iterator-based code faster.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

human error is killer

Yep, that's why we have Rust. :)

unsafe should be your absolute last resort when optimising a program.

I realize that. However, sometimes it is necessary to drop down to unsafe code to get as much speed as possible.

In particular, I think rust-lang/rust#11751 may be relevant and will likely make my suggested iterator-based code faster.

Hopefully this is the case. We can then switch to using the iterator-based code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's the risk of optimising prematurely for issues that should really be fixed upstream; leaving coreutils with a significant amount of unsafe code, even after Rust has fixed the issues, allowing equally/more performant safe code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point (probably soon after Rust hits 1.0) I'm planning to check all of the unsafes in the codebase to make sure that they're really necessary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see your point, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huonw I agree that using unsafe here may be premature. I will keep an eye on rust-lang/llvm#14 and when it is integrated into rustc I will check the performance again, and hopefully switch to the iterator.

Arcterus added a commit that referenced this pull request Jul 15, 2014
Small performance enhancements for cut: bytes cutting
@Arcterus Arcterus merged commit e8780b8 into uutils:master Jul 15, 2014
jbcrail pushed a commit to jbcrail/coreutils that referenced this pull request Apr 29, 2015
Small performance enhancements for cut: bytes cutting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants