Small performance enhancements for cut: bytes cutting #357

rolfmorel · 2014-07-15T09:21:03Z

Several small performance enhancements.

For very large, or infinite ranges, uutils cut bytes cutting is now 3 times faster than GNU cut.

$ ranges=1-
$ perf stat -r 10 build/cut -b $ranges `find /boot -type f -readable` >/tmp/uucut
...
0.095898887 seconds time elapsed ( +-  0.14% )
$ perf stat -r 10 cut -b $ranges `find /boot -type f -readable` >/tmp/gcut
...
0.308146576 seconds time elapsed ( +-  0.24% )

$ ranges=4-6,3-8,20-30,15-25,23,6
$ perf stat -r 10 build/cut -b $ranges `find /boot -type f -readable` >/tmp/uucut
...
0.102587877 seconds time elapsed ( +-  0.07% )
$ perf stat -r 10 cut -b $ranges `find /boot -type f -readable` >/tmp/gcut
...
0.179558780 seconds time elapsed ( +-  0.06% )

$ ranges=-4097
$ perf stat -r 10 build/cut -b $ranges `find /boot -type f -readable` >/tmp/uucut
...
0.095640648 seconds time elapsed ( +-  0.13% )
$ perf stat -r 10 cut -b $ranges `find /boot -type f -readable` >/tmp/gcut
...
0.305291892 seconds time elapsed ( +-  0.24% )

For relative performance improvement see issue #345.

Get rid of half filled heuristic and use unsafe array indexing because the indices should always be correct.

huonw · 2014-07-15T09:38:07Z

cut/buffer.rs

@@ -73,7 +75,8 @@ impl<R: Reader> BufReader<R> {
            if buffer_used == 0 { return bytes_consumed; }

            for idx in range(self.start, self.end) {


Can this loop be for byte in self.buffer.slice_from(self.start, self.end), and thus be safe?

I just did a benchmark of the different solutions (with -b 1):

0.073622189 seconds for the current pull, which uses unsafe indexing.

0.096472123 seconds for the current uutils/coreutils master, which uses safe indexing.

0.093891360 seconds for for (idx, byte) in self.buffer.slice(self.start, self.end).iter().enumerate()

Forgoing the index of the newline byte is not possible as it is needed to update the start of the filled buffer.

The choice is between "safety" and all out performance. The safety guaranteed by the compiler is only to fail for invalid/out of bounds indices. This means it does an start <= end && end <= len assert for every byte and just crashes if it fails. This does nothing to guarantee that the indexing in your implementation of the algorithm isn't off by one, but comes with a large runtime cost.

If you rather not have unsafe blocks in the utilities then the buffer.slice().iter().enumerate() is a good solution.

I think it's alright to use unsafe code so long as:

The code is in actuality safe (e.g. won't go out of bounds).

The code is noticeably faster.

unsafe is used sparingly.

This seems to fit those requirements.

Sure, however

The code is in actuality safe (e.g. won't go out of bounds).

actually requires verification, and easily can be verified incorrectly: human error is killer.

For this code, the correct way to verify is ensuring that self.start and self.end are always in bounds. That's not obvious, you have to go and track down/follow all uses of the self.start/self.end through out this whole module, as they're not local to this function. (And is even worse below, where you need to be tracking max_segment_len too.)

Furthermore, there's a pile of rules that need to be upheld inside unsafe: breaking any of them is undefined behaviour and can lead to an arbitrarily broken program. unsafe should be your absolute last resort when optimising a program.

In particular, I think rust-lang/rust#11751 may be relevant and will likely make my suggested iterator-based code faster.

human error is killer

Yep, that's why we have Rust. :)

unsafe should be your absolute last resort when optimising a program.

I realize that. However, sometimes it is necessary to drop down to unsafe code to get as much speed as possible.

In particular, I think rust-lang/rust#11751 may be relevant and will likely make my suggested iterator-based code faster.

Hopefully this is the case. We can then switch to using the iterator-based code.

I think there's the risk of optimising prematurely for issues that should really be fixed upstream; leaving coreutils with a significant amount of unsafe code, even after Rust has fixed the issues, allowing equally/more performant safe code.

At some point (probably soon after Rust hits 1.0) I'm planning to check all of the unsafes in the codebase to make sure that they're really necessary.

I do see your point, though.

@huonw I agree that using unsafe here may be premature. I will keep an eye on rust-lang/llvm#14 and when it is integrated into rustc I will check the performance again, and hopefully switch to the iterator.

Small performance enhancements for cut: bytes cutting

Small performance enhancements

67a1631

Get rid of half filled heuristic and use unsafe array indexing because the indices should always be correct.

rolfmorel changed the title ~~Small performance enhancements for cut, bytes cutting~~ Small performance enhancements for cut: bytes cutting Jul 15, 2014

huonw reviewed Jul 15, 2014
View reviewed changes

Arcterus added a commit that referenced this pull request Jul 15, 2014

Merge pull request #357 from polyphemus/cut-bytes-rewrite

e8780b8

Small performance enhancements for cut: bytes cutting

Arcterus merged commit e8780b8 into uutils:master Jul 15, 2014

jbcrail pushed a commit to jbcrail/coreutils that referenced this pull request Apr 29, 2015

Merge pull request uutils#357 from polyphemus/cut-bytes-rewrite

311f647

Small performance enhancements for cut: bytes cutting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small performance enhancements for cut: bytes cutting #357

Small performance enhancements for cut: bytes cutting #357

rolfmorel commented Jul 15, 2014

huonw Jul 15, 2014

rolfmorel Jul 15, 2014

Arcterus Jul 15, 2014

huonw Jul 15, 2014

huonw Jul 15, 2014

Arcterus Jul 15, 2014

huonw Jul 15, 2014

Arcterus Jul 15, 2014

Arcterus Jul 15, 2014

rolfmorel Jul 16, 2014

		@@ -73,7 +75,8 @@ impl<R: Reader> BufReader<R> {
		if buffer_used == 0 { return bytes_consumed; }

		for idx in range(self.start, self.end) {

Small performance enhancements for cut: bytes cutting #357

Small performance enhancements for cut: bytes cutting #357

Conversation

rolfmorel commented Jul 15, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment