Improve siphash performance for longer data #27280

Use `ptr::copy_nonoverlapping` (aka memcpy) to load an u64 from the byte stream. This is correct for any alignment, and the compiler will use the appropriate instruction to load the data. Use unchecked indexing. This results in a large improvement of throughput (hashed bytes / second) for long data. Maximum improvement benches at a 70% increase in throughput for large values (> 256 bytes) but already values of 16 bytes or larger improve. Introducing unchecked indexing is motivated to reach as good throughput as possible. Using ptr::copy_nonoverlapping without unchecked indexing would land the improvement some 20-30 pct units lower. We use a debug assertion so that the test suite checks our use of unchecked indexing.

Without this temporary variable, codegen improves slightly and less registers are spilled to the stack in SipHash::write.

If they are ordered v0, v2, v1, v3, the compiler can find just a few simd optimizations itself. The new optimization I could observe on x86-64 was using 128 bit registers for the v = key ^ constant operations in new / reset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve siphash performance for longer data #27280

Improve siphash performance for longer data #27280

Commits on Jul 25, 2015