Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve siphash performance for longer data #27280

Merged
merged 4 commits into from
Jul 28, 2015
Merged

Commits on Jul 25, 2015

  1. siphash: Add more benchmarks

    bluss committed Jul 25, 2015
    Configuration menu
    Copy the full SHA
    381d2ed View commit details
    Browse the repository at this point in the history
  2. siphash: Use ptr::copy_nonoverlapping for efficient data loading

    Use `ptr::copy_nonoverlapping` (aka memcpy) to load an u64 from the
    byte stream. This is correct for any alignment, and the compiler will
    use the appropriate instruction to load the data.
    
    Use unchecked indexing.
    
    This results in a large improvement of throughput (hashed bytes
    / second) for long data. Maximum improvement benches at a 70% increase
    in throughput for large values (> 256 bytes) but already values of 16
    bytes or larger improve.
    
    Introducing unchecked indexing is motivated to reach as good throughput
    as possible. Using ptr::copy_nonoverlapping without unchecked indexing
    would land the improvement some 20-30 pct units lower.
    
    We use a debug assertion so that the test suite checks our use of
    unchecked indexing.
    bluss committed Jul 25, 2015
    Configuration menu
    Copy the full SHA
    f910d27 View commit details
    Browse the repository at this point in the history
  3. siphash: Remove one variable

    Without this temporary variable, codegen improves slightly and less
    registers are spilled to the stack in SipHash::write.
    bluss committed Jul 25, 2015
    Configuration menu
    Copy the full SHA
    5f6a61e View commit details
    Browse the repository at this point in the history
  4. siphash: Reorder hash state in the struct

    If they are ordered v0, v2, v1, v3, the compiler can find just a few
    simd optimizations itself.
    
    The new optimization I could observe on x86-64 was using 128 bit
    registers for the v = key ^ constant operations in new / reset.
    bluss committed Jul 25, 2015
    Configuration menu
    Copy the full SHA
    27c44ce View commit details
    Browse the repository at this point in the history