-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Take advantage of LLVM funnel shift intrinsics (llvm.fshl/llvm.fshr) for fast rotation of integer types #11841
Comments
To sum it up, my suggestion is to implement some kind of "rotate left" and "rotate right" methods for basic integer types. Which may take advantage of LLVM's funnel shift intrinsics under the hood. Then even the existing hash function can use them instead of https://github.com/crystal-lang/crystal/blob/1.3.2/src/crystal/hasher.cr#L91-L93 (which doesn't seem to compile to efficient code by the current Crystal version 1.3.2 either). |
What happens in C if you right shift a value too much, like shift by 65? And what happens in Crystal? |
This is because the C and Crystal versions aren't the same; @[NoInline]
def unsafe_rotl(x : UInt64, rot : Int32) : UInt64
x.unsafe_shl(rot) | x.unsafe_shr(64 &- rot)
end "*unsafe_rotl<UInt64, Int32>:UInt64":
movl %esi, %ecx
movq %rdi, %rax
rolq %cl, %rax
retq |
Oh, thank you very much. This seems to be a "missing documentation" bug then. Because I can't see the PS. I did try |
A faster "safe" version, i.e. defined over all shift counts, is: struct UInt64
def rotate_left(count : Int) : self
unsafe_rotl(count.to_i! & 63)
end
end
struct Int64
def unsafe_rotl(count : Int32) : self
to_u64!.unsafe_rotl(count).to_i64!
end
def rotate_left(count : Int) : self
to_u64!.rotate_left(count).to_i64!
end
end Ditto for the other primitive integer types.
It is undefined behaviour (6.5.7):
|
Intrinsic still could be useful because without it
|
That's a very good question. C doesn't matter, but it would be great if Crystal language designers could provide the answers in the documentation about how Crystal shift operators are supposed to behave. Another issue can be probably reported about this. As for the proposed new rotate methods for the integer types, I suggest to just follow the LLVM documentation, which says:
The Intel documentation also says the following about ROR/ROR instructions:
I'm trying to convert the following Rust code to Crystal: fn finish(&self) -> u64 {
let rot = (self.pad & 63) as u32;
self.buffer.rotate_left(rot)
} Ideally all of this should be translated to just a single x86-64 instruction ROL without relying on any kind of undefined behaviour in the Crystal source code. Is I'm currently using the |
First please consider the following C code:
It gets compiled by Clang with -O3 optimizations on an x86-64 system into pretty much a single ROL instruction:
Now let's look at what happens to the equivalent Crystal code:
It gets compiled by crystal build --release --emit obj test.cr && objdump -d test.o into something really horrible
I'm not sure what's going on and why can't Crystal do the same optimization as Clang, but right now using
llvm.fshl
intrinsic appears to be a possible workaround/solution:Which compiles into
I've been experimenting with this simple benchmark (with and without NoInline): https://gist.github.com/ssvb/49ee46ed5c9e481dc6c6ecf8b05b4d6b
As for the practical usage, I stumbled upon performance problems when trying to experiment with changing the Crystal's hash function to AHash. Incidentally, @funny-falcon (the author of the current hash function) mentioned the same performance problem too.
The text was updated successfully, but these errors were encountered: