-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implements ByteBuffer support for all LZ4 methods #49
Conversation
performance as byte array methods for the JNI and Unsafe versions.
Fixes build rule for java-unsafe. Removes leftover code in threaded bench.
@jpountz any chance this might be integrated some time soon? we're keen to use it in Apache Cassandra |
Hi @blambov, thanks for the pull request, it looks great in general and I know how much work such changes require! I have some questions/comments:
|
The reason I added a separate version with offsets is to permit thread-safe concurrent access using the same buffer. If you use the position/limit in a buffer, the buffer immediately becomes a mutable resource that needs exclusive access. To use such a buffer from multiple threads (which we need, e.g., to compress multiple sections in the same large memory space concurrently) usually one duplicates the buffer, which is an extra overhead we could do without.
There are "Neither buffer's position is moved."/"The positions in both buffers are moved to reflect the bytes read/written." sentences in the JavaDoc in
The int/long versions are just in the unsafe implementations. Whenever they access memory they do an offset from a pointer base (where the pointer base can change at any point in the execution due to GC). The pointer base changes size on 32-bit vs. 64-bit JVMs, but the offset does not as Java does not have a ptr-sized integer type. That normally isn't a problem as arrays don't go above 32-bit size. Once direct buffers come into play, however, things change as now the pointer base has to be null and the offset has to accommodate the whole address of the buffer in memory, thus it has to be long on a 64-bit JVM. The code itself does a lot of calculations with these offsets, and I think that's what ends up making it almost twice slower with longs on a 32-bit JVM (~220MB/s vs ~400 with the included test). There are other alternatives to fixing this (e.g. ptr base + long base + int offset addressing, or using long offsets only for direct buffers), but this seems to be the fastest option.
I'll see if I can improve this today. |
byte[] and ByteBuffers.
This is now done. |
No thanks, that's great. I don't know how I missed it! Thanks for the changes to the tests, it looks great.
I don't know how important it is for Cassandra, but if you don't care too much about performance on 32-bits systems, I am personally ok with not even trying to optimize for these systems. I tend to be more scared by the potential bugs it could create that I would not find since I almost always run a 64-bits JVM, than I am tempted by the performance improvements. It could also make the code easier to maintain. I just played with your changes but I am getting some test failures because of LZ4HCJavaUnsafeLongCompressor:
I think the cast to an |
Changes LZ4ByteBufferUtils to be generated from lz4 utils template.
Thanks for catching this. Apparently Windows will not give you large addresses until you use that much memory. This is now fixed, plus a new "large" test is added to make sure this and similar problems are also caught on Windows. I moved this to a separate test target as it requires 6 gigs of memory (to test both negative ints and truncation).
Well, the int offset versions of the templates will be pretty well tested on a 64-bit JVM as they will be used for the Safe implementations. These will always require 32-bit array indices. Even without special 32-bit unsafe code there will still have to be 32-bit (for Safe) and 64-bit (for Unsafe with direct buffers) instances of the templates; all the template complexity and maintenance pain will still be there, we will only instantiate it one less time. I personally don't think that's enough reason to accept a 2x hit. I don't think that is a rare target either, as people still often use 32-bit JVMs to save memory. |
I merged the change, thanks!
Is it still true with the compressed oops optimization on? |
Thank you for resyncing and merging this. One small thing: in unsafe_utils the new
Probably not as much, but as an example Google was using 32-bit JVMs by default until fairly recently. |
Oops, I merged too quickly with the xxh64 change, thanks for the catch!
Interesting. |
@blambov FYI I was playing with compressing ByteBuffers and it seems possible to make the JVM segfault by providing a read-only heap ByteBuffer. The problem seems to be that we assume that buffers that don't have an array are direct: if (srcBuf.hasArray()) {
src = srcBuf.array();
srcOff += srcBuf.arrayOffset();
} else {
src = null;
srcOff += getBufferOffsetFromNull(srcBuf);
} Here is a patch that makes the tests segfault: diff --git a/src/java/net/jpountz/lz4/LZ4Factory.java b/src/java/net/jpountz/lz4/LZ4Factory.java
index 7478c41..d8d7487 100644
--- a/src/java/net/jpountz/lz4/LZ4Factory.java
+++ b/src/java/net/jpountz/lz4/LZ4Factory.java
@@ -15,6 +15,7 @@ package net.jpountz.lz4;
*/
import java.lang.reflect.Field;
+import java.nio.ByteBuffer;
import java.util.Arrays;
import net.jpountz.util.Native;
diff --git a/src/test/net/jpountz/lz4/AbstractLZ4RoundtripTest.java b/src/test/net/jpountz/lz4/AbstractLZ4RoundtripTest.java
index 1728cb4..8f2e6a3 100644
--- a/src/test/net/jpountz/lz4/AbstractLZ4RoundtripTest.java
+++ b/src/test/net/jpountz/lz4/AbstractLZ4RoundtripTest.java
@@ -90,7 +90,11 @@ public abstract class AbstractLZ4RoundtripTest extends AbstractLZ4Test {
abstract ByteBuffer allocate(int len);
ByteBuffer copy(byte[] src) {
- return allocate(src.length).put(src);
+ ByteBuffer copy = allocate(src.length).put(src);
+ if (randomBoolean()) {
+ copy = copy.asReadOnlyBuffer();
+ }
+ return copy;
}
ByteBuffer slice(ByteBuffer src, int start, int end) { |
Sorry that I didn't clear this up. I will make a separate patch for read-only buffers. How would you prefer to handle them:
|
Good questions... I wanted to add ByteBuffer support to the xxhash APIs yesterday and faced a similar issue. I ended up falling back to the plain Java impl when the buffer is not direct and doesn't expose an array (see XXHash32JNI for instance) so that it uses the ByteBuffer APIs. Maybe we should do the same here? I'm wondering that we should also add checks that the destination buffer is not read-only before starting to (de)compress? |
I just ran tests on a 32-bits VM and got test failures because of an assertion error:
Will try to understand what happens here. It seems due to an overflow of the current address in the byte buffer. |
@jpountz : any news on if this is stable enough to get into a release? I'm interested in implementing this with our use of ByteBuffers in Druid's CompressedObjectStrategy |
Addresses Issue #9.
Covers all implementations:
Additional changes applied:
The efficiency of the JNI and Unsafe methods on byte buffers is on par with those on arrays, with some benefits in the case of JNI, direct buffers and active garbage collection. Safe methods are slower due to the ByteBuffer access overheads.