Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ByteBuffer in zero copy APIs (instead of DirectBuffer) #42

Open
phraktle opened this issue Sep 6, 2015 · 21 comments
Open

Use ByteBuffer in zero copy APIs (instead of DirectBuffer) #42

phraktle opened this issue Sep 6, 2015 · 21 comments

Comments

@phraktle
Copy link

phraktle commented Sep 6, 2015

Hi,

I have started playing a bit with the zero-copy APIs and my initial impression is that it should use straight ByteBuffers, rather than the custom wrapper DirectBuffer. A library like this should not be opinionated on which convenience wrapper one uses for buffers.

(There are actually several such options and one should be free to make a choice – or use none of these:
https://github.com/OpenHFT/Chronicle-Bytes#comparison-of-access-to-native-memory)

Regards,
Viktor

@phraktle
Copy link
Author

phraktle commented Sep 6, 2015

Btw, this would also eliminate the need to use Unsafe (which I see has been causing problems for some), since you can return ByteBuffers wrappers for a native address: http://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#NewDirectByteBuffer

@krisskross
Copy link
Member

I agree that first resort should always be the JDK.

But ByteBuffer is well known to have both api bugs and internal performance bugs that are not prioritized by Oracle. ByteBuffers is simply not the default anymore and that's why people create new implementations. Until Oracle start taking these issues seriously i'm very much against using ByteBuffers! I agree the situation is not ideal. But this is the state of Java today and not specific to lmdbjni.

I'm not sure how much users would gain from getting freedom to choose their own buffer implementation? What problems specifically is it that you see? If we have bugs we should fix them.

As you probably know there is a lot of ongoing debate on Oracle deprecating Unsafe in Java 9. Some of this will be replaced by VarHandles (among others) http://openjdk.java.net/jeps/193. And I hear rumors that new "high-performance" JSRs might be created to solve the Unsafe situation.

Everything is in a state of flux at the moment and it's still a bit early to know how a standard Java zero copy solution would look like in the future.

@phraktle
Copy link
Author

phraktle commented Sep 7, 2015

Agree with the problems of ByteBuffer, however, the convenience APIs mentioned above are also just wrappers around these in general (and/or use Unsafe, which is also not the most portable/future-proof approach). So providing the base LMDB API using ByteBuffers would introduce no new issues and would still allow one to use any further buffer APIs.

@krisskross
Copy link
Member

Can you please elaborate exactly what your proposal is? If understand correctly you want to deprecate and replace Entry with ByteBuffer on most methods on Database in order to abstract away direct vs heap access to LMDB? There may be problems with this but maybe I don't understand your proposal?

@phraktle
Copy link
Author

phraktle commented Sep 7, 2015

My proposal is simply to use ByteBuffer in the LMDB API, where you currently use DirectBuffer (and move DirectBuffer outside the LMDB API)

@krisskross
Copy link
Member

What are the advantages from doing that?

@krisskross
Copy link
Member

If ByteBuffer isn't your buffer of choice then you would be penalized with two buffer conversions.

@phraktle
Copy link
Author

phraktle commented Sep 7, 2015

I would ask the same for DirectBuffer ;) Many of our existing serialization etc is implemented against ByteBuffers. ByteBuffers work with all other Java APIs, e.g. NIO, and the aforementioned buffer libraries. And you can actually wrap a ByteBuffer with any of these without copying (or extra overhead on operations), but you can't always get a ByteBuffer from the current DirectBuffer in lmdbjni (when it just accesses a memory address directly with Unsafe, you can't get a ByteBuffer w/o copying).

So, I would rather not (re)write code against an lmdbjni-specifc buffer API, when there's one in the JDK that's widely supported (I agree about your comments about how it's not a great design – but that's besides the point).

@krisskross
Copy link
Member

Yes. I'm not saying that we shouldn't do it. It's an interesting use case. My main concern is that performance will get worse and use cases for DirectBuffer harder to implement. Maybe a prototype is appropriate that proves otherwise?

@krisskross
Copy link
Member

Is it enough for DirectBuffer.byteBuffer() to wrap the ByteBuffer around the memory address? This is required anyway. You would only pay for the construction of a DirectBuffer object.

@phraktle
Copy link
Author

If you can guarantee the DirectBuffer to always have a wrapped ByteBuffer, then why not just have the lmdbjni API use ByteBuffers everywhere?

@krisskross
Copy link
Member

Doing this lazily impose no penalty when only using DirectBuffer. The penalty for using ByteBuffer include hacking into internals of ByteBuffer using reflection.

@kk00ss
Copy link

kk00ss commented Mar 16, 2016

Hello @phraktle , how do we wrap mmap memory from LMDB with ByteBuffer ? The only way ByteBuffer works with off heap memory is ByteBuffer.allocateDirect. Or we probably can create new mmap using MappedByteBuffer - which will require having two MMaps for one file. But it's api requires us to load whole buffer into memory, or otherwise exception will happen An attempt to access an inaccessible region of a mapped byte buffer will not change the buffer's content and will cause an unspecified exception to be thrown either at the time of the access or at some later time. (I especially like this "some later time" 👍 ) The beauty of working with MMaped data is that pages required will be loaded automatically when needed. I'm new to JVM world, so maybe I'm missing something. Looking forward to read your opinion.

@phraktle
Copy link
Author

@kk00ss by using NewDirectByteBuffer in the native code, which will wrap an LMDB region.

@kk00ss
Copy link

kk00ss commented Mar 18, 2016

Then I support @phraktle request, there are various APIs that know how to work with ByteBuffer and it's often hard to change to work with custom buffer implementation. I think this change is rather trivial, unfortunately I cannot configure environment to build lmdbjni locally (windows 10 and sdk for it). And since @krisskross wrote that build for win64 was performed on virtual machine. It would be awesome to have shared windows image with all required stuff. Not sure about licencing for that virtual machine however.

@krisskross
Copy link
Member

I agree that ByteBuffer is a more portable solution and worth comparing numbers with DirectBuffer.

I'm not sure how you would hook into hawtjni to make that JNI call though? Does hawtjni support it or do we need a fully custom JNI implementation?

@phraktle
Copy link
Author

On first glance, I see no evidence that HawtJNI supports this. As a matter of fact, HawtJNI looks like an unmaintained project (no commits in a year, no reactions on tickets/pull requests).

@krisskross
Copy link
Member

Yeah that's my impression as well. Maybe it's possible to take the generated code and do it manually. But I think the best bet is to wait for Project Panama and JEP 191: Foreign Function Interface which is probably both safer and higher performance.

@krisskross
Copy link
Member

FYI: We are almost finished with an implementation using JNR-FFI and benchmarks are looking very good. Give it a spin. Any comments/opinions are greatly appreciated.

https://github.com/lmdbjava/lmdbjava

@krisskross
Copy link
Member

Have a look at the benchmarks.

https://github.com/lmdbjava/benchmarks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants