-
-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make dtype-next available via Java API? #52
Comments
Regarding libjulia-clj and libpython-clj I guess giving it a dtype-next object will already do an efficient transfer into julia arrays or numpy arrays since that is anyway part of the clojure bindings? |
Semi efficient meaning it will still allocate the native numpy/julia type but it will only need to do a memcpy as opposed to now where it is constructing a java tensor, then allocating the numpy/julia type, then doing the memcpy - assuming there is no transpose performed on either side. If there is a transpose then it has to go in a parallelized index-by-index transformation. Dataset makes more sense as the arrow integration does allow very fast transfer - serializing to arrow is quite quick and loading is pretty much instantaneous. |
But to use dataset you need parts of dtype-next so it would start here. I am currently rebuilding the arrow bindings for dataset to support JDK-17 and will loop back to this soon. |
I have now made some significant progress with the clojure integration here: https://github.com/invesdwin/invesdwin-context/blob/master/invesdwin-context-parent/invesdwin-context-clojure/src/main/java/de/invesdwin/context/clojure/pool/ It seems single threaded tests run fine now, but when using more than 1 thread the bindings don't seem to work. From what I read clojure should use threadlocals for namespace variables. But maybe I am missing some system property to isolate everything properly? (similar to jruby's Am I doing something wrong or is clojure not capable to be called from multiple threads in parallel without namespace conflicts? => Upstream ticket here: https://ask.clojure.org/index.php/11482/unable-resolve-symbol-after-pushthreadbindings-second-thread |
I honestly have no idea with the above issue. The Clojure compiler/runtime system itself isn't something I have spent any more time than necessary getting into and especially the interaction between defining vars in namespaces and threads is totally opaque to me. I think, for example, that ns-unmap is a global function so if a var is unmapped in one thread it is unmapped in all threads. I would, in a threadsafe way, define the var if undefined and set dynamic like you did. Then when you assign a value that is when you pushThread. I would never undefine it but you can pop the var's thread stack which should reset it's value to null (or nil in clojure land - LISP heritage is nil, not null). |
Thanks, the hint about ns-unmap working on globals is indeed a problem. I have now worked around it by checking if the variable really is in the threads local binding. The "binding not available" issue was that the bindings leaked to a different thread. I made it purely thread local now so that the bindings instance really is invoking the constructor in a given thread. So the test cases now work in a second thread. Though there are still issues with global variables in the scripts: The correct solution is to not use one "user" namespace for all threads and (in-ns switches to it) but instead a unique namespace per thread. In that regard the bindings can be applied globally but separated. Both def and ns-unmap work properly that way. This way my testcases are green now. So thanks a lot for the hint, this made me understand the problem a lot better. Clojure just does not have something like "threadsafe" isolation availabe that other languages provides, but one can build it via namespaces. |
Got it - yep. I am working on providing a java layer for dtype-next right now. |
I also got a cache working that reuses clojure compilations between invocations. Existing scripting integrations for clojure are a bit lacking:
Here a benchmark for the reuse of compilation (100 iterations of the testcases in 1 thread with disabled logging):
So it might make a bigger difference for tight loops. I have also added a JSR223 wrapper for the improved binding: https://github.com/invesdwin/invesdwin-context/tree/master/invesdwin-context-parent/invesdwin-context-clojure/src/main/java/de/invesdwin/context/clojure/jsr223 Docs and a full list of supported JVM languages (each having JSR-223 support in addition to the invesdwin scripting interfaces): https://github.com/invesdwin/invesdwin-context#scripting-modules-for-jvm-languages |
Java API for dtype-next '8.057' is up - https://cnuernber.github.io/dtype-next/javadoc/tech/v3/package-summary.html. Note that neither libpthon-clj nor libjulia-clj have been updated so you will need to override this in the dependencies for now. QuickNDirty test - https://github.com/cnuernber/dtype-next/blob/master/java_test/java/jtest/Main.java |
And just like that, the test discovered that I hadn't included nio buffer support. |
I looked at the docs a bit, makes a very good impression! Regarding Readers, maybe we should also support the other less common types?
|
I also tried using the new types with libjulia-clj:
I guess we don't yet have the types considered in the put pathway for buffers and buffer readers (though dunno if it makes sense to support these types). Though I guess only the tensor variant will be supported (not yet complete). Or will we need the abstractions of tech.ml.dataset to support this, as you suggested previously? |
Glad you got into it so quickly! That is encouraging. A few things I see - Wrapping a double array in a reader like above means we disable the high performance copy pathway. When dtype wraps double arrays it does so in a way that allows the system to get back to the original buffer for high perf loops and such: double[] testData = toDoubleArray(range(100));
Buffer wrappedData = toBuffer(testData);
ArrayBufferData origData = asArrayBuffer(wrappedData);
System.out.println(String.valueOf(System.identityHashCode(testData))
+ " " + String.valueOf(System.identityHashCode(origData.arrayData)));
// Outputs:
// 748921347 748921347 There are DoubleWriters and such but one thing to note is that 'toBuffer' creates a buffer implementation that will allow writing. For efficiently putting data into existing containers, there is DType.copy which falls back to the lower level arrayCopy or copyMemory fastpaths. One thing you can do now is after creating the Julia array you can get a tensor from it with Tensor.asTensor. So you can can manipulate the data in the tensor from the Java side for the case where you are doing something other than just bulk uploads. Or you can copy new data in with DType.copy. You can also transpose it so you get the same representation on the Java side as Julia is seeing. I didn't create the other integer types of reader base interfaces or float interfaces as if, for instance, you create a LongReader and override elemwiseDatatype to return int16 the system will interpret that as a short buffer. You need to make sure you return longs in the range of shorts but casting from a long to a short and back I think is nearly instantaneous and that is true for all integer types. I am not sure about casting from a double to float in terms of time but because the virtual call times are much longer than the cast times I didn't create the various subtype overloads. I think they do help in code clarity if you are dealing with signed integer types but as I said I don't believe there is an efficiency argument there. The one perf test I think would be the time difference of Julia createArray between a double[][] and a single double[] reshaped to be a tensor of shape [1000 10]. |
The reader was an idea to maybe transparently transpose during the copy into julia. But yes, it could be more efficient to just transpose on the java side, then copy to julia. Or copy to julia, then transpose there. To test the double[][] I guess first this pathway needs to work:
I checked it in more detail:
The output is:
Thus the putGlobalFunction.invoke(...) fails to put the value. Instead "nothing" arrives in julia. |
The most efficient transpose is definitely going to be on the Julia side. What I meant was something more like: Object juliaAry = createArray(data, "float64", [10]);
NDBuffer tens = asTensor(juliaAry); Then you can do whatever you like to it in Java land as well as in Julia land. Reshape is the best way to take a flat buffer of data and end up with a tensor. |
In Julia you have the choice when transposing of either creating a view which will incur indexing cost down the road or paying all the indexing cost and transposing directly which for 2D square matrixes has an inplace option. For sure Julia's implementation will be far more heavily optimized than mine :-). |
Ah, ok great. With the example you provided I understand how this is supposed to work. ^^ And tech.ml.dataset might provide a put/get by utilizing arrow I guess? |
Yes, the python bindings work the same way. I hadn't thought of that really. For tmd there is a fast serialization path to an output stream or a block of memory so you can kind of choose how intense you want things to be. It serializes into Arrow stream format which it can then load via mmap if you want to lower your java heap usage. And python, R, etc. can mmap the same file themselves. Arrow has a lot of options. It also has a C interchange format that I haven't looked into so you can directly share columns or entire datasets. For very tight coupling the straight C structure-based interchange pathway is probably best but as I said it is simpler to save the file to arrow streaming format and then mmap the result so I haven't invested time in the C pathway. Arrow also introduced compression and it has support for structs so you could potentially save a dataset to an array of structs and compress it which would make it somewhat similar to your existing serialization. I don't yet support serializing a dataset to arrow in row-major form nor their compression but their compression is next on the list as I think it is useful. |
Of course mmap doesn't make a ton of sense with compression but as orthogonal options for various scenarios I think it all makes a lot of sense. |
I did a benchmark with the new path, it is indeed faster for matrix (when doing the index-by-index transpose during the index by index copy):
So really good job. :) |
Correction, boolean does not work:
|
Right, and it links to that other issue. Boolean isn't a numeric datatype, it is usually implemented at the C level by signed or unsigned characters with the value of 0 or 1. The tensor interface only supports reading and writing using longs or doubles. Tensors storage can be any of the C datatypes themselves: user> (Tensor/makeTensor (range 9) [3 3] :uint8)
#tech.v3.tensor<uint8>[3 3]
[[0 1 2]
[3 4 5]
[6 7 8]] Or similarly: user> (Tensor/reshape (DType/makeContainer :uint8 (range 9)) [3 3])
#tech.v3.tensor<uint8>[3 3]
[[0 1 2]
[3 4 5]
[6 7 8]] Just the reading/writing only supports those datatypes which I argue have fast cast semantics to the rest; at least fast enough that the vcall overhead overwhelms the concrete additional time. You can definitely create float32 and uint16, etc type tensors and read/write to them using the long or double interfaces. I just didn't want to type out 24 functions for reading data and 24 functions for writing data. |
Or, for a much more technical argument - https://cnuernber.github.io/dtype-next/datatype-to-dtype-next.html The synopsis was that I found that the code generation required to create code for each primitive type first off wasn't faster in actual execution time and unnecessarily increased the compile time and binary size of the system. |
Release 8.060 adds a simple POD descriptor type for arraybuffers and nativebuffers and fixes an issue with makeTensor. |
Yup, you are right. Going through a bigger type is not an issue. Performance results show the same relations even for smaller types that go through read/write with larger types. I will migrate all of my put/get matrix calls. I guess we can thus close this issue. I would still be interested in maybe examples for (maybe you have them):
I guess both neanderthal and tech.ml.dataset could be worked with inside of clojure scripts that way. And I guess with similar paths scicloj should also work. I looked at the tech.ml.dataset docs and dtype-next docs but did not come to conclusions about these use cases. |
I am glad you tested that pathway :-). I appreciate the reality check :-). I will add in neanderthal dense matrix pathway to the Java Main sample and let you know. I can also show a dataset constructor using the columns of the matrix. Datasets can be constructed from a map of column name to column so once you have a 2d tensor you can either use the rows or columns and name them to get a dataset. It will share the memory regardless. And if you just want to pass in a map of columns of primitive arrays, nio buffers, or readers those all work too. The most inefficient way to construct them which is still surprisingly fast is to pass in a sequence of maps in row-major form. Clojure has clever ways to create objects that look like maps but have a fixed minimum number of keys -- those are clojure records and this can help a lot with the row-major pathway but it is still much faster of course to use columns as the dataset system simply ensures there is a conversion to a buffer. Then a scicloj example of a classifier and a regressor would be interesting - I agree. All that will also proof out my minimal clojure layer such that it is enough raw clojure bindings to make using these systems fairly straight forward even if there is no bespoke Java api. All great thoughts, keep em coming! |
Here the most extreme example of put/get byte through a long:
|
I agree, I think we can achieve a lot with a minimal layer. I was also positively impressed by the Clj class you provided. ^^ |
That is great! I discovered I updated the sample with neanderthal and bit of dataset and arrow demonstration - https://github.com/cnuernber/dtype-next/blob/master/java_test/java/jtest/Main.java. The bootup time of neanderthal is intense for two reasons. First it includes clojure.core.async and this library has a somewhat sophisticated compile-time system for async programming. It alone takes I think 5 or 6 seconds. Next Neanderthal unpacks MKL because I don't have the javacpp MKL bindings on my classpath. All that side it all works great from Java and is very nearly as condensed as the Clojure version so that is encouraging. |
This might be material for another clojure talk ^^ |
I would love to get an infoq talk on this. Within the Clojure community -- I am not sure how well received it would be but it sure is fun. No one is writing Java code like this which is of course both good and bad. I added a high performance parallelism primitive, the one used most frequently throughout the system - demo. One huge benefit of the Java API is I am only exposing the very key concepts and things. I think this really helps focus someone new on exactly what is different and then they can explore more of the Clojure API as they see fit. I have trouble with new people getting overwhelmed as it is. I am going to move to implementing the compression specification in Arrow and then do a pass to give |
Also by exposing it as a Java API, it becomes available to JRuby, Scala, Kotlin, ... on the JVM. They all provide convenience methods to integrate Java. Also there are Julia and Python bindings that allow to embed Java inside them. This API then allows to call clojure code from Julia and Python. Java is here the least common denominator. |
Previously I thought that merely by publishing a jar people could use this system from other JVM languages and I guess that is true to some extent but not without being extremely knowledgeable with Clojure to begin with which completely defeats the purpose. |
Yes, the Clojure compiler/binding is anything but intuitive, maybe also the reason why the existing JSR-223 bindings were working only sort of. |
I will thus close this issue, we can then move on to the follow up story: techascent/tech.ml.dataset#281 Again, I will be happy to try it! |
As discussed here: cnuernber/libjulia-clj#4 (comment)
Though I guess dtype-next is a low level API and higher level functionality is available when using neanderthal, tech.ml.dataset or scicloj on top of or with it.
But having a java api for dtype-next would still allow to create abstractions to transport data from java into scala efficiently to then work with other scala libraries without copying data back and forth. I would integrate that here: https://github.com/invesdwin/invesdwin-context/tree/master/invesdwin-context-parent/invesdwin-context-clojure
The arrow integration of tech.ml.dataset could also be used to transport datasets between the jvm and languages like r/python/julia. Though dunno if for that case tech.ml.dataset should also be made available as a java api later or if arrow integration should be implement on the outside for a java abstraction that uses dtype-next.
Same applies to a possible abstraction to other interfaces like nd4j, tablesaw, smile, ojalgo, ...
The text was updated successfully, but these errors were encountered: