Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make dtype-next available via Java API? #52

Closed
subes opened this issue Jan 12, 2022 · 35 comments
Closed

make dtype-next available via Java API? #52

subes opened this issue Jan 12, 2022 · 35 comments

Comments

@subes
Copy link

subes commented Jan 12, 2022

As discussed here: cnuernber/libjulia-clj#4 (comment)

Though I guess dtype-next is a low level API and higher level functionality is available when using neanderthal, tech.ml.dataset or scicloj on top of or with it.

But having a java api for dtype-next would still allow to create abstractions to transport data from java into scala efficiently to then work with other scala libraries without copying data back and forth. I would integrate that here: https://github.com/invesdwin/invesdwin-context/tree/master/invesdwin-context-parent/invesdwin-context-clojure

The arrow integration of tech.ml.dataset could also be used to transport datasets between the jvm and languages like r/python/julia. Though dunno if for that case tech.ml.dataset should also be made available as a java api later or if arrow integration should be implement on the outside for a java abstraction that uses dtype-next.

Same applies to a possible abstraction to other interfaces like nd4j, tablesaw, smile, ojalgo, ...

@subes
Copy link
Author

subes commented Jan 12, 2022

Regarding libjulia-clj and libpython-clj I guess giving it a dtype-next object will already do an efficient transfer into julia arrays or numpy arrays since that is anyway part of the clojure bindings?

@cnuernber
Copy link
Owner

Semi efficient meaning it will still allocate the native numpy/julia type but it will only need to do a memcpy as opposed to now where it is constructing a java tensor, then allocating the numpy/julia type, then doing the memcpy - assuming there is no transpose performed on either side. If there is a transpose then it has to go in a parallelized index-by-index transformation.

Dataset makes more sense as the arrow integration does allow very fast transfer - serializing to arrow is quite quick and loading is pretty much instantaneous.

@cnuernber
Copy link
Owner

But to use dataset you need parts of dtype-next so it would start here. I am currently rebuilding the arrow bindings for dataset to support JDK-17 and will loop back to this soon.

@subes
Copy link
Author

subes commented Jan 15, 2022

I have now made some significant progress with the clojure integration here: https://github.com/invesdwin/invesdwin-context/blob/master/invesdwin-context-parent/invesdwin-context-clojure/src/main/java/de/invesdwin/context/clojure/pool/

It seems single threaded tests run fine now, but when using more than 1 thread the bindings don't seem to work. From what I read clojure should use threadlocals for namespace variables. But maybe I am missing some system property to isolate everything properly? (similar to jruby's org.jruby.embed.localcontext.scope=threadsafe)

Am I doing something wrong or is clojure not capable to be called from multiple threads in parallel without namespace conflicts?
It seems clojure binds itself to the first thread that initialized it and any other thread runs into binding errors:

grafik

=> Upstream ticket here: https://ask.clojure.org/index.php/11482/unable-resolve-symbol-after-pushthreadbindings-second-thread

@cnuernber
Copy link
Owner

I honestly have no idea with the above issue. The Clojure compiler/runtime system itself isn't something I have spent any more time than necessary getting into and especially the interaction between defining vars in namespaces and threads is totally opaque to me.

I think, for example, that ns-unmap is a global function so if a var is unmapped in one thread it is unmapped in all threads.

I would, in a threadsafe way, define the var if undefined and set dynamic like you did. Then when you assign a value that is when you pushThread. I would never undefine it but you can pop the var's thread stack which should reset it's value to null (or nil in clojure land - LISP heritage is nil, not null).

@subes
Copy link
Author

subes commented Jan 15, 2022

Thanks, the hint about ns-unmap working on globals is indeed a problem. I have now worked around it by checking if the variable really is in the threads local binding. The "binding not available" issue was that the bindings leaked to a different thread. I made it purely thread local now so that the bindings instance really is invoking the constructor in a given thread. So the test cases now work in a second thread.

Though there are still issues with global variables in the scripts:
A similar problem to ns-unmap happens when one uses def for global variables in the scripts. I tried using set! (https://clojure.org/reference/vars#set) instead to bind it thread local as well in the script, but it seems the "var" has to be declared in some other way before set! works, because it throws an error. Similar problems will occur when defining functions with the same name in different threads. Since that did not work, I looked for a different solution.


The correct solution is to not use one "user" namespace for all threads and (in-ns switches to it) but instead a unique namespace per thread. In that regard the bindings can be applied globally but separated. Both def and ns-unmap work properly that way. This way my testcases are green now.

So thanks a lot for the hint, this made me understand the problem a lot better. Clojure just does not have something like "threadsafe" isolation availabe that other languages provides, but one can build it via namespaces.

@cnuernber
Copy link
Owner

Got it - yep. I am working on providing a java layer for dtype-next right now.

@subes
Copy link
Author

subes commented Jan 15, 2022

I also got a cache working that reuses clojure compilations between invocations.

Existing scripting integrations for clojure are a bit lacking:

Here a benchmark for the reuse of compilation (100 iterations of the testcases in 1 thread with disabled logging):

  • always parsing: PT13.233.921.654S
  • always reusing: PT12.676.846.056S

So it might make a bigger difference for tight loops.

I have also added a JSR223 wrapper for the improved binding: https://github.com/invesdwin/invesdwin-context/tree/master/invesdwin-context-parent/invesdwin-context-clojure/src/main/java/de/invesdwin/context/clojure/jsr223

Docs and a full list of supported JVM languages (each having JSR-223 support in addition to the invesdwin scripting interfaces): https://github.com/invesdwin/invesdwin-context#scripting-modules-for-jvm-languages

grafik

@cnuernber
Copy link
Owner

cnuernber commented Jan 17, 2022

Java API for dtype-next '8.057' is up - https://cnuernber.github.io/dtype-next/javadoc/tech/v3/package-summary.html.

Note that neither libpthon-clj nor libjulia-clj have been updated so you will need to override this in the dependencies for now.

QuickNDirty test - https://github.com/cnuernber/dtype-next/blob/master/java_test/java/jtest/Main.java

@cnuernber
Copy link
Owner

And just like that, the test discovered that I hadn't included nio buffer support.
'8.058' includes nio buffer support by default.

@subes
Copy link
Author

subes commented Jan 18, 2022

I looked at the docs a bit, makes a very good impression!

Regarding Readers, maybe we should also support the other less common types?

//DTye.bool => BooleanReader exists with variants
  DType.int8 => ByteReader not available
  DType.uint8 => not needed or ShortReader? or can we treat a ByteReader that way?
  DType.int16 => ShortReader not available
  DType.uint16 => not needed or IntegerReader? or can we treat a ShortReader that way?
  DType.int32 => IntegerReader not available
  DType.uint32 => not needed or LongReader? or can we treat a IntegerReader that way?
//DType.int64 => LongReader exists with variants
  DType.uint64 => not needed or DoubleReader? or can we treat a LongReader that way?
  DType.float32 => FloatReader not available
//DType.float64 => DoubleReader exists with variants

@subes
Copy link
Author

subes commented Jan 18, 2022

I also tried using the new types with libjulia-clj:

for (final int count : new int[] { 10, 100, 250, 500, 1000, 10000 }) {
    System.out.println("\nArray Size: " + count);
    final double[] array = new double[count];
    for (int i = 0; i < count; i++) {
        array[i] = i / 10000D;
    }
    final int iterations = 100;
    final LibjuliacljScriptTaskEngineJulia engine = new LibjuliacljScriptTaskEngineJulia(this);
    for (int t = 0; t < 2; t++) {
        Instant start = new Instant();
        for (int i = 0; i < iterations; i++) {
            putDoubleVector("asdf", array);
            final double[] out = getDoubleVector("asdf");
            Assertions.checkEquals(array, out);
        }
        if (t == 1) {
            System.out.println("createArray: " + start);
        }
        //not convertible
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            putDoubleVectorBuffer("asdf", array);
            final double[] out = getDoubleVector("asdf");
            Assertions.checkEquals(array, out);
        }
        if (t == 1) {
            System.out.println("buffer: " + start);
        }
        //Cannot serialize type Ptr{Nothing}
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            putDoubleVectorBufferReader("asdf", array);
            final double[] out = getDoubleVector("asdf");
            Assertions.checkEquals(array, out);
        }
        if (t == 1) {
            System.out.println("bufferReader: " + start);
        }
        //put works, get returns "nothing"
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            putDoubleVectorTensor("asdf", array);
            final double[] out = getDoubleVector("asdf");
            Assertions.checkEquals(array, out);
        }
        if (t == 1) {
            System.out.println("tensor: " + start);
        }
    }
}
@Override
public void putDoubleVector(final String variable, final double[] vector) {
    final Object array = libjulia_clj.java_api.createArray("float64", new int[] { 1, vector.length }, vector);
    putGlobalFunction.invoke(variable, array);
}

public void putDoubleVectorTensor(final String variable, final double[] vector) {
    final NDBuffer tensor = Tensor.asTensor(vector);
    putGlobalFunction.invoke(variable, tensor);
}

public void putDoubleVectorBuffer(final String variable, final double[] vector) {
    final DoubleBuffer tensor = DoubleBuffer.wrap(vector);
    putGlobalFunction.invoke(variable, tensor);
}

public void putDoubleVectorBufferReader(final String variable, final double[] vector) {
    final DoubleReader tensor = new DoubleReader() {
        @Override
        public long lsize() {
            return vector.length;
        }

        @Override
        public double readDouble(final long idx) {
            return vector[(int) idx];
        }
    };
    putGlobalFunction.invoke(variable, tensor);
}

I guess we don't yet have the types considered in the put pathway for buffers and buffer readers (though dunno if it makes sense to support these types). Though I guess only the tensor variant will be supported (not yet complete). Or will we need the abstractions of tech.ml.dataset to support this, as you suggested previously?

@cnuernber
Copy link
Owner

Glad you got into it so quickly! That is encouraging. A few things I see -

Wrapping a double array in a reader like above means we disable the high performance copy pathway. When dtype wraps double arrays it does so in a way that allows the system to get back to the original buffer for high perf loops and such:

    double[] testData = toDoubleArray(range(100));
    Buffer wrappedData = toBuffer(testData);
    ArrayBufferData origData = asArrayBuffer(wrappedData);

    System.out.println(String.valueOf(System.identityHashCode(testData))
		       + " " + String.valueOf(System.identityHashCode(origData.arrayData)));

// Outputs:
// 748921347 748921347

There are DoubleWriters and such but one thing to note is that 'toBuffer' creates a buffer implementation that will allow writing.

For efficiently putting data into existing containers, there is DType.copy which falls back to the lower level arrayCopy or copyMemory fastpaths.

One thing you can do now is after creating the Julia array you can get a tensor from it with Tensor.asTensor. So you can can manipulate the data in the tensor from the Java side for the case where you are doing something other than just bulk uploads. Or you can copy new data in with DType.copy. You can also transpose it so you get the same representation on the Java side as Julia is seeing.

I didn't create the other integer types of reader base interfaces or float interfaces as if, for instance, you create a LongReader and override elemwiseDatatype to return int16 the system will interpret that as a short buffer. You need to make sure you return longs in the range of shorts but casting from a long to a short and back I think is nearly instantaneous and that is true for all integer types. I am not sure about casting from a double to float in terms of time but because the virtual call times are much longer than the cast times I didn't create the various subtype overloads. I think they do help in code clarity if you are dealing with signed integer types but as I said I don't believe there is an efficiency argument there.

The one perf test I think would be the time difference of Julia createArray between a double[][] and a single double[] reshaped to be a tensor of shape [1000 10].

@subes
Copy link
Author

subes commented Jan 18, 2022

The reader was an idea to maybe transparently transpose during the copy into julia. But yes, it could be more efficient to just transpose on the java side, then copy to julia. Or copy to julia, then transpose there.

To test the double[][] I guess first this pathway needs to work:

 //put works, get returns "nothing"
start = new Instant();
for (int i = 0; i < iterations; i++) {
    putDoubleVectorTensor("asdf", array);
    final double[] out = getDoubleVector("asdf");
    Assertions.checkEquals(array, out);
}
if (t == 1) {
    System.out.println("tensor: " + start);
}

I checked it in more detail:

final double[] array = new double[10];
final NDBuffer tensor = Tensor.asTensor(array);
putGlobalFunction.invoke("asdf", tensor);
libjulia_clj.java_api.runString("println(asdf)");
final Object arrayOut = libjulia_clj.java_api.runString("asdf");
System.out.println(arrayOut);
final Map<?, ?> map = libjulia_clj.java_api.arrayToJVM(array);

The output is:

nothing
nothing
2022-01-18 16:21:34.821 [ |7-1:InitializingJul] ERROR de.invesdwin.ERROR.process                                   - processing #00000002
de.invesdwin.context.log.error.LoggedRuntimeException: #00000002 java.lang.Exception: nil value passed into ->array
        ... 13 omitted, see following cause or error.log
Caused by - java.lang.Exception: nil value passed into ->array
        at tech.v3.datatype.copy_make_container$__GT_array.invokeStatic(copy_make_container.clj:160)
        at tech.v3.datatype.copy_make_container$__GT_array.invoke(copy_make_container.clj:153)
        at tech.v3.datatype.copy_make_container$__GT_int_array.invokeStatic(copy_make_container.clj:195)
        at tech.v3.datatype.copy_make_container$__GT_int_array.invoke(copy_make_container.clj:192)
        at tech.v3.datatype$__GT_int_array.invokeStatic(datatype.clj:89)
        at tech.v3.datatype$__GT_int_array.invoke(datatype.clj:86)
        at clojure.lang.Var.invoke(Var.java:384)
        at libjulia_clj.java_api$_arrayToJVM.invokeStatic(java_api.clj:126)
        at libjulia_clj.java_api$_arrayToJVM.invoke(java_api.clj:120)
        at libjulia_clj.java_api.arrayToJVM(Unknown Source)
      * at de.invesdwin.context.julia.runtime.libjuliaclj.internal.UncheckedJuliaEngineWrapper.init(UncheckedJuliaEngineWrapper.java:82) *

Thus the putGlobalFunction.invoke(...) fails to put the value. Instead "nothing" arrives in julia.

@cnuernber
Copy link
Owner

The most efficient transpose is definitely going to be on the Julia side.

What I meant was something more like:

Object juliaAry = createArray(data, "float64", [10]);
NDBuffer tens = asTensor(juliaAry);

Then you can do whatever you like to it in Java land as well as in Julia land. Reshape is the best way to take a flat buffer of data and end up with a tensor.

@cnuernber
Copy link
Owner

In Julia you have the choice when transposing of either creating a view which will incur indexing cost down the road or paying all the indexing cost and transposing directly which for 2D square matrixes has an inplace option. For sure Julia's implementation will be far more heavily optimized than mine :-).

@subes
Copy link
Author

subes commented Jan 18, 2022

Ah, ok great. With the example you provided I understand how this is supposed to work. ^^
I guess this is the same approach that is taken with libpython-clj right now.

And tech.ml.dataset might provide a put/get by utilizing arrow I guess?

@cnuernber
Copy link
Owner

Yes, the python bindings work the same way.

I hadn't thought of that really. For tmd there is a fast serialization path to an output stream or a block of memory so you can kind of choose how intense you want things to be. It serializes into Arrow stream format which it can then load via mmap if you want to lower your java heap usage. And python, R, etc. can mmap the same file themselves.

Arrow has a lot of options. It also has a C interchange format that I haven't looked into so you can directly share columns or entire datasets. For very tight coupling the straight C structure-based interchange pathway is probably best but as I said it is simpler to save the file to arrow streaming format and then mmap the result so I haven't invested time in the C pathway.

Arrow also introduced compression and it has support for structs so you could potentially save a dataset to an array of structs and compress it which would make it somewhat similar to your existing serialization. I don't yet support serializing a dataset to arrow in row-major form nor their compression but their compression is next on the list as I think it is useful.

@cnuernber
Copy link
Owner

Of course mmap doesn't make a ton of sense with compression but as orthogonal options for various scenarios I think it all makes a lot of sense.

@subes
Copy link
Author

subes commented Jan 18, 2022

I did a benchmark with the new path, it is indeed faster for matrix (when doing the index-by-index transpose during the index by index copy):

Array Size: 1*10=10
array->array: PT0.072.557.115S
tensor->tensor: PT0.050.459.582S
array->tensor: PT0.064.713.044S
tensor->array: PT0.055.351.334S

Array Size: 10*10=100
array->array: PT0.060.970.385S
tensor->tensor: PT0.047.616.097S
array->tensor: PT0.056.382.343S
tensor->array: PT0.051.879.886S

Array Size: 25*10=250
array->array: PT0.054.135.615S
tensor->tensor: PT0.046.517.671S
array->tensor: PT0.056.972.308S
tensor->array: PT0.056.526.428S

Array Size: 50*10=500
array->array: PT0.049.437.031S
tensor->tensor: PT0.041.438.917S
array->tensor: PT0.048.155.145S
tensor->array: PT0.044.329.402S

Array Size: 100*10=1000
array->array: PT0.048.341.229S
tensor->tensor: PT0.041.696.545S
array->tensor: PT0.046.851.216S
tensor->array: PT0.044.586.612S

Array Size: 1000*10=10000
array->array: PT0.076.822.862S
tensor->tensor: PT0.061.204.679S
array->tensor: PT0.059.972.084S
tensor->array: PT0.056.766.389S
@Override
public void putDoubleMatrix(final String variable, final double[][] matrix) {
    //        IScriptTaskRunnerJulia.LOG.debug("> put %s", variable);
    final int cols = matrix[0].length;
    final int rows = matrix.length;
    final double[] vector = new double[rows * cols];
    int i = 0;
    for (int c = 0; c < cols; c++) {
        for (int r = 0; r < rows; r++) {
            vector[i] = matrix[r][c];
            i++;
        }
    }
    final Object array = libjulia_clj.java_api.createArray("float64", new int[] { cols, rows }, vector);
    putGlobalFunction.invoke(variable, array);
}

public void putDoubleMatrixTensor(final String variable, final double[][] matrix) {
    //        IScriptTaskRunnerJulia.LOG.debug("> put %s", variable);
    final int cols = matrix[0].length;
    final int rows = matrix.length;
    final Object array = libjulia_clj.java_api
            .runString(variable + " = Array{Float64}(undef, " + rows + ", " + cols + "); " + variable);
    final NDBuffer tensor = Tensor.asTensor(array);
    int i = 0;
    for (int c = 0; c < cols; c++) {
        for (int r = 0; r < rows; r++) {
            tensor.ndWriteDouble(i, matrix[r][c]);
            i++;
        }
    }
}

@Override
public double[][] getDoubleMatrix(final String variable) {
    //        IScriptTaskRunnerJulia.LOG.debug("> get %s", variable);
    final Object array = libjulia_clj.java_api.runString("__ans__=" + variable + ";\n__ans__");
    if (array == null) {
        return null;
    }
    if (!isJuliaArray(array)) {
        return getDoubleMatrixAsJson("__ans__");
    }
    final Map<?, ?> map = libjulia_clj.java_api.arrayToJVM(array);
    final double[] vector = Doubles.checkedCastVector(map.get("data"));
    final int[] dims = (int[]) map.get("shape");
    if (dims.length != 2) {
        throw new IllegalArgumentException("Not a matrix: " + Arrays.toString(dims));
    }
    final int cols = dims[0];
    final int rows = dims[1];
    final double[][] matrix = new double[rows][];
    for (int r = 0; r < rows; r++) {
        matrix[r] = new double[cols];
    }
    int i = 0;
    for (int c = 0; c < cols; c++) {
        for (int r = 0; r < rows; r++) {
            matrix[r][c] = vector[i];
            i++;
        }
    }
    return matrix;
}

public double[][] getDoubleMatrixTensor(final String variable) {
    //        IScriptTaskRunnerJulia.LOG.debug("> get %s", variable);
    final Object array = libjulia_clj.java_api.runString("__ans__=" + variable + ";\n__ans__");
    if (array == null) {
        return null;
    }
    if (!isJuliaArray(array)) {
        return getDoubleMatrixAsJson("__ans__");
    }
    final NDBuffer tensor = Tensor.asTensor(array);
    final Iterable<Object> shape = tensor.shape();
    final Iterator<Object> dims = shape.iterator();
    final int cols = Integers.checkedCast(dims.next());
    final int rows = Integers.checkedCast(dims.next());
    if (dims.hasNext()) {
        throw new IllegalArgumentException("Not a matrix: " + Lists.toList(shape));
    }
    final double[][] matrix = new double[rows][];
    for (int r = 0; r < rows; r++) {
        matrix[r] = new double[cols];
    }
    int i = 0;
    for (int c = 0; c < cols; c++) {
        for (int r = 0; r < rows; r++) {
            matrix[r][c] = tensor.ndReadDouble(i);
            i++;
        }
    }
    return matrix;
}

So really good job. :)
Though also that api only supports long/double/boolean. But these are anyway the most important types.

@subes
Copy link
Author

subes commented Jan 18, 2022

Correction, boolean does not work:

@Override
public void putBooleanMatrix(final String variable, final boolean[][] matrix) {
    IScriptTaskRunnerJulia.LOG.debug("> put %s", variable);
    final int cols = matrix[0].length;
    final int rows = matrix.length;
    final Object array = libjulia_clj.java_api
            .runString(variable + " = Array{Bool}(undef, " + rows + ", " + cols + "); " + variable);
    final NDBuffer tensor = Tensor.asTensor(array);
    int i = 0;
    for (int c = 0; c < cols; c++) {
        for (int r = 0; r < rows; r++) {
            tensor.ndWriteBoolean(i, matrix[r][c]);
            i++;
        }
    }
}
Caused by - clojure.lang.ExceptionInfo: datatype is not numeric: :boolean
        at tech.v3.datatype.casting$numeric_byte_width.invokeStatic(casting.clj:174)
        at tech.v3.datatype.casting$numeric_byte_width.invokePrim(casting.clj)
        at libjulia_clj.impl.base$julia_array__GT_nd_descriptor.invokeStatic(base.clj:661)
        at libjulia_clj.impl.base$julia_array__GT_nd_descriptor.invoke(base.clj:652)
        at libjulia_clj.impl.base.JuliaArray.as_tensor(base.clj:692)
        at tech.v3.tensor_api$as_tensor.invokeStatic(tensor_api.clj:578)
        at tech.v3.tensor_api$as_tensor.invoke(tensor_api.clj:572)
        at tech.v3.tensor$as_tensor.invokeStatic(tensor.clj:63)
        at tech.v3.tensor$as_tensor.invoke(tensor.clj:59)
        at clojure.lang.Var.invoke(Var.java:384)
        at tech.v3.Clj.call(Clj.java:145)
        at tech.v3.Tensor.asTensor(Tensor.java:127)
      * at de.invesdwin.context.julia.runtime.libjuliaclj.internal.UncheckedJuliaEngineWrapper.putBooleanMatrix(UncheckedJuliaEngineWrapper.java:638) *

@cnuernber
Copy link
Owner

Right, and it links to that other issue. Boolean isn't a numeric datatype, it is usually implemented at the C level by signed or unsigned characters with the value of 0 or 1.

The tensor interface only supports reading and writing using longs or doubles. Tensors storage can be any of the C datatypes themselves:

user> (Tensor/makeTensor (range 9) [3 3] :uint8)
#tech.v3.tensor<uint8>[3 3]
[[0 1 2]
 [3 4 5]
 [6 7 8]]

Or similarly:

user> (Tensor/reshape (DType/makeContainer :uint8 (range 9)) [3 3])
#tech.v3.tensor<uint8>[3 3]
[[0 1 2]
 [3 4 5]
 [6 7 8]]

Just the reading/writing only supports those datatypes which I argue have fast cast semantics to the rest; at least fast enough that the vcall overhead overwhelms the concrete additional time. You can definitely create float32 and uint16, etc type tensors and read/write to them using the long or double interfaces. I just didn't want to type out 24 functions for reading data and 24 functions for writing data.

@cnuernber
Copy link
Owner

cnuernber commented Jan 18, 2022

Or, for a much more technical argument - https://cnuernber.github.io/dtype-next/datatype-to-dtype-next.html

The synopsis was that I found that the code generation required to create code for each primitive type first off wasn't faster in actual execution time and unnecessarily increased the compile time and binary size of the system.

@cnuernber
Copy link
Owner

Release 8.060 adds a simple POD descriptor type for arraybuffers and nativebuffers and fixes an issue with makeTensor.

@subes
Copy link
Author

subes commented Jan 18, 2022

Yup, you are right. Going through a bigger type is not an issue. Performance results show the same relations even for smaller types that go through read/write with larger types. I will migrate all of my put/get matrix calls. I guess we can thus close this issue.

I would still be interested in maybe examples for (maybe you have them):

  • converting a tensor created in java into something that neanderthal understands
  • filling a tech.ml.dataset from a tensor created in java and how to go back to java

I guess both neanderthal and tech.ml.dataset could be worked with inside of clojure scripts that way. And I guess with similar paths scicloj should also work. I looked at the tech.ml.dataset docs and dtype-next docs but did not come to conclusions about these use cases.

@cnuernber
Copy link
Owner

I am glad you tested that pathway :-). I appreciate the reality check :-).

I will add in neanderthal dense matrix pathway to the Java Main sample and let you know.

I can also show a dataset constructor using the columns of the matrix. Datasets can be constructed from a map of column name to column so once you have a 2d tensor you can either use the rows or columns and name them to get a dataset. It will share the memory regardless. And if you just want to pass in a map of columns of primitive arrays, nio buffers, or readers those all work too. The most inefficient way to construct them which is still surprisingly fast is to pass in a sequence of maps in row-major form. Clojure has clever ways to create objects that look like maps but have a fixed minimum number of keys -- those are clojure records and this can help a lot with the row-major pathway but it is still much faster of course to use columns as the dataset system simply ensures there is a conversion to a buffer.

Then a scicloj example of a classifier and a regressor would be interesting - I agree.

All that will also proof out my minimal clojure layer such that it is enough raw clojure bindings to make using these systems fairly straight forward even if there is no bespoke Java api.

All great thoughts, keep em coming!

@subes
Copy link
Author

subes commented Jan 18, 2022

Here the most extreme example of put/get byte through a long:

Array Size: 1*10=10
array->array: PT0.084.700.160S
tensor->tensor: PT0.053.167.573S
array->tensor: PT0.074.474.954S
tensor->array: PT0.057.273.157S

Array Size: 10*10=100
array->array: PT0.062.787.909S
tensor->tensor: PT0.048.445.076S
array->tensor: PT0.055.572.510S
tensor->array: PT0.049.938.167S

Array Size: 25*10=250
array->array: PT0.056.490.228S
tensor->tensor: PT0.045.131.411S
array->tensor: PT0.059.297.107S
tensor->array: PT0.047.697.818S

Array Size: 50*10=500
array->array: PT0.051.574.095S
tensor->tensor: PT0.042.963.530S
array->tensor: PT0.048.996.814S
tensor->array: PT0.047.061.821S

Array Size: 100*10=1000
array->array: PT0.054.053.037S
tensor->tensor: PT0.043.719.201S
array->tensor: PT0.050.506.487S
tensor->array: PT0.045.817.824S

Array Size: 1000*10=10000
array->array: PT0.087.889.003S
tensor->tensor: PT0.065.609.443S
array->tensor: PT0.068.957.770S
tensor->array: PT0.068.516.036S
final int cols = 10;
for (final int rows : new int[] { 1, 10, 25, 50, 100, 1000 }) {
    System.out.println("\nArray Size: " + rows + "*" + cols + "=" + (rows * cols));
    final byte[][] matrix = new byte[rows][];
    int element = 0;
    for (int i = 0; i < rows; i++) {
        final byte[] row = new byte[cols];
        matrix[i] = row;
        for (int j = 0; j < cols; j++) {
            row[j] = (byte) element++;
        }
    }
    final int iterations = 100;
    final LibjuliacljScriptTaskEngineJulia engine = new LibjuliacljScriptTaskEngineJulia(this);
    for (int t = 0; t < 2; t++) {
        Instant start = new Instant();
        for (int i = 0; i < iterations; i++) {
            putByteMatrix("asdf", matrix);
            final byte[][] out = getByteMatrix("asdf");
            Assertions.checkEquals(matrix, out);
        }
        if (t == 1) {
            System.out.println("array->array: " + start);
        }
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            putByteMatrixTensor("asdf", matrix);
            final byte[][] out = getByteMatrixTensor("asdf");
            Assertions.checkEquals(matrix, out);
        }
        if (t == 1) {
            System.out.println("tensor->tensor: " + start);
        }
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            putByteMatrix("asdf", matrix);
            final byte[][] out = getByteMatrixTensor("asdf");
            Assertions.checkEquals(matrix, out);
        }
        if (t == 1) {
            System.out.println("array->tensor: " + start);
        }
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            putByteMatrixTensor("asdf", matrix);
            final byte[][] out = getByteMatrix("asdf");
            Assertions.checkEquals(matrix, out);
        }
        if (t == 1) {
            System.out.println("tensor->array: " + start);
        }
    }
}

@subes
Copy link
Author

subes commented Jan 18, 2022

I agree, I think we can achieve a lot with a minimal layer. I was also positively impressed by the Clj class you provided. ^^

@cnuernber
Copy link
Owner

That is great! I discovered import static and with that I can make a file in Java look a lot like a file in Clojure so I went with it. I honestly am a bit surprised by how little effort the Clojure team has made to make Clojure easy to use from Java. I think before Java 8 it was probably not worth it but Java 8 has a lot of typing efficiency details and Java 10 has var. So import static along with var gets you a long way in terms of making very compact code if that is your game.

I updated the sample with neanderthal and bit of dataset and arrow demonstration - https://github.com/cnuernber/dtype-next/blob/master/java_test/java/jtest/Main.java.

The bootup time of neanderthal is intense for two reasons. First it includes clojure.core.async and this library has a somewhat sophisticated compile-time system for async programming. It alone takes I think 5 or 6 seconds. Next Neanderthal unpacks MKL because I don't have the javacpp MKL bindings on my classpath.

All that side it all works great from Java and is very nearly as condensed as the Clojure version so that is encouraging.

@subes
Copy link
Author

subes commented Jan 19, 2022

This might be material for another clojure talk ^^
Dunno if people used or saw an intergration like this before. Invoking neanderthal and tmd seems rather fluent. Besides the lack of typed variables, which indeed could just be written as “var”s.

@cnuernber
Copy link
Owner

I would love to get an infoq talk on this. Within the Clojure community -- I am not sure how well received it would be but it sure is fun. No one is writing Java code like this which is of course both good and bad.

I added a high performance parallelism primitive, the one used most frequently throughout the system - demo.

One huge benefit of the Java API is I am only exposing the very key concepts and things. I think this really helps focus someone new on exactly what is different and then they can explore more of the Clojure API as they see fit. I have trouble with new people getting overwhelmed as it is.

I am going to move to implementing the compression specification in Arrow and then do a pass to give tmd a Java API. I need to think about how to talk about this system because I think it is unique in JVM-land or at the very least the most usable of its type by far.

@subes
Copy link
Author

subes commented Jan 19, 2022

Also by exposing it as a Java API, it becomes available to JRuby, Scala, Kotlin, ... on the JVM. They all provide convenience methods to integrate Java. Also there are Julia and Python bindings that allow to embed Java inside them. This API then allows to call clojure code from Julia and Python. Java is here the least common denominator.

@cnuernber
Copy link
Owner

Previously I thought that merely by publishing a jar people could use this system from other JVM languages and I guess that is true to some extent but not without being extremely knowledgeable with Clojure to begin with which completely defeats the purpose.

@subes
Copy link
Author

subes commented Jan 19, 2022

Yes, the Clojure compiler/binding is anything but intuitive, maybe also the reason why the existing JSR-223 bindings were working only sort of.

@subes
Copy link
Author

subes commented Jan 19, 2022

I will thus close this issue, we can then move on to the follow up story: techascent/tech.ml.dataset#281

Again, I will be happy to try it!

@subes subes closed this as completed Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants