-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create tensor of float16 as input using the Java API #7003
Comments
There is no way to do that in the Java API at the moment. It supports the outbound transformation (i.e. a model can produce a fp16 output and it will be converted into a What's your fp16 input type in Java? Are you storing them in |
The input type in Java is |
So you'd like to be able to pass in a |
Yes. Since the fp16 model expects fp16 input, currently I am getting this error: |
Ok. If you don't need to persist fp16 values in Java, and are fine with storing floats on the Java side then it's easier to implement, but still will require a bunch of changes to the Java side of ORT. I'll put it on the list of things to do. |
Hi, Any updates on this ? |
The support is still not available in the ORT Java API, though Java 20 will likely have conversion methods fp32 <-> fp16 which will make it easier to implement (https://download.java.net/java/early_access/jdk20/docs/api/java.base/java/lang/Float.html#floatToFloat16(float)). For the time being you can take an ONNX model in fp16 and add a fp32 -> fp16 cast node to the start of it using the ONNX python tooling (or edit the protobuf in java). |
Hey @Craigacp, thanks for quick reply. I have a float16 model and would like to add a node to convert the float32 input to float16, Is there any sample code available ? My model has multiple named input nodes and would like to preserve the naming. |
On a side note, is it possible to implement a method that just takes byte-buffer as inputs and let user specify what type it is? The conversion will take place in the c/c++ side in JNI layer. |
I don't have sample code, but you can load the protobuf and add cast nodes. Preserving the names will be trickier as you'll need to rename the existing input layer and that tends to ripple. We could implement something that accepted a byte buffer and a type, but you'd still have to prepare the fp16 values in Java to put into the byte buffer, and it would be significantly easier to mess things up by accidentally specifying the wrong type, or getting the endian-ness wrong. |
Ended up running this script in reverse to convert whole model to float32. Needed that anyways because most onnx to 'other format' model converters dont support Cast operator. |
### Description The Java API currently only supports fp16 output tensors which it automatically casts to floats on the way out. This PR adds support for creating fp16 and bf16 tensors (from `java.nio.Buffer` objects or as the output of models, creation from Java short arrays is not supported), along with efficient methods for casting `FloatBuffer` into `ShortBuffer` filled with fp16 or bf16 values and vice versa. The fp16 conversions use a trick to pull in the efficient conversion methods added to Java 20, falling back to ports of the MLAS methods otherwise. The Java 20 methods can be special cased by the C2 JIT compiler to emit the single instruction on x86 and ARM which converts fp32<->fp16, or the vectorized versions thereof, so they should be quite a bit faster than the MLAS ported one. ### Motivation and Context fp16 and bf16 are increasingly popular formats and we've had several requests for this functionality. Fixes #7003. cc @yuslepukhin @cassiebreviu --------- Co-authored-by: Scott McKay <[email protected]>
### Description The Java API currently only supports fp16 output tensors which it automatically casts to floats on the way out. This PR adds support for creating fp16 and bf16 tensors (from `java.nio.Buffer` objects or as the output of models, creation from Java short arrays is not supported), along with efficient methods for casting `FloatBuffer` into `ShortBuffer` filled with fp16 or bf16 values and vice versa. The fp16 conversions use a trick to pull in the efficient conversion methods added to Java 20, falling back to ports of the MLAS methods otherwise. The Java 20 methods can be special cased by the C2 JIT compiler to emit the single instruction on x86 and ARM which converts fp32<->fp16, or the vectorized versions thereof, so they should be quite a bit faster than the MLAS ported one. ### Motivation and Context fp16 and bf16 are increasingly popular formats and we've had several requests for this functionality. Fixes #7003. cc @yuslepukhin @cassiebreviu --------- Co-authored-by: Scott McKay <[email protected]>
Sry, I noticed that this issue has been closed,but how to create tensor of float16 as input? I haven't found any interfaces for me to use in onnxruntime_gpu 1.17.3. An exception (java.lang.ClassCastException: class java.nio.HeapByteBuffer cannot be cast to class java.nio.ShortBuffer) has occurred when I try to use OnnxTensor.createTensor(this.env, inputBuffer, INPUT_SHAPE, OnnxJavaType.FLOAT16) |
There's an example in the tests - https://github.com/microsoft/onnxruntime/blob/main/java/src/test/java/ai/onnxruntime/OnnxTensorTest.java#L298 which shows creating a direct byte buffer, taking a short buffer view of it and then writing in fp16 values to it. Alternatively you can use this method to prepare a suitable |
Thank you for your reply.
I try to create a OnnxTensor by ByteBuffer like https://github.com/microsoft/onnxruntime/blob/main/java/src/test/java/ai/onnxruntime/OnnxTensorTest.java#L298.
|
You need to pass in a direct ByteBuffer allocated with |
Hi team,
In order to run a fp16 model, do we have a way to create a tensor of float16 as input using the Java API? e.g.,
OnnxTensor.createTensor(env, FloatBuffer.wrap(pixels), shape, ***);
Thanks,
The text was updated successfully, but these errors were encountered: