Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create tensor of float16 as input using the Java API #7003

Closed
andyguo007 opened this issue Mar 13, 2021 · 15 comments · Fixed by #16703
Closed

create tensor of float16 as input using the Java API #7003

andyguo007 opened this issue Mar 13, 2021 · 15 comments · Fixed by #16703
Labels
api:Java issues related to the Java API core runtime issues related to core runtime

Comments

@andyguo007
Copy link

Hi team,

In order to run a fp16 model, do we have a way to create a tensor of float16 as input using the Java API? e.g.,
OnnxTensor.createTensor(env, FloatBuffer.wrap(pixels), shape, ***);

Thanks,

@Craigacp
Copy link
Contributor

There is no way to do that in the Java API at the moment. It supports the outbound transformation (i.e. a model can produce a fp16 output and it will be converted into a FloatBuffer or float array), but the inbound transformation is not allowed.

What's your fp16 input type in Java? Are you storing them in short or do you want to downcast a float into fp16?

@askhade askhade added api:Java issues related to the Java API core runtime issues related to core runtime type:support labels Mar 15, 2021
@andyguo007
Copy link
Author

There is no way to do that in the Java API at the moment. It supports the outbound transformation (i.e. a model can produce a fp16 output and it will be converted into a FloatBuffer or float array), but the inbound transformation is not allowed.

What's your fp16 input type in Java? Are you storing them in short or do you want to downcast a float into fp16?

The input type in Java is float. I converted the model from fp32 to fp16, wondering if there is a way to use the "accelerated(fp16) model".

@Craigacp
Copy link
Contributor

So you'd like to be able to pass in a float[] or FloatBuffer and have ORT downcast the floats into fp16 before storing them in the OnnxTensor?

@andyguo007
Copy link
Author

So you'd like to be able to pass in a float[] or FloatBuffer and have ORT downcast the floats into fp16 before storing them in the OnnxTensor?

Yes.

Since the fp16 model expects fp16 input, currently I am getting this error:
ai.onnxruntime.OrtException: Error code - ORT_INVALID_ARGUMENT - message: Unexpected input data type. Actual: (N11onnxruntime17PrimitiveDataTypeIfEE) , expected: (N11onnxruntime17PrimitiveDataTypeINS_9MLFloat16EEE)

@Craigacp
Copy link
Contributor

Ok. If you don't need to persist fp16 values in Java, and are fine with storing floats on the Java side then it's easier to implement, but still will require a bunch of changes to the Java side of ORT. I'll put it on the list of things to do.

@faxu faxu removed the type:support label Aug 18, 2021
@MankaranSingh
Copy link

Hi, Any updates on this ?

@Craigacp
Copy link
Contributor

The support is still not available in the ORT Java API, though Java 20 will likely have conversion methods fp32 <-> fp16 which will make it easier to implement (https://download.java.net/java/early_access/jdk20/docs/api/java.base/java/lang/Float.html#floatToFloat16(float)).

For the time being you can take an ONNX model in fp16 and add a fp32 -> fp16 cast node to the start of it using the ONNX python tooling (or edit the protobuf in java).

@MankaranSingh
Copy link

Hey @Craigacp, thanks for quick reply. I have a float16 model and would like to add a node to convert the float32 input to float16, Is there any sample code available ? My model has multiple named input nodes and would like to preserve the naming.
Thanks!

@MankaranSingh
Copy link

On a side note, is it possible to implement a method that just takes byte-buffer as inputs and let user specify what type it is? The conversion will take place in the c/c++ side in JNI layer.

@Craigacp
Copy link
Contributor

I don't have sample code, but you can load the protobuf and add cast nodes. Preserving the names will be trickier as you'll need to rename the existing input layer and that tends to ripple.

We could implement something that accepted a byte buffer and a type, but you'd still have to prepare the fp16 values in Java to put into the byte buffer, and it would be significantly easier to mess things up by accidentally specifying the wrong type, or getting the endian-ness wrong.

@MankaranSingh
Copy link

MankaranSingh commented Jan 29, 2023

Ended up running this script in reverse to convert whole model to float32. Needed that anyways because most onnx to 'other format' model converters dont support Cast operator.

skottmckay added a commit that referenced this issue Jul 21, 2023
### Description
The Java API currently only supports fp16 output tensors which it
automatically casts to floats on the way out. This PR adds support for
creating fp16 and bf16 tensors (from `java.nio.Buffer` objects or as the
output of models, creation from Java short arrays is not supported),
along with efficient methods for casting `FloatBuffer` into
`ShortBuffer` filled with fp16 or bf16 values and vice versa.

The fp16 conversions use a trick to pull in the efficient conversion
methods added to Java 20, falling back to ports of the MLAS methods
otherwise. The Java 20 methods can be special cased by the C2 JIT
compiler to emit the single instruction on x86 and ARM which converts
fp32<->fp16, or the vectorized versions thereof, so they should be quite
a bit faster than the MLAS ported one.

### Motivation and Context
fp16 and bf16 are increasingly popular formats and we've had several
requests for this functionality. Fixes #7003.

cc @yuslepukhin  @cassiebreviu

---------

Co-authored-by: Scott McKay <[email protected]>
jchen351 pushed a commit that referenced this issue Aug 12, 2023
### Description
The Java API currently only supports fp16 output tensors which it
automatically casts to floats on the way out. This PR adds support for
creating fp16 and bf16 tensors (from `java.nio.Buffer` objects or as the
output of models, creation from Java short arrays is not supported),
along with efficient methods for casting `FloatBuffer` into
`ShortBuffer` filled with fp16 or bf16 values and vice versa.

The fp16 conversions use a trick to pull in the efficient conversion
methods added to Java 20, falling back to ports of the MLAS methods
otherwise. The Java 20 methods can be special cased by the C2 JIT
compiler to emit the single instruction on x86 and ARM which converts
fp32<->fp16, or the vectorized versions thereof, so they should be quite
a bit faster than the MLAS ported one.

### Motivation and Context
fp16 and bf16 are increasingly popular formats and we've had several
requests for this functionality. Fixes #7003.

cc @yuslepukhin  @cassiebreviu

---------

Co-authored-by: Scott McKay <[email protected]>
@Xiao-Nine
Copy link

Sry, I noticed that this issue has been closed,but how to create tensor of float16 as input? I haven't found any interfaces for me to use in onnxruntime_gpu 1.17.3. An exception (java.lang.ClassCastException: class java.nio.HeapByteBuffer cannot be cast to class java.nio.ShortBuffer) has occurred when I try to use OnnxTensor.createTensor(this.env, inputBuffer, INPUT_SHAPE, OnnxJavaType.FLOAT16)

@Craigacp
Copy link
Contributor

Craigacp commented May 6, 2024

There's an example in the tests - https://github.com/microsoft/onnxruntime/blob/main/java/src/test/java/ai/onnxruntime/OnnxTensorTest.java#L298 which shows creating a direct byte buffer, taking a short buffer view of it and then writing in fp16 values to it. Alternatively you can use this method to prepare a suitable ShortBuffer from a FloatBuffer - https://github.com/microsoft/onnxruntime/blob/main/java/src/main/jvm/ai/onnxruntime/platform/Fp16Conversions.java#L67. What's the stack trace of the exception?

@Xiao-Nine
Copy link

Thank you for your reply.
My envs:

  • JDK17
  • onnxruntime_gpu 1.17.3

I try to create a OnnxTensor by ByteBuffer like https://github.com/microsoft/onnxruntime/blob/main/java/src/test/java/ai/onnxruntime/OnnxTensorTest.java#L298.
but an exception occurred:

java.lang.ClassCastException: class java.nio.HeapByteBuffer cannot be cast to class java.nio.ShortBuffer (java.nio.HeapByteBuffer and java.nio.ShortBuffer are in module java.base of loader 'bootstrap')

	at ai.onnxruntime.OrtUtil.prepareBuffer(OrtUtil.java:524)
	at ai.onnxruntime.OnnxTensor.createTensor(OnnxTensor.java:754)
	at ai.onnxruntime.OnnxTensor.createTensor(OnnxTensor.java:610)
	at ai.onnxruntime.OnnxTensor.createTensor(OnnxTensor.java:589)
	at com.example.demo.DemoApplicationTests.modelTest(DemoApplicationTests.java:58)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)

@Craigacp
Copy link
Contributor

Craigacp commented May 7, 2024

You need to pass in a direct ByteBuffer allocated with ByteBuffer.allocateDirect then call asShortBuffer() on it. Otherwise it tries to cast the buffer you pass in back as a ShortBuffer to copy it into a direct ByteBuffer and fails. I'll work at improving the checks so it accepts byte buffers of any kind and does the copy, but for the moment you need to prepare it appropriately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api:Java issues related to the Java API core runtime issues related to core runtime
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants