FP16 gemm on cpu not implemented! #15780
Replies: 15 comments
-
Hey, this is the MXNet Label Bot. |
Beta Was this translation helpful? Give feedback.
-
@KhurramPirov This is expected behavior on CPU where the FP16 GEMM doesn't support. |
Beta Was this translation helpful? Give feedback.
-
Hi @pengzhao-intel , yes I have memory leak problem during predict steps in a loop. All answers which I have found referred to asynchronous operations, that is why I decided to paste either something as .asnumpy(), .asscalar() or just print to make it wait for all operations and start to be synchronous. It is well known problem, but I don't know the solution with FP16 bug. |
Beta Was this translation helpful? Give feedback.
-
My personal suggestion is that if you want to use FP16, please use a GPU which supports FP16. Currently we don't have native data type and instruction of FP16 on CPU. |
Beta Was this translation helpful? Give feedback.
-
@TaoLv my GPU supports FP16, as you see "FP16 gemm on cpu" it is error on cpu support. |
Beta Was this translation helpful? Give feedback.
-
Is it available to use FP32 GEMM to emulate FP16 GEMM temporarily? Although the performance is slow, users can test FP16 model on CPU. |
Beta Was this translation helpful? Give feedback.
-
I tried to use float32 everywhere, but got the same error linked with FP16 when made predict step. |
Beta Was this translation helpful? Give feedback.
-
Is your model trained with FP16? Can you double check if there are any cast operators in the saved model? |
Beta Was this translation helpful? Give feedback.
-
yeah, I have for inputs to my net: and then for predictions I use: But I tried to make np.float32 for inputs but got the same error with FP16)) |
Beta Was this translation helpful? Give feedback.
-
@eric-haibin-lin @zhreshold Do you any experience on this? Train the model with FP16 and then run inference with FP32? I'm afraid the weight parameter is saved with FP16 data type. |
Beta Was this translation helpful? Give feedback.
-
I don't think it is the problem with 16 to 32 change. I noticed that any command which can make inloop actions synchronous leads to the same error I have posted above.
The recommendations are taken from |
Beta Was this translation helpful? Give feedback.
-
@TaoLv Will appreciate you with any new advice, because the research is stuck because of this. |
Beta Was this translation helpful? Give feedback.
-
If you want to use a GPU, please make sure the context of your model is gpu. If you want to use a CPU, possibly you need re-trian your model with FP32. |
Beta Was this translation helpful? Give feedback.
-
@KhurramPirov is your model trained using fp16? Are you using any pre-trained model? It's possible that some weights are stored in fp16 and you need to cast the checkpoint to fp32. |
Beta Was this translation helpful? Give feedback.
-
Description
Try to add synhronious function to avoid memory leak but got this error
Environment info (Required)
Package used (Python)
Minimum reproducible example
for k, batch in enumerate(train_dataiter2):
temp = mod_fc.predict(batch.data[0])
out = mx.ndarray.cast(data=temp, dtype='float32')
print(f"out_norm:{nd.norm(out)}")#for avoiding memory leak
mxnet.base.MXNetError: [13:56:17] src/operator/nn/./../linalg_impl.h:166: FP16 gemm on cpu not implemented!
Beta Was this translation helpful? Give feedback.
All reactions