FP16 gemm on cpu not implemented! #15780

KhurramPirov · 2019-08-07T11:02:53Z

KhurramPirov
Aug 7, 2019

Description

Try to add synhronious function to avoid memory leak but got this error

Environment info (Required)

Package used (Python)

Minimum reproducible example

for k, batch in enumerate(train_dataiter2):
temp = mod_fc.predict(batch.data[0])
out = mx.ndarray.cast(data=temp, dtype='float32')
print(f"out_norm:{nd.norm(out)}")#for avoiding memory leak

mxnet.base.MXNetError: [13:56:17] src/operator/nn/./../linalg_impl.h:166: FP16 gemm on cpu not implemented!

mxnet-label-bot · 2019-08-07T11:02:57Z

mxnet-label-bot
Aug 7, 2019

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Performance

0 replies

pengzhao-intel · 2019-08-08T01:28:04Z

pengzhao-intel
Aug 8, 2019
Collaborator

@KhurramPirov This is expected behavior on CPU where the FP16 GEMM doesn't support.
Seems your problem is the memory leak so I suggest filing a bug to report leak problem.

0 replies

KhurramPirov · 2019-08-08T09:05:02Z

KhurramPirov
Aug 8, 2019
Author

@KhurramPirov This is expected behavior on CPU where the FP16 GEMM doesn't support.
Seems your problem is the memory leak so I suggest filing a bug to report leak problem.

Hi @pengzhao-intel , yes I have memory leak problem during predict steps in a loop. All answers which I have found referred to asynchronous operations, that is why I decided to paste either something as .asnumpy(), .asscalar() or just print to make it wait for all operations and start to be synchronous. It is well known problem, but I don't know the solution with FP16 bug.

0 replies

TaoLv · 2019-08-08T10:19:41Z

TaoLv
Aug 8, 2019
Collaborator

My personal suggestion is that if you want to use FP16, please use a GPU which supports FP16. Currently we don't have native data type and instruction of FP16 on CPU.

0 replies

KhurramPirov · 2019-08-08T10:33:05Z

KhurramPirov
Aug 8, 2019
Author

@TaoLv my GPU supports FP16, as you see "FP16 gemm on cpu" it is error on cpu support.

0 replies

wkcn · 2019-08-08T12:13:50Z

wkcn
Aug 8, 2019
Collaborator

Is it available to use FP32 GEMM to emulate FP16 GEMM temporarily?

Although the performance is slow, users can test FP16 model on CPU.

0 replies

KhurramPirov · 2019-08-08T12:29:24Z

KhurramPirov
Aug 8, 2019
Author

Is it available to use FP32 GEMM to emulate FP16 GEMM temporarily?

Although the performance is slow, users can test FP16 model on CPU.

I tried to use float32 everywhere, but got the same error linked with FP16 when made predict step.

0 replies

TaoLv · 2019-08-08T13:54:53Z

TaoLv
Aug 8, 2019
Collaborator

Is your model trained with FP16? Can you double check if there are any cast operators in the saved model?

0 replies

KhurramPirov · 2019-08-08T13:58:55Z

KhurramPirov
Aug 8, 2019
Author

Is your model trained with FP16? Can you double check if there are any cast operators in the saved model?

@TaoLv

yeah, I have for inputs to my net:
data = mx.sym.Cast(data=data, dtype=np.float16)

and then for predictions I use:
out = mx.ndarray.cast(data=temp, dtype='float32')

But I tried to make np.float32 for inputs but got the same error with FP16))

0 replies

TaoLv · 2019-08-08T14:16:00Z

TaoLv
Aug 8, 2019
Collaborator

@eric-haibin-lin @zhreshold Do you any experience on this? Train the model with FP16 and then run inference with FP32? I'm afraid the weight parameter is saved with FP16 data type.

0 replies

KhurramPirov · 2019-08-08T14:24:17Z

KhurramPirov
Aug 8, 2019
Author

@eric-haibin-lin @zhreshold Do you any experience on this? Train the model with FP16 and then run inference with FP32? I'm afraid the weight parameter is saved with FP16 data type.

I don't think it is the problem with 16 to 32 change. I noticed that any command which can make inloop actions synchronous leads to the same error I have posted above.
for k, batch in enumerate(train_dataiter2):

          temp = mod_fc.predict(batch.data[0])
          temp.wait_to_read()# here you can paste .asnumpy(), .asscalar() or print or nd.waitall(),.

The recommendations are taken from
https://www.d2l.ai/chapter_computational-performance/async-computation.html. All this steps I try to implement to avoid Memory Leak.

0 replies

KhurramPirov · 2019-08-12T09:48:20Z

KhurramPirov
Aug 12, 2019
Author

@TaoLv Will appreciate you with any new advice, because the research is stuck because of this.

0 replies

TaoLv · 2019-08-12T12:26:34Z

TaoLv
Aug 12, 2019
Collaborator

If you want to use a GPU, please make sure the context of your model is gpu. If you want to use a CPU, possibly you need re-trian your model with FP32.

0 replies

eric-haibin-lin · 2019-08-14T18:18:23Z

eric-haibin-lin
Aug 14, 2019
Collaborator

@KhurramPirov is your model trained using fp16? Are you using any pre-trained model? It's possible that some weights are stored in fp16 and you need to cast the checkpoint to fp32.

0 replies

chinakook · 2020-04-27T01:09:54Z

chinakook
Apr 27, 2020

dmlc/gluon-cv#1277

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FP16 gemm on cpu not implemented! #15780

{{title}}

Replies: 15 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

FP16 gemm on cpu not implemented! #15780

KhurramPirov Aug 7, 2019

Description

Environment info (Required)

Minimum reproducible example

Replies: 15 comments

mxnet-label-bot Aug 7, 2019

pengzhao-intel Aug 8, 2019 Collaborator

KhurramPirov Aug 8, 2019 Author

TaoLv Aug 8, 2019 Collaborator

KhurramPirov Aug 8, 2019 Author

wkcn Aug 8, 2019 Collaborator

KhurramPirov Aug 8, 2019 Author

TaoLv Aug 8, 2019 Collaborator

KhurramPirov Aug 8, 2019 Author

TaoLv Aug 8, 2019 Collaborator

KhurramPirov Aug 8, 2019 Author

KhurramPirov Aug 12, 2019 Author

TaoLv Aug 12, 2019 Collaborator

eric-haibin-lin Aug 14, 2019 Collaborator

chinakook Apr 27, 2020

KhurramPirov
Aug 7, 2019

mxnet-label-bot
Aug 7, 2019

pengzhao-intel
Aug 8, 2019
Collaborator

KhurramPirov
Aug 8, 2019
Author

TaoLv
Aug 8, 2019
Collaborator

KhurramPirov
Aug 8, 2019
Author

wkcn
Aug 8, 2019
Collaborator

KhurramPirov
Aug 8, 2019
Author

TaoLv
Aug 8, 2019
Collaborator

KhurramPirov
Aug 8, 2019
Author

TaoLv
Aug 8, 2019
Collaborator

KhurramPirov
Aug 8, 2019
Author

KhurramPirov
Aug 12, 2019
Author

TaoLv
Aug 12, 2019
Collaborator

eric-haibin-lin
Aug 14, 2019
Collaborator

chinakook
Apr 27, 2020