-
-
Notifications
You must be signed in to change notification settings - Fork 574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there an easy way to convert ONNX or PB from (NCHW) to (NHWC)? #15
Comments
Thank you for commenting on this for a hobbyist like me who does DeepLearning as a hobby. I am not an engineer or a researcher.
No. By the way, I've already successfully converted NCHW to NHWC, but in a very primitive way I did. Since Tensorflow's Conv2D and several other OPs do not support NCHW, this was accomplished by inserting Transpose OPs before and after each OP. While this method can be made to infer correctly, the inserted Transpose OP resulted in unnecessary overhead and a significant loss of original performance. I used a combination of Keras and OpenVINO's model_optimizer to achieve the NHWC to NCHW conversion. (Converting backwards is easy.)
Yes. I've described how to do the conversion in some of my blog posts below.
I'm not sure. I'm not really interested in using a high performance GPU. I only benchmark with low performance CPUs and edge accelerators. However, I've seen blogs in the past where Japanese engineers have done comparative benchmarking between NCHW and NHWC. However, this article does not refer to the performance of reasoning, but rather shows an increase in learning speed. The improvement in learning speed appears to be from a few percent to a few dozen percent.
RaspberryPi4 + CPU only + INT8 + Tensorflow Lite (4 threads) + 256x256 with 88ms/inference Performance. RaspberryPi4 + CPU only + INT8 + Tensorflow Lite (4 threads) + 416x416 with 243ms/inference Performance.
### tensorflow-gpu==1.15.2
from nets.yolo4_tiny import yolo_body
from keras.layers import Input
image_input = Input(shape=(416, 416, 3))
model = yolo_body(image_input, 3, 20)
model.summary()
json_string = model.to_json()
open('yolov4_tiny_voc.json', 'w').write(json_string) |
@PINTO0309 Thank you for your huge work as a hobby!
It seems it doesn't work fast on RPi4.
Do you know if there is a plan to fix this?
Thanks, it helps a lot. |
@AlexeyAB Thank you for your reply.
No. I have posted similar issues, but so far I haven't received a definitive answer. I don't know if Keras' implementation of YoloV4-tiny correctly replicates the original implementation, but I sympathize with you. I'm going to try to build OpenCV / NCNN myself for the first time in a long time. And I'm going to try it in Pi4. |
@AlexeyAB |
@PINTO0309 Thanks!
It seems yolov4-tiny speed is the same as mobilenet_yolo on RPi4. Can you try to quantize yolov4-tiny to int8 and test it on RPi4? https://github.com/Tencent/ncnn/tree/master/tools/quantize#user-guide If it will not help a lot, it seems we should try to implement yolov4-tiny with Depthwise/Grouped-convolution. |
@AlexeyAB RaspberryPi4 + Ubuntu 19.10 aarch64 + ncnn + CPU only + 4 threads + YoloV4-tiny int8 416x416 326ms/pred
|
@PINTO0309 Thanks!
So int8 isn't faster on RPi4 + NCNN.
So TensorFlow-Lite is 1.25x faster than NCNN. |
Thanks ! Maybe one day, I can't stand the speed of int8 anymore, I will try optimizing it 😃 |
@nihui Are you using Vulkan or self-written functions for int8 inference? |
@AlexeyAB |
@PINTO0309
|
@AlexeyAB |
@PINTO0309
import torch
model = torch.hub.load(
"rwightman/gen-efficientnet-pytorch",
"tf_efficientnet_lite3",
pretrained=True,
exportable=True
)
rand_example = torch.rand(1, 3, 256, 256)
output1 = model(rand_example)
traced_model = torch.jit.trace(model, rand_example)
scripted_model = torch.jit.script(model)
torch.onnx.export(model, rand_example, 'model.onnx', opset_version=10) When I tried to do such conversion of |
@AlexeyAB I'm debugging a few things, so please be patient for a moment. |
The midasnet groupcovolusion will probably need to be split with tf.keras.layers.SeparableConv2D or tf.nn.separable_conv2d. It looks like it needs a bit of a tricky implementation. |
Is Just maybe there is different layout |
Oh... I'll try it when I get home!😄 |
@AlexeyAB Convert Log
The conversion to TFLite was successful. This is Float32, so it is a huge size of 416MB. I haven't checked the operation, so I don't know if I can infer correctly.
|
Do you mean that Keras-h5 model can't be saved, but Tflite was saved successfully? Seems to be there is something wrong: https://colab.research.google.com/gist/AlexeyAB/c72d1c1ccb85e59c580725ada26072eb/tflite_midas_1.ipynb
|
@AlexeyAB
Yes. It fails to save saved_model and h5.
Hmmm. It's not easy. Btw, I also tried converting EfficientNet-lite3, but it seems that the process after the last ReLU6 is not compatible with TFLite. I have not yet confirmed the operation of this one, too.
|
I made such conversion of pt-weights to tflite-weights for EfficientNet-Lite3 successfully, and TFlite model works well: https://colab.research.google.com/gist/AlexeyAB/cc05f2690c3707d5e0f66d1e749b5546/weights_torch_to_tf_effnet_lite0.ipynb#scrollTo=WFnIID6iBlsq I only converted weights, but the structure is taken from the official repository, there is such ReLU6 implementation The same as in your repo |
@AlexeyAB Unfortunately, it seems that the current situation is not supported. |
GroupedConv2D(
filters=out_filters,
kernel_size=[(3,3), (3,3), (3,3), (3,3)],
strides=[1, 1],
padding='same',
use_bias=False,
use_keras=True) perhaps there will be layout |
@AlexeyAB Midasnet - Float32 - GroupConvolusion - TFLite(.tflite) |
@PINTO0309 |
I still think I'm transcribing the weights the wrong way. It's 1AM in Japan, so I'll try again tomorrow. 😄 |
@AlexeyAB Please correct just one line below. img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)# / 255.0 |
@PINTO0309 Also did you try to convert default classifier EfficientNet-lite? |
@AlexeyAB |
@AlexeyAB
I did a crazy implementation, but the conversion to tflite appears to have succeeded. I have not checked the operation. What are the benefits of successfully completing this conversion process?
|
So your converter can be very useful https://github.com/PINTO0309/openvino2tensorflow.git |
Great! It seems your model works well. I compare 3 models EfficientNet-Lite3 with the same
I added Softmax at the end of 1 and 3, because 2 uses Softmax. There are some differences here, possibly due to different normalization and different network resolutions:
|
@AlexeyAB |
@PINTO0309 Sorry, it seems there is no error, my mistake ) Great work! |
@PINTO0309 Hi, https://twitter.com/PINTO03091/status/1322723345838731265
What is the problem with 'split'? As I understand you successfully used split for Grouped Convolution. What YoloV4 PyTorch repository do you mean? https://github.com/WongKinYiu/PyTorch_YOLOv4 or https://github.com/Tianxiaomo/pytorch-YOLOv4 or https://github.com/maudzung/Complex-YOLOv4-Pytorch or https://github.com/maudzung/YOLO3D-YOLOv4-PyTorch ? |
@AlexeyAB |
@AlexeyAB Unfortunately, there is still a bug in the Reshape operation of the 5D tensor that causes YoloV4 and ShuffuleNet conversions to fail. |
@PINTO0309 Hi, Thanks, Great!
Are you about YOLOv4 or CSP-P5-P7 models? https://github.com/WongKinYiu/PyTorch_YOLOv4#pretrained-models--comparison |
I am testing using the models in the following repositories
This is a bug in my openvino2tensorflow.
Of course. I'm trying every night, but it's hard to solve the problem. If you combine Reshape and Transpose, and the tensor to be transformed is 5D or 6D, the transposition operation is difficult. So far, I haven't come up with any good ideas. For example, I feel that converting [1,256,13,13] to [1,256,13,1,13,1] would be a very complex operation in TensorFlow, as shown below. |
Converting ONNX generated by the old branch master to .pb is successful, but converting it to tflite seems to cause an error. https://github.com/WongKinYiu/PyTorch_YOLOv4/tree/master
|
Yes, there are quite complex transformations here when objects from different branches are merged.
It seems that there is also an issue - TFlite doesn't support all TF operations. Do you get the same issue with |
I first tried to generate onnx from the u5 branch, but couldn't export to onnx in the first place. I'll try a few more things with the u5 branch. |
I fixed a bug in openvino2tensorflow and succeeded in converting YOLOv4 to tflite, although I have not checked the operation of the conversion to be correct. I used the onnx YOLOv4 below.
In anticipation of the conversion to the EdgeTPU model, the PReLU is deliberately changed to a combination of Maximum and Minimum. |
@PINTO0309
Is it because EdgeTPU doesn't support PReLU? Can you try to convert
|
Yes. The PReLU was not present in the supported OPs listed at the following URL. The Transpose at the end was in the way, so I edited OpenVINO's .xml to remove it and then converted it to .tflite. It looks structurally sound, but I'm not sure if it works correctly.
|
It was very hard work, but it looks like I was able to refurbish openvino2tensorflow to generate the EdgeTPU model of YOLOv4-tiny. I found that there is a bug regarding the Resize OP conversion in either edgetpu_compiler or TFLiteConverter.
|
@PINTO0309 Great!
How did you solve or avoid it?
Did you check it, does it produce approximately the same result as source yolov4-tiny model? What source model do you use, is it yolov4-tiny? |
I used the TensorFlow v2.x or later converters to pass the full-integer quantization model equivalent to So I combined Lamda OP and I have been playing with converting models that are committed to various repositories, so in this case I converted the models in the following repositories. The work I carry out is always fickle. |
Hi @PINTO0309 I tried converting the keras model from https://github.com/bubbliiiing/yolov4-tiny-keras.git but had no luck in the end as I received model not quantized when passing the model to the edgetpu_compiler. Here is the process I followed. 1- convert the keras model to frozen graph (.pb) Is there anything I'm missing here? Here is a glance to the output model file. |
Since $ openvino2tensorflow \
--model_path={model_path} \
--output_saved_model True
import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np
def representative_dataset_gen():
for data in raw_test_data.take(10):
image = data['image'].numpy()
image = tf.image.resize(image, (416, 416))
image = image[np.newaxis,:,:,:]
image = image / 255.
yield [image]
raw_test_data, info = tfds.load(name="voc/2007", with_info=True, split="validation", data_dir="~/TFDS", download=True)
# Full Integer Quantization - Input/Output=float32
converter = tf.lite.TFLiteConverter.from_saved_model('saved_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8,tf.lite.OpsSet.SELECT_TF_OPS]
converter.representative_dataset = representative_dataset_gen
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_quant_model = converter.convert()
with open('yolov4_416_full_integer_quant.tflite', 'wb') as w:
w.write(tflite_quant_model)
print("Full Integer Quantization complete! - yolov4_416_full_integer_quant.tflite") $ python3 quantization.py $ edgetpu_compiler -s yolov4_416_full_integer_quant.tflite The normalization process of |
Hi @PINTO0309 Thanks for your response. I am having issues converting the model, wondering if you can help? This time I:
The error I get is:
I tried different tensorflow version but can't get passed this error I would appreciate your help |
@itsmasabdi |
Just info for someone that does not like openvino path:
|
@PINTO0309 Hi,
Nice work with YOLOv4 / tiny!
As I see you use:
NCHW for: OpenVINO (xml / bin), Darknet (cfg / weights)
NHWC for:
TFLite
,Keras
(yolov4_tiny_voc.json / yolov4_tiny_voc.h5),TF1
(pb),TF2
(saved_models.json / saved_models.pb)I have several questions:
Is there an easy way to convert ONNX or PB from (NCHW) to (NHWC)?
I've seen converters that add transpose before and after each layer, but this seems to slow things down a lot. Is it possible to do this transformation without slowing down the inference?
Is there an easy way to convert TF1-pb to TF2-saved_models.pb ?
Is NHWC slowing down execution on the GPU?
How many FPS do you get on
Google Coral TPU-Edge
andRaspberryPi4
for yolov4-tiny (int8)?What script did you use to get
yolov4_tiny_voc.json
?The text was updated successfully, but these errors were encountered: