-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with inference in PYNQ-Z1 and emulation stitched IP #855
Comments
Hi @ISPRrj , Could you also provide an example input .npy file with corresponding output reference .npy file? |
Hi @auphelia, In this zip file you can find the input.npy and the corresponding output.npy after applying the brevitas model. To obtain the output of the brevitas model I have previously applied a normalization (to the input) by dividing by 255 because I have trained the brevitas network with normalized tensors (by doing the ToTensor() transformation). And I have also performed a reshape (1,3,32,32) to the input to have the dimensions expected by brevitas. When I am performing the inference in FINN instead I am not performing any normalization to the data because this is already applied after the application of the preprocessing transformation. As for the data set question, I have not performed any quantization. I was trying to use the first MultiThreshold for quantization. |
Hi @ISPRrj, what do you mean by the "application of the preprocessing transformation"? The division by 255 that normalizes UINT8 inputs to FLOAT [0,1]? Does that mean the primary input datatype (going into the first MultiThreshold) is annotated as UINT8? I have two suggestions:
In general (for FLOAT inputs), I will use a QuantIdentity or similar as the input quantization node in Brevitas. Since FINN's MultiThreshold node does not support FLOAT inputs, I remove this node manually from the graph and do the input quantization somewhere else (in software). I know the exact quantization range by reading it from the nodes properties (if it was dynamically determined during training) or by setting a fixed range (using |
Hi @fpjentzsch,
Thank you very much for your time! |
Hi, I would not change the QuantRelu within your model, but you may try to set a fixed quantization range for the input quant node like this (maybe adjust it to use UINT8 instead of INT8):
then you could remove the initial MultiThreshold manually like this (after converting from QONNX to FINN-ONNX):
In any case, the input data type should be properly annotated (see the last line). I'm a bit confused that you didn't run into other issues down the line while your input is set to FLOAT32... |
Hi @fpjentzsch ! I have made the changes and now I get the following error in the inference with finn:
|
It looks like the actual tensor dataype is UINT8 now, so ONNXRuntime complains. The tensor datatype should still be float throughput the whole model, FINN uses it as a container datatype to hold integers. The true datatype from FINN's perspective is just annotated in the "quantization: finn_datatype" attribute, which is set using What if you cast the input tensor to float datatype before feeding it to the model? |
If I cast the input to float datatype before feeding to the model the inference is made. But now the problem I can see is that out of 100 test cases I am running, the inference of FINN and brevitas only match in 40. |
Hello @ISPRrj @fpjentzsch , any developments regarding this issue? I have reproduced all the steps so far with quite a similar NN that runs 10% accuracy on MNIST hardware (always infers 0). |
Hi @rassB, did you also run verification/simulation at different build steps to narrow down where the error is introduced? If the problem is related to input/output shaping and quantization, maybe our new tutorial notebook could be a useful resource, as it covers a custom build step to deal with 8-bit RGB inputs. |
Hi @fpjentzsch @rassB @ISPRrj @maltanar @auphelia , Initial Tidyup Transformation: I am seeking assistance to identify and resolve this issue. Here are some details about my environment: ZCU102: PYNQ Linux, based on Ubuntu 18.04 (GNU/Linux 4.19.0-xilinx-v2019.1 a) I have been working with the cnv_end2end_example and successfully modified it to build the Accelerator on a different dataset. The brevitas model was trained on a dataset with a shape of 1x1x14x14 and dtype torch.float32. Following the cnv_end2end_example, the first layer that exists does the quantization and the ONNX conversion includes pre-processing (ToTensor(), i.e., division by 255 for normalization UINT8 inputs to FLOAT [0,1]) and post-processing (TopK=1). The ONNX model, after create_dataflow_partition, provides all the blocks converted into HLS_Layers, except the initial Transpose. Given that the first Transpose was not converted to an HLS layer, and the accelerator works with a dataset of shape 1x14x14x1 and dtype UINT8, I reshaped the dataset accordingly for inference on ZCU102 (1x14x14x1 and dtype np.uint8 (dataset*255.astype(np.uint8))). Runtime_writeable_weights are enabled (set to 1) in the .json file for MVAU of CNV and Linear Layers, following the guidelines in 4_advanced_builder_settings and cnv-w1a1_folding_config. I would appreciate any assistance in debugging this issue. @fpjentzsch, you mentioned in your previous reply that reshaping alone might not be sufficient. Could you please provide further guidance, considering my specific setup, to achieve the desired accuracy on the accelerator? Thank you in advance for your help.
|
Hi @fpjentzsch @rassB @ISPRrj @maltanar @auphelia @heborras @Tobi-Alonso @quetric @mmrahorovic @preusser, Initially, during the ONNX execution, I achieved a commendable accuracy of 86% after applying tidy-up transformations, pre and post-processing transformations. However, upon proceeding with the streamline transformations, I encountered a significant drop in accuracy to 68%. This drop persisted when deploying the model onto an FPGA. To give you a clearer picture, here are the streamline transformations I've implemented: I also tried the finn.builder.build_dataflow, it still showed the same issue i.e. when streamline transformations are applied there is a drop in accuracy. Only when I take "model = model.transform(LowerConvsToMatMul())" this trasnformation off, I get the same 86% accuracy. And I know to convert the model to hls-compatible node we have to convert convs to matmul and we need this transformation. And the only difference other than this I see with and without transformation multithreshold_1 and multithreshold_2 finn_datatype are Binaray (with LowerConvsToMatMul: giving an accuracy of 68%), and are Bipolar (without LowerConvsToMatMul: giving an accuracy of 86%) respectively. I'm at a loss as to why this transformation is causing such a significant accuracy drop. Is it due to the even Kernel Size i.e 6x6 I am using in quantconv2d? Any insights or suggestions you could offer would be greatly appreciated. Thank you for your time and assistance. |
I had the same issue and I think the reason is the bias in the convolution layers. During the export to qonnx format, the bias quant initializer was not exported so the ExtractConvBias transformation that happens during the conversion from qonnx to finn-onnx failed to add an "Add" node in front of the "Conv" node. Check the logs to see if you encounter a "Could not extract bias from node" warning and remove the biases from the network if so. |
Versions
Commit hash
commit e76f20d (HEAD -> dev, origin/dev)
Merge: a3b6a7f 3873325
Author: auphelia [email protected]
Date: Tue Jul 11 10:21:26 2023 +0100
commit 3873325
Author: auphelia [email protected]
Date: Tue Jul 11 09:44:30 2023 +0100
commit a3b6a7f
Merge: e56e813 96fc4f5
Author: auphelia [email protected]
Date: Mon Jul 10 09:14:01 2023 +0100
commit 96fc4f5
Author: auphelia [email protected]
Date: Fri Jul 7 15:54:13 2023 +0100
commit 7924bf7
Author: auphelia [email protected]
Date: Fri Jul 7 14:31:14 2023 +0100
commit 391cd76
Author: auphelia [email protected]
Date: Fri Jul 7 12:07:42 2023 +0100
commit a48b503
Author: auphelia [email protected]
Date: Thu Jul 6 16:50:29 2023 +0100
commit 0cd757f
Author: auphelia [email protected]
Date: Thu Jul 6 15:50:01 2023 +0100
Quick summary
I am trying to implement the Lenet5 network on the PYNQ-Z1 board. For that purpose I have created the network using brevitas and I have obtained the following accuracy after training (about 55%).
I have followed all the steps of the FINN end-to-end flow and even all the intermediate checks (including emulation via PyVerilator).
At first I thought that all the intermediate checks were working correctly and I performed the deployment on the PYNQ board getting only 8% accuracy well below the 55% obtained with brevitas.
But the other day I realised that when performing the stitched IP emulation I always get the same output value, regardless of the input value.
Details
I'm using the next dataset: https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz
Steps to Reproduce
Add what needs to be done to reproduce the bug. Add code examples where useful
and make sure to include the resulting ONNX files, and the commit hash you are working on.
Network training
Brevitas export
6.Conversion to HLS layers
7.Dataflow partitioning
8.Folding
Simulation cppsim: works correctly
Emulation node by node PyVerilator: works correctly
Emulation stitched IP PyVerilator: PROBLEM: always get the same output value, regardless of the input value
Deployment on PYNQ: PROBLEM: 8% inference accuracy
The text was updated successfully, but these errors were encountered: