Problems with inference in PYNQ-Z1 and emulation stitched IP #855

ISPRrj · 2023-07-15T11:43:13Z

Versions

PYNQ Z1: v3.0.1
FINN: v0.9
Xilinx tools: 2022.2
Ununtu: 20.04

Commit hash

commit e76f20d (HEAD -> dev, origin/dev)
Merge: a3b6a7f 3873325
Author: auphelia [email protected]
Date: Tue Jul 11 10:21:26 2023 +0100

Merge pull request #852 from Xilinx/fix/alveo_build

Set axilite address range to a minimum of 4K

commit 3873325
Author: auphelia [email protected]
Date: Tue Jul 11 09:44:30 2023 +0100

[AlveoBuild] Set axilite address range to a minimum of 4K

commit a3b6a7f
Merge: e56e813 96fc4f5
Author: auphelia [email protected]
Date: Mon Jul 10 09:14:01 2023 +0100

Merge pull request #844 from Xilinx/feature/2022_2

Dev PR to update Docker environment to Ubuntu 22, Python 3.10 and Xilinx tool version

commit 96fc4f5
Author: auphelia [email protected]
Date: Fri Jul 7 15:54:13 2023 +0100

[Deps] Update qonnx version

commit 7924bf7
Author: auphelia [email protected]
Date: Fri Jul 7 14:31:14 2023 +0100

[NBs] Update notebooks to only use QONNX export

commit 391cd76
Author: auphelia [email protected]
Date: Fri Jul 7 12:07:42 2023 +0100

[deps] Bump clize to 5.0.1 and sigtools to 4.0.1

commit a48b503
Author: auphelia [email protected]
Date: Thu Jul 6 16:50:29 2023 +0100

[Tests] Update tests to only use qonnx export

commit 0cd757f
Author: auphelia [email protected]
Date: Thu Jul 6 15:50:01 2023 +0100

Quick summary

I am trying to implement the Lenet5 network on the PYNQ-Z1 board. For that purpose I have created the network using brevitas and I have obtained the following accuracy after training (about 55%).

I have followed all the steps of the FINN end-to-end flow and even all the intermediate checks (including emulation via PyVerilator).

At first I thought that all the intermediate checks were working correctly and I performed the deployment on the PYNQ board getting only 8% accuracy well below the 55% obtained with brevitas.

But the other day I realised that when performing the stitched IP emulation I always get the same output value, regardless of the input value.

Details

I'm using the next dataset: https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz

Steps to Reproduce

Add what needs to be done to reproduce the bug. Add code examples where useful
and make sure to include the resulting ONNX files, and the commit hash you are working on.

I create the lenet network in brevitas ( Note that I am using QuantIdentity with a bit width of 8 at the beginning and I am using biasing, except in the last layer I am not using biasing to avoid problems in the subsequent transformations to HLS layers)

BIT_WIDTH=2;

class QuantWeightActBiasLeNet(Module):
    def __init__(self):
        super(QuantWeightActBiasLeNet, self).__init__()
        self.quant_inp = qnn.QuantIdentity(bit_width=8, return_quant_tensor=True)
        self.conv1 = qnn.QuantConv2d(3, 6, 5, bias=True, weight_bit_width=BIT_WIDTH)
        self.relu1 = qnn.QuantReLU(bit_width=BIT_WIDTH, return_quant_tensor=True)
        self.conv2 = qnn.QuantConv2d(6, 16, 5, bias=True, weight_bit_width=BIT_WIDTH)
        self.relu2 = qnn.QuantReLU(bit_width=BIT_WIDTH, return_quant_tensor=True)
        self.fc1   = qnn.QuantLinear(16*5*5, 120, bias=True, weight_bit_width=BIT_WIDTH)
        self.relu3 = qnn.QuantReLU(bit_width=BIT_WIDTH, return_quant_tensor=True)
        self.fc2   = qnn.QuantLinear(120, 84, bias=True, weight_bit_width=BIT_WIDTH)
        self.relu4 = qnn.QuantReLU(bit_width=BIT_WIDTH, return_quant_tensor=True)
        self.fc3   = qnn.QuantLinear(84, 5, bias=False, weight_bit_width=BIT_WIDTH)

    def forward(self, x):
        out = self.quant_inp(x)
        out = self.relu1(self.conv1(out))
        out = F.max_pool2d(out, 2)
        out = self.relu2(self.conv2(out))
        out = F.max_pool2d(out, 2)
        out = torch.flatten(out,1)
        out = self.relu3(self.fc1(out))
        out = self.relu4(self.fc2(out))
        out = self.fc3(out)
       
        return out

Network training
Brevitas export

ready_model_filename = "Lenet_quantized.onnx"
export_qonnx(model,torch.randn(1,3,32,32), ready_model_filename)
qonnx_cleanup(ready_model_filename, out_file=ready_model_filename)

Tidy up, pre and post processing.

Lowering and streamlined transformations

model = ModelWrapper("lenet_quantized_pre_post.onnx")
model = model.transform(MoveScalarLinearPastInvariants())
model = model.transform(Streamline())
model = model.transform(LowerConvsToMatMul())
model = model.transform(MakeMaxPoolNHWC())
model = model.transform(absorb.AbsorbTransposeIntoMultiThreshold())

model = model.transform(MakeMaxPoolNHWC())
model = model.transform(absorb.AbsorbConsecutiveTransposes())

model = model.transform(Streamline())

model = model.transform(absorb.AbsorbScalarMulAddIntoTopK())
model = model.transform(InferDataLayouts())
model = model.transform(RemoveUnusedTensors())
model.save("lenet_quantized_streamlined.onnx")

6.Conversion to HLS layers

mem_mode = "decoupled"

model = model.transform(to_hls.InferBinaryMatrixVectorActivation(mem_mode))
model = model.transform(to_hls.InferQuantizedMatrixVectorActivation(mem_mode))

model = model.transform(to_hls.InferLabelSelectLayer())
model = model.transform(to_hls.InferThresholdingLayer())
model = model.transform(GiveUniqueNodeNames())

model = model.transform(to_hls.InferThresholdingLayer())
model = model.transform(to_hls.InferConvInpGen())
model = model.transform(to_hls.InferStreamingMaxPool())

model = model.transform(RemoveCNVtoFCFlatten())

model = model.transform(absorb.AbsorbConsecutiveTransposes())

model = model.transform(InferDataLayouts())

model.save("lenet_hls_layers.onnx")

7.Dataflow partitioning

8.Folding

model = ModelWrapper("flowers_dataflow_model.onnx")
fc_layers = model.get_nodes_by_op_type("MatrixVectorActivation")
# each tuple is (PE, SIMD, in_fifo_depth) for a layer
folding = [
    (6, 3),
    (2, 6),
    (2, 2),
    (1, 1),
    (1, 1),
  
]
for fcl, (pe, simd) in zip(fc_layers, folding):
    fcl_inst = getCustomOp(fcl)
    fcl_inst.set_nodeattr("PE", pe)
    fcl_inst.set_nodeattr("SIMD", simd)
    

# use same SIMD values for the sliding window operators
swg_layers = model.get_nodes_by_op_type("ConvolutionInputGenerator")
for i in range(len(swg_layers)):
    swg_inst = getCustomOp(swg_layers[i])
    simd = folding[i][1]
    swg_inst.set_nodeattr("SIMD", simd)
    

model = model.transform(GiveUniqueNodeNames())
model.save("flowers_lenet_folded.onnx")

Simulation cppsim: works correctly
Emulation node by node PyVerilator: works correctly
Emulation stitched IP PyVerilator: PROBLEM: always get the same output value, regardless of the input value
Deployment on PYNQ: PROBLEM: 8% inference accuracy

The text was updated successfully, but these errors were encountered:

auphelia · 2023-07-18T09:13:40Z

Hi @ISPRrj ,

Could you also provide an example input .npy file with corresponding output reference .npy file?
FINN expects integer values for all components, is your data set quantized or are you trying to use the first MultiThreshold for quantization?

ISPRrj · 2023-07-18T12:34:17Z

Hi @auphelia,

inputoutput.zip

In this zip file you can find the input.npy and the corresponding output.npy after applying the brevitas model.

To obtain the output of the brevitas model I have previously applied a normalization (to the input) by dividing by 255 because I have trained the brevitas network with normalized tensors (by doing the ToTensor() transformation). And I have also performed a reshape (1,3,32,32) to the input to have the dimensions expected by brevitas.

When I am performing the inference in FINN instead I am not performing any normalization to the data because this is already applied after the application of the preprocessing transformation.

As for the data set question, I have not performed any quantization. I was trying to use the first MultiThreshold for quantization.

fpjentzsch · 2023-07-21T15:12:03Z

Hi @ISPRrj,

what do you mean by the "application of the preprocessing transformation"? The division by 255 that normalizes UINT8 inputs to FLOAT [0,1]? Does that mean the primary input datatype (going into the first MultiThreshold) is annotated as UINT8?

I have two suggestions:

Do you transpose the input data before feeding it to the stitched-ip and hardware accelerator? The initial "Transpose" node after HLS conversion will not be handled by the accelerator. FINN's HLS layers all operate on NHWC data layout, while you train on NCHW. Reshaping will not be enough.
There might be something wrong with the way you normalize/quantize the input. Could you try to train directly on UINT8 inputs, so that you do not need the initial MultiThreshold and the input pixels (0-255) can be consumed without normalization by the first ConvolutionInputGenerator?

In general (for FLOAT inputs), I will use a QuantIdentity or similar as the input quantization node in Brevitas. Since FINN's MultiThreshold node does not support FLOAT inputs, I remove this node manually from the graph and do the input quantization somewhere else (in software). I know the exact quantization range by reading it from the nodes properties (if it was dynamically determined during training) or by setting a fixed range (using min_val, max_val, and scaling_impl_type = ScalingImplType.CONST).

ISPRrj · 2023-07-28T12:11:58Z

Hi @fpjentzsch,
thank you very much for your feedback!

what do you mean by the "application of the preprocessing transformation"? The division by 255 that normalizes UINT8 inputs to FLOAT [0,1]? Does that mean the primary input datatype (going into the first MultiThreshold) is annotated as UINT8?
Yes, I mean the division by 255 that normalizes UINT8 inputs to FLOAT [0,1]. But that does not mean that the first input datatype going into the first MultiThreshold is annotated as UINT8. In fact it is annotated as float32 as you can see in the image below.

Do you transpose the input data before feeding it to the stitched-ip and hardware accelerator?
Yes, I was aware of that and transposed the data before passing it to the accelerator.

There might be something wrong with the way you normalize/quantize the input. Could you try to train directly on UINT8 inputs, so that you do not need the initial MultiThreshold and the input pixels (0-255) can be consumed without normalization by the first ConvolutionInputGenerator?
I have tried to perform the training directly with UINT8 but when I try to perform the inference with FINN I get the following error. Is this because I have to manually remove the initial Multithreshold? If so, how can I do it?

In general (for FLOAT inputs), I will use a QuantIdentity or similar as the input quantization node in Brevitas. Since FINN's MultiThreshold node does not support FLOAT inputs, I remove this node manually from the graph and do the input quantization somewhere else (in software). I know the exact quantization range by reading it from the nodes properties (if it was dynamically determined during training) or by setting a fixed range (using min_val, max_val, and scaling_impl_type = ScalingImplType.CONST).
So in my case, would you change the QuantRelu for QuantIdentity?

Thank you very much for your time!

fpjentzsch · 2023-07-31T17:33:41Z

Hi,

I would not change the QuantRelu within your model, but you may try to set a fixed quantization range for the input quant node like this (maybe adjust it to use UINT8 instead of INT8):

from brevitas.inject.defaults import Int8ActPerTensorFloatMinMaxInit
from brevitas.inject.enum import ScalingImplType
    class InputQuantizer(Int8ActPerTensorFloatMinMaxInit):
        min_val = -config["in_quant_range"]
        max_val = config["in_quant_range"]
        scaling_impl_type = ScalingImplType.CONST
        bit_width = config["in_quant_bits"]
self.quant_inp = qnn.QuantHardTanh(act_quant=InputQuantizer, return_quant_tensor=True)

then you could remove the initial MultiThreshold manually like this (after converting from QONNX to FINN-ONNX):

first_node = model.graph.node[0]
if first_node.op_type == "MultiThreshold":
    quantized_input_dtype = model.get_tensor_datatype(first_node.output[0])
    # remove nodes up to first Mul (= MT + Add used for input quant)
    new_input_node = model.get_nodes_by_op_type("Mul")[0]
    new_input_tensor = model.get_tensor_valueinfo(new_input_node.input[0])
    old_input_tensor = model.graph.input[0]
    model.graph.input.remove(old_input_tensor)
    model.graph.input.append(new_input_tensor)
    model.graph.value_info.remove(new_input_tensor) # remove redundant value_info
    new_input_index = model.get_node_index(new_input_node)
    del model.graph.node[0:new_input_index]
    # make sure input datatype is set correctly
    model.set_tensor_datatype(model.graph.input[0].name, quantized_input_dtype)
else:
    model.set_tensor_datatype(model.graph.input[0].name, DataType["UINT8"])

In any case, the input data type should be properly annotated (see the last line). I'm a bit confused that you didn't run into other issues down the line while your input is set to FLOAT32...

ISPRrj · 2023-08-03T10:05:15Z

Hi @fpjentzsch !

I have made the changes and now I get the following error in the inference with finn:

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(uint8)) , expected: (tensor(float))

fpjentzsch · 2023-08-03T10:16:52Z

It looks like the actual tensor dataype is UINT8 now, so ONNXRuntime complains. The tensor datatype should still be float throughput the whole model, FINN uses it as a container datatype to hold integers. The true datatype from FINN's perspective is just annotated in the "quantization: finn_datatype" attribute, which is set using model.set_tensor_datatype(model.graph.input[0].name, DataType["UINT8"]) for example.

What if you cast the input tensor to float datatype before feeding it to the model?

ISPRrj · 2023-08-03T10:51:00Z

If I cast the input to float datatype before feeding to the model the inference is made. But now the problem I can see is that out of 100 test cases I am running, the inference of FINN and brevitas only match in 40.

rassB · 2023-09-28T20:49:52Z

Hello @ISPRrj @fpjentzsch , any developments regarding this issue? I have reproduced all the steps so far with quite a similar NN that runs 10% accuracy on MNIST hardware (always infers 0).

fpjentzsch · 2023-10-16T16:09:44Z

Hi @rassB, did you also run verification/simulation at different build steps to narrow down where the error is introduced? If the problem is related to input/output shaping and quantization, maybe our new tutorial notebook could be a useful resource, as it covers a custom build step to deal with 8-bit RGB inputs.

shakeelakram00 · 2024-02-22T13:06:27Z

Hi @fpjentzsch @rassB @ISPRrj @maltanar @auphelia ,
I have been closely following the ongoing discussion and encountered a similar accuracy discrepancy issue with my setup. Specifically, the accuracy of the Brevitas model is 86%, while the Accelerator on ZCU102 shows only 66%. After performing Initial Tidyup Transformations below, the accuracy of ONNX-converted model remains consistent with the brevitas model. However, applying the remaining transformations and building the accelerator results in a drop to 66% accuracy on the board.

Initial Tidyup Transformation:
bo.export_finn_onnx(brevitas_model, (1, 1, 14, 14), "export.onnx");
model = ModelWrapper("export.onnx")
model = model.transform(InferShapes())
model = model.transform(FoldConstants())
...
output_dict = oxe.execute_onnx(model_t, input_dict)

I am seeking assistance to identify and resolve this issue. Here are some details about my environment:

ZCU102: PYNQ Linux, based on Ubuntu 18.04 (GNU/Linux 4.19.0-xilinx-v2019.1 a)
FINN: v0.9
Xilinx tools: 2022.2
Ubuntu: 22.04.1 LTS

I have been working with the cnv_end2end_example and successfully modified it to build the Accelerator on a different dataset. The brevitas model was trained on a dataset with a shape of 1x1x14x14 and dtype torch.float32.

Following the cnv_end2end_example, the first layer that exists does the quantization and the ONNX conversion includes pre-processing (ToTensor(), i.e., division by 255 for normalization UINT8 inputs to FLOAT [0,1]) and post-processing (TopK=1). The ONNX model, after create_dataflow_partition, provides all the blocks converted into HLS_Layers, except the initial Transpose.

Given that the first Transpose was not converted to an HLS layer, and the accelerator works with a dataset of shape 1x14x14x1 and dtype UINT8, I reshaped the dataset accordingly for inference on ZCU102 (1x14x14x1 and dtype np.uint8 (dataset*255.astype(np.uint8))).

Runtime_writeable_weights are enabled (set to 1) in the .json file for MVAU of CNV and Linear Layers, following the guidelines in 4_advanced_builder_settings and cnv-w1a1_folding_config.

I would appreciate any assistance in debugging this issue.

@fpjentzsch, you mentioned in your previous reply that reshaping alone might not be sufficient. Could you please provide further guidance, considering my specific setup, to achieve the desired accuracy on the accelerator?

Thank you in advance for your help.

Hi @ISPRrj,
what do you mean by the "application of the preprocessing transformation"? The division by 255 that normalizes UINT8 inputs to FLOAT [0,1]? Does that mean the primary input datatype (going into the first MultiThreshold) is annotated as UINT8?
I have two suggestions:

Do you transpose the input data before feeding it to the stitched-ip and hardware accelerator? The initial "Transpose" node after HLS conversion will not be handled by the accelerator. FINN's HLS layers all operate on NHWC data layout, while you train on NCHW. Reshaping will not be enough.

There might be something wrong with the way you normalize/quantize the input. Could you try to train directly on UINT8 inputs, so that you do not need the initial MultiThreshold and the input pixels (0-255) can be consumed without normalization by the first ConvolutionInputGenerator?
In general (for FLOAT inputs), I will use a QuantIdentity or similar as the input quantization node in Brevitas. Since FINN's MultiThreshold node does not support FLOAT inputs, I remove this node manually from the graph and do the input quantization somewhere else (in software). I know the exact quantization range by reading it from the nodes properties (if it was dynamically determined during training) or by setting a fixed range (using min_val, max_val, and scaling_impl_type = ScalingImplType.CONST).

shakeelakram00 · 2024-03-13T14:51:47Z

Hi @fpjentzsch @rassB @ISPRrj @maltanar @auphelia @heborras @Tobi-Alonso @quetric @mmrahorovic @preusser,
I've been diligently verifying each stage of FINN Flow for the above query, and I've run into a perplexing issue that I could use some guidance on.

Initially, during the ONNX execution, I achieved a commendable accuracy of 86% after applying tidy-up transformations, pre and post-processing transformations. However, upon proceeding with the streamline transformations, I encountered a significant drop in accuracy to 68%. This drop persisted when deploying the model onto an FPGA.

To give you a clearer picture, here are the streamline transformations I've implemented:
model = model.transform(MoveScalarLinearPastInvariants())
model = model.transform(Streamline())
model = model.transform(LowerConvsToMatMul())
model = model.transform(MakeMaxPoolNHWC())
model = model.transform(Streamline())
model = model.transform(absorb.AbsorbTransposeIntoMultiThreshold())
model = model.transform(ConvertBipolarMatMulToXnorPopcount())
model = model.transform(Streamline())
model = model.transform(absorb.AbsorbScalarMulAddIntoTopK())
model = model.transform(InferDataLayouts())
model = model.transform(RemoveUnusedTensors())

I also tried the finn.builder.build_dataflow, it still showed the same issue i.e. when streamline transformations are applied there is a drop in accuracy.

Only when I take "model = model.transform(LowerConvsToMatMul())" this trasnformation off, I get the same 86% accuracy. And I know to convert the model to hls-compatible node we have to convert convs to matmul and we need this transformation. And the only difference other than this I see with and without transformation multithreshold_1 and multithreshold_2 finn_datatype are Binaray (with LowerConvsToMatMul: giving an accuracy of 68%), and are Bipolar (without LowerConvsToMatMul: giving an accuracy of 86%) respectively.

I'm at a loss as to why this transformation is causing such a significant accuracy drop. Is it due to the even Kernel Size i.e 6x6 I am using in quantconv2d? Any insights or suggestions you could offer would be greatly appreciated.

Thank you for your time and assistance.

joannapng · 2024-04-23T12:00:53Z

I had the same issue and I think the reason is the bias in the convolution layers. During the export to qonnx format, the bias quant initializer was not exported so the ExtractConvBias transformation that happens during the conversion from qonnx to finn-onnx failed to add an "Add" node in front of the "Conv" node. Check the logs to see if you encounter a "Could not extract bias from node" warning and remove the biases from the network if so.

ISPRrj added the bug Something isn't working label Jul 15, 2023

auphelia self-assigned this Jul 18, 2023

auphelia assigned fpjentzsch Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with inference in PYNQ-Z1 and emulation stitched IP #855

Problems with inference in PYNQ-Z1 and emulation stitched IP #855

ISPRrj commented Jul 15, 2023

auphelia commented Jul 18, 2023

ISPRrj commented Jul 18, 2023

fpjentzsch commented Jul 21, 2023

ISPRrj commented Jul 28, 2023

fpjentzsch commented Jul 31, 2023

ISPRrj commented Aug 3, 2023

fpjentzsch commented Aug 3, 2023

ISPRrj commented Aug 3, 2023

rassB commented Sep 28, 2023

fpjentzsch commented Oct 16, 2023

shakeelakram00 commented Feb 22, 2024

shakeelakram00 commented Mar 13, 2024 •

edited

Loading

joannapng commented Apr 23, 2024

Problems with inference in PYNQ-Z1 and emulation stitched IP #855

Problems with inference in PYNQ-Z1 and emulation stitched IP #855

Comments

ISPRrj commented Jul 15, 2023

Versions

Commit hash

Quick summary

Details

Steps to Reproduce

auphelia commented Jul 18, 2023

ISPRrj commented Jul 18, 2023

fpjentzsch commented Jul 21, 2023

ISPRrj commented Jul 28, 2023

fpjentzsch commented Jul 31, 2023

ISPRrj commented Aug 3, 2023

fpjentzsch commented Aug 3, 2023

ISPRrj commented Aug 3, 2023

rassB commented Sep 28, 2023

fpjentzsch commented Oct 16, 2023

shakeelakram00 commented Feb 22, 2024

shakeelakram00 commented Mar 13, 2024 • edited Loading

joannapng commented Apr 23, 2024

shakeelakram00 commented Mar 13, 2024 •

edited

Loading