Feature/generalized-data-width-converter #144

lstasytis · 2024-09-18T09:35:34Z

This companion PR to the 'soon-to-be-PR'd' https://github.com/lstasytis/finn/tree/feature/generalized-datawidthconverter branch introduces a new variant for the StreamingDataWidthConverter_Batch (DWC), called StreamingDataWidthConverterGeneralized_Batch which should eventually completely replace the old StreamingDataWidthConverter_Batch function from finn-hlslib.

The new DWC has two key improvements over the previous HLS version:

a.) Cases where the input and output streams have widths which are incompatible for use with the RTL variant (one cannot be divided by the other) will no longer result in an intermediate buffer of the size equaling the lowest common multiple (LCM) of the two widths.

Instead, a single intermediate buffer of size input width + output width is always generated.

This leads to the intermediate buffer never having an enormous size due to an extremely large LCM between widths, while also limiting the node to a single module instantiation, instead of 3. Thus, a potential >4K bit width intermediate axis data bus will not be generated (unless the input+output streams widths are >4K bit wide in total) which would have otherwise broke HLS.

b.) The node supports padding and cropping of the tail-end of the transactions being passed through the node with zeroes. This allows arbitrary padding of nodes in finn for relaxing folding factor constraints.

Architecture:
The node functions by using an intermediate shift-register-based buffer of size in width + out width. The input stream is addressed to the intermediate buffer using an offset variable which tracks how many elements are currently in the intermediate buffer. The output stream is tied to the right-most output stream width bits of the intermediate buffer. We also track the total number of input and output words which need to be processed by the DWC in a single transaction and either shift in zeroes (padding) or stop writing to the output stream (cropping) whenever we run out of either input words or output words relative to how many are assigned during compile time.

Downsides:
The architecture does not produce efficient HLS code due to the multiplexing of the input stream to the intermediate buffer leading to massive LUT use because a general IP core is instantiated by HLS for the task. The node is only more LUT-efficient versus the old HLS variant in cases where the intermediate buffer produced by the old DWC is 3-4x larger than the sum size of the input width and output width streams.

Improvements to be made:
An RTL variant for the DWC should eventually be pushed to finn-rtllib, at which point the old DWC can be retired entirely in favor of this current architecture.

Use of padding functionality:
Introducing padding to FINN nodes is extremely error-prone and so should be done carefully. The recommendation is to use the new generalized folding optimizer from the following branch: https://github.com/lstasytis/finn/tree/feature/set-folding-optimizer and allow it to use padding by setting the folding_maximum_padding dataflow builder argument to more than 0. The InsertDWC transformation will then insert DWCs which will potentially perform padding since the SetFolding() transformation will relax the stream shape restrictions with the assumption of DWCs performing the padding.

For a breakdown of padding restrictions in FINN nodes, refer to the code in the new SetFolding() transformation in the aformentioned branch.

… cropping

…e input or output ports only.

…nput stream to output stream

…g by using a shift register where either the input or the output word are in a static position while the other are dynamically tracked as to where the write the bits using a variable for tracking how many values are currently in the buffer.

preusser

Thanks, @lstasytis!
Please, review and validate the commit I appended.
Please, also add a testbench for sanity checking the functionality of the added function.

lstasytis added 10 commits May 6, 2024 10:27

Generalized the StreamingDataWidthConverter_Batch to allow padding or…

232473e

… cropping

Cropping and Padding decoupled to two parameters, effecting either th…

671c3d8

…e input or output ports only.

fixed wrong commit

4250e16

Padding and Cropping introduced via tracking of forwarded bits from i…

9d845f6

…nput stream to output stream

Decreasing bit count used for tracking values, further cleanup

efe52b8

bugfix

239d0fc

support for 4D input

5fb0f96

moved datatype log computations to cpp side

0c75c5e

small refactor

2cd5b20

lstasytis mentioned this pull request Sep 18, 2024

Generalized DataWidthConverter Xilinx/finn#1186

Draft

Add credits and align coding style.

2e7b259

preusser reviewed Oct 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/generalized-data-width-converter #144

Feature/generalized-data-width-converter #144

lstasytis commented Sep 18, 2024

preusser left a comment

Feature/generalized-data-width-converter #144

Are you sure you want to change the base?

Feature/generalized-data-width-converter #144

Conversation

lstasytis commented Sep 18, 2024

preusser left a comment

Choose a reason for hiding this comment