-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/generalized-data-width-converter #144
Open
lstasytis
wants to merge
11
commits into
Xilinx:dev
Choose a base branch
from
lstasytis:feature/generalized_dwc
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…e input or output ports only.
…nput stream to output stream
…g by using a shift register where either the input or the output word are in a static position while the other are dynamically tracked as to where the write the bits using a variable for tracking how many values are currently in the buffer.
preusser
reviewed
Oct 30, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @lstasytis!
Please, review and validate the commit I appended.
Please, also add a testbench for sanity checking the functionality of the added function.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This companion PR to the 'soon-to-be-PR'd' https://github.com/lstasytis/finn/tree/feature/generalized-datawidthconverter branch introduces a new variant for the StreamingDataWidthConverter_Batch (DWC), called StreamingDataWidthConverterGeneralized_Batch which should eventually completely replace the old StreamingDataWidthConverter_Batch function from finn-hlslib.
The new DWC has two key improvements over the previous HLS version:
a.) Cases where the input and output streams have widths which are incompatible for use with the RTL variant (one cannot be divided by the other) will no longer result in an intermediate buffer of the size equaling the lowest common multiple (LCM) of the two widths.
Instead, a single intermediate buffer of size input width + output width is always generated.
This leads to the intermediate buffer never having an enormous size due to an extremely large LCM between widths, while also limiting the node to a single module instantiation, instead of 3. Thus, a potential >4K bit width intermediate axis data bus will not be generated (unless the input+output streams widths are >4K bit wide in total) which would have otherwise broke HLS.
b.) The node supports padding and cropping of the tail-end of the transactions being passed through the node with zeroes. This allows arbitrary padding of nodes in finn for relaxing folding factor constraints.
Architecture:
The node functions by using an intermediate shift-register-based buffer of size in width + out width. The input stream is addressed to the intermediate buffer using an offset variable which tracks how many elements are currently in the intermediate buffer. The output stream is tied to the right-most output stream width bits of the intermediate buffer. We also track the total number of input and output words which need to be processed by the DWC in a single transaction and either shift in zeroes (padding) or stop writing to the output stream (cropping) whenever we run out of either input words or output words relative to how many are assigned during compile time.
Downsides:
The architecture does not produce efficient HLS code due to the multiplexing of the input stream to the intermediate buffer leading to massive LUT use because a general IP core is instantiated by HLS for the task. The node is only more LUT-efficient versus the old HLS variant in cases where the intermediate buffer produced by the old DWC is 3-4x larger than the sum size of the input width and output width streams.
Improvements to be made:
An RTL variant for the DWC should eventually be pushed to finn-rtllib, at which point the old DWC can be retired entirely in favor of this current architecture.
Use of padding functionality:
Introducing padding to FINN nodes is extremely error-prone and so should be done carefully. The recommendation is to use the new generalized folding optimizer from the following branch: https://github.com/lstasytis/finn/tree/feature/set-folding-optimizer and allow it to use padding by setting the folding_maximum_padding dataflow builder argument to more than 0. The InsertDWC transformation will then insert DWCs which will potentially perform padding since the SetFolding() transformation will relax the stream shape restrictions with the assumption of DWCs performing the padding.
For a breakdown of padding restrictions in FINN nodes, refer to the code in the new SetFolding() transformation in the aformentioned branch.