Skip to content

Possible ways to improve FPS #433

Answered by fpjentzsch
rpitonak asked this question in Q&A
Discussion options

You must be logged in to vote

Hi,

indeed, you exhausted the (currently) available parallelism via PE & SIMD. There are 2 bottlenecks you would need to overcome for the input layer:

First bottleneck:
The current sliding window generator (SWG) (aka "ConvolutionInputGenerator") implementation (https://github.com/Xilinx/finn-hlslib/blob/master/slidingwindow.h#L172) outputs each window element (5x5 in your case) in a separate clock cycle and not in parallel. This limits you to the 512x512x5x5 ~ 6556180 cycles you are seeing. Theoretically, this IP core could be modified to run in 512x512x1 ~ 262k cycles, but of course the following layers would also have to be parallelized accordingly.

For 1D convolutions, we already have …

Replies: 3 comments 1 reply

Comment options

You must be logged in to vote
1 reply
@rbcarlos
Comment options

Comment options

You must be logged in to vote
0 replies
Answer selected by fpjentzsch
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants