Discussion - Inlined Conv slows down latency significantly (up to x15 - x20) #800

bo3z · 2023-06-01T18:06:57Z

Description

While testing some code unrolling for the hls4ml Optimisation API, I noticed that inlining in Conv2D can allocate unnecessary RAM.

When tested on the current version of Conv2D (line buffer, streaming, Resource strategy, RF > 1), there is a significant difference in latency (between 3x and nearly 20x)

Still unsure what cause this bug and if it is present for (i) Latency strategy, (ii) RF = 1 and (iii) encoded convolution. But is certainly seems that for RF > 1 in Resource to seem a bug. Opening this as a discussion until further synthesis results are obtained.

Type of change

Bug fix
Breaking change (potentially)
Discussion

Tests

Below are report files following a full Vivado synthesis and CoSim analysis, for the SVHN paper model, with RF = 9

Underscored _master, corresponds to implementations of the current, line-buffer, resource, streaming Conv2D

Underscored _no_pragma, corresponds to implementations with the inline keyword removed, as per the PR

Inspecting the report files, the models are clearly equivalent (in terms of HLS config and architecture) as they use the same number of DSPs and BRAM and similar utilisation of LUT & FF. However, latencies differ up to 20x.

Source of report files: https://cernbox.cern.ch/s/DK4v2KUTiBmFvYN

Checklist

I have read the guidelines for contributing.
I have commented my code, particularly in hard-to-understand areas.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
I have installed and run pre-commit on the files I edited or added.
I have added tests that prove my fix is effective or that my feature works.

jmitrevs · 2023-06-06T23:15:40Z

The changes are pretty minimal and should do no harm. If they fix an issue, I am not opposed to merging them, even if we don't fully understand why.

jmitrevs · 2023-06-07T16:05:04Z

I think I'll go ahead and merge this since it causes no problems and seems to help. We can still try to understand why this is, though.

bo3z · 2023-06-07T16:07:53Z

I spoke briefly with @vloncar about this - I think when the RTL's are inlined the two pipelined designs will conflict each other and the compiler gets confused. I'm still not sure if this change would slow down Latency strategy (if it does, it would be by one clock cycle) - worth investigating further.

Discussion - Inlined Conv slows down latency significantly (up to x15 - x20)

Example changes to streaming Conv2D

782246d

bo3z added the bug label Jun 1, 2023

jmitrevs approved these changes Jun 6, 2023

View reviewed changes

jmitrevs merged commit 0599cca into fastmachinelearning:main Jun 7, 2023

calad0i pushed a commit to calad0i/hls4ml that referenced this pull request Jul 1, 2023

Merge pull request fastmachinelearning#800 from bo3z/broken-conv-inline

f5327a1

Discussion - Inlined Conv slows down latency significantly (up to x15 - x20)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion - Inlined Conv slows down latency significantly (up to x15 - x20) #800

Discussion - Inlined Conv slows down latency significantly (up to x15 - x20) #800

bo3z commented Jun 1, 2023

jmitrevs commented Jun 6, 2023

jmitrevs commented Jun 7, 2023 •

edited

Loading

bo3z commented Jun 7, 2023

Discussion - Inlined Conv slows down latency significantly (up to x15 - x20) #800

Discussion - Inlined Conv slows down latency significantly (up to x15 - x20) #800

Conversation

bo3z commented Jun 1, 2023

Description

Type of change

Tests

Checklist

jmitrevs commented Jun 6, 2023

jmitrevs commented Jun 7, 2023 • edited Loading

bo3z commented Jun 7, 2023

jmitrevs commented Jun 7, 2023 •

edited

Loading