Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion - Inlined Conv slows down latency significantly (up to x15 - x20) #800

Merged
merged 1 commit into from
Jun 7, 2023

Conversation

bo3z
Copy link
Contributor

@bo3z bo3z commented Jun 1, 2023

Description

  • While testing some code unrolling for the hls4ml Optimisation API, I noticed that inlining in Conv2D can allocate unnecessary RAM.
  • When tested on the current version of Conv2D (line buffer, streaming, Resource strategy, RF > 1), there is a significant difference in latency (between 3x and nearly 20x)
  • Still unsure what cause this bug and if it is present for (i) Latency strategy, (ii) RF = 1 and (iii) encoded convolution. But is certainly seems that for RF > 1 in Resource to seem a bug. Opening this as a discussion until further synthesis results are obtained.

Type of change

  • Bug fix
  • Breaking change (potentially)
  • Discussion

Tests

  • Below are report files following a full Vivado synthesis and CoSim analysis, for the SVHN paper model, with RF = 9
  • Underscored _master, corresponds to implementations of the current, line-buffer, resource, streaming Conv2D
  • Underscored _no_pragma, corresponds to implementations with the inline keyword removed, as per the PR
  • Inspecting the report files, the models are clearly equivalent (in terms of HLS config and architecture) as they use the same number of DSPs and BRAM and similar utilisation of LUT & FF. However, latencies differ up to 20x.
  • Source of report files: https://cernbox.cern.ch/s/DK4v2KUTiBmFvYN

Checklist

  • I have read the guidelines for contributing.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have installed and run pre-commit on the files I edited or added.
  • I have added tests that prove my fix is effective or that my feature works.

@bo3z bo3z added the bug label Jun 1, 2023
@jmitrevs
Copy link
Contributor

jmitrevs commented Jun 6, 2023

The changes are pretty minimal and should do no harm. If they fix an issue, I am not opposed to merging them, even if we don't fully understand why.

@jmitrevs
Copy link
Contributor

jmitrevs commented Jun 7, 2023

I think I'll go ahead and merge this since it causes no problems and seems to help. We can still try to understand why this is, though.

@jmitrevs jmitrevs merged commit 0599cca into fastmachinelearning:main Jun 7, 2023
@bo3z
Copy link
Contributor Author

bo3z commented Jun 7, 2023

I spoke briefly with @vloncar about this - I think when the RTL's are inlined the two pipelined designs will conflict each other and the compiler gets confused. I'm still not sure if this change would slow down Latency strategy (if it does, it would be by one clock cycle) - worth investigating further.

calad0i pushed a commit to calad0i/hls4ml that referenced this pull request Jul 1, 2023
Discussion - Inlined Conv slows down latency significantly (up to x15 - x20)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants