Unrolled CNN implementation #600

vloncar · 2022-07-13T17:45:09Z

Description

This is the refined version of a Conv1D/2D implementation that urolls the input feature matrix of im2col algorithm for the io_parallel implementation. The general idea is to generate code for im2col transformation with exact instructions for each layer instead of synthesizing a generic C++ function because the HLS compiler has issues with it. With this implementation, I was able to synthesize layers with <= 4096 elements (the usual partitioning limit). The old implementations had trouble with far smaller layers.

Based on the unrolled im2col step, the implementation further uses an adapted matrix-vector multiplication for Resource or Latency strategy. Note that using overall Latency strategy won't work as that will pipeline the entire design and cause all the loops to be urolled and this breaks the synthesis. Therefore, using Latency strategy for the model will issue a warning and switch to the Resource strategy (aka "dataflow"). Individual layers may still use the Latency strategy.

A new turning knob is introduced to be combined with the ReuseFactor to control the amount of parallelism: ParallelizationFactor. This controls the number of output pixels processed in parallel. Defaults to 1, implying no parallelization. Valid values are divisors of the out_height * out_width, though hls4ml will warn if the an incorrect ParallelizationFactor is used.

One feature of this implementation that wasn't part of the original implementation from last year is the predictable II. In general, for Resource strategy, II = (ReuseFactor + C) * out_height * out_width / ParallelizationFactor + 1 where C is ~4. For Latency strategy C is 1-2. The +1 is for the function call itself.

This only touches the base Conv1D/2D layers, SeparableConv1D/2D will come as a later PR. PointwiseConv1D/2D needs investigation if it should be a special case at all with this implementation.

Limitations:

in_height  * in_width  * n_chan <= 4096
out_height * out_width * n_filt <= 4096

In order to wire all this, the core of the layers had to be extended. A new type of an attribute is introduced Source, representing generated source code. Layers can have any number of generated source codes. Writer can pick up this information.

Type of change

New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Breaking in a sense that it replaces previous implementations and changes slightly the mechanics of how strategy.

Tests

The existing tests confirm the accuracy of the implementation.

Test Configuration:

Run any Conv1D/2D tests, just ensure io_parallel is used. Play with ParallelizationFactor and ReuseFactor as desired. Don't forget the limitations above!

Checklist

I have read the guidelines for contributing.
I have commented my code, particularly in hard-to-understand areas.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
I have added tests that prove my fix is effective or that my feature works.

hls4ml/backends/fpga/fpga_backend.py

jmitrevs · 2022-09-19T09:19:06Z

Generally this is looking good to me, but there are pytests, including on convolutions, that failed.

jmitrevs · 2022-09-26T18:45:19Z

The qkeras pytest failure has been occurring recently, so it's not related to this pull requests. I am curious, though about the conv1d and sepconv2d failures.

vloncar · 2022-10-04T01:36:16Z

I addressed some issues, but the conv1d test still fails. I need to investigate more.

vloncar · 2022-10-04T14:59:48Z

Conv1D failed because the model used is rather big, so the generated code ends up being huge, taking a lot of time to compile, and this causes the test to timeout. I've replaced it with a smaller model from the example-models repo.

Unrolled CNN implementation

vloncar added 8 commits July 5, 2022 21:22

Support generated source code as an attribute

2e5df8c

Code-generated im2col 1D/2D cnn implementation

d3d41fd

Merge branch 'master' into instruct_cnn

d7e7406

Use the same implementaion of im2col in python and C++

1894e00

Merge branch 'master' into instruct_cnn

e5de571

Ensure 'Resource' strategy is used for Conv1D/2D

bb9d9af

Add 'Latency' implementation of Conv1D

41d9cd6

Merge branch 'master' into instruct_cnn

2ba8384

thesps mentioned this pull request Jul 18, 2022

change master to main #602

Merged

5 tasks

vloncar added 2 commits July 27, 2022 17:40

Merge branch 'main' into instruct_cnn

37e0704

Merge branch 'main' into instruct_cnn

c38ee53

vloncar mentioned this pull request Aug 4, 2022

Vitis HLS backend #629

Merged

6 tasks

Explicitly partition the pool array

8e539fc

Jonathan-Shoemaker mentioned this pull request Aug 24, 2022

Add Support for ConvTranspose Layers (1D and 2D) #644

Closed

6 tasks

jmduarte self-requested a review August 31, 2022 13:38

jmitrevs reviewed Sep 19, 2022

View reviewed changes

hls4ml/backends/fpga/fpga_backend.py Show resolved Hide resolved

vloncar added 4 commits September 29, 2022 18:03

Merge branch 'main' into instruct_cnn

392fb15

Remove old pointwise implementations

96d059c

Fix separable conv failing due to missing info about partitions

c7f87f0

Docstrings for codegeneration functions

e9bb7ff

Use smaller model in Conv1D test

cd915eb

vloncar requested a review from jmitrevs October 4, 2022 14:59

jmitrevs approved these changes Oct 4, 2022

View reviewed changes

jmitrevs merged commit 90d760a into fastmachinelearning:main Oct 4, 2022

vloncar deleted the instruct_cnn branch March 5, 2023 17:42

calad0i pushed a commit to calad0i/hls4ml that referenced this pull request Jul 1, 2023

Merge pull request fastmachinelearning#600 from vloncar/instruct_cnn

8e61dc2

Unrolled CNN implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unrolled CNN implementation #600

Unrolled CNN implementation #600

vloncar commented Jul 13, 2022

jmitrevs commented Sep 19, 2022

jmitrevs commented Sep 26, 2022

vloncar commented Oct 4, 2022

vloncar commented Oct 4, 2022

Unrolled CNN implementation #600

Unrolled CNN implementation #600

Conversation

vloncar commented Jul 13, 2022

Description

Type of change

Tests

Checklist

jmitrevs commented Sep 19, 2022

jmitrevs commented Sep 26, 2022

vloncar commented Oct 4, 2022

vloncar commented Oct 4, 2022