[fpga] Manually place bufh cell to ease congestion #8138

tjaychen · 2021-09-09T23:04:57Z

Manually place the aes bufh cell to ensure it is not placed
into the same clock region as otbn / kmac / hmac.

For whatever reason, vivado attempts to cram all the blocks
that utilize bufh into the same clocking region (even though
it can easily relocate them). This causes congestion as there
are over 15000 flip flops betweeen aes / otbn / kmac / hmac.

This PR follows the example shown here:
https://www.xilinx.com/support/answers/66386.html.

Also make the cg in spi_device local, so we don't have the siuation
of global -> local -> global, and instead global -> local -> local.
The former scenario seems to cause hold violations sometimes.

Signed-off-by: Timothy Chen [email protected]

Manually place the aes bufh cell to ensure it is not placed into the same clock region as otbn / kmac / hmac. For whatever reason, vivado attempts to cram all the blocks that utilize bufh into the same clocking region (even though it can easily relocate them). This causes congestion as there are over 15000 flip flops betweeen aes / otbn / kmac / hmac. This PR follows the example shown here: https://www.xilinx.com/support/answers/66386.html. Also make the cg in spi_device local, so we don't have the siuation of global -> local -> global, and instead global -> local -> local. The former scenario seems to cause hold violations sometimes. Signed-off-by: Timothy Chen <[email protected]>

tjaychen · 2021-09-09T23:05:20Z

can you guys give this a try and see if it addresses congestion in your local environments?

vogelpi · 2021-09-10T07:34:04Z

Thanks a lot @tjaychen for investigating this! It seems to solve the issue also locally. Well done!

The behavior of Vivado here is really interesting - also with your newly created placement rule. I've had a look at the resulting implementation. Below you can see the resulting FPGA utilization. I've marked where the different modules are implemented.

The local clock buffers are placed in the horizontal center of the device (marked by the red circle). The clock regions start from this point. One region goes to the left, the other one (holding AES) goes to the right.

Interestingly, KMAC, and partially also HMAC expand both to the left and right, meaning they use logic in both clock regions. Zooming in one sees the following:

In the middle you see again the BUFHCEs (same color coding as above). The AES BUFHCE is on the right hand side (right clock region) where all other BUFHCEs are on the left-hand side (left clock region). Since there is no KMAC clock available in the right clock region, all KMAC FFs are placed in the left clock region (right column of quads inside slices). However, KMAC can still use the logic LUTs in the right clock region (left column of rectangles inside slices).

It's worth mentioning that we actually use only very few of the available clock buffers in these two regions. As shown below, there are 12 BUFHCEs for both regions but most of them aren't utilized. So I don't think the routing of the clock was an issue. Instead and as you suggested Vivado happened to place all manually instantiated BUFHCEs into the same clock region and hence all FFs had to be in the same clock region. Having all FFs in one clock region but spanning the logic over multiple clock regions then caused a lot of routing work and unpredictable implementation times.

I think this will also help to increase the reproducibility of SCA results. I will do a couple of measurements. But I suggest to merge this asap to get rid of the CI issues.

vogelpi · 2021-09-10T10:00:14Z

I've now done in total 3 runs locally using the fix proposed in this PR. None of these runs did have excessive implementation time. I am thus merging this PR now to stop the CI issues.

tjaychen · 2021-09-10T18:32:39Z

thanks for such a thorough check @vogelpi ! i want to preserver this thought process in the documents so I pushed #8148 to link to your comment. Can you let me know what you think?

tjaychen requested a review from eunchan as a code owner September 9, 2021 23:04

tjaychen requested review from GregAC, msfschaffner and vogelpi September 9, 2021 23:05

tjaychen requested a review from imphil September 10, 2021 00:35

tjaychen mentioned this pull request Sep 10, 2021

[fpga] Excessive Vivado Implementation Time #8128

Closed

vogelpi approved these changes Sep 10, 2021

View reviewed changes

vogelpi merged commit a940469 into lowRISC:master Sep 10, 2021

This was referenced Sep 10, 2021

[clkgmr] Disable gating of software controllable clocks on FPGA #8122

Closed

[fpga] Modify and add new placement rules for BRAM #8063

Closed

tjaychen mentioned this pull request Sep 10, 2021

[fpga] Update documentation based on #8138 analysis #8148

Merged

vogelpi mentioned this pull request Aug 10, 2023

[fpga, prim] Add support for Ultrascale and CW340 + CW341 #19295

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fpga] Manually place bufh cell to ease congestion #8138

[fpga] Manually place bufh cell to ease congestion #8138

tjaychen commented Sep 9, 2021

tjaychen commented Sep 9, 2021

vogelpi commented Sep 10, 2021

vogelpi commented Sep 10, 2021

tjaychen commented Sep 10, 2021

[fpga] Manually place bufh cell to ease congestion #8138

[fpga] Manually place bufh cell to ease congestion #8138

Conversation

tjaychen commented Sep 9, 2021

tjaychen commented Sep 9, 2021

vogelpi commented Sep 10, 2021

vogelpi commented Sep 10, 2021

tjaychen commented Sep 10, 2021