Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fpga] Manually place bufh cell to ease congestion #8138

Merged
merged 1 commit into from
Sep 10, 2021

Conversation

tjaychen
Copy link

@tjaychen tjaychen commented Sep 9, 2021

Manually place the aes bufh cell to ensure it is not placed
into the same clock region as otbn / kmac / hmac.

For whatever reason, vivado attempts to cram all the blocks
that utilize bufh into the same clocking region (even though
it can easily relocate them). This causes congestion as there
are over 15000 flip flops betweeen aes / otbn / kmac / hmac.

This PR follows the example shown here:
https://www.xilinx.com/support/answers/66386.html.

Also make the cg in spi_device local, so we don't have the siuation
of global -> local -> global, and instead global -> local -> local.
The former scenario seems to cause hold violations sometimes.

Signed-off-by: Timothy Chen [email protected]

Manually place the aes bufh cell to ensure it is not placed
into the same clock region as otbn / kmac / hmac.

For whatever reason, vivado attempts to cram all the blocks
that utilize bufh into the same clocking region (even though
it can easily relocate them).  This causes congestion as there
are over 15000 flip flops betweeen aes / otbn / kmac / hmac.

This PR follows the example shown here:
https://www.xilinx.com/support/answers/66386.html.

Also make the cg in spi_device local, so we don't have the siuation
of global -> local -> global, and instead global -> local -> local.
The former scenario seems to cause hold violations sometimes.

Signed-off-by: Timothy Chen <[email protected]>
@tjaychen
Copy link
Author

tjaychen commented Sep 9, 2021

can you guys give this a try and see if it addresses congestion in your local environments?

@vogelpi
Copy link
Contributor

vogelpi commented Sep 10, 2021

Thanks a lot @tjaychen for investigating this! It seems to solve the issue also locally. Well done!

The behavior of Vivado here is really interesting - also with your newly created placement rule. I've had a look at the resulting implementation. Below you can see the resulting FPGA utilization. I've marked where the different modules are implemented.

The local clock buffers are placed in the horizontal center of the device (marked by the red circle). The clock regions start from this point. One region goes to the left, the other one (holding AES) goes to the right.

impl_01

Interestingly, KMAC, and partially also HMAC expand both to the left and right, meaning they use logic in both clock regions. Zooming in one sees the following:

impl_21

In the middle you see again the BUFHCEs (same color coding as above). The AES BUFHCE is on the right hand side (right clock region) where all other BUFHCEs are on the left-hand side (left clock region). Since there is no KMAC clock available in the right clock region, all KMAC FFs are placed in the left clock region (right column of quads inside slices). However, KMAC can still use the logic LUTs in the right clock region (left column of rectangles inside slices).

It's worth mentioning that we actually use only very few of the available clock buffers in these two regions. As shown below, there are 12 BUFHCEs for both regions but most of them aren't utilized. So I don't think the routing of the clock was an issue. Instead and as you suggested Vivado happened to place all manually instantiated BUFHCEs into the same clock region and hence all FFs had to be in the same clock region. Having all FFs in one clock region but spanning the logic over multiple clock regions then caused a lot of routing work and unpredictable implementation times.

impl_3

I think this will also help to increase the reproducibility of SCA results. I will do a couple of measurements. But I suggest to merge this asap to get rid of the CI issues.

@vogelpi
Copy link
Contributor

vogelpi commented Sep 10, 2021

I've now done in total 3 runs locally using the fix proposed in this PR. None of these runs did have excessive implementation time. I am thus merging this PR now to stop the CI issues.

@tjaychen
Copy link
Author

thanks for such a thorough check @vogelpi ! i want to preserver this thought process in the documents so I pushed #8148 to link to your comment. Can you let me know what you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants