Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge certificate parsing speed regression between mbedtls 2.16.9 and 2.16.10 (constant time base64). #4814

Closed
Faless opened this issue Jul 26, 2021 · 5 comments · Fixed by #4835, #4819 or #5108
Assignees
Labels
bug size-m Estimated task size: medium (~1w)

Comments

@Faless
Copy link
Contributor

Faless commented Jul 26, 2021

Summary

We recently started noticing huge performance regression in Godot Engine TLS certificates parsing code.

System information

Mbed TLS version (number or commit id): v2.16.11 (since 2.16.10)
Operating system and version: Linux (Ubuntu 20.04)/Any.

Steps to reproduce

The mbedtls_x509_crt_parse became much slower between version v2.16.9 and v2.16.10.

It seems form our flamegraph (also attached below with highlight) that this is due to the constant flow table access introduced in: 738d231 .

I understand this was introduced to fix a potential side-channel attack on PEM key decoding:

Guard against strong local side channel attack against base64 tables by
making access aceess to them use constant flow code. (CVE-2021-24119)

The commit though, does that for all base64 decoding, which means this also applies to certificate decoding. I don't have the full flamegraph for the 2.16.9 release (I can provide that if you wish), but from raw printing I've seen a 2 order of magnitudes increase in the time to parse our CA list (from ~3 ms to ~500 ms).

I am not sure if certificate parsing needs to be protected against this side-channels attack, but this is having quite an impact on our editor startup time so I felt it was worth reporting.

flamegraph

@gilles-peskine-arm
Copy link
Contributor

The base64 module doesn't know whether it's working on sensitive data or not. So it always uses a constant-time method.

I see four possibilities.

  • We live with fast code that has a timing side channel. That's not good.
  • We live with slow constant-time code. That's not good either.
  • We duplicate the base64 code, and the PEM code, and call the slow-constant-time functions for private keys and the fast-leaky functions for public keys. That increases the code size significantly, which is also not good.
  • We manage to write a base64 implementation that is both fast (or at least fast enough) and constant-time. Can we do that?

The original base64 implementation in Mbed TLS was table-based. To make it constant-time, we went the conceptually simple route of replacing table lookups by constant-time table lookups. This is fairly slow since it means 128 table lookups when decoding. The table is likely to be in cache, but even so that turns out to make for a significant slowdown.

A while ago I started working on a constant-time base64 implementation that used a different approach, based on value ranges. I've completed my work (as far as coding is concerned, some bits of documentation and maybe code polish are missing), and I also wrote a dedicated benchmark program. This is currently based on an older version of Mbed TLS because that's what my patch was for. If this approach is considered acceptable, I'll port the patch to current versions of Mbed TLS. https://github.com/gilles-peskine-arm/mbedtls/tree/base64-no-table-2.16.9

Some timings for build-O2/programs/x509/load_roots /usr/share/ca-certificates/mozilla/{*,*,*,*} on my Ubuntu 20.04 PC (Core i7-7600U):

2.16.9 2.16.11 my patch
11 ms 428 ms 29 ms

So there's still a significant slowdown, but it's an order of magnitude better than the current version.

@mpg
Copy link
Contributor

mpg commented Jul 27, 2021

I think your patch looks very promising, @gilles-peskine-arm! @Faless would the new performance be acceptable?

@Faless
Copy link
Contributor Author

Faless commented Jul 27, 2021

@gilles-peskine-arm that's some really great work!

I've run some in-engine tests, here are the results (Core i7-7700K, Ubuntu 20.04):

Debug

2.16.9 2.16.11 your patch
3ms 506ms 19 ms

Optimized (-O3)

2.16.9 2.16.11 your patch
1,3ms 82,5ms 4,4ms

I believe some of the difference is due to the fact that your test program reads from file, while in Godot we read directly from memory (packed in a compressed array by the build system) so my timings do not include file system I/O.

This looks much better. I understand having a dedicated code path for unsafe bas64 would be complex to properly maintain, and I think this is quite acceptable performance for us @mpg .

Many thanks for your work @gilles-peskine-arm !

@gilles-peskine-arm
Copy link
Contributor

gilles-peskine-arm commented Jul 28, 2021

Pull request for 2.16: #4819. Please note that this has not been reviewed yet. It passes the unit tests but no one other than me has checked that the code is constant time.

Performance is slightly better than yesterday's patch because I noticed a possible simplification. It's still way above non-constant time code, but I think the difference is reasonable now.

@Faless
Copy link
Contributor Author

Faless commented Jul 28, 2021

Performance is slightly better than yesterday's patch because I noticed a possible simplification.

I've tested the current PR. In debug mode it's twice as fast, optimized builds saw less gain, but still pretty noticeable:

Debug

2.16.9 old patch new patch
3ms 19ms 10 ms

Optimized (-O3)

2.16.9 old patch new patch
1,3ms 4,4ms 3,0ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment