Huge certificate parsing speed regression between mbedtls 2.16.9 and 2.16.10 (constant time base64). #4814

Faless · 2021-07-26T13:39:13Z

Summary

We recently started noticing huge performance regression in Godot Engine TLS certificates parsing code.

System information

Mbed TLS version (number or commit id): v2.16.11 (since 2.16.10)
Operating system and version: Linux (Ubuntu 20.04)/Any.

Steps to reproduce

The mbedtls_x509_crt_parse became much slower between version v2.16.9 and v2.16.10.

It seems form our flamegraph (also attached below with highlight) that this is due to the constant flow table access introduced in: 738d231 .

I understand this was introduced to fix a potential side-channel attack on PEM key decoding:

Guard against strong local side channel attack against base64 tables by
making access aceess to them use constant flow code. (CVE-2021-24119)

The commit though, does that for all base64 decoding, which means this also applies to certificate decoding. I don't have the full flamegraph for the 2.16.9 release (I can provide that if you wish), but from raw printing I've seen a 2 order of magnitudes increase in the time to parse our CA list (from ~3 ms to ~500 ms).

I am not sure if certificate parsing needs to be protected against this side-channels attack, but this is having quite an impact on our editor startup time so I felt it was worth reporting.

The text was updated successfully, but these errors were encountered:

gilles-peskine-arm · 2021-07-26T20:43:07Z

The base64 module doesn't know whether it's working on sensitive data or not. So it always uses a constant-time method.

I see four possibilities.

We live with fast code that has a timing side channel. That's not good.
We live with slow constant-time code. That's not good either.
We duplicate the base64 code, and the PEM code, and call the slow-constant-time functions for private keys and the fast-leaky functions for public keys. That increases the code size significantly, which is also not good.
We manage to write a base64 implementation that is both fast (or at least fast enough) and constant-time. Can we do that?

The original base64 implementation in Mbed TLS was table-based. To make it constant-time, we went the conceptually simple route of replacing table lookups by constant-time table lookups. This is fairly slow since it means 128 table lookups when decoding. The table is likely to be in cache, but even so that turns out to make for a significant slowdown.

A while ago I started working on a constant-time base64 implementation that used a different approach, based on value ranges. I've completed my work (as far as coding is concerned, some bits of documentation and maybe code polish are missing), and I also wrote a dedicated benchmark program. This is currently based on an older version of Mbed TLS because that's what my patch was for. If this approach is considered acceptable, I'll port the patch to current versions of Mbed TLS. https://github.com/gilles-peskine-arm/mbedtls/tree/base64-no-table-2.16.9

Some timings for build-O2/programs/x509/load_roots /usr/share/ca-certificates/mozilla/{*,*,*,*} on my Ubuntu 20.04 PC (Core i7-7600U):

2.16.9	2.16.11	my patch
11 ms	428 ms	29 ms

So there's still a significant slowdown, but it's an order of magnitude better than the current version.

mpg · 2021-07-27T07:48:23Z

I think your patch looks very promising, @gilles-peskine-arm! @Faless would the new performance be acceptable?

Faless · 2021-07-27T13:53:32Z

@gilles-peskine-arm that's some really great work!

I've run some in-engine tests, here are the results (Core i7-7700K, Ubuntu 20.04):

Debug

2.16.9	2.16.11	your patch
3ms	506ms	19 ms

Optimized (-O3)

2.16.9	2.16.11	your patch
1,3ms	82,5ms	4,4ms

I believe some of the difference is due to the fact that your test program reads from file, while in Godot we read directly from memory (packed in a compressed array by the build system) so my timings do not include file system I/O.

This looks much better. I understand having a dedicated code path for unsafe bas64 would be complex to properly maintain, and I think this is quite acceptable performance for us @mpg .

Many thanks for your work @gilles-peskine-arm !

gilles-peskine-arm · 2021-07-28T12:51:45Z

Pull request for 2.16: #4819. Please note that this has not been reviewed yet. It passes the unit tests but no one other than me has checked that the code is constant time.

Performance is slightly better than yesterday's patch because I noticed a possible simplification. It's still way above non-constant time code, but I think the difference is reasonable now.

Faless · 2021-07-28T15:08:48Z

Performance is slightly better than yesterday's patch because I noticed a possible simplification.

I've tested the current PR. In debug mode it's twice as fast, optimized builds saw less gain, but still pretty noticeable:

Debug

2.16.9	old patch	new patch
3ms	19ms	10 ms

Optimized (-O3)

2.16.9	old patch	new patch
1,3ms	4,4ms	3,0ms

mbedtls_x509_crt_parse_file() is slow. This affected startup time by a lot. See Mbed-TLS/mbedtls#4814

daverodgman added bug Community labels Jul 26, 2021

gilles-peskine-arm self-assigned this Jul 28, 2021

laumor01 added the Product Backlog label Jul 28, 2021

gilles-peskine-arm mentioned this issue Jul 28, 2021

Backport 2.16: range-based constant-flow base64 #4819

Merged

gilles-peskine-arm added the fix available label Jul 28, 2021

gilles-peskine-arm mentioned this issue Aug 3, 2021

Backport 2.2x: range-based constant-flow base64 #4835

Merged

This was linked to pull requests Aug 4, 2021

Backport 2.2x: range-based constant-flow base64 #4835

Merged

Backport 2.16: range-based constant-flow base64 #4819

Merged

gilles-peskine-arm added the size-m Estimated task size: medium (~1w) label Aug 6, 2021

gilles-peskine-arm assigned tom-daubney-arm Aug 6, 2021

This was referenced Sep 9, 2021

Move base64 constant-time functions to the new module #4926

Closed

Move constant-time functions into a separate module #3649

Closed

krispraws mentioned this issue Sep 22, 2021

mbedtls upgrade from 2.24.0 to 2.26.0 causes significant performance regression in mbedtls_base64_encode fluent/fluent-bit#4110

Closed

opatomic added a commit to opatomic/opacli-c that referenced this issue Oct 12, 2021

manually parse CA certs rather than using mbedtls's code

160791c

mbedtls_x509_crt_parse_file() is slow. This affected startup time by a lot. See Mbed-TLS/mbedtls#4814

gilles-peskine-arm mentioned this issue Oct 25, 2021

range-based constant-flow base64 #5108

Merged

mpg closed this as completed in #5108 Oct 27, 2021

akien-mga mentioned this issue Nov 22, 2021

sha256 hash function on Windows in Godot 3.4 takes close to 10x as long as it does to 3.3 godotengine/godot#55193

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Huge certificate parsing speed regression between mbedtls 2.16.9 and 2.16.10 (constant time base64). #4814

Huge certificate parsing speed regression between mbedtls 2.16.9 and 2.16.10 (constant time base64). #4814

Faless commented Jul 26, 2021

gilles-peskine-arm commented Jul 26, 2021

mpg commented Jul 27, 2021

Faless commented Jul 27, 2021

gilles-peskine-arm commented Jul 28, 2021 •

edited

Loading

Faless commented Jul 28, 2021

Huge certificate parsing speed regression between mbedtls 2.16.9 and 2.16.10 (constant time base64). #4814

Huge certificate parsing speed regression between mbedtls 2.16.9 and 2.16.10 (constant time base64). #4814

Comments

Faless commented Jul 26, 2021

Summary

System information

Steps to reproduce

gilles-peskine-arm commented Jul 26, 2021

mpg commented Jul 27, 2021

Faless commented Jul 27, 2021

gilles-peskine-arm commented Jul 28, 2021 • edited Loading

Faless commented Jul 28, 2021

gilles-peskine-arm commented Jul 28, 2021 •

edited

Loading