-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for client-side QUIC 0-RTT #16260
Conversation
This PR introduces EnvoyQuicSessionCache, which is responsible for saving QUIC+TLS session tickets which are required for 0-RTT. I've confirmed that the new integration test from this PR would reproduce the 0-RTT crash that we saw previouslybut had since been fixed. Signed-off-by: David Schinazi <[email protected]>
/assign @danzh2010 |
Signed-off-by: David Schinazi <[email protected]>
Signed-off-by: David Schinazi <[email protected]>
/retest |
Retrying Azure Pipelines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this should also address #16146.
constexpr size_t kMaxSessionCacheEntries = 1024; | ||
|
||
// Returns false if the SSL |session| doesn't exist or it is expired at |system_time|. | ||
bool isSessionValid(SSL_SESSION* session, SystemTime system_time) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: if you make this a private member function, it can directly access system_time_ and no need to pass in system_time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no system_time_
member. I think you might be confusing it with time_source_
which is different - we want to avoid computing the time from the time source too often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I meant to access time_source_ directly. time_source_.systemTime()
is always called before isSessionValid(). For readability, I think either renaming system_time
to now
here or making this function a class member.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I've renamed system_time
to now
.
|
||
SSL_SESSION* peekSession(); | ||
|
||
bssl::UniquePtr<SSL_SESSION> sessions[2]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why size of 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that's sufficient in practice, it's the value Chrome uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind commenting about the reason? Otherwise a queue-like data structure might be more preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mind adding a bit more detail for those of with less QUIC-savvy? is it that session information changes infrequently? If so why do we need two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
void EnvoyQuicSessionCache::createAndInsertEntry(const quic::QuicServerId& server_id, | ||
bssl::UniquePtr<SSL_SESSION> session, | ||
const quic::TransportParameters& params, | ||
const quic::ApplicationState* application_state) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit confusing about the life time of application_state. Can you comment that it's now owned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input parameter isn't owned. The fact that it's a const pointer makes the ownership model clear to me. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, typo "not" for "now". If we create ApplicationState instance for each entry here, why does doApplicationStatesMatch() compare pointers?
And Lookup() also create ApplicationState for return value. Probably QUICHE somehow requires duplication this object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, typo "not" for "now".
I don't think we need to comment that a const pointer is not owned, as that's very common usage.
If we create ApplicationState instance for each entry here, why does doApplicationStatesMatch() compare pointers?
doApplicationStatesMatch() dereferences the pointers and compares the values. It looks at pointers because they can be nullptr.
And Lookup() also create ApplicationState for return value. Probably QUICHE somehow requires duplication this object.
I'm not sure what you're asking? QUICHE passes us a const pointer and we need to make a copy that we own.
ASSERT(cache_.size() < kMaxSessionCacheEntries); | ||
Entry entry; | ||
entry.pushSession(std::move(session)); | ||
entry.params = std::make_unique<quic::TransportParameters>(params); | ||
if (application_state != nullptr) { | ||
entry.application_state = std::make_unique<quic::ApplicationState>(*application_state); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: consider wrapping these into Entry constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is a struct which needs a default constructor, I think this is overall simpler?
namespace Quic { | ||
namespace { | ||
|
||
constexpr size_t kMaxSessionCacheEntries = 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Envoy doesn't use kCamelCaseConstantName convention for const. MAX_SESSION_CACHE_SIZE instead?
Do this needs to be configuration? @alyssawilk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with MaxSessionCacheEntries
which is preferred by the Envoy style guide because it doesn't conflict with macros
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll want config eventually but that can be a TODO (either for David, against QUIC upstream, I vote for the former =P) with a comment that the hard-coded value is arbitrary
Also @RyanTheOptimist look, another bounded cache! now that I look they're everywhere =P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can file an issue to make this configurable once this lands. Though I don't think it's a priority
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking more that having configurable 0-rtt config and integration tests is worthwhile for envoy mobile :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's a configurable 0-RTT config?
This PR adds an integration test.
Signed-off-by: David Schinazi <[email protected]>
void EnvoyQuicSessionCache::createAndInsertEntry(const quic::QuicServerId& server_id, | ||
bssl::UniquePtr<SSL_SESSION> session, | ||
const quic::TransportParameters& params, | ||
const quic::ApplicationState* application_state) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, typo "not" for "now". If we create ApplicationState instance for each entry here, why does doApplicationStatesMatch() compare pointers?
And Lookup() also create ApplicationState for return value. Probably QUICHE somehow requires duplication this object.
constexpr size_t kMaxSessionCacheEntries = 1024; | ||
|
||
// Returns false if the SSL |session| doesn't exist or it is expired at |system_time|. | ||
bool isSessionValid(SSL_SESSION* session, SystemTime system_time) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I meant to access time_source_ directly. time_source_.systemTime()
is always called before isSessionValid(). For readability, I think either renaming system_time
to now
here or making this function a class member.
|
||
SSL_SESSION* peekSession(); | ||
|
||
bssl::UniquePtr<SSL_SESSION> sessions[2]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind commenting about the reason? Otherwise a queue-like data structure might be more preferred.
Signed-off-by: David Schinazi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks fantastic - it'll be great to have 0-rtt working and even better to regression-proof that crash :-)
|
||
constexpr size_t MaxSessionCacheEntries = 1024; | ||
|
||
// Returns false if the SSL |session| doesn't exist or it is expired at |now|. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Envoy doesn't usually do ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
EnvoyQuicSessionCache::~EnvoyQuicSessionCache() = default; | ||
|
||
void EnvoyQuicSessionCache::Insert(const quic::QuicServerId& server_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The duplication between this and QuicClientSessionCache is a bit of a bummer.
Optionally, as a follow-up, maybe we can implement more of the base logic in the quiche class and either template away time or have pure virtuals for time bits. but that'd be a follow-up no matter what and should wait until we've sorted out any upstream scalability problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's QuicClientSessionCache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, yes we're going to have duplication between Chrome and Envoy :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, but modulo "can we refactor away the time abstraction" seems reasonable to stick this one in quiche at some point.
} | ||
|
||
std::unique_ptr<quic::QuicResumptionState> | ||
EnvoyQuicSessionCache::Lookup(const quic::QuicServerId& server_id, const SSL_CTX* /*ctx*/) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive by question - why is SSL_CTX unused here and in quiche? Should we clean up upstream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some clients may want to use different CCL_CTX'es. We could remove it but it isn't causing any issues. Either way that would be a QUICHE CL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, sorry, quiche == upstream in this context. Not something you'd do in situ but if it's not used anywhere might be worth adding our quiche clean up list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG
bssl::UniquePtr<SSL_SESSION> session, | ||
const quic::TransportParameters& params, | ||
const quic::ApplicationState* application_state) { | ||
prune(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we walk the entire list on insertion? Maybe make that clear in prune comments()
Also definitely worth a comment up by that 1024 that it should stay small and why. eek!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O(1024) is negligible compared to the cryptographic operations required to create a new connection.
Added comments though
|
||
SSL_SESSION* peekSession(); | ||
|
||
bssl::UniquePtr<SSL_SESSION> sessions[2]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mind adding a bit more detail for those of with less QUIC-savvy? is it that session information changes infrequently? If so why do we need two?
// already stored. | ||
void pushSession(bssl::UniquePtr<SSL_SESSION> session); | ||
|
||
// Retrieves the latest session from the entry, meanwhile removing it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retrieves and removes the latest session...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
std::unique_ptr<quic::ApplicationState> application_state; | ||
}; | ||
|
||
// Remove all entries that are no longer valid. If after that we are still at the size limit, also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optionally
If after that we are still at the size limit, also remove ->
If all entries were valid but the cache is at its size limit, instead removes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
public: | ||
// From TimeSource. | ||
SystemTime systemTime() override { return fake_time_; } | ||
MonotonicTime monotonicTime() override { return MonotonicTime(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mind commenting why you can't just use simtime's monotonicTime too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: David Schinazi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG modulo one more | nit!
|
||
EnvoyQuicSessionCache::~EnvoyQuicSessionCache() = default; | ||
|
||
void EnvoyQuicSessionCache::Insert(const quic::QuicServerId& server_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
std::unique_ptr<quic::QuicResumptionState> | ||
EnvoyQuicSessionCache::Lookup(const quic::QuicServerId& server_id, const SSL_CTX* /*ctx*/) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, sorry, quiche == upstream in this context. Not something you'd do in situ but if it's not used anywhere might be worth adding our quiche clean up list.
Entry(Entry&&) noexcept; | ||
~Entry(); | ||
|
||
// Adds a new |session| onto sessions, dropping the oldest one if two are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto | here and below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: David Schinazi <[email protected]>
if (state == other) { | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering why we compare the pointer of two different objects. But it seems to be only possible when both state
and other
are nullptr? I slightly prefer to make it explicit instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is needed to avoid a dereference when they're both nullptr. I personally prefer this style of comparison because it is correct for all inputs. The fact that callers won't ever send the same non-null pointer as both parameters is an implementation details that doesn't need to be visible in this function.
return session_time <= now_u64 + 1 && now_u64 < session_expiration; | ||
} | ||
|
||
bool doApplicationStatesMatch(const quic::ApplicationState* state, quic::ApplicationState* other) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: making other
const as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: David Schinazi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, modulo Dan's sign-off
constexpr uint32_t kTimeout = 1000; | ||
const quic::QuicVersionLabel kFakeVersionLabel = 0x01234567; | ||
const quic::QuicVersionLabel kFakeVersionLabel2 = 0x89ABCDEF; | ||
const uint64_t kFakeIdleTimeoutMilliseconds = 12012; | ||
const uint8_t kFakeStatelessResetTokenData[16] = {0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, | ||
0x98, 0x99, 0x9A, 0x9B, 0x9C, 0x9D, 0x9E, 0x9F}; | ||
const uint64_t kFakeMaxPacketSize = 9001; | ||
const uint64_t kFakeInitialMaxData = 101; | ||
const bool kFakeDisableMigration = true; | ||
const auto kCustomParameter1 = static_cast<quic::TransportParameters::TransportParameterId>(0xffcd); | ||
const char* kCustomParameter1Value = "foo"; | ||
const auto kCustomParameter2 = static_cast<quic::TransportParameters::TransportParameterId>(0xff34); | ||
const char* kCustomParameter2Value = "bar"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: renaming these const according to Envoy naming convention. (some of them are only used once, plz consider inlining them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed the constants to the Envoy naming convention. I kept the ones used only once because they make the code more readable.
Signed-off-by: David Schinazi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for adding this!
This PR introduces EnvoyQuicSessionCache, which is responsible for saving QUIC+TLS session tickets which are required for 0-RTT. Risk Level: Low Testing: unit tests, integration test Docs Changes: None Release Notes: None Platform Specific Features: None Closes envoyproxy#16234 Signed-off-by: David Schinazi <[email protected]> Signed-off-by: Gokul Nair <[email protected]>
This PR introduces EnvoyQuicSessionCache, which is responsible for saving QUIC+TLS session tickets which are required for 0-RTT. Risk Level: Low Testing: unit tests, integration test Docs Changes: None Release Notes: None Platform Specific Features: None Closes envoyproxy#16234 Signed-off-by: David Schinazi <[email protected]> Signed-off-by: Gokul Nair <[email protected]>
This PR introduces EnvoyQuicSessionCache, which is responsible
for saving QUIC+TLS session tickets which are required for 0-RTT.
I've confirmed that the new integration test from this PR would
reproduce the 0-RTT crash that we saw previouslybut had since
been fixed.
Risk Level: Low
Testing: Local
Docs Changes: None
Release Notes: None
Platform Specific Features: None
Closes #16234