Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for client-side QUIC 0-RTT #16260

Merged
merged 9 commits into from
May 5, 2021

Conversation

DavidSchinazi
Copy link
Contributor

@DavidSchinazi DavidSchinazi commented Apr 30, 2021

This PR introduces EnvoyQuicSessionCache, which is responsible
for saving QUIC+TLS session tickets which are required for 0-RTT.

I've confirmed that the new integration test from this PR would
reproduce the 0-RTT crash that we saw previouslybut had since
been fixed.

Risk Level: Low
Testing: Local
Docs Changes: None
Release Notes: None
Platform Specific Features: None

Closes #16234

This PR introduces EnvoyQuicSessionCache, which is responsible
for saving QUIC+TLS session tickets which are required for 0-RTT.

I've confirmed that the new integration test from this PR would
reproduce the 0-RTT crash that we saw previouslybut had since
been fixed.

Signed-off-by: David Schinazi <[email protected]>
@DavidSchinazi
Copy link
Contributor Author

/assign @danzh2010

Signed-off-by: David Schinazi <[email protected]>
Signed-off-by: David Schinazi <[email protected]>
@DavidSchinazi
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16260 (comment) was created by @DavidSchinazi.

see: more, trace.

Copy link
Contributor

@danzh2010 danzh2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this should also address #16146.

constexpr size_t kMaxSessionCacheEntries = 1024;

// Returns false if the SSL |session| doesn't exist or it is expired at |system_time|.
bool isSessionValid(SSL_SESSION* session, SystemTime system_time) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: if you make this a private member function, it can directly access system_time_ and no need to pass in system_time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no system_time_ member. I think you might be confusing it with time_source_ which is different - we want to avoid computing the time from the time source too often.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I meant to access time_source_ directly. time_source_.systemTime() is always called before isSessionValid(). For readability, I think either renaming system_time to now here or making this function a class member.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I've renamed system_time to now.


SSL_SESSION* peekSession();

bssl::UniquePtr<SSL_SESSION> sessions[2];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why size of 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that's sufficient in practice, it's the value Chrome uses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind commenting about the reason? Otherwise a queue-like data structure might be more preferred.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mind adding a bit more detail for those of with less QUIC-savvy? is it that session information changes infrequently? If so why do we need two?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

void EnvoyQuicSessionCache::createAndInsertEntry(const quic::QuicServerId& server_id,
bssl::UniquePtr<SSL_SESSION> session,
const quic::TransportParameters& params,
const quic::ApplicationState* application_state) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit confusing about the life time of application_state. Can you comment that it's now owned?

Copy link
Contributor Author

@DavidSchinazi DavidSchinazi May 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input parameter isn't owned. The fact that it's a const pointer makes the ownership model clear to me. Am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, typo "not" for "now". If we create ApplicationState instance for each entry here, why does doApplicationStatesMatch() compare pointers?
And Lookup() also create ApplicationState for return value. Probably QUICHE somehow requires duplication this object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, typo "not" for "now".

I don't think we need to comment that a const pointer is not owned, as that's very common usage.

If we create ApplicationState instance for each entry here, why does doApplicationStatesMatch() compare pointers?

doApplicationStatesMatch() dereferences the pointers and compares the values. It looks at pointers because they can be nullptr.

And Lookup() also create ApplicationState for return value. Probably QUICHE somehow requires duplication this object.

I'm not sure what you're asking? QUICHE passes us a const pointer and we need to make a copy that we own.

Comment on lines 132 to 138
ASSERT(cache_.size() < kMaxSessionCacheEntries);
Entry entry;
entry.pushSession(std::move(session));
entry.params = std::make_unique<quic::TransportParameters>(params);
if (application_state != nullptr) {
entry.application_state = std::make_unique<quic::ApplicationState>(*application_state);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: consider wrapping these into Entry constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is a struct which needs a default constructor, I think this is overall simpler?

namespace Quic {
namespace {

constexpr size_t kMaxSessionCacheEntries = 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Envoy doesn't use kCamelCaseConstantName convention for const. MAX_SESSION_CACHE_SIZE instead?
Do this needs to be configuration? @alyssawilk

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with MaxSessionCacheEntries which is preferred by the Envoy style guide because it doesn't conflict with macros

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll want config eventually but that can be a TODO (either for David, against QUIC upstream, I vote for the former =P) with a comment that the hard-coded value is arbitrary

Also @RyanTheOptimist look, another bounded cache! now that I look they're everywhere =P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can file an issue to make this configurable once this lands. Though I don't think it's a priority

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking more that having configurable 0-rtt config and integration tests is worthwhile for envoy mobile :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's a configurable 0-RTT config?
This PR adds an integration test.

@alyssawilk alyssawilk self-assigned this May 3, 2021
Signed-off-by: David Schinazi <[email protected]>
void EnvoyQuicSessionCache::createAndInsertEntry(const quic::QuicServerId& server_id,
bssl::UniquePtr<SSL_SESSION> session,
const quic::TransportParameters& params,
const quic::ApplicationState* application_state) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, typo "not" for "now". If we create ApplicationState instance for each entry here, why does doApplicationStatesMatch() compare pointers?
And Lookup() also create ApplicationState for return value. Probably QUICHE somehow requires duplication this object.

constexpr size_t kMaxSessionCacheEntries = 1024;

// Returns false if the SSL |session| doesn't exist or it is expired at |system_time|.
bool isSessionValid(SSL_SESSION* session, SystemTime system_time) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I meant to access time_source_ directly. time_source_.systemTime() is always called before isSessionValid(). For readability, I think either renaming system_time to now here or making this function a class member.


SSL_SESSION* peekSession();

bssl::UniquePtr<SSL_SESSION> sessions[2];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind commenting about the reason? Otherwise a queue-like data structure might be more preferred.

Signed-off-by: David Schinazi <[email protected]>
Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks fantastic - it'll be great to have 0-rtt working and even better to regression-proof that crash :-)


constexpr size_t MaxSessionCacheEntries = 1024;

// Returns false if the SSL |session| doesn't exist or it is expired at |now|.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Envoy doesn't usually do ||

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


EnvoyQuicSessionCache::~EnvoyQuicSessionCache() = default;

void EnvoyQuicSessionCache::Insert(const quic::QuicServerId& server_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplication between this and QuicClientSessionCache is a bit of a bummer.

Optionally, as a follow-up, maybe we can implement more of the base logic in the quiche class and either template away time or have pure virtuals for time bits. but that'd be a follow-up no matter what and should wait until we've sorted out any upstream scalability problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's QuicClientSessionCache?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, yes we're going to have duplication between Chrome and Envoy :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, but modulo "can we refactor away the time abstraction" seems reasonable to stick this one in quiche at some point.

}

std::unique_ptr<quic::QuicResumptionState>
EnvoyQuicSessionCache::Lookup(const quic::QuicServerId& server_id, const SSL_CTX* /*ctx*/) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive by question - why is SSL_CTX unused here and in quiche? Should we clean up upstream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some clients may want to use different CCL_CTX'es. We could remove it but it isn't causing any issues. Either way that would be a QUICHE CL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, sorry, quiche == upstream in this context. Not something you'd do in situ but if it's not used anywhere might be worth adding our quiche clean up list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

bssl::UniquePtr<SSL_SESSION> session,
const quic::TransportParameters& params,
const quic::ApplicationState* application_state) {
prune();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we walk the entire list on insertion? Maybe make that clear in prune comments()
Also definitely worth a comment up by that 1024 that it should stay small and why. eek!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O(1024) is negligible compared to the cryptographic operations required to create a new connection.

Added comments though


SSL_SESSION* peekSession();

bssl::UniquePtr<SSL_SESSION> sessions[2];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mind adding a bit more detail for those of with less QUIC-savvy? is it that session information changes infrequently? If so why do we need two?

// already stored.
void pushSession(bssl::UniquePtr<SSL_SESSION> session);

// Retrieves the latest session from the entry, meanwhile removing it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retrieves and removes the latest session...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

std::unique_ptr<quic::ApplicationState> application_state;
};

// Remove all entries that are no longer valid. If after that we are still at the size limit, also
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optionally
If after that we are still at the size limit, also remove ->
If all entries were valid but the cache is at its size limit, instead removes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

public:
// From TimeSource.
SystemTime systemTime() override { return fake_time_; }
MonotonicTime monotonicTime() override { return MonotonicTime(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mind commenting why you can't just use simtime's monotonicTime too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: David Schinazi <[email protected]>
Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG modulo one more | nit!


EnvoyQuicSessionCache::~EnvoyQuicSessionCache() = default;

void EnvoyQuicSessionCache::Insert(const quic::QuicServerId& server_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

std::unique_ptr<quic::QuicResumptionState>
EnvoyQuicSessionCache::Lookup(const quic::QuicServerId& server_id, const SSL_CTX* /*ctx*/) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, sorry, quiche == upstream in this context. Not something you'd do in situ but if it's not used anywhere might be worth adding our quiche clean up list.

Entry(Entry&&) noexcept;
~Entry();

// Adds a new |session| onto sessions, dropping the oldest one if two are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto | here and below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: David Schinazi <[email protected]>
Comment on lines +27 to +29
if (state == other) {
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering why we compare the pointer of two different objects. But it seems to be only possible when both state and other are nullptr? I slightly prefer to make it explicit instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is needed to avoid a dereference when they're both nullptr. I personally prefer this style of comparison because it is correct for all inputs. The fact that callers won't ever send the same non-null pointer as both parameters is an implementation details that doesn't need to be visible in this function.

return session_time <= now_u64 + 1 && now_u64 < session_expiration;
}

bool doApplicationStatesMatch(const quic::ApplicationState* state, quic::ApplicationState* other) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: making other const as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: David Schinazi <[email protected]>
alyssawilk
alyssawilk previously approved these changes May 4, 2021
Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo Dan's sign-off

Comment on lines 11 to 23
constexpr uint32_t kTimeout = 1000;
const quic::QuicVersionLabel kFakeVersionLabel = 0x01234567;
const quic::QuicVersionLabel kFakeVersionLabel2 = 0x89ABCDEF;
const uint64_t kFakeIdleTimeoutMilliseconds = 12012;
const uint8_t kFakeStatelessResetTokenData[16] = {0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
0x98, 0x99, 0x9A, 0x9B, 0x9C, 0x9D, 0x9E, 0x9F};
const uint64_t kFakeMaxPacketSize = 9001;
const uint64_t kFakeInitialMaxData = 101;
const bool kFakeDisableMigration = true;
const auto kCustomParameter1 = static_cast<quic::TransportParameters::TransportParameterId>(0xffcd);
const char* kCustomParameter1Value = "foo";
const auto kCustomParameter2 = static_cast<quic::TransportParameters::TransportParameterId>(0xff34);
const char* kCustomParameter2Value = "bar";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: renaming these const according to Envoy naming convention. (some of them are only used once, plz consider inlining them)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed the constants to the Envoy naming convention. I kept the ones used only once because they make the code more readable.

Signed-off-by: David Schinazi <[email protected]>
Copy link
Contributor

@danzh2010 danzh2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding this!

@alyssawilk alyssawilk merged commit 691487d into envoyproxy:main May 5, 2021
gokulnair pushed a commit to gokulnair/envoy that referenced this pull request May 6, 2021
This PR introduces EnvoyQuicSessionCache, which is responsible
for saving QUIC+TLS session tickets which are required for 0-RTT.

Risk Level: Low
Testing: unit tests, integration test
Docs Changes: None
Release Notes: None
Platform Specific Features: None

Closes envoyproxy#16234

Signed-off-by: David Schinazi <[email protected]>
Signed-off-by: Gokul Nair <[email protected]>
gokulnair pushed a commit to gokulnair/envoy that referenced this pull request May 6, 2021
This PR introduces EnvoyQuicSessionCache, which is responsible
for saving QUIC+TLS session tickets which are required for 0-RTT.

Risk Level: Low
Testing: unit tests, integration test
Docs Changes: None
Release Notes: None
Platform Specific Features: None

Closes envoyproxy#16234

Signed-off-by: David Schinazi <[email protected]>
Signed-off-by: Gokul Nair <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

regression test QUIC crash caused by the import
4 participants