Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wip: Add multi-connection support in H2 conn pool. #7852

Closed

Conversation

conqerAtapple
Copy link
Contributor

Signed-off-by: Jojy G Varghese [email protected]

For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md

Description: Currently H2 connection pool can create only ONE upstream connection. There are use cases that demand a policy based approach to connections in the H2 pool More background can be found at:
https://github.com/envoyproxy/envoy/issues/7403

The policy could be described as below example :

 clusters:                                                                      
    - name: cluster-name                                                   
      connect_timeout: 0.25s       
      connection_policy:
           max_requests_per_connection: 1  
           drain_on_overflow: yes // Should drain or keep in overflow list
           idle_timeout: 500ms     // Time after which the connection can be closed.                   
                                                  
      ...
      ...                

Risk Level: High
Testing: Unit tests, Integration tests to be added
Docs Changes: This change will require new configuration for describing the policy.
Release Notes:
[Optional Fixes #Issue]
[Optional Deprecated:]

@conqerAtapple
Copy link
Contributor Author

@mattklein123 @alyssawilk This is a first take on the issue #7403.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on. In order to aid in review, can you potentially add some high level comments for all the interfaces and functions in the header files? That will help me give some initial feedback on the implementation. Thank you!

/wait

@@ -646,6 +646,25 @@ using ProtocolOptionsConfigConstSharedPtr = std::shared_ptr<const ProtocolOption
*/
class ClusterTypedMetadataFactory : public Envoy::Config::TypedMetadataFactory {};

class ConnectionRequestPolicySubscriber {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add interface comments for these classes and functions? It's not immediately clear to me why these policies need to live in upstream.h. Can we potentially move them to a new file? I'm not sure how they are going to get configured.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added comments and TODOs hoping to make the intent clearer. There is no strong reason for having the policy interfaces in upstream.h. The only reason i went with it for now is due to its proximity to ClusterInfo. Since the policies are related (1:1) to ClusterInfo, they could be declared together. I am open to suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks this is more clear. IMO I would simplify this for right now and not worry about a public interface, cluster info, etc. I would probably just build this config directly into the a new http2 connection pool options config:

core.Http2ProtocolOptions http2_protocol_options = 14;
. You can then plumb that config directly from cluster info -> the http2 connection pool much like we do for the existing http2 protocol options.

In terms of these interfaces and implementation, I would either just define them right now directly in the http2 conn pool code, or make new files in that area to contain them. We can always make this more generic later but IMO it's fine to define this where we are going to use this for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123 Thanks! How do you visualize the Htt2ConnectionPolicy config? I suppose something like

 clusters:                                                                      
    - name: cluster-name                                                   
      connect_timeout: 0.25s       
      connection_policy:
           max_requests_per_connection: 1  
           drain_on_overflow: yes // Should drain or keep in overflow list
           idle_timeout: 500ms     // Time after which the connection can be closed.                   
                                                  
      http2_protocol_options: {...}
      ...       
                   

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this yes, but per @alyssawilk we should see if we can also unify with HTTP/1.

// list when drain clients does not have any more requests being served.
// Connections remain in this list till:
// - Connection is closed.
std::list<ActiveClientPtr> to_close_clients_;
ActiveClientPtr primary_client_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably primary/draining clients are no longer needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. I will remove them.

@@ -651,6 +651,14 @@ ClusterInfoImpl::extensionProtocolOptions(const std::string& name) const {
return nullptr;
}

const ConnectionRequestPolicy& ClusterInfoImpl::connectionPolicy() const {
if (!connection_policy_) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt this is thread safe? But it's clear to me yet where this is used and whether it needs to live here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in its current form. The idea is that the Policy implementation would be injected but I am not sure how that would be implemented. I might need help with that.

@conqerAtapple
Copy link
Contributor Author

@mattklein123 PTAL. Thank you for initial feedback. Looking forward to more feedback.

@conqerAtapple
Copy link
Contributor Author

@mattklein123 I just rebased with master . I am getting link errors in protobuf:

bazel-out/host/bin/external/com_google_protobuf/_objs/protoc/main.o:main.cc:function _GLOBAL__sub_I__ZN6google8protobuf8compiler12ProtobufMainEiPPc: error: undefined reference to 'std::ios_base::Init::~Init()'
bazel-out/host/bin/external/com_google_protobuf/_objs/protoc/main.o:main.cc:function google::protobuf::compiler::ProtobufMain(int, char**): error: undefined reference to 'operator delete(void*)'
bazel-out/host/bin/external/com_google_protobuf/_objs/protoc/main.o:main.cc:function google::protobuf::compiler::ProtobufMain(int, char**): error: undefined reference to 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)'
bazel-out/host/bin/external/com_google_protobuf/_objs/protoc/main.o:main.cc:function google::protobuf::compiler::ProtobufMain(int, char**): error: undefined reference to 'operator delete(void*)'
bazel-out/host/bin/external/com_google_protobuf/_objs/protoc/main.o:main.cc:function google::protobuf::compiler::ProtobufMain(int, char**): error: undefined reference to 'operator delete(void*)'
bazel-out/host/bin/external/com_google_protobuf/_objs/protoc/main.o:main.cc:function google::protobuf::compiler::ProtobufMain(int, char**): error: undefined reference to 'operator delete(void*)'
bazel-out/host/bin/external/com_google_protobuf/_objs/protoc/main.o:main.cc:function google::protobuf::compiler::ProtobufMain(int, char**): error: undefined reference to 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)'
bazel-out/host/bin/external/com_google_protobuf/_objs/protoc/main.o:main.cc:function google::protobuf::compiler::ProtobufMain(int, char**): error: undefined reference to 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)'
bazel-out/host/bin/external/com_google_protobuf/_objs/protoc/main.o:main.cc:function google::protobuf::compiler::ProtobufMain(int, char**): error: undefined reference to 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)'

Wondering what i am missing.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. This is going to need very detailed review, but in general this seems like it's on the right track. I think modulo my high level comments that I added, it's probably fine to start actually getting tests passing, add new tests, etc. Thank you!

/wait

@@ -646,6 +646,25 @@ using ProtocolOptionsConfigConstSharedPtr = std::shared_ptr<const ProtocolOption
*/
class ClusterTypedMetadataFactory : public Envoy::Config::TypedMetadataFactory {};

class ConnectionRequestPolicySubscriber {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks this is more clear. IMO I would simplify this for right now and not worry about a public interface, cluster info, etc. I would probably just build this config directly into the a new http2 connection pool options config:

core.Http2ProtocolOptions http2_protocol_options = 14;
. You can then plumb that config directly from cluster info -> the http2 connection pool much like we do for the existing http2 protocol options.

In terms of these interfaces and implementation, I would either just define them right now directly in the http2 conn pool code, or make new files in that area to contain them. We can always make this more generic later but IMO it's fine to define this where we are going to use this for now.

@@ -81,19 +94,63 @@ class ConnPoolImpl : public ConnectionPool::Instance, public ConnPoolImplBase {

virtual CodecClientPtr createCodecClient(Upstream::Host::CreateConnectionData& data) PURE;
virtual uint32_t maxTotalStreams() PURE;
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please clean out the commented out code here and elsewhere.

// Connections that are waiting to be closed. Connections are moved to this
// list when drain clients does not have any more requests being served.
// Connections remain in this list till:
// - Connection is closed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this state needed? Can't we just close them directly vs. putting them on a new list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that a busy/active client could be transitioned to drained if the policy dictates a) to not accept any more requests on that client b) close the client once all current requests are finished.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I don't follow. Once a client can be closed why does it need to be on a list?


// Make sure all clients are destroyed before we are destroyed.
ConnPoolImpl::~ConnPoolImpl() {
drainConnections();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All connections need to be closed here in this case like happened previously, not just put in the draining state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It happens in the method drainConnections. It in turn calls checkForDrained which then calls close on all closable clients. I thought once we move all the clients to drain state, it implies that we dont accept any new requests and just wait for existing requests to finish .

@@ -259,17 +388,60 @@ void ConnPoolImpl::onStreamReset(ActiveClient& client, Http::StreamResetReason r
}
}

void ConnPoolImpl::onUpstreamReady() {
// Establishes new codec streams for each pending request.
while (!pending_requests_.empty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the HTTP/2 case I think we can still have multiple pending requests per connection created, right? Don't we still need a loop to deal with one we get a new connection?

@conqerAtapple
Copy link
Contributor Author

conqerAtapple commented Aug 8, 2019

@mattklein123 Thanks for reviewing. I wanted to check once again what i might be missing. After rebasing with master, i am unable to build:

ERROR: /home/jojy/.cache/bazel/_bazel_jojy/d11de52fa5a5d3ec50a154631da518da/external/com_github_grpc_grpc/BUILD:470:1: Linking of rule '@com_github_grpc_grpc//:grpc_cpp_plugin' failed (Exit 1) gcc failed: error executing command /usr/bin/gcc -o bazel-out/host/bin/external/com_github_grpc_grpc/grpc_cpp_plugin -pthread
 -pthread -Wl,-S '-fuse-ld=gold' -Wl,-no-as-needed -Wl,-z,relro,-z,now -B/usr/bin -pass-exit-codes ... (remaining 3 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
bazel-out/host/bin/external/com_github_grpc_grpc/_objs/grpc_cpp_plugin/cpp_plugin.o:cpp_plugin.cc:function ProtoBufService::method(int) const: error: undefined reference to 'operator new(unsigned long)'
bazel-out/host/bin/external/com_github_grpc_grpc/_objs/grpc_cpp_plugin/cpp_plugin.o:cpp_plugin.cc:function ProtoBufFile::service(int) const: error: undefined reference to 'operator new(unsigned long)'
bazel-out/host/bin/external/com_github_grpc_grpc/_objs/grpc_cpp_plugin/cpp_plugin.o:cpp_plugin.cc:function google::protobuf::io::StringOutputStream::~StringOutputStream(): error: undefined reference to 'operator delete(void*)'
bazel-out/host/bin/external/com_github_grpc_grpc/_objs/grpc_cpp_plugin/cpp_plugin.o:cpp_plugin.cc:function ProtoBufMethod::~ProtoBufMethod(): error: undefined reference to 'operator delete(void*)'
bazel-out/host/bin/external/com_github_grpc_grpc/_objs/grpc_cpp_plugin/cpp_plugin.o:cpp_plugin.cc:function ProtoBufService::~ProtoBufService(): error: undefined reference to 'operator delete(void*)'
bazel-out/host/bin/external/com_github_grpc_grpc/_objs/grpc_cpp_plugin/cpp_plugin.o:cpp_plugin.cc:function ProtoBufFile::~ProtoBufFile(): error: undefined reference to 'operator delete(void*)'
bazel-out/host/bin/external/com_github_grpc_grpc/_objs/grpc_cpp_plugin/cpp_plugin.o:cpp_plugin.cc:function ProtoBufFile::CreatePrinter(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) const: error: undefined reference to 'operator new(unsigned long)'
bazel-out/host/bin/external/com_github_grpc_grpc/_objs/grpc_cpp_plugin/cpp_plugin.o:cpp_plugin.cc:function void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) [clone .isra.92]: error: undefined reference to 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)'
bazel-out/host/bin/external/com_github_grpc_grpc/_objs/grpc_cpp_plugin/cpp_plugin.o:cpp_plugin.cc:function void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) [clone .isra.92]: error: undefined reference to 'std::__throw_logic_error(char const*)'


it looks like a missing ldflag(libstdc++)?

@mattklein123
Copy link
Member

@conqerAtapple sorry not sure I would hop into #envoy-dev and ask there. cc @lizan @PiotrSikora

@lizan
Copy link
Member

lizan commented Aug 9, 2019

@conqerAtapple make sure you have Bazel >= 0.28.0, ideally what .bazelversion file says.

- removed old commented reside
- added override keyword.

Signed-off-by: Jojy G Varghese <[email protected]>
@conqerAtapple
Copy link
Contributor Author

@lizan thanks! that worked.

* this interface as a contract between the policy implementation and policy
* user(subscriber).
*/
class ConnectionRequestPolicySubscriber {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a super cursory pass at this and while I'm really happy to get this work into HTTP/2 I'm worried about the code overlap between the HTTP/1 multiple-connection management and HTTP/2.

I think we might do better here if we tried to factor out / enhance the connection management we already have, and then use it in the H2 connection pool. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this. This was going to be my suggestion as well once we get the code a bit more stabilized. I think there will be significant duplication. Up to you whether you want to try to do it in HTTP/2 first and then figure out the overlap or try to factor out first... (There already is a common base class you can work from.)

- Added ConnectionPolicy interface in Mock.
- Fixed conn pool cleanup.

Signed-off-by: Jojy G Varghese <[email protected]>
@stale
Copy link

stale bot commented Aug 21, 2019

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Aug 21, 2019
@mattklein123 mattklein123 added the no stalebot Disables stalebot from closing an issue label Aug 23, 2019
@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Aug 23, 2019
@stale
Copy link

stale bot commented Oct 10, 2019

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Oct 10, 2019
@stale
Copy link

stale bot commented Oct 17, 2019

This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale stalebot believes this issue/PR has not been touched recently waiting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants