-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System Design Doc #16
Conversation
Signed-off-by: Arko Dasgupta <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest following Envoy's dir structure by putting docs in ./docs instead of ./design
design/SYSTEM_DESIGN.md
Outdated
#### User Config | ||
This configuration is based on the [Gateway API](https://gateway-api.sigs.k8s.io) and will provide: | ||
* Infrastructure Management capabilities for the Infrastructure Administrator to provision the infrastructure required to run EnvoyProxy. | ||
This is expressed using the [Gateway resource](https://gateway-api.sigs.k8s.io/concepts/api-overview/#gateway). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Envoy proxy config is expressed using GC and GW resources. The GC optionally contains Envoy config parameters. If not parameterRefs
are provided, then a matching GW instance will cause EG to provision an Envoy fleet with its default config (we should specify what Envoy's default config is).
Signed-off-by: Arko Dasgupta <[email protected]>
Signed-off-by: Arko Dasgupta <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Largely in agreement with the direction of this design, added a few comments. Thanks for your work on this @arkodg!
docs/design/SYSTEM_DESIGN.md
Outdated
|
||
#### Workflow | ||
1. The Infrastructure Administrator spawns an Envoy Gateway process and provides a Bootstrap Configuration as part of the commandline argument. | ||
2. They will configure a [GatewayClass resource](https://gateway-api.sigs.k8s.io/concepts/api-overview/#gatewayclass), and optionally modify default values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make use of GatewayClass parameters for customizing details of provisioned Gateways - we could do this using a ConfigMap, or by defining a custom CRD that can be referenced from a GatewayClass.
As a reference Contour has a CRD called ContourDeployment
(https://github.com/projectcontour/contour/blob/main/apis/projectcontour/v1alpha1/contourdeployment.go) that can be used as GatewayClass parameters, and allows controlling things like whether Envoy is deployed as a DaemonSet or Deployment, the type of Service to provision for Envoy, etc.
Agree that some global EG settings will still be controlled in the bootstrap config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm'd the modify line to keep the common case workflow easy to understand, please lmk if you prefer the doc to be explicit about it and I'll add it back in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or by defining a custom CRD that can be referenced from a GatewayClass
+1 on defining a config API instead of using a configmap, xref.
rm'd the modify line to keep the common case workflow easy to understand, please lmk if you prefer the doc to be explicit about it and I'll add it back in
+1 on keeping the language high-level in this doc.
docs/design/SYSTEM_DESIGN.md
Outdated
|
||
#### Workflow | ||
1. The Infrastructure Administrator spawns an Envoy Gateway process and provides a Bootstrap Configuration as part of the commandline argument. | ||
2. They will configure a [GatewayClass resource](https://gateway-api.sigs.k8s.io/concepts/api-overview/#gatewayclass), and optionally modify default values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest "one or more GatewayClass resources"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made it explicit in the last section / Design Decisions, wanted to keep it simple here, highlight the common case
|
||
### Components | ||
|
||
#### Config Sources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have questions about how we'll handle merging config across different sources - seems like it can be a thorny issue if not everything is in etcd. OK with deferring the details for now, but we'll have to figure it out sooner rather than later if we do want to support multiple sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are all opt-in, based on the bootstrap spec, so imho the platform admin knows what they are signing up for if they include more config sources.
I added a line message service
- It can also aggregate resources from multiple publishers allowing configuration from individual config sources to be aggregated before being processed by the translation layers.
. But Im thinking of removing it since it adds details not needed in this generic document. We can delay this decision when we build out the config source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xref #30 to capture the details of config merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also related to multi-cluster. I realize MC is P1, but how does that fit into all of this? Should we at least mention where we see MC fitting into this? Will there be a meta controller? Will each bootstrap have to manually point at multiple config sources with distributed policy? Etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we havent brainstormed on MC yet 🙈 , mainly because we have not discussed the requirements yet - is this MultiCluster Ingress use case (purely north south) with external clients sending traffic to Tier1/Edge/Front Proxies which route to Tier2/BU Proxies or east-west with internal services routing to other internal services in another cluster using the Tier2/BU in each cluster or both 😄 . Once we know the WHAT, it will require us to answer the following for the HOW
- Architecture - is it a peer-to-peer arch or a hub-spoke architecture
- Config Source - What is the source of truth for intent
- Service Resolver - Which service registry does this resolver use to resolve endpoints
- Service Registry - How is this service registry computed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we transition the MC discussion to #9 where I define
... Gateway should support multi-cluster service networking methods defined by sig-multicluster..."
@mattklein123 @arkodg I have provided additional information to #9, so PTAL.
Signed-off-by: Arko Dasgupta <[email protected]>
Signed-off-by: Arko Dasgupta <[email protected]>
* Create a detailed design and interface specification for each system component. | ||
|
||
### Architecture | ||
![Architecture](../images/architecture.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we all agree on this arch, will work with getting a better illustration out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LukeShu @skriss @youngnick please comment here so @arkodg can nail down the diagram.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to do this @arkodg, but I looked over this a bit, and I think we should consider if we want the xds server and provisioner to be running inside the same container or be separate processes.
This diagram implies they're running inside the same container, but I think that the separate-process model has some advantages, in that it unlocks the ability to easily have one provisioner instance manage many xds control planes (since they are 1:1 with an Envoy fleet). I think it might be possible to have a single xds server process serve different config to different Envoy fleets, but that seems very risky to me - getting that wrong could very easily lead to big security problems.
I think a better representation will have two boxes, talking to the apiserver, one that is the provisioner, that has a "creates" relationship with the one (or more) xDS server boxes, which contain the architecture outlined above.
This probably isn't a blocker, for this PR merging, but I think that since we want Envoy Gateway as a whole to be able to handle multiple GatewayClasses, we'll end up with something like what I describe here anyway.,
Signed-off-by: Arko Dasgupta <[email protected]>
* Create a detailed design and interface specification for each system component. | ||
|
||
### Architecture | ||
![Architecture](../images/architecture.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LukeShu @skriss @youngnick please comment here so @arkodg can nail down the diagram.
|
||
### Components | ||
|
||
#### Config Sources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xref #30 to capture the details of config merging.
docs/design/SYSTEM_DESIGN.md
Outdated
|
||
#### Config Manager | ||
This component consumes the Bootstrap Config, and spawns the appropriate internal services in Envoy Gateway based on the config spec. For e.g. if the platform field | ||
in the Bootsrap Config is set to `kubernetes`, the config manager will instantiate kubernetes controller services that implement the Config Source, Service Resolver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
I suggest removing field
since it implies that the config API is already spec'd.
s/config manager/Config Manager/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding field
makes the example more concrete. Can revisit/edit this statement if it doesn't reflect the actual spec
Signed-off-by: Arko Dasgupta <[email protected]>
Signed-off-by: Arko Dasgupta <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry to be the downer here, but I think that this design should call out that the provisioner and the xDS server will need to be separate binaries to be able to achieve the goals we want around being able to reconcile multiple GatewayClasses.
However, I think that we should merge this first and then iterate rather than holding this up. This is what we all want in broad strokes, let's record that we agree, and improve afterwards.
* Create a detailed design and interface specification for each system component. | ||
|
||
### Architecture | ||
![Architecture](../images/architecture.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to do this @arkodg, but I looked over this a bit, and I think we should consider if we want the xds server and provisioner to be running inside the same container or be separate processes.
This diagram implies they're running inside the same container, but I think that the separate-process model has some advantages, in that it unlocks the ability to easily have one provisioner instance manage many xds control planes (since they are 1:1 with an Envoy fleet). I think it might be possible to have a single xds server process serve different config to different Envoy fleets, but that seems very risky to me - getting that wrong could very easily lead to big security problems.
I think a better representation will have two boxes, talking to the apiserver, one that is the provisioner, that has a "creates" relationship with the one (or more) xDS server boxes, which contain the architecture outlined above.
This probably isn't a blocker, for this PR merging, but I think that since we want Envoy Gateway as a whole to be able to handle multiple GatewayClasses, we'll end up with something like what I describe here anyway.,
hey @youngnick regarding #16 (comment)
|
Signed-off-by: Arko Dasgupta <[email protected]>
@mattklein123 the design doc is close to being merged. When you have a moment, PTAL. |
Let's talk more about provisioner and xds-server separation later, this LGTM until we can have that conversation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with getting this merged and resolving the remaining feedback in subsequent PRs. @arkodg can you create tracker issues for any remaining feedback so it's not lost? Let's wait to merge until @mattklein123 has a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks at a high level this LGTM. A flushed out some comments that I think will help for people coming to this doc without as much context. Thank you!
docs/design/SYSTEM_DESIGN.md
Outdated
This is expressed using [HTTPRoute](https://gateway-api.sigs.k8s.io/concepts/api-overview/#httproute) and [TLSRoute](https://gateway-api.sigs.k8s.io/concepts/api-overview/#tlsroute). | ||
|
||
#### Workflow | ||
1. The Infrastructure Administrator spawns an Envoy Gateway process using a Bootstrap Configuration to manage a fleet of Envoy Proxies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per above can you clarify and example of what this means in Kubernetes in terms of what it would do and not do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arkodg maybe my example in #16 (comment) is better suited here.
|
||
### Components | ||
|
||
#### Config Sources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also related to multi-cluster. I realize MC is P1, but how does that fit into all of this? Should we at least mention where we see MC fitting into this? Will there be a meta controller? Will each bootstrap have to manually point at multiple config sources with distributed policy? Etc.
Signed-off-by: Arko Dasgupta <[email protected]>
thanks for the review @mattklein123 ! added more examples, and linked the md sections to reduce confusion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM to ship and iterate, thanks!
Fixes: #1
Signed-off-by: Arko Dasgupta [email protected]