Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[request]: Provide support for Envoy's zone aware routing #94

Open
rlafferty opened this issue Aug 22, 2019 · 13 comments
Open

[request]: Provide support for Envoy's zone aware routing #94

rlafferty opened this issue Aug 22, 2019 · 13 comments
Assignees
Labels
feature Roadmap: Accepted We are planning on doing this work.

Comments

@rlafferty
Copy link

Tell us about your request
Support the ability to use envoy's zone aware routing.

Which integration(s) is this request for?
App Mesh and potentially Cloud Map

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
It would be beneficial to be able to use Envoy's zone aware routing to "preference" backends in the same AZ. This would reduce cross-AZ latency where possible. I believe the expectation is that the zone information would be provided by the service discovery tool - so in this case, Im unsure if Cloud Map already provides/retains that information.

Are you currently working around this issue?
Not currently able to, so going "without it"

@rlafferty rlafferty added the Roadmap: Proposed We are considering this for inclusion in the roadmap. label Aug 22, 2019
@rlafferty rlafferty changed the title [request]: describe request here [request]: Provide support for Envoy's zone aware routing Aug 22, 2019
@lavignes
Copy link

Hi @rlafferty. Thanks for opening this issue. This is definitely a worth-while feature to add.
I know today that ECS will publish Cloud Map attributes such as AVAILABILITY_ZONE:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-discovery.html

So hypothetically today, you could create a virtual nodes for each AZ that matches on the AVAILABILITY_ZONE Cloud Map attribute as a work-around. But that is obviously not ideal.

We could specify the AZ by querying the EC2 metadata endpoint and pass it along to Envoy's bootstrap locality: https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/core/base.proto#envoy-api-msg-core-node

Then we can query for zonal instances as an opt-in parameter in VirtualNode service discovery.. Not sure exactly how this should look yet, but we can start thinking about it.

@shubharao shubharao added Roadmap: Accepted We are planning on doing this work. and removed Roadmap: Proposed We are considering this for inclusion in the roadmap. labels Sep 26, 2019
@shubharao shubharao self-assigned this Sep 27, 2019
@joeykhashab
Copy link

@lavignes I am currently experimenting with the work around that you were suggesting. I am still a little unsure about how the zone-specific logic would work, when I setup my zone specific virtual nodes I still need to setup zone specific virtual services, virtual routers, and routes to them correct?

Once I do that, I guess I need to create a custom envoy docker image to handle the logic of "if in zone A, route the request using the zone A virtual router" right?

@lavignes
Copy link

Hi @joeykhashab

I still need to setup zone specific virtual services, virtual routers, and routes to them correct?

For now, yes unfortunately. Hence why this issue exists :(

In the route, you could specify two virtual nodes weighted targets: one with an AVAILABILITY_ZONE Cloud Map attribute and another without. Then you could weight the traffic primarily towards the first to gain some fail-over capabilities if the preferred AZ is seeing an outage.

Once I do that, I guess I need to create a custom envoy docker image to handle the logic of "if in zone A, route the request using the zone A virtual router" right?

That would not be necessary. You'd simply configure the backends of your virtual nodes to use the zone-specific virtual service. The Envoy image contains just some information about reaching our management server to download configuration. So you aren't able to target any specific virtual routers etc. It is all fetched at run-time.

All that said, I wouldn't recommend doing this manually. It would require creating a lot of duplicate mesh resources. My example above is more hypothetical as it does allow for routing to a specific AZ via service discovery, but there is no nice mechanism for handling fail-over.

@joeykhashab
Copy link

I see. OK I think I am understanding the work around better now. Thanks for your response @lavignes.

@claydanford
Copy link

Any update on this after 8 months?

@inakianduaga
Copy link

How realistic is it that this will be implemented in the next 6 to 12 months? We are currently considering AppMesh to potentially remove our ALB LCU costs by dropping ALBs for all internal traffic, but it feels that won't help if we get hit with CrossAZ costs for the traffic

@Rob-Johnson
Copy link

just adding a +1 for wanting to see this functionality. From what I can see, the method involving cloudmap attributes won't provide the same capabilities that envoy has natively, particularly making sure that the amount of traffic sent within the same az is kept proportionate with the capacity of the upstream cluster.

@herrhound herrhound assigned herrhound and unassigned shubharao Apr 30, 2021
@james-skinner-deltatre
Copy link

Any update on this one?

We are also looking to replace an internal ALB and if I understand the billing docs correctly (which is not always easy)

  1. If you use an ALB to communicate between AZs you dot not pay for cross AZ data transfer, only ALB LCU cost ($0.008/GB)
  2. If you use App Mesh do make the same call, you pay for the corss AZ data transfer costs ($0.02/GB), plus the CPU to run the Envoy sidecars

If this is correct, and I am trying to get support to confirm, a switch to App Mesh will cost us a lot more if we cannot do some zone aware routing.

@james-skinner-deltatre
Copy link

james-skinner-deltatre commented Jan 5, 2022

For anyone interested, I got confirmation from support on pricing:

  1. If I transfer 1GB of data from EC2 instance A to EC2 instance B which are in the same VPC and the same region but different AZs then I pay 0.1 + 0.1 = $0.2 for data transfer.
  1. If I then create an internal ALB in the same VPC and add EC2 instance B to be behind it, then transfer another 1GB from EC2 instance A to EC2 instance B, this time via the ALB, I do not pay the same $0.2 data transfer costs, only the ALB pricing as per https://aws.amazon.com/elasticloadbalancing/pricing/
  • Yes, That is correct

Doing some rough calculations on an existing ALB we have - if we replaced it with App Mesh we would go from paying $40/week in ALB costs to $630/week in cross-AZ data transfer costs, plus the added cost to run the envoy sidecars.

Clear deal breaker for us, which this feature would mitigate I hope.

UPDATE: Data transfer cost is $0.02/GB not $0.2 so it works out at ~$60 for App Mesh vs ~$40 for ALB

@herrhound
Copy link
Contributor

@james-skinner-deltatre, I feel your calculations are not exactly correct. First, the cross-AZ data transfer cost in most of the AWS Regions in the US is $0.01, ten times less than you assumed. Second, in most of the cases, you send no more than two thirds of the traffic to other AZs. Assuming your traffic is ~450 GB per day, that would result in ~$42 of data transfer charges per week, comparable with your current ALB cost.

The added cost of running Envoys is probably a larger concern. We recommend allocating 512 CPU units (0.5 vCPU) and 64 MiB of memory to the Envoy container, which results in additional cost of $0.02 per hour per Envoy on Fargate, or less on ECS. As always, your decision to use a service mesh should be guided by the benefits that you can get from using it.

Speaking of the AZ-aware routing capability, it is high on our roadmap. However, at the moment we can't share a specific timeframe for the release.

@james-skinner-deltatre
Copy link

@herrhound you're absolutely right, I dropped a 0 at some point. In this case it works out at $63/week which is much more acceptable (I did in fact factor in the two thirds)

The Envoy cost is a concern as we tend to run many small tasks, so it adds up.

This is the wrong thread for it but at the moment I am also seeing request latency increasing on App Mesh vs using an ALB which is not what I expected, but need to check I'm not off on the maths here too :-)

@herrhound
Copy link
Contributor

Hey James @james-skinner-deltatre, please ping me on our Slack channel with the details on increased App Mesh latency: https://awsappmesh.slack.com/archives/D011Z89UH1B

@kgns
Copy link

kgns commented Oct 11, 2023

Speaking of the AZ-aware routing capability, it is high on our roadmap. However, at the moment we can't share a specific timeframe for the release.

hi @herrhound, are there any updates on this feature request? is it still in researching stage?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Roadmap: Accepted We are planning on doing this work.
Projects
None yet
Development

No branches or pull requests