Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

负载性能 #5297

Merged
merged 5 commits into from
Mar 6, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 32 additions & 29 deletions TODO1/performance-under-load.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,69 +2,72 @@
> * 原文作者:[Netflix Technology Blog Netflix Technology Blog](https://medium.com/@NetflixTechBlog)
> * 译文出自:[掘金翻译计划](https://github.com/xitu/gold-miner)
> * 本文永久链接:[https://github.com/xitu/gold-miner/blob/master/TODO1/performance-under-load.md](https://github.com/xitu/gold-miner/blob/master/TODO1/performance-under-load.md)
> * 译者:
> * 校对者:
> * 译者:[WangLeto](https://github.com/WangLeto)
> * 校对者:[sunui](https://github.com/sunui),[xionglong58](https://github.com/xionglong58)

# Performance Under Load
# 负载性能

*Adaptive Concurrency Limits @ Netflix*
> by Eran Landau, William Thurston, Tim Bozarth
**Netflix 的自适应并发限制**

At Netflix we are obsessed with service availability, and we’ve written several blog posts over the years about how we achieve our goals. These techniques include circuit breakers, concurrency limits, chaos testing and more. Today we’re announcing our most recent innovation: adaptive concurrency limits. Adaptive concurrency limits fundamentally improve how an application behaves under extreme load, and allow us to avoid cascading service failures. We have eliminated the arduous task of trying to determine the concurrency limit of a system, while ensuring that latencies remain low. With this announcement, we’re also open-sourcing a simple Java library with integrations for servlets, executors and GRPC.
> Eran Landau、William Thurston、Tim Bozarth 作

## A little background first
在 Netflix,我们沉迷于服务可用性的研究,关于如何实现目标这几年也写了几篇博客文章。这些技术包括了[断路器模式](https://zh.m.wikipedia.org/wiki/斷路器設計模式)、并发限制、混沌测试等等。现在我们宣布最近的创新:自适应并发限制。自适应并发限制从根本上提升了应用程序在极端负载下的表现,并使得我们可以避免级联服务故障。我们移除了判定系统并发限制的艰难工作,又保证了延迟保持较低水平。本文发布的同时,我们开源了一个简单的 Java 类库,集成了 servlet、executor 和 GRPC 框架支持。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些技术包括了断路器模式、并发限制、混沌测试等等。

Concurrency is nothing more than the number of requests a system can service at any given time and is normally driven by a fixed resource such as CPU. A system’s concurrency is normally calculated using Little’s law, which states: For a system at steady state, concurrency is the product of the average service time and the average service rate (L = 𝛌W). Any requests in excess of this concurrency cannot immediately be serviced and must be queued or rejected. With that said some queueing is necessary as it enables full system utilization in spite of non-uniform request arrival and service time.
## 简要的背景介绍

并发量其实就是系统在任意时刻能够处理的请求数,通常是由硬件资源决定的,比如 CPU。通常我们使用利特尔法则(Little’s Law)计算系统的并发量:稳定状态下的系统,并发量等于平均服务时间与平均服务速率的乘积(L = 𝛌W)。任何超过该并发量的请求都不能立即得到服务,必须排队或被拒绝。如此说来,即便请求到达速率和服务时间都不均匀,做些排队处理就能令整个系统被充分利用起来,因此它是很必要的。

![](https://cdn-images-1.medium.com/max/2468/1*XurJ5f2Hjf4lO-GspmCRIw.png)

Systems fail when no limit is enforced on this queue, such as during prolonged periods of time where the arrival rate exceeds the exit rate. As the queue grows so will latency until all requests start timing out and the system will ultimately run out of memory and crash. If left unchecked latency increases start adversely affecting it’s callers leading to cascading failures through the system.
当队列没有强制限制时系统会出问题,比如在很长的时间内请求到达速度都超过请求结束速度。随着队列增长,延迟也增长了,直到所有的请求都开始超时,最终系统耗尽内存然后崩溃。如果留下未经检测的延迟,它就会对其调用者产生不利影响,从而导致系统的级联故障。

![](https://cdn-images-1.medium.com/max/2432/1*HuSIJZzGk7RSeJbnINF-DQ.png)

Enforcing concurrency limits is nothing new; the hard part is figuring out this limit in a large dynamic distributed system where concurrency and latency characteristics are constantly changing. The main purpose of our solution is to dynamically identify this concurrency limit. This limit can be seen as maximum number of inflight requests (concurrency + queue) allowed before performance (i.e. latency) starts to degrade.
强制进行并发限制不是什么新奇玩意儿;难的是在大型的动态分布式系统中找到并发量限制,而且这个系统的并发量和延迟特征还在不断改变。我们解决方案给出的主要目的,就是动态地确定并发量限制。这一并发量可以被视为在系统性能下降前(如延迟)所允许的最大请求数量(实时并发量+排队中的请求)。

## The solution
## 解决方案

Historically, at Netflix we’ve manually configured fixed concurrency limits measured via an arduous process of performance testing and profiling. While this provided an accurate value at that moment in time, the measured limit would quickly become stale as a system’s topology changes due to partial outages, auto-scaling or from code pushes that impact latency characteristics.
以前,在 Netflix 我们手动配置了由复杂的性能测试和分析得到的并发限制。虽然这在某一时刻很准确,但是一旦系统部分停运导致拓扑结构变化、系统自动扩展(auto scaling)或是推送代码上线引起延迟特征变化,这一测量值都会很快过时。

We knew we could to do better than static concurrency limits, so we sought to automatically identify a system’s inherent concurrency limit in a way that would:
我们知道我们可以做得比静态的并发限制更好,所以我们探索了自动确定系统固有并发限制的方案,该方案:

1. Require no manual work
1. 不需要人工

2. Require no centralized coordination
2. 不需要集中协调

3. Infer the limit without any knowledge of the hardware or system topology
3. 不需要硬件信息或系统拓扑结构就能确定并发限制

4. Adapt to changes in system topology
4. 能适应系统的拓扑结构变动

5. Easy to compute and enforce
5. 计算容易,执行简单

To solve this problem we turned to tried and true TCP congestion control algorithms that seek to determine how many packets may be transmitted concurrently (i.e. congestion window size) without incurring timeouts or increased latency. These algorithms keep track of various metrics to estimate the system’s concurrency limit and constantly adjust the congestion window size.
为了解决该问题,我们转向尝试真正的 TCP 拥塞控制算法,不会引起超时或增加延迟就能探明可以同时传输多少个数据包(比如拥塞窗口大小)。这些算法持续跟踪各种指标,以估算系统的并发限制,并持续调节拥塞窗口大小。

![](https://cdn-images-1.medium.com/max/2496/1*rWdqQuqi50OJNLnGeDgo1w.png)

Here we see a system with an unknown concurrency in blue. The client starts by sending requests at a low concurrency while frequently probing for higher concurrency by increasing the congestion window as long as latencies don’t go up. When latencies do increase the sender assumes to have reached the limit and backs off to a smaller congestion window size. The limit is continually probed resulting in the common saw-tooth pattern.
我们可以看到这个系统用蓝线代表了未知的实际并发量。客户端起初发送的请求被限制在较低的并发水平,系统在延迟不增加的情况下通过增加并发窗口频繁探测更高的并发量。当延迟真的开始增长时,传感器假定到达了并发极限,降回拥塞窗口大小。并发限制的探测持续进行,就产生了上图的锯齿状图像。

Our algorithm builds on latency based TCP congestion control algorithm that look at the ratio between the minimum latency (representing the best case scenario without queuing) and a time sampled latency measurement as a proxy to identifying that a queue has formed and is causing latencies to rise. This ratio gives us the gradient or magnitude of latency change: **gradient=(RTTnoload/RTTactual)** . A value of 1 indicates no queueing and that the limit can be increased. A value less than 1 indicates that an excessive queue has formed and that the limit should be decreased. With each new sample the limit adjusts using this ratio and adding an allowable queue size using the simple formula,
我们的算法建立在基于延迟的 TCP 拥塞控制算法的基础上,该算法查看最小的延迟(代表没有排队的最佳情况)除以按时间进行采样测量的延迟之间的比率,作为识别队列产生并开始导致延迟增加的衡量参照。从该比值可以获知延迟变化的梯度或级别:**gradient=(RTTnoload/RTTactual)** (译者注:[RTT](https://en.wikipedia.org/wiki/Round-trip_delay_time) 指 TCP 连接中一个数据包的往返时间,RTTnoload 指没有延迟的最小 RTT 值,RTTactual 指实际的 RTT 值)。值为 1 表示没有出现排队,可以增大并发限制;小于 1 的值表示形成了过多队列,应该减少并发限制。对于每个新样本,利用此比率调整并发限制,并使用一个简单公式增加所允许的队列大小:

**newLimit = currentLimit × gradient + queueSize**
```
newLimit = currentLimit × gradient + queueSize
```

After several iterations the algorithm converges on a limit that keeps latencies low while allowing for some queuing to account for bursts. Allowable queue size is tunable and is used to determine how quickly the limit may grow. We settled on a good default of the square root of the current limit. We chose square root mostly because it has the useful property of being large relative to the current limit for low numbers, thereby allowing for faster growth, but reduces for larger numbers for better stability.
经过几次迭代后,算法收敛到某个保持延迟较低的并发限制水平,同时允许一些排队处理来处理突发请求。所允许的队列大小是可调整的,并且该值决定了并发限制的增长速度。我们认定当前并发限制的算术平方根是个不错的默认值。选择算数平方根,主要是因为它能很大程度上反映出当前并发限制在较低请求数时的有用特性,允许迭代时快速增长,而在出现大量请求时又能灵活减小,从而保证了系统稳定。

## Adaptive Limits in Action
## 启用自适应并发限制

When enabled, adaptive server side limits reject excess RPS and keep latency low, allowing the instance to protect itself and the services it depends on. Without rejecting excess traffic, any sustained increase in RPS or latency previously translated to even worse latencies and ultimately system failure. Services are now able to shed excess load and keep latencies low while other mitigating actions such as auto-scaling kick into action.
启用后,自适应服务器端并发限制会拒绝过多的 RPS(Request Per Second) 来保持较低的延迟,从而保护实例自身及其所依赖的服务。如果不拒绝过多的并发请求,RPS 或者延迟的持续增加都会转化为更糟糕的延迟并最终导致系统故障。服务现在能够减少过多的负载并保持较低的延迟,而其他缓和措施(如自动扩展)则会起作用。

![](https://cdn-images-1.medium.com/max/2452/1*sfDL_PVx-lCAs3W4z_S0cQ.png)

It’s important to note that limits are enforced at the server level (with no coordination) and that traffic to each server can be fairly bursty. The discovered limit and number of concurrent requests can therefore vary from server to server, especially in a multi-tenant cloud environment. This can result in shedding by one server when there was enough capacity elsewhere. With that said, using client side load balancing a single client retry is nearly 100% successful at reaching an instance with available capacity. Better yet, there’s no longer a concern about retries causing DDOS and retry storms as services are able to shed traffic quickly in sub millisecond time with minimum impact to performance.
注意这一点很重要:在服务器层面强制执行并发限制(不进行协调),每个服务器的流量限制可能应用得相当迅速。 因此,得到的并发请求限制和数量在不同服务器间可能相差较大,从云服务商租用的多个服务器尤其如此。 当其他地方有足够的承载能力时,可能引起某台服务器脱离服务集群。话虽如此,使用了客户端侧负载均衡,客户端只需重试一次请求,几乎就能 100% 到达一台可用的服务实例。更棒的是,由于服务能够在亚毫秒时间内极速减少流量,这对服务性能的影响微乎其微,因此不用再担心客户端重试请求引起的 DDOS 和重试风暴(译者注:应该是指大量客户端因为请求被拒绝而反复重试服务请求)。

## Conclusion
## 总结

As we roll out adaptive concurrency limits we’re eliminating the need to babysit and manually tune how our services shed load. Even more, it does it while simultaneously improving the overall reliability and availability of our whole microservice-based ecosystem.
随着我们推出自适应的并发限制,我们不再需要像看小孩一样看着系统性能,然后手动调整服务负载。 更重要的是,它同时提高了我们整个基于微服务的生态系统的可靠性和可用性。

We’re excited to share our implementation and common integrations in a small open source library that can be found at [http://github.com/Netflix/concurrency-limits](http://github.com/Netflix/concurrency-limits). Our hope is that anyone interested in shielding their services from cascading failures and load-related latency degradation can take advantage of our code to achieve better availability. We look forward to feedback from the community and are happy to accept pull requests with new algorithms or integrations.
我们很高兴在一个小型开源库中分享我们的思路实现和常见框架集成:http://github.com/Netflix/concurrency-limits。 我们希望,任何想要保护其服务免于级联故障和与负载引起的延迟降级的人,都可以利用我们的代码来实现更好的可用性。 我们期待社区的反馈,并乐意接受新算法或框架集成的 Pull Request。

> 如果发现译文存在错误或其他需要改进的地方,欢迎到 [掘金翻译计划](https://github.com/xitu/gold-miner) 对译文进行修改并 PR,也可获得相应奖励积分。文章开头的 **本文永久链接** 即为本文在 GitHub 上的 MarkDown 链接。

Expand Down