-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http server occasionally experiences high latency #3419
Comments
There are a few likely possibilities here:
|
@bdarnell First of all, thank you very much for your reply~ I debugged according to the above 3 points: I used Apache JMeter to stress test the Tornado interface, with 500 concurrent requests and a duration of 180 seconds. 2、Poor load balancing At the same time, I observed that there is DefaultHostMatches logic in tornado/routing.py, which seems to be used for routing judgment? I don't know what actions tornado will perform when xheaders=true. I found that this parameter will increase the time consumption when I tested it with Jmeter. Maybe it has something to do with my operation? 3、Tornado uses single-process mode I tried to directly set the number of CPU cores to 1, and the POD replicas were also expanded accordingly. The performance time consumption under the same pressure of Jmeter is not optimistic. . . 4、Full of resources |
These messages that say "poll %.3f ms took %.3f ms" are harmless - they're reporting on internal state of the asyncio event loop (basically, how long the loop sat idle waiting for network activity). This message was removed in python 3.8. The message you're looking for is "Executing %s took %.3f seconds". If you see any of those, you have a problem with a callback that's running too long.
This mainly affects logging: With xheaders=True, logs will show the IP address from the X-Real-IP header instead of the IP from the TCP connection (which is the load balancer). There's a tiny amount of extra work involved when xheaders=True but I would be surprised if it were a measurable difference.
Hmm, this sounds like the configuration I would recommend so it's worth investigating it further to see what's going on.
OK. The question you need to answer is whether the CPU usage is balanced across all the cores or if some cores are at 100% while others are less. Or, when you run one core per pod with more pods, whether the CPU usage is balanced across the pods. |
Can you give a complete code example? |
1、Environment Information
Tornado Version: TornadoServer/6.1
Python Version: 3.7
Docker Version: 20.10.7
My Tornado Web Server runs in K8s, so the traffic is forwarded by Envoy. The client uses HTTP1.1 requests by default, so HTTP Keep-Alive is used between Envoy and Tornado Pod
Envoy--->Tornado Pod
Tornado startup sample code is as follows:
Tornado Pod cpu limit=4
2、Problems encountered
When I performed stress testing on the Tornado interface, the original interface response delay should be within 100ms, but after 1 minute of stress testing, the interface response delay rose to more than 300ms, which is intolerable for our business.
We used tcpdump to capture packets and check the communication, and found the following situation:
172.29.222.1 is the envoy ip, 172.29.86.34 is the tornado web pod ip
Serial number 20498: envoy forwards the request to tornado pod (time: 11:17:57.3055)
Serial number 20499: tornado pod responds with ACK (time: 11:17:57.3056)
Serial number 22235: tornado pod responds with http data (time: 11:17:57.799)
The entire HTTP response time is 494ms
The request entered the tornado pod at 11:17:57.3056, but we observed the tornado access log and found that tornado actually started processing the request at 11:17:57.751, and completed the processing at 11:17:57.799, sending an http response
We don't know where the time is spent. At the same time, is there any tool to observe the time consumption of this part?
I hope to get your reply. This problem has troubled us for a long time. Thank you~
The text was updated successfully, but these errors were encountered: