-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Big memory consumed when using pipeline #203
Comments
nutcracker does its own memory management for mbufs. To make sure it doesn't do it -- change this part of code |
I meet the same problem, after run this test:
I found that twemproxy consume 1.6G of memeory, and the memory will not free after the client shutdown.
100_1000_52(msg length) = 5MB if we deploy twemporxy and redis on the same machine. they will killed by OOMKiller. I'm trying to fix it. |
Hi manjuraj:
|
I use valgrind found that the memory is used by mbuf, because default mbuf size is 16K, it's too large for you request. there are 3 advise:
|
@liu21yd I think your nutcracker is running with 100% of one cpu core, I think you need more nutcracker instances. |
@idning thanks for your response, but why it need more memory when cpu is busy? to store the commands in pipeline? |
apply this patch and see if it helps (warning, I haven't tested it) diff --git a/src/nc_mbuf.c b/src/nc_mbuf.c
index 50aa59a..42c28d6 100644
--- a/src/nc_mbuf.c
+++ b/src/nc_mbuf.c
@@ -118,13 +118,7 @@ mbuf_free(struct mbuf *mbuf)
void
mbuf_put(struct mbuf *mbuf)
{
- log_debug(LOG_VVERB, "put mbuf %p len %d", mbuf, mbuf->last - mbuf->pos);
-
- ASSERT(STAILQ_NEXT(mbuf, next) == NULL);
- ASSERT(mbuf->magic == MBUF_MAGIC);
-
- nfree_mbufq++;
- STAILQ_INSERT_HEAD(&free_mbufq, mbuf, next);
+ mbuf_free(mbuf)
} |
on cpu consumption, see this - #158 (comment) |
@manjuraj I might think this patch break twemproxy's performance in other actions. |
@charsyam the patch disables twemproxy from using a free q for mbuf. It uses the operating system's malloc and free to manage memory for mbuf |
@manjuraj Yes. you're right. but I think One of twemproxy's strength is managing mbuf it self like memory pool. |
The problem that @liu21yd is encountering is that he/she subjects twemproxy to a burst of pipelined request. This leads to an increased allocation of mbuf and after the burst dies, the previously allocated mbufs continue to be in the free pool and are not returned back to the OS. As a result she/he see resident-memory values of 10G (or something like that). The way to ensure that doesn't happen is to a) disable mbuf freeq in twemproxy, or b) provide a threshold parameter that would ensure that twemproxy would only manage mbuf memory to a certain limit (like 1G) and any allocations above that are managed by OS. (b) lets you get best of both worlds |
@manjuraj I also think (b) is best. and how about giving limitation value in conf? |
yeah, I think (b) is good middle ground |
Unfortunately, we applied the patch you suggesting, and ran our tests, it was unuseful. void mbuf_put(struct mbuf *mbuf) { // log_debug(LOG_VVERB, "put mbuf %p len %d", mbuf, mbuf->last - mbuf->pos); // // ASSERT(STAILQ_NEXT(mbuf, next) == NULL); // ASSERT(mbuf->magic == MBUF_MAGIC); // // nfree_mbufq++; // STAILQ_INSERT_HEAD(&free_mbufq, mbuf, next); mbuf_free(mbuf); } Besides, based on the above patch, we applied another patch, like this: void msg_put(struct msg *msg) { log_debug(LOG_VVERB, "put msg %p id %"PRIu64"", msg, msg->id); while (!STAILQ_EMPTY(&msg->mhdr)) { struct mbuf *mbuf = STAILQ_FIRST(&msg->mhdr); mbuf_remove(&msg->mhdr, mbuf); mbuf_put(mbuf); } // nfree_msgq++; // TAILQ_INSERT_HEAD(&free_msgq, msg, m_tqe); msg_free(msg); } It also didn't work. But, at this time, we checked and recorded the stats of twemproxy in the process of the test, this maybe could help you to analyze and fix this issue and give us some advice. Fri Mar 7 19:05:51 CST 2014 {"service":"nutcracker", "source":"192.168.1.110", "version":"0.3.0", "uptime":60, "timestamp":1394190351, "writer": {"client_eof":0, "client_err":0, "client_connections":0, "server_ejects":0, "forward_error":0, "fragments":0, "192.168.1.110:6001": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":0, "server_ejected_at":0, "requests":0, "request_bytes":0, "responses":0, "response_bytes":0, "in_queue":0, "in_queue_bytes":0, "out_queue":0, "out_queue_bytes":0},"192.168.1.110:6002": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":0, "server_ejected_at":0, "requests":0, "request_bytes":0, "responses":0, "response_bytes":0, "in_queue":0, "in_queue_bytes":0, "out_queue":0, "out_queue_bytes":0}}} Fri Mar 7 19:05:54 CST 2014 {"service":"nutcracker", "source":"192.168.1.110", "version":"0.3.0", "uptime":63, "timestamp":1394190354, "writer": {"client_eof":0, "client_err":0, "client_connections":240, "server_ejects":0, "forward_error":0, "fragments":0, "192.168.1.110:6001": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":1, "server_ejected_at":0, "requests":290165, "request_bytes":26812704, "responses":275245, "response_bytes":1133134, "in_queue":16, "in_queue_bytes":2736, "out_queue":14904, "out_queue_bytes":1389370},"192.168.1.110:6002": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":1, "server_ejected_at":0, "requests":236117, "request_bytes":22729182, "responses":235997, "response_bytes":975032, "in_queue":31, "in_queue_bytes":5189, "out_queue":89, "out_queue_bytes":9454}}} Fri Mar 7 19:06:30 CST 2014 {"service":"nutcracker", "source":"192.168.1.110", "version":"0.3.0", "uptime":99, "timestamp":1394190390, "writer": {"client_eof":0, "client_err":0, "client_connections":400, "server_ejects":0, "forward_error":0, "fragments":0, "192.168.1.110:6001": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":1, "server_ejected_at":0, "requests":13205991, "request_bytes":1211720915, "responses":3418198, "response_bytes":14063533, "in_queue":9758660, "in_queue_bytes":895414027, "out_queue":29133, "out_queue_bytes":2270239},"192.168.1.110:6002": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":1, "server_ejected_at":0, "requests":6862631, "request_bytes":751595146, "responses":3256944, "response_bytes":13656769, "in_queue":3580334, "in_queue_bytes":394467044, "out_queue":25353, "out_queue_bytes":2776957}}} Fri Mar 7 19:06:33 CST 2014 {"service":"nutcracker", "source":"192.168.1.110", "version":"0.3.0", "uptime":102, "timestamp":1394190393, "writer": {"client_eof":0, "client_err":0, "client_connections":400, "server_ejects":1, "forward_error":106460, "fragments":0, "192.168.1.110:6001": {"server_eof":0, "server_err":1, "server_timedout":0, "server_connections":0, "server_ejected_at":1394190392234820, "requests":13209106, "request_bytes":1211992601, "responses":3418198, "response_bytes":14063533, "in_queue":0, "in_queue_bytes":0, "out_queue":0, "out_queue_bytes":0},"192.168.1.110:6002": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":1, "server_ejected_at":0, "requests":7463043, "request_bytes":810447031, "responses":3421994, "response_bytes":14351538, "in_queue":4015739, "in_queue_bytes":434603174, "out_queue":25310, "out_queue_bytes":2988179}}} Fri Mar 7 19:08:38 CST 2014 {"service":"nutcracker", "source":"192.168.1.110", "version":"0.3.0", "uptime":227, "timestamp":1394190518, "writer": {"client_eof":0, "client_err":0, "client_connections":400, "server_ejects":53, "forward_error":126961, "fragments":0, "192.168.1.110:6001": {"server_eof":0, "server_err":53, "server_timedout":0, "server_connections":0, "server_ejected_at":1394190510571647, "requests":14035498, "request_bytes":1287899296, "responses":3418198, "response_bytes":14063533, "in_queue":286718, "in_queue_bytes":26249247, "out_queue":0, "out_queue_bytes":0},"192.168.1.110:6002": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":1, "server_ejected_at":0, "requests":21767898, "request_bytes":2215822059, "responses":21031903, "response_bytes":87507329, "in_queue":702210, "in_queue_bytes":71225864, "out_queue":33785, "out_queue_bytes":3300524}}} |
I ran into this same problem a year ago. Twemproxy used up to 10GB of RAM. By the time I think I was using Unfortunately I also dont remember the circumstances that triggered the issue... :( |
@liu21yd are you using -m 512? With the new patch, did the resident memory values drop after the test? Also use the patch that I gave you |
we see that in uptime 99 your client has send 13,205,991+6,862,631 request to one nutcracker instance:
it's absolutely overload. and response cnt is 3,418,198+3,256,944, which means you got 13 millon request in queue you got in_queue:
but acutally the memory used is::
and in uptime 227, message in the in_queue is still:
and your memory usage now is:
so you see you nutcracker instance consume your are abuse of nutcracker. advise:
may be you need a deploy manager like this: https://github.com/idning/redis-mgr |
@manjuraj I am using -m 512. The the resident memory values didn't drop when I stopped the test, and the same thing happened when using the patch you giving. |
@idning,thank you so much for your analysis and advice. These may help us optimize our system and fix this issue. |
@idning nice analysis. Good job :) |
nutcracker always try to receive all data at the client side::
if the client write to the socket, it will always success, (something like the problem is that client do not know when to stop sending request, I think we can add a config like |
@idning you would just be introducing max-queue config option to solve a benchmark usecase, which is not ideal. The right thing to do is fix benchmark or workaround using a timeout |
I think that twemproxy should has some overload protection, on attack or client-side bug, twemproxy will use a lot of memory, it's a risk if we deploy redis and twemproxy on on the same machine. we shoule close connection when quota over-run
|
Expecting max_memory option to be provided. |
I think the reason memory not freed after |
Hi all:
The text was updated successfully, but these errors were encountered: