Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big memory consumed when using pipeline #203

Closed
liu21yd opened this issue Mar 5, 2014 · 26 comments
Closed

Big memory consumed when using pipeline #203

liu21yd opened this issue Mar 5, 2014 · 26 comments

Comments

@liu21yd
Copy link

liu21yd commented Mar 5, 2014

Hi all:

  we are using twemproxy with redis for sharding. There are 8 instances after proxy.When we used 80 clients, and each one sent about 100 ~ 200 commands one time to the proxy at the same time, the proxy's memory increased so quickly, after about 5 mimitues, it could consume 10G and then crashed.
we have used -m 512 and define NC_IOV_MAX to 512, it didn't work.

nutcracker

Could anyone help me fix this issue?Thanks.  
@manjuraj
Copy link
Collaborator

manjuraj commented Mar 5, 2014

  • how big are the individual commands?
  • what happens if you don't change the NC_IOV_MAX settings?

nutcracker does its own memory management for mbufs. To make sure it doesn't do it -- change this part of code

@idning
Copy link
Contributor

idning commented Mar 6, 2014

I meet the same problem, after run this test:

ning@ning-laptop:~/idning-github/redis-mgr$ cat tests/a.py 
import socket
import time

HOST = '127.0.0.5'
PORT = 24000

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))

data = '*3\r\n$3\r\nSET\r\n$13\r\nkkk-100001480\r\n$13\r\nvvv-100001480\r\n'
for i in range(100*1000):
    s.sendall(data)

ning@ning-laptop:~/idning-github/redis-mgr$ time python tests/a.py 

real    0m0.797s
user    0m0.280s
sys 0m0.120s

I found that twemproxy consume 1.6G of memeory, and the memory will not free after the client shutdown.

ning@ning-laptop:/tmp/r/nutcracker-24000$ ps aux | grep nut
ning      2017  0.5 14.8 1652068 1186692 ?     Sl   09:43   0:00 bin/nutcracker -d -c /tmp/r/nutcracker-24000/conf/nutcracker.conf -o /tmp/r/nutcracker-24000/log/nutcracker.log -p /tmp/r/nutcracker-24000/log/nutcracker.pid -s 25000 -v 4

100_1000_52(msg length) = 5MB
send 5MB to twemproxy, it will consume 1.6G memory...

if we deploy twemporxy and redis on the same machine. they will killed by OOMKiller.

I'm trying to fix it.

@liu21yd
Copy link
Author

liu21yd commented Mar 6, 2014

Hi manjuraj:

The commands we using are zrangeByScore,get,hgetall,zadd, mixed in the request.
"zrangeByScore" is sent just one to proxy at one time, and each time we try to get 100 returns, just like "zrangeByScore ul:100 (0 (3377649059136274431 0  100",  and each return's length is no more than 32 bytes.
"get" is sent 100 at one time using pipeline, the individual just like "get msgs:8839254021692500000", and each return's length is no more than 256 bytes.
"hgetall" is sent 100 at one time using pipeline, the individual just like "hgetall mc:8839254021692500000", and each return's length is no more than 64 bytes.
"zadd" is sent 100 at one time using pipeline, the individual just like "ZADD fr:200 100.0 100:user".
So the individual command's length is less than 64 bytes.

This thing happened before we changed the NC_IOV_MAX setting.So we try to change it, but it didn't work.

And we print some log at the code, found  many alloc happened in mbuf_split and msg_get in msg_parsed function.It looks like there are no free mbuf or msg in the free queue all the time.We think there must be a theoretical max value, is that right? 

@idning
Copy link
Contributor

idning commented Mar 6, 2014

I use valgrind found that the memory is used by mbuf, because default mbuf size is 16K,

it's too large for you request.

there are 3 advise:

  1. use smaller mbuf size by :

     bin/nutcracker -m 512
    
  2. modify the mbuf_put function, just free it:

    mbuf_put(struct mbuf *mbuf)
    {
        return mbuf_free(mbuf);
    }
    
  3. do not use large pipeline.

@idning
Copy link
Contributor

idning commented Mar 6, 2014

@liu21yd I think your nutcracker is running with 100% of one cpu core,
request and response will block in nutcracker and consume memory,

I think you need more nutcracker instances.

@hniychenwei
Copy link

@idning thanks for your response, but why it need more memory when cpu is busy? to store the commands in pipeline?
or which percent of cpu is acceptable for each nutcracker instance in practice?

@manjuraj
Copy link
Collaborator

manjuraj commented Mar 7, 2014

apply this patch and see if it helps (warning, I haven't tested it)

diff --git a/src/nc_mbuf.c b/src/nc_mbuf.c
index 50aa59a..42c28d6 100644
--- a/src/nc_mbuf.c
+++ b/src/nc_mbuf.c
@@ -118,13 +118,7 @@ mbuf_free(struct mbuf *mbuf)
 void
 mbuf_put(struct mbuf *mbuf)
 {
-    log_debug(LOG_VVERB, "put mbuf %p len %d", mbuf, mbuf->last - mbuf->pos);
-
-    ASSERT(STAILQ_NEXT(mbuf, next) == NULL);
-    ASSERT(mbuf->magic == MBUF_MAGIC);
-
-    nfree_mbufq++;
-    STAILQ_INSERT_HEAD(&free_mbufq, mbuf, next);
+    mbuf_free(mbuf)
 }

@manjuraj
Copy link
Collaborator

manjuraj commented Mar 7, 2014

on cpu consumption, see this - #158 (comment)

@charsyam
Copy link
Contributor

charsyam commented Mar 7, 2014

@manjuraj I might think this patch break twemproxy's performance in other actions.
(because twemproxy always recycle memory.)

@manjuraj
Copy link
Collaborator

manjuraj commented Mar 7, 2014

@charsyam the patch disables twemproxy from using a free q for mbuf. It uses the operating system's malloc and free to manage memory for mbuf

@charsyam
Copy link
Contributor

charsyam commented Mar 7, 2014

@manjuraj Yes. you're right. but I think One of twemproxy's strength is managing mbuf it self like memory pool.

@manjuraj
Copy link
Collaborator

manjuraj commented Mar 7, 2014

The problem that @liu21yd is encountering is that he/she subjects twemproxy to a burst of pipelined request. This leads to an increased allocation of mbuf and after the burst dies, the previously allocated mbufs continue to be in the free pool and are not returned back to the OS. As a result she/he see resident-memory values of 10G (or something like that).

The way to ensure that doesn't happen is to a) disable mbuf freeq in twemproxy, or b) provide a threshold parameter that would ensure that twemproxy would only manage mbuf memory to a certain limit (like 1G) and any allocations above that are managed by OS. (b) lets you get best of both worlds

@charsyam
Copy link
Contributor

charsyam commented Mar 7, 2014

@manjuraj I also think (b) is best. and how about giving limitation value in conf?

@manjuraj
Copy link
Collaborator

manjuraj commented Mar 7, 2014

yeah, I think (b) is good middle ground
lets wait for @liu21yd to run the applied patch and see what happens

@liu21yd
Copy link
Author

liu21yd commented Mar 7, 2014

@manjuraj ,@charsyam:

Unfortunately, we applied the patch you suggesting, and ran our tests, it was unuseful.

void
mbuf_put(struct mbuf *mbuf)
{
//    log_debug(LOG_VVERB, "put mbuf %p len %d", mbuf, mbuf->last - mbuf->pos);
//
//    ASSERT(STAILQ_NEXT(mbuf, next) == NULL);
//    ASSERT(mbuf->magic == MBUF_MAGIC);
//
//    nfree_mbufq++;
//    STAILQ_INSERT_HEAD(&free_mbufq, mbuf, next);
    mbuf_free(mbuf);
}

Besides, based on the above patch, we applied another patch, like this:

void
msg_put(struct msg *msg)
{
    log_debug(LOG_VVERB, "put msg %p id %"PRIu64"", msg, msg->id);
    while (!STAILQ_EMPTY(&msg->mhdr)) {
        struct mbuf *mbuf = STAILQ_FIRST(&msg->mhdr);
        mbuf_remove(&msg->mhdr, mbuf);
        mbuf_put(mbuf);
    }
//    nfree_msgq++;
//    TAILQ_INSERT_HEAD(&free_msgq, msg, m_tqe);
    msg_free(msg);
}

It also didn't work.

But, at this time, we checked and recorded the stats of twemproxy in the process of the test, this maybe could help you to analyze and fix this issue and give us some advice.

Fri Mar 7 19:05:51 CST 2014
{"service":"nutcracker", "source":"192.168.1.110", "version":"0.3.0", "uptime":60, "timestamp":1394190351, "writer": {"client_eof":0, "client_err":0, "client_connections":0, "server_ejects":0, "forward_error":0, "fragments":0, "192.168.1.110:6001": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":0, "server_ejected_at":0, "requests":0, "request_bytes":0, "responses":0, "response_bytes":0, "in_queue":0, "in_queue_bytes":0, "out_queue":0, "out_queue_bytes":0},"192.168.1.110:6002": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":0, "server_ejected_at":0, "requests":0, "request_bytes":0, "responses":0, "response_bytes":0, "in_queue":0, "in_queue_bytes":0, "out_queue":0, "out_queue_bytes":0}}}
Fri Mar 7 19:05:54 CST 2014
{"service":"nutcracker", "source":"192.168.1.110", "version":"0.3.0", "uptime":63, "timestamp":1394190354, "writer": {"client_eof":0, "client_err":0, "client_connections":240, "server_ejects":0, "forward_error":0, "fragments":0, "192.168.1.110:6001": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":1, "server_ejected_at":0, "requests":290165, "request_bytes":26812704, "responses":275245, "response_bytes":1133134, "in_queue":16, "in_queue_bytes":2736, "out_queue":14904, "out_queue_bytes":1389370},"192.168.1.110:6002": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":1, "server_ejected_at":0, "requests":236117, "request_bytes":22729182, "responses":235997, "response_bytes":975032, "in_queue":31, "in_queue_bytes":5189, "out_queue":89, "out_queue_bytes":9454}}}
Fri Mar 7 19:06:30 CST 2014
{"service":"nutcracker", "source":"192.168.1.110", "version":"0.3.0", "uptime":99, "timestamp":1394190390, "writer": {"client_eof":0, "client_err":0, "client_connections":400, "server_ejects":0, "forward_error":0, "fragments":0, "192.168.1.110:6001": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":1, "server_ejected_at":0, "requests":13205991, "request_bytes":1211720915, "responses":3418198, "response_bytes":14063533, "in_queue":9758660, "in_queue_bytes":895414027, "out_queue":29133, "out_queue_bytes":2270239},"192.168.1.110:6002": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":1, "server_ejected_at":0, "requests":6862631, "request_bytes":751595146, "responses":3256944, "response_bytes":13656769, "in_queue":3580334, "in_queue_bytes":394467044, "out_queue":25353, "out_queue_bytes":2776957}}}
Fri Mar 7 19:06:33 CST 2014
{"service":"nutcracker", "source":"192.168.1.110", "version":"0.3.0", "uptime":102, "timestamp":1394190393, "writer": {"client_eof":0, "client_err":0, "client_connections":400, "server_ejects":1, "forward_error":106460, "fragments":0, "192.168.1.110:6001": {"server_eof":0, "server_err":1, "server_timedout":0, "server_connections":0, "server_ejected_at":1394190392234820, "requests":13209106, "request_bytes":1211992601, "responses":3418198, "response_bytes":14063533, "in_queue":0, "in_queue_bytes":0, "out_queue":0, "out_queue_bytes":0},"192.168.1.110:6002": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":1, "server_ejected_at":0, "requests":7463043, "request_bytes":810447031, "responses":3421994, "response_bytes":14351538, "in_queue":4015739, "in_queue_bytes":434603174, "out_queue":25310, "out_queue_bytes":2988179}}}
Fri Mar 7 19:08:38 CST 2014
{"service":"nutcracker", "source":"192.168.1.110", "version":"0.3.0", "uptime":227, "timestamp":1394190518, "writer": {"client_eof":0, "client_err":0, "client_connections":400, "server_ejects":53, "forward_error":126961, "fragments":0, "192.168.1.110:6001": {"server_eof":0, "server_err":53, "server_timedout":0, "server_connections":0, "server_ejected_at":1394190510571647, "requests":14035498, "request_bytes":1287899296, "responses":3418198, "response_bytes":14063533, "in_queue":286718, "in_queue_bytes":26249247, "out_queue":0, "out_queue_bytes":0},"192.168.1.110:6002": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":1, "server_ejected_at":0, "requests":21767898, "request_bytes":2215822059, "responses":21031903, "response_bytes":87507329, "in_queue":702210, "in_queue_bytes":71225864, "out_queue":33785, "out_queue_bytes":3300524}}}

@mezzatto
Copy link
Contributor

mezzatto commented Mar 7, 2014

I ran into this same problem a year ago. Twemproxy used up to 10GB of RAM. By the time I think I was using -m 65535 (I guess, dont really remember the exact value). Changed to -m 16384 and never had the problem anymore.

Unfortunately I also dont remember the circumstances that triggered the issue... :(

@manjuraj
Copy link
Collaborator

manjuraj commented Mar 7, 2014

@liu21yd are you using -m 512? With the new patch, did the resident memory values drop after the test?

Also use the patch that I gave you

@idning
Copy link
Contributor

idning commented Mar 8, 2014

@liu21yd

we see that in uptime 99

your client has send 13,205,991+6,862,631 request to one nutcracker instance:

(13205991+6862631)/99. = 202713 request/s

it's absolutely overload.

and response cnt is 3,418,198+3,256,944, which means you got 13 millon request in queue

you got in_queue:

895414027+394467044=1.2G bytes 

but acutally the memory used is::

13 million * 512 = 6G bytes.

and in uptime 227, message in the in_queue is still:

(14035498-3418198) + (21767898-21031903) = 11353295

and your memory usage now is:

11353295*512 = 5812887040 (6G)

so you see you nutcracker instance consume 6-8G Memeory, and use 100% of CPU

your are abuse of nutcracker. advise:

  1. make sure your request - response not too large
    • always read response from nutcracker
    • control your qps.
  2. check your timeout config in nutcracker.conf, I think these message should be Timeout
  3. add more nutcracker instances (as I have said 2 days ago)

may be you need a deploy manager like this: https://github.com/idning/redis-mgr

@liu21yd
Copy link
Author

liu21yd commented Mar 8, 2014

@manjuraj I am using -m 512. The the resident memory values didn't drop when I stopped the test, and the same thing happened when using the patch you giving.

@liu21yd
Copy link
Author

liu21yd commented Mar 8, 2014

@idning,thank you so much for your analysis and advice. These may help us optimize our system and fix this issue.

@manjuraj
Copy link
Collaborator

manjuraj commented Mar 8, 2014

@idning nice analysis. Good job :)

@idning
Copy link
Contributor

idning commented Mar 9, 2014

@manjuraj

nutcracker always try to receive all data at the client side::

rstatus_t
core_core(void *arg, uint32_t events)
{
    /* read takes precedence over write */
    if (events & EVENT_READ) {
        status = core_recv(ctx, conn);      //call conn->recv (msg_recv)
        if (status != NC_OK || conn->done || conn->err) {
            core_close(ctx, conn);
            return NC_ERROR;
        }
    }
    ...
}

if the client write to the socket, it will always success, (something like redis-cli --pipe ) then message queued at nutcracker, and got timeouted.

the problem is that client do not know when to stop sending request,

I think we can add a config like max-queue, if nutcracker got too much request queued, it stop read at the client side, so the client will block on sending.

@manjuraj
Copy link
Collaborator

manjuraj commented Mar 9, 2014

@idning you would just be introducing max-queue config option to solve a benchmark usecase, which is not ideal. The right thing to do is fix benchmark or workaround using a timeout

@idning
Copy link
Contributor

idning commented Apr 1, 2014

I think that twemproxy should has some overload protection,

on attack or client-side bug, twemproxy will use a lot of memory, it's a risk if we deploy redis and twemproxy on on the same machine.

we shoule close connection when quota over-run

max-memory is better than max-queue

@crazyneo-xx
Copy link

Expecting max_memory option to be provided.
I was using one client connection for continuous message sending with no receive action at all(hiredis function modified), nc just stuck within minutes. Memory it used reached the system maximum and was never freed even when I close the connection, and also I can't have any other connection because of OOM issues.
I was wondering, that besides the max_memory option to protect user abusing or overload, should we consider a more graceful way, like trying maintaining a tailq for each connection while all tailqs belong to one memory pool? So that once a connection is closed, we might simply free all the chunk linked to the tailq?
It might be a little tricky to decide whether the connection was properly closed or just shortly unnormal state.

@yongman
Copy link

yongman commented Apr 24, 2019

@manjuraj ,@charsyam:

Unfortunately, we applied the patch you suggesting, and ran our tests, it was unuseful.

void
mbuf_put(struct mbuf *mbuf)
{
// log_debug(LOG_VVERB, "put mbuf %p len %d", mbuf, mbuf->last - mbuf->pos);
//
// ASSERT(STAILQ_NEXT(mbuf, next) == NULL);
// ASSERT(mbuf->magic == MBUF_MAGIC);
//
// nfree_mbufq++;
// STAILQ_INSERT_HEAD(&free_mbufq, mbuf, next);
mbuf_free(mbuf);
}
Besides, based on the above patch, we applied another patch, like this:

void
msg_put(struct msg *msg)
{
log_debug(LOG_VVERB, "put msg %p id %"PRIu64"", msg, msg->id);
while (!STAILQ_EMPTY(&msg->mhdr)) {
struct mbuf *mbuf = STAILQ_FIRST(&msg->mhdr);
mbuf_remove(&msg->mhdr, mbuf);
mbuf_put(mbuf);
}
// nfree_msgq++;
// TAILQ_INSERT_HEAD(&free_msgq, msg, m_tqe);
msg_free(msg);
}
It also didn't work.

But, at this time, we checked and recorded the stats of twemproxy in the process of the test, this maybe could help you to analyze and fix this issue and give us some advice.

Fri Mar 7 19:05:51 CST 2014
{"service":"nutcracker", "source":"192.168.1.110", "version":"0.3.0", "uptime":60, "timestamp":1394190351, "writer": {"client_eof":0, "client_err":0, "client_connections":0, "server_ejects":0, "forward_error":0, "fragments":0, "192.168.1.110:6001": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":0, "server_ejected_at":0, "requests":0, "request_bytes":0, "responses":0, "response_bytes":0, "in_queue":0, "in_queue_bytes":0, "out_queue":0, "out_queue_bytes":0},"192.168.1.110:6002": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":0, "server_ejected_at":0, "requests":0, "request_bytes":0, "responses":0, "response_bytes":0, "in_queue":0, "in_queue_bytes":0, "out_queue":0, "out_queue_bytes":0}}}
Fri Mar 7 19:05:54 CST 2014
{"service":"nutcracker", "source":"192.168.1.110", "version":"0.3.0", "uptime":63, "timestamp":1394190354, "writer": {"client_eof":0, "client_err":0, "client_connections":240, "server_ejects":0, "forward_error":0, "fragments":0, "192.168.1.110:6001": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":1, "server_ejected_at":0, "requests":290165, "request_bytes":26812704, "responses":275245, "response_bytes":1133134, "in_queue":16, "in_queue_bytes":2736, "out_queue":14904, "out_queue_bytes":1389370},"192.168.1.110:6002": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":1, "server_ejected_at":0, "requests":236117, "request_bytes":22729182, "responses":235997, "response_bytes":975032, "in_queue":31, "in_queue_bytes":5189, "out_queue":89, "out_queue_bytes":9454}}}
Fri Mar 7 19:06:30 CST 2014
{"service":"nutcracker", "source":"192.168.1.110", "version":"0.3.0", "uptime":99, "timestamp":1394190390, "writer": {"client_eof":0, "client_err":0, "client_connections":400, "server_ejects":0, "forward_error":0, "fragments":0, "192.168.1.110:6001": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":1, "server_ejected_at":0, "requests":13205991, "request_bytes":1211720915, "responses":3418198, "response_bytes":14063533, "in_queue":9758660, "in_queue_bytes":895414027, "out_queue":29133, "out_queue_bytes":2270239},"192.168.1.110:6002": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":1, "server_ejected_at":0, "requests":6862631, "request_bytes":751595146, "responses":3256944, "response_bytes":13656769, "in_queue":3580334, "in_queue_bytes":394467044, "out_queue":25353, "out_queue_bytes":2776957}}}
Fri Mar 7 19:06:33 CST 2014
{"service":"nutcracker", "source":"192.168.1.110", "version":"0.3.0", "uptime":102, "timestamp":1394190393, "writer": {"client_eof":0, "client_err":0, "client_connections":400, "server_ejects":1, "forward_error":106460, "fragments":0, "192.168.1.110:6001": {"server_eof":0, "server_err":1, "server_timedout":0, "server_connections":0, "server_ejected_at":1394190392234820, "requests":13209106, "request_bytes":1211992601, "responses":3418198, "response_bytes":14063533, "in_queue":0, "in_queue_bytes":0, "out_queue":0, "out_queue_bytes":0},"192.168.1.110:6002": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":1, "server_ejected_at":0, "requests":7463043, "request_bytes":810447031, "responses":3421994, "response_bytes":14351538, "in_queue":4015739, "in_queue_bytes":434603174, "out_queue":25310, "out_queue_bytes":2988179}}}
Fri Mar 7 19:08:38 CST 2014
{"service":"nutcracker", "source":"192.168.1.110", "version":"0.3.0", "uptime":227, "timestamp":1394190518, "writer": {"client_eof":0, "client_err":0, "client_connections":400, "server_ejects":53, "forward_error":126961, "fragments":0, "192.168.1.110:6001": {"server_eof":0, "server_err":53, "server_timedout":0, "server_connections":0, "server_ejected_at":1394190510571647, "requests":14035498, "request_bytes":1287899296, "responses":3418198, "response_bytes":14063533, "in_queue":286718, "in_queue_bytes":26249247, "out_queue":0, "out_queue_bytes":0},"192.168.1.110:6002": {"server_eof":0, "server_err":0, "server_timedout":0, "server_connections":1, "server_ejected_at":0, "requests":21767898, "request_bytes":2215822059, "responses":21031903, "response_bytes":87507329, "in_queue":702210, "in_queue_bytes":71225864, "out_queue":33785, "out_queue_bytes":3300524}}}

I think the reason memory not freed after mbuf_free(mbuf) is the malloc mechanism of brk resulting memory fragments. Call msg_free in msg_put will result in memory freed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants