-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about how pool.zig handles overflow condition, and possible source of a memory leak #17
Comments
Running with this logic for a while, and see what happens : pool : listener : |
Ah … I see the error of my thinking now :) was expecting symmetry in the alloc/free when reading the code, but that is not necessarily the case. Interesting. |
I'm kind of waiting for a generic pool to land in There is a multi-threaded unit test that tries to fuzz the pool a little. I've looked the code over and over (I've always had some reservations about it), but it seems fine. I like it because it's simple and compact (a small array). I have considered making it blocking. The pools for my sqlite and duckdb libraries are similar, but blocking (https://github.com/karlseguin/zqlite.zig/blob/63e2e3bc29877ed6a72e819e74145cedf5066e14/zqlite.zig#L624) and thus enforce a maximum. Maybe opt-in. If you |
Excellent, thanks Yeah, what I did for now was extend my http.zig version to pass in a flag to the pool to tell it either grow on demand, or refuse the connection if the pool is full. I guess the library could introduce a range of flags in the config to allow fine tuning explicit behaviour. Local testing with hitting it hard, can see it refusing connections rather than allocating more resReq pairs when it gets very full. This is perfect for what I have in mind using this lib, and effectively blocks the sort of DOS probes that I am seeing on AWS. Running that up with these options for the next week, expecting to get much closer to completely flat line with no leaks :) Also played with my app a little bit - shrunk down some of the audio assets, and hand-tuned the buffer sizes to reduce any wasted space. Whole application + webserver consumes just over 24MB, and doesn't grow. Quite amazing really :) |
I'm testing the pool_limit branch. Rather than a single
If timeout is exceeded, the server will reply with a 503. I updated the readme to explain this a little better. To me, the important point is: it's hard to know what to set these to. For apps that aren't waiting on IO, the best performance will come when min == max == # of cores. But with HTTP keepalive, this can easily result in 503s. I suspect (and hope), most people are running this behind a reverse proxy like NGINX for TLS termination. So this adds another dimension since the keepalive can now become long lived. I believe it's better to manage as much of this in NGINX / the reverse proxy as possible. Timeouts, keepalive counts, limits can all be better tuned in NGINX. When async lands, I'm willing to look at all those things in more detail. I'll merge into master later this week if it seems to be working ok. |
Cool, that sounds well thought out. I will give it a run. Im starting to think that what Im seeing as a slow "memory leak" may actually be memory fragmentation using the generalPurposeAllocator + arena. I have tuned my app so that there are no explicit allocations + it rejects new connections when the pool is full. It climbs a small amount each day ... and dips every now and then as memory is released. Overall trend is to grow at a slow but steady rate. Measuring max_rss before and after each handleConnection, and logging any growth (yeah I know - the overhead of instrumenting an app can also be a cause of memory consumption too !!) - what Im seeing is every now and then it will grab 12kb or 16kb during a burst of garbage requests that all terminate with a 404. I dont know yet whether those garbage requests contain large payloads .... not measuring that yet. We are only talking about 1-2 MB a day though. Not sure, its hard to tell exactly. What I might do about them is to auto-ban IP addresses that request a lot of garbage URLs ... just for fun. Whats the best way do you think to compute the remote IP address of the client ? |
I merged the branch. I added Though, if you have a reverse proxy infront of http.zig, this will only give you the remove proxy's address, since the TCP connection is between that proxy and the http.zig listener. Normally this is solved by having the reverse proxy forward the client IP in a header, like |
Also - with the nginx frontending thing, thats not really an option of me in my intended use case for the library. I will be using it as an embedded webserver inside a GUI app for a multi-player system that runs on the user's machine, and serves local players via a web client on the local network. I can live without TLS termination for a while I think. Would be a nice-to-have to add TLS termination into the http library, though that is non-trivial to achieve from what I have seen. |
ziglang/zig#14171 is the issue to watch for. Once that arrives, should be able to add TLS. |
What happens when the system is very busy, and the reqResPool in listener.zig gets full ?
Looks like it allocates a new RequestResponsePair in
acquire()
... but doesn't add that to the pool's slice of re-usable items.When
release()
gets called on the newly allocated RequestResponsePair, it uses some logic to say that if the pool is fully available / unused (available == items.len) then ... deinit() this resource.Don't know, but maybe that logic should be "if (available == 0)" ?
I think what it is trying to do there is say "IF this item looks like it was newly allocated, because there were no free slots during the call to acquire() ... then assume that case, and then deinit this item"
I think this would become a problem as soon as you have > 1 item that gets acquired after overflow during a burst, and then working out what to do to release them. Whilst the release() code can detect that the pool has overflowed, it has no way of knowing how many new resources are spawned, and therefore has no safe way of knowing how many to deinit()
not sure .. its a bit mind bending :)
Maybe, if that is the intention, then doing this might be cleaner :
That would handle the case where there is a large burst of requests during a pool overflow, and safely clean them all up.
If you try this, then you can see that its definitely leaking acquired resources too :
create an endpoint that does a short delay before returning. ie - sleep(1) then return
set the pool size to 100, then hit it with 100 parallel requests. Observe that there are no leaks
set the pool size to 10, then hit it with 100 parallel requests. Observe that the 90 newly allocated RequestResponsePair resources are never freed
The text was updated successfully, but these errors were encountered: