Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serious memory issues since mirage-3.7 update #93

Closed
xaki23 opened this issue Jan 30, 2020 · 10 comments · Fixed by #95
Closed

serious memory issues since mirage-3.7 update #93

xaki23 opened this issue Jan 30, 2020 · 10 comments · Fixed by #95

Comments

@xaki23
Copy link
Contributor

xaki23 commented Jan 30, 2020

i am running into some serious memory problems since the update.
afaict some part of "on new downstream vm" tries to allocate 4MB of unfragmented/aligned memory and fails.

before the update a mirage-fw with 40MB of ram was working "just fine" even with a lot of downstream VMs.
after the update even a m-fw with a single downstream VM starts failing quickly (after 1-3 vm (re)starts).

on new-vm-starts the fail in m-fw log looks like this:

2020-01-28 04:24:37 -00:00: INF [client_net] Client 4295 (IP: 10.137.0.26) ready
2020-01-28 04:24:37 -00:00: INF [ethernet] Connected Ethernet interface fe:ff:ff:ff:ff:ff
2020-01-28 04:24:37 -00:00: WRN [client_net] Error with client {domid=4295;device_id=0}: Out of memory

there seems to be some "leakage" too, as in the 4MB dont get cleaned up every time the downstream vm shuts down. but sometimes it gets freed up again after a while.

usecase "single downstream vm that gets restarted every hour", cranked up the mem from 40 to 100 to make it more visible:

2020-01-29 04:25:47 -00:00: INF [memory_pressure] Writing meminfo: free 66452 / 98516 kB (67.45 %)
2020-01-29 08:00:01 -00:00: INF [memory_pressure] Writing meminfo: free 62356 / 98516 kB (63.30 %)
2020-01-29 09:25:20 -00:00: INF [memory_pressure] Writing meminfo: free 58260 / 98516 kB (59.14 %)
2020-01-29 10:24:48 -00:00: INF [memory_pressure] Writing meminfo: free 54164 / 98516 kB (54.98 %)
2020-01-29 11:25:45 -00:00: INF [memory_pressure] Writing meminfo: free 74652 / 98516 kB (75.78 %)
2020-01-29 12:25:32 -00:00: INF [memory_pressure] Writing meminfo: free 70548 / 98516 kB (71.61 %)
2020-01-29 13:26:16 -00:00: INF [memory_pressure] Writing meminfo: free 66460 / 98516 kB (67.46 %)
2020-01-29 14:24:48 -00:00: INF [memory_pressure] Writing meminfo: free 62364 / 98516 kB (63.30 %)
2020-01-29 15:25:41 -00:00: INF [memory_pressure] Writing meminfo: free 58264 / 98516 kB (59.14 %)
2020-01-29 16:29:02 -00:00: INF [memory_pressure] Writing meminfo: free 54148 / 98516 kB (54.96 %)
2020-01-29 17:25:12 -00:00: INF [memory_pressure] Writing meminfo: free 45980 / 98516 kB (46.67 %)
2020-01-29 18:26:29 -00:00: INF [memory_pressure] Writing meminfo: free 74632 / 98516 kB (75.76 %)
2020-01-29 19:26:40 -00:00: INF [memory_pressure] Writing meminfo: free 70556 / 98516 kB (71.62 %)
2020-01-29 21:26:51 -00:00: INF [memory_pressure] Writing meminfo: free 66460 / 98516 kB (67.46 %)
2020-01-30 01:26:06 -00:00: INF [memory_pressure] Writing meminfo: free 62364 / 98516 kB (63.30 %)
2020-01-30 08:00:01 -00:00: INF [memory_pressure] Writing meminfo: free 58260 / 98516 kB (59.14 %)
2020-01-30 09:23:58 -00:00: INF [memory_pressure] Writing meminfo: free 54164 / 98516 kB (54.98 %)
2020-01-30 12:27:33 -00:00: INF [memory_pressure] Writing meminfo: free 45980 / 98516 kB (46.67 %)
2020-01-30 13:24:25 -00:00: WRN [client_net] Error with client {domid=4376;device_id=0}: Out of memory
2020-01-30 14:25:01 -00:00: WRN [client_net] Error with client {domid=4377;device_id=0}: Out of memory
2020-01-30 16:23:24 -00:00: INF [memory_pressure] Writing meminfo: free 85900 / 98516 kB (87.19 %)
2020-01-30 17:25:14 -00:00: INF [memory_pressure] Writing meminfo: free 75140 / 98516 kB (76.27 %)
2020-01-30 18:24:16 -00:00: INF [memory_pressure] Writing meminfo: free 70868 / 98516 kB (71.94 %)
2020-01-30 19:27:08 -00:00: INF [memory_pressure] Writing meminfo: free 66764 / 98516 kB (67.77 %)

note:

  • how the free mem gets lower in about 4MB chunks.
  • how after some hours sometimes it bounces back up.

in another usecase (more client vms, less restarts) i set memory to 200MB and it failed after about a day while still reporting more than 65MB free.

@xaki23
Copy link
Contributor Author

xaki23 commented Jan 30, 2020

... and a ping to @hannesm and @talex5

@xaki23
Copy link
Contributor Author

xaki23 commented Jan 30, 2020

(some talk with @hannesm on irc later)

the build is currently ocaml 4.08.1 with netchannel and mirage-net-xen both pinned to --dev

i added a call to Gc.compact() in front of the mempressure reporting code, built+restarted, lets see if that makes any difference.

@xaki23
Copy link
Contributor Author

xaki23 commented Jan 31, 2020

so, the "forced Gc.compact()" version has been running for 20h now, and it seems to make a difference:

.[32;1mMirageOS booting....[0m
2020-01-30 23:08:23 -00:00: INF [memory_pressure] Writing meminfo: free 85900 / 98516 kB (87.19 %)
2020-01-31 00:02:28 -00:00: INF [memory_pressure] Writing meminfo: free 68932 / 98516 kB (69.97 %)
2020-01-31 01:09:16 -00:00: INF [memory_pressure] Writing meminfo: free 72972 / 98516 kB (74.07 %)
2020-01-31 02:27:58 -00:00: INF [memory_pressure] Writing meminfo: free 72900 / 98516 kB (74.00 %)
2020-01-31 03:27:02 -00:00: INF [memory_pressure] Writing meminfo: free 72908 / 98516 kB (74.01 %)
2020-01-31 04:26:10 -00:00: INF [memory_pressure] Writing meminfo: free 72900 / 98516 kB (74.00 %)
2020-01-31 05:28:58 -00:00: INF [memory_pressure] Writing meminfo: free 72908 / 98516 kB (74.01 %)
2020-01-31 06:25:31 -00:00: INF [memory_pressure] Writing meminfo: free 72908 / 98516 kB (74.01 %)
2020-01-31 07:26:16 -00:00: INF [memory_pressure] Writing meminfo: free 72900 / 98516 kB (74.00 %)
2020-01-31 08:26:12 -00:00: INF [memory_pressure] Writing meminfo: free 72908 / 98516 kB (74.01 %)
2020-01-31 10:12:04 -00:00: INF [memory_pressure] Writing meminfo: free 64716 / 98516 kB (65.69 %)
2020-01-31 11:27:28 -00:00: INF [memory_pressure] Writing meminfo: free 72860 / 98516 kB (73.96 %)
2020-01-31 12:27:09 -00:00: INF [memory_pressure] Writing meminfo: free 72856 / 98516 kB (73.95 %)
2020-01-31 13:26:15 -00:00: INF [memory_pressure] Writing meminfo: free 72868 / 98516 kB (73.97 %)
2020-01-31 14:26:28 -00:00: INF [memory_pressure] Writing meminfo: free 72860 / 98516 kB (73.96 %)
2020-01-31 15:26:30 -00:00: INF [memory_pressure] Writing meminfo: free 72860 / 98516 kB (73.96 %)
2020-01-31 16:24:36 -00:00: INF [memory_pressure] Writing meminfo: free 72868 / 98516 kB (73.97 %)
2020-01-31 17:26:18 -00:00: INF [memory_pressure] Writing meminfo: free 72860 / 98516 kB (73.96 %)
2020-01-31 18:26:39 -00:00: INF [memory_pressure] Writing meminfo: free 72860 / 98516 kB (73.96 %)

the dips to 65 and 69MB seems to have been at time where i was manually messing around with the downstream VM.
besides that, it seems to be hovering pretty solidly at 73MB.

will deploy in a "many clients" role next.

@talex5
Copy link
Collaborator

talex5 commented Feb 1, 2020

By default, OCaml increases its heap size in 15% increments: https://github.com/ocaml/ocaml/blob/5ad64306d36755b600f2556805effb73627508c8/runtime/caml/config.h#L216. That's about 4 MB for your heap size.

The main thing that changed recently was the adding the IP fragments cache:

let fragment_cache = Fragments.Cache.create (256 * 1024) in

256 * 1024 entries per client does seem like a lot. I guess it's a few MB of extra memory per client? What happens if you reduce that number?

@hannesm
Copy link
Member

hannesm commented Feb 1, 2020

256 * 1024 entries per client does seem like a lot

the 256 * 1024 is the capacity in bytes (see the weight function at https://github.com/mirage/mirage-tcpip/blob/v4.0.0/src/ipv4/fragments.ml#L55) of the fragmentation cache. this is maximum of 256kB per client. i.e. if you've clients not doing fragmentation, the overhead will be rather small.

I guess it's a few MB of extra memory per client?

I'm not sure how 256 * 1024 will be "a few MB of extra memory per client", sorry.

to me, it rather looks like previously the io-page allocator https://github.com/mirage/io-page/blob/v2.3.0/lib/io_page.ml#L36-L45 got called on more call sites (which includes the hack to call Gc.compact if allocation raises out of memory). This is coherent with #93 (comment) where adding a call to Gc.compact () to every memory printout avoids the out of memory issue.

@talex5
Copy link
Collaborator

talex5 commented Feb 1, 2020

Hmm. Here's a test program:

#require "tcpip.ipv4";;
#require "fmt";;

let show_heap () =
  let stat = Gc.stat () in
  Fmt.pr "Heap size = %a@." Fmt.byte_size (stat.Gc.live_words * 8)

let n = 1024

let () =
  show_heap ();
  let x = List.init n (fun _ -> Fragments.Cache.create (256*1024)) in
  Gc.full_major ();
  show_heap ();
  ignore (Sys.opaque_identity x)

It creates 1024 caches. It prints:

$ utop test.ml
Heap size = 14MB
Heap size = 2.16GB

That's about 2MB per cache by my calculation. Which is roughly what I'd expect since Fragments.Cache.create uses Lru.M to create a hashtable of that size.

I seem to recall there are some problems with how OCaml calculates when it needs to do a GC when lots of external mallocs being used. This is why the firewall sometimes does a Gc manually, which shouldn't really be necessary. Large hashtables cause problems because the firewall can run out of memory suddenly (either on creation or on resize). We used to have this problem with mirage-nat before it switched to a different representation IIRC.

@hannesm
Copy link
Member

hannesm commented Feb 1, 2020

@talex5 thanks for your example code.

that's quite surprising behaviour of Lru.M to me -- I had the intuition of Lru.F in mind (which I originally used to implement the Fragments cache) that create an empty map in create, and use the provided capacity only as an indicator when to drop elements.

with the given behaviour that a hashtbl of that size is allocated, I can only repeat from #89:

Fragmentation and reassembly: each network interface now has a reassembly cache, a LRU, its size is hardcoded to 256 * 1024 bytes. I am not sure whether this should instead (a) be user-configurable (b) default to a smaller size. Maybe the individual client networks (Client_net) should have a smaller cache, and the Uplink a slightly larger?

(or, alternatively adjust/PR Lru.M to allocate a small hashtbl on create, and use cap as capacity)

@hannesm
Copy link
Member

hannesm commented Feb 1, 2020

or -- given that rather unfriendly Hashtbl memory behaviour (on grow/resize or create), revert the Fragments implementaion to use a Lru.F (this is my current favourite)..

@talex5
Copy link
Collaborator

talex5 commented Feb 5, 2020

Yeah, getting rid of the hash table sounds like the best solution to me!

hannesm added a commit to hannesm/opam-repository that referenced this issue Feb 8, 2020
CHANGES:

* Revert "Ipv4.Fragments use a Lru.M.t instead of Lru.F.t" (mirage/mirage-tcpip#423 by @hannesm)
  A Lru.M.t allocates a Hashtbl.t of size = capacity (= 256 * 1024 in our case),
  this leads to excessive ~2MB memory consumption for each Fragment cache,
  reported by @xaki23 in mirage/qubes-mirage-firewall#93
* use SOCK_RAW for an ICMP socket in the unix sockets API (previously used
  SOCK_DGRAM which did not work)
  reported by @justinc1 in mirage/mirage-tcpip#358, fixed in mirage/mirage-tcpip#424 by @hannesm
* tcp is now compatible with lwt >= 5.0.0 (where Lwt.async requires a function
  of (unit -> unit Lwt.t) (mirage/mirage-tcpip#370 mirage/mirage-tcpip#425 @cfcs @hannesm, issue mirage/mirage-tcpip#392 @emillon)
* Add a dependency on dune-configurator to support dune 2.0.0 (mirage/mirage-tcpip#421 @avsm)
@xaki23
Copy link
Contributor Author

xaki23 commented Feb 8, 2020

built with the two pins above and the matching q-m-fw branch.
build works,
basic functionality works,
memory situation looks much better (even without the "forced Gc.compact() on every mem-stats print")

so +1 from me for the whole set.
even if there are any non-obvious gremlins lurking, it works better than the builds from the last 2-3 weeks.

mgree pushed a commit to mgree/opam-repository that referenced this issue Feb 19, 2020
CHANGES:

* Revert "Ipv4.Fragments use a Lru.M.t instead of Lru.F.t" (mirage/mirage-tcpip#423 by @hannesm)
  A Lru.M.t allocates a Hashtbl.t of size = capacity (= 256 * 1024 in our case),
  this leads to excessive ~2MB memory consumption for each Fragment cache,
  reported by @xaki23 in mirage/qubes-mirage-firewall#93
* use SOCK_RAW for an ICMP socket in the unix sockets API (previously used
  SOCK_DGRAM which did not work)
  reported by @justinc1 in mirage/mirage-tcpip#358, fixed in mirage/mirage-tcpip#424 by @hannesm
* tcp is now compatible with lwt >= 5.0.0 (where Lwt.async requires a function
  of (unit -> unit Lwt.t) (mirage/mirage-tcpip#370 mirage/mirage-tcpip#425 @cfcs @hannesm, issue mirage/mirage-tcpip#392 @emillon)
* Add a dependency on dune-configurator to support dune 2.0.0 (mirage/mirage-tcpip#421 @avsm)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants