-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory measurements - weird behaviour #205
Comments
I haven't had time to look at this yet, I'm sorry. But something that might be affecting the result is the size of the function objects themselves. I cannot explain how the |
@michalmuskala 👋 Thanks and no worries or hurry. It's awesome and amazing any time you manage to make time and help us by looking at stuff, and if you don't get to it that's also cool. You already to a ton of things and in the end we're all human and do this in our free time. We do what we can and that's already 🌟 🌟 🌟 |
Changed milestone, we didn't get to it in time and from the basics it seems like there might be some optimizations going on underneath. We should still look into it but I also wanna get memory measurements out there and people can test it and see if they get any errors or problems. It seems so far that this weirdness only happens for small/static data. It's unlikely to affect real world examples and might also be correct. Also we're still 0.x and I shouldn't cling to my perfectionism as much 😎 |
I tried looking more into this and I have no idea what is happening. In general - a lot of the overhead we're seeing there is the measurement itself, the calls to |
I have a feeling maybe this was resolved with #216? Maybe you should look at this one more time now that that's hit master. |
Sadly not, for reasons unknown the 10 case for instance is still weird with less memory etc:
|
So I've confirmed that none of these functions trigger even a minor GC (as most of them allocate no data, this was expected). This means all memory usage for these functions comes from |
All the 616B cases just optimize to a literal (I decompiled and checked them) - it's still weird to me that |
Ok, I have a new weird thing that's going on. I was looking at the "before" and "after" measurements that we're taking with
What jumped out to me was that not only was an Is it possible we're missing GC events somehow? Here's the modifications I made to check out what's happening: defmodule Benchee.Benchmark.Measure.Memory do
@moduledoc false
# Measure memory consumption of a function.
#
# Returns `{nil, return_value}` in case the memory measurement went bad.
@behaviour Benchee.Benchmark.Measure
def measure(fun) do
ref = make_ref()
Process.flag(:trap_exit, true)
start_runner(fun, ref)
receive do
{^ref, memory_usage_info} -> return_memory(memory_usage_info)
:shutdown -> nil
end
end
defp start_runner(fun, ref) do
parent = self()
spawn_link(fn ->
printer = start_printer()
tracer = start_tracer(self(), printer)
try do
memory_usage_info = measure_memory(fun, tracer, printer)
send(parent, {ref, memory_usage_info})
catch
kind, reason -> graceful_exit(kind, reason, tracer, parent)
after
send(tracer, :done)
end
end)
end
defp start_printer() do
spawn(fn -> printer_loop() end)
end
defp printer_loop() do
receive do
info ->
IO.inspect(info)
printer_loop()
end
end
defp return_memory({memory_usage, result}) when memory_usage < 0, do: {nil, result}
defp return_memory({memory_usage, result}), do: {memory_usage, result}
defp measure_memory(fun, tracer, printer) do
word_size = :erlang.system_info(:wordsize)
{:garbage_collection_info, heap_before} = Process.info(self(), :garbage_collection_info)
send(printer, heap_before)
result = fun.()
{:garbage_collection_info, heap_after} = Process.info(self(), :garbage_collection_info)
send(printer, heap_after)
mem_collected = get_collected_memory(tracer)
memory_used =
(total_memory(heap_after) - total_memory(heap_before) + mem_collected) * word_size
{memory_used, result}
end
@spec graceful_exit(Exception.kind(), any(), pid(), pid()) :: no_return
defp graceful_exit(kind, reason, tracer, parent) do
send(tracer, :done)
send(parent, :shutdown)
stacktrace = System.stacktrace()
IO.puts(Exception.format(kind, reason, stacktrace))
exit(:normal)
end
defp get_collected_memory(tracer) do
ref = Process.monitor(tracer)
send(tracer, {:get_collected_memory, self(), ref})
receive do
{:DOWN, ^ref, _, _, _} -> nil
{^ref, collected} -> collected
end
end
defp start_tracer(pid, printer) do
spawn(fn ->
:erlang.trace(pid, true, [:garbage_collection, tracer: self()])
tracer_loop(pid, 0, printer)
end)
end
defp tracer_loop(pid, acc, printer) do
receive do
{:get_collected_memory, reply_to, ref} ->
send(reply_to, {ref, acc})
{:trace, ^pid, :gc_minor_start, info} ->
send(printer, {:gc_minor_start, info})
listen_gc_end(pid, :gc_minor_end, acc, total_memory(info), printer)
{:trace, ^pid, :gc_major_start, info} ->
send(printer, {:gc_major_start, info})
listen_gc_end(pid, :gc_major_end, acc, total_memory(info), printer)
:done ->
exit(:normal)
other ->
send(printer, other)
tracer_loop(pid, acc, printer)
end
end
defp listen_gc_end(pid, tag, acc, mem_before, printer) do
receive do
{:trace, ^pid, ^tag, info} ->
send(printer, {tag, info})
mem_after = total_memory(info)
tracer_loop(pid, acc + mem_before - mem_after, printer)
other ->
send(printer, other)
tracer_loop(pid, acc, printer)
end
end
defp total_memory(info) do
# `:heap_size` seems to only contain the memory size of the youngest
# generation `:old_heap_size` has the old generation. There is also
# `:recent_size` but that seems to already be accounted for.
Keyword.fetch!(info, :heap_size) + Keyword.fetch!(info, :old_heap_size)
end
end |
Can't we get the count of how many GCs happened? Other thean that... you are right... are we sure that old heap only comes with a full GC? I think so just asking... but I think yes only during the GC are things copied into the old_heap... Unsure. Code looks ok to me... can't we just So yeah, not sure :) Thanks for all the work! |
You're right - it's not that stuff is only moved to the old heap during a major GC, it's that the old heap is only collected during a major GC. But, yeah, the only way (to my knowledge) for something to get moved to the old heap is through a GC run. I didn't want to do any IO in the runner process because it uses memory, and these weird things are only happening (that I know of) in scenarios with very little memory use. I'll see if there's some way to get a count of GC runs before the runner process exits, that's a good idea! |
This is resolved by #239 |
(builds on top of #204 for now)
So I took the fast functions example and wanted to see what the general overhead might be like:
Much to my own surprise the results look like this:
WHAT???
Enum.map(10)
stickst out and consumes less memory, although it's the only one that should actually generate any data. (I assume that the others could/should be compiler optimizations as they are clearly static)I don't know. Might be we're missing to collect some memory somewhere or it might just be weird optimization around that I'm unaware which show up in these very micro examples.
We don't need to solve this (if it is solvable) before we release 0.13 but I at least want to have a look at it.
doing @michalmuskala for inside and BEAM knowledge, @devonestes cause 🐰 )+ initial memory implementation research etc. etc.) :D
The text was updated successfully, but these errors were encountered: