-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Space Leak in deployed nodes #370
Space Leak in deployed nodes #370
Comments
Staging snapshot shared on slack https://files.slack.com/files-pri/T0N639Z4N-FR64QTEJH/staging-gc-bytes.png 2 Dec 2019 |
I have looked into this and have found the following.
My conclusion: the memory leak is in the logging/monitoring code, not network or consensus. I think the right people from the logging/monitoring should investigate this further. For example, if the leak is inside a tracer, they could start by disabling some tracers until they find the one leaking memory. |
Not that the problem DevOps diagnosed is that we're leaking stack space which should not be confused with memory space (the former is a subset of the latter, the latter also includes heap space). This measurement is too short to say anything about the slow but steady leaking of stack space. |
The local `qProc` function in `Cardano.BM.Backend.Switchboard` loops by calling itself recursively, passing in the same `MVar MessageCounter` each time. However, `MessageCounter` was missing a bang on its `mcCountersMap` field, which contains `HM.HashMap Text Word64`. Even though the `HashMap` is a strict one, if you don't force it, you're still accumulating thunks. And as the `MVar` containing the `MessageCounter` was passed recursively, this resulted in a stack overflow instead of running out of (heap) memory. Fix it by adding the missing bang. This should fix IntersectMBO/cardano-node#370.
479: Fix stack overflow by adding a missing bang r=dcoutts a=mrBliss The local `qProc` function in `Cardano.BM.Backend.Switchboard` loops by calling itself recursively, passing in the same `MVar MessageCounter` each time. However, `MessageCounter` was missing a bang on its `mcCountersMap` field, which contains `HM.HashMap Text Word64`. Even though the `HashMap` is a strict one, if you don't force it, you're still accumulating thunks. And as the `MVar` containing the `MessageCounter` was passed recursively, this resulted in a stack overflow instead of running out of (heap) memory. Fix it by adding the missing bang. This should fix IntersectMBO/cardano-node#370. Co-authored-by: Thomas Winant <[email protected]>
479: Fix stack overflow by adding a missing bang r=CodiePP a=mrBliss The local `qProc` function in `Cardano.BM.Backend.Switchboard` loops by calling itself recursively, passing in the same `MVar MessageCounter` each time. However, `MessageCounter` was missing a bang on its `mcCountersMap` field, which contains `HM.HashMap Text Word64`. Even though the `HashMap` is a strict one, if you don't force it, you're still accumulating thunks. And as the `MVar` containing the `MessageCounter` was passed recursively, this resulted in a stack overflow instead of running out of (heap) memory. Fix it by adding the missing bang. This should fix IntersectMBO/cardano-node#370. Co-authored-by: Thomas Winant <[email protected]>
The local `qProc` function in `Cardano.BM.Backend.Switchboard` loops by calling itself recursively, passing in the same `MVar MessageCounter` each time. However, `MessageCounter` was missing a bang on its `mcCountersMap` field, which contains `HM.HashMap Text Word64`. Even though the `HashMap` is a strict one, if you don't force it, you're still accumulating thunks. And as the `MVar` containing the `MessageCounter` was passed recursively, this resulted in a stack overflow instead of running out of (heap) memory. Fix it by adding the missing bang. This should fix IntersectMBO/cardano-node#370. Signed-off-by: Alexander Diemand <[email protected]>
We're seeing a space leak in deployed nodes. The nodes get OOM killed as can be seen in the memory graph here: https://monitoring.awstest.iohkdev.io/grafana/d/Oe0reiHef/cardano-application-metrics-v2?orgId=1&refresh=1m
The text was updated successfully, but these errors were encountered: