-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak introduced by cold flush #2015
Comments
This is solid investigation, TY @Betula-L. There's definitely a clear issue here, we're testing your suggested fix, and also investigating if series themselves are being released properly as well in tickAndExpire (which could similarly cause the same leak if the read/write ref count never falls below 0). |
Sorry for the late reply. I run master version for a week. This memory leak is fixed by #2037. A comprehensive testing in our system is planing now. I will report the results if it made progress. |
Performance issues
What service is experiencing the performance issue? (M3Coordinator, M3DB, M3Aggregator, etc)
M3DB
Approximately how many datapoints per second is the service handling?
2889767 per minitue
What is the hardware configuration (number CPU cores, amount of RAM, disk size and types, etc) that the service is running on? Is the service the only process running on the host or is it colocated with other software?
40 core, 128G memory. m3db used about 100G memory.
What is the configuration of the service? Please include any YAML files, as well as namespace / placement configuration (with any sensitive information anonymized if necessary).
What should be it?
I observed m3db memory usage for last 4 days, and it always increased slow, finally my machine was OOM. Whereas, retentionPeriodDuration is only 48h.
I tried to root cause where memory leaks, and found something interesting.
PR #1624 is implemented to support cold flushes, since that series should be saved more time for cold flush function
Merge
.commit a115331 set tags NoFinalize, so series will not be gc immediately whether coldFlush is enabled or not.
m3/src/dbnode/storage/shard.go
Line 1100 in a115331
If
coldFlush
was enabled, tags will be Finalize() in functionMerge
eventually which is triggered by functionColdFlush
, but ifcoldFlush
was disabled, only functionWarmFlush
will be triggered. Thus, series will not be set Finalize() ifcoldFlush
was disabled.m3/src/dbnode/persist/fs/merger.go
Line 186 in 0c23eb6
To validate the below conclusion, i annotated this line, since i do not need save series if
coldFlush
was disabled, and run the hot-fix version about 6h. There is no significant memory increment anymore.Heap Profiles
Comparison at the same load:
Memory usage on m3dbnode v1.14.1:
Memory usage if annotated NoFinalize:
m3/src/dbnode/storage/shard.go
Line 1100 in a115331
The following files include heap dump for last 4day.
run-50m.zip
run-6h.zip
run-4d.zip
run-1d.zip
The text was updated successfully, but these errors were encountered: