Skip to content
This repository has been archived by the owner on Jul 13, 2023. It is now read-only.

determine cause of memory leak on pypy (especially connection) nodes #802

Closed
pjenvey opened this issue Feb 6, 2017 · 1 comment
Closed
Assignees
Labels

Comments

@pjenvey
Copy link
Member

pjenvey commented Feb 6, 2017

Connection nodes under heavy load in prod are steadily leaking too much memory. Endpoints seems to as well, albeit negligibly. We'll try to reproduce it locally, but we should additionally provide an API call (probably on a separate port, only accessible internally) to provide a tarfile dump of memory information useful for diagnosing the leak.

It may be a little taxing to grab this information on prod -- we'll have to keep that in mind -- we'll definitely only target one guinea pig node to query.

It'll include something like the output from objgraph's most common types, pypy's gc.dump_rpy_heap, and maybe a resource.getrusage.

@pjenvey pjenvey added this to the PUSHSVC-0: quality milestone Feb 6, 2017
@pjenvey pjenvey self-assigned this Feb 6, 2017
pjenvey added a commit that referenced this issue Feb 9, 2017
pjenvey added a commit that referenced this issue Feb 9, 2017
pjenvey added a commit that referenced this issue Feb 9, 2017
pjenvey added a commit that referenced this issue Feb 9, 2017
pjenvey added a commit that referenced this issue Feb 9, 2017
pjenvey added a commit that referenced this issue Feb 10, 2017
pjenvey added a commit that referenced this issue Feb 10, 2017
pjenvey added a commit that referenced this issue Feb 10, 2017
pjenvey added a commit that referenced this issue Feb 10, 2017
pjenvey added a commit that referenced this issue Feb 10, 2017
pjenvey added a commit that referenced this issue Feb 10, 2017
pjenvey added a commit that referenced this issue Mar 3, 2017
pjenvey added a commit that referenced this issue Mar 3, 2017
@pjenvey pjenvey added p2 p1 and removed p1 labels May 3, 2017
pjenvey added a commit that referenced this issue Jun 19, 2017
pjenvey added a commit that referenced this issue Jun 19, 2017
pjenvey added a commit that referenced this issue Jun 19, 2017
@pjenvey
Copy link
Member Author

pjenvey commented Apr 24, 2018

Connection node's memory usage is still higher than we should expect but the introduction of MALLOC_ARENA_MAX=2 (see mozilla-services/push-service#79 for more information) appears to have (so far) leveled off the memory, and at a slightly smaller peak than previous deploys w/out the value.

With the intent to replace autopush w/ autopush_rs, closing this out as "good enough" for now. Much of the culprit was w/ this and #917 was glibc's malloc and how it interacts w/ pypy.

We should continue to play around w/ external malloc settings and analyze nodes w/ the new pypy gc stats API slated for release in pypy 6.0 (to be released very soon). We'll track that progress in both mozilla-services/push-service#79 and https://bugzilla.mozilla.org/show_bug.cgi?id=1442705

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants