-
-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extremely high memory consumption #303
Comments
A |
Patient: "Doc, my knee is more then twice the normal size." |
@mity I don't understand your comment. Obviously a processor that doesn't construct an AST (such as md4c) could avoid the overhead for an AST node for each list item. But given that we do construct an AST, the only way to slim down would be to reduce the memory per node. 136 bytes isn't much. We've got to store pointers to parents and siblings, source location, node type, and information about the list item type. Do you have a concrete suggestion for improvement? |
Well. it does seem a way lot to me. But if I am the only one, feel free to close. |
If this became a problem in practice, we could look into ways to reduce the bytes/node. But I'd be surprised if we could trim down the overhead for a node by more than 40% or so. I think we just have to accept that if we're building an AST with the structure of our AST, documents with a lot of nodes are going to take a lot of memory. Anyway, this remains pretty theoretical; I've yet to see anyone have a problem in practice with memory usage. A normal sized book is going to be, what, 500K? (with much less density of nodes than your example). |
To others: it looks like #326 reduced the size of |
#446 reduces the size further to 104 bytes (13 64-bit words). Everything is neatly packed now and all the low-hanging fruit are picked. There are quite a few optimizations that would require additional and intrusive changes. The absolute minimum required seems to be:
This would mean ~6 words (48 bytes) for simple nodes without line numbers, ~10 words (80 bytes) for larger node types. |
Consider the command
It generates a stream of 8004001 bytes (approx. 7.6 MB):
When I measure cmark's memory consumption for processing such input, I can see this:
If I compare heap peak (1631967532 bytes; or 1556.3 MB) and input size (8004001 bytes; or 7.6 MB), I can see cmark allocates more then 200 times the size of the input on the heap. Such factor seems extremely high to me, even when taking into account cmark builds complete AST.
(EDIT: Tested on 64-bit Linux)
The text was updated successfully, but these errors were encountered: