Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely high memory consumption #303

Closed
mity opened this issue Jun 8, 2019 · 7 comments
Closed

Extremely high memory consumption #303

mity opened this issue Jun 8, 2019 · 7 comments

Comments

@mity
Copy link
Contributor

mity commented Jun 8, 2019

Consider the command

$ python3 -c 'N=2000; print(("* " * N + "x\n") * N)'

It generates a stream of 8004001 bytes (approx. 7.6 MB):

$ python3 -c 'N=2000; print(("* " * N + "x\n") * N)' | wc -c
8004001

When I measure cmark's memory consumption for processing such input, I can see this:

$ python3 -c 'N=2000; print(("* " * N + "x\n") * N)' | memusage ./cmark >/dev/null

Memory usage summary: heap total: 1727979956, heap peak: 1631967532, stack peak: 8704
         total calls   total memory   failed calls
 malloc|          2           8192              0
realloc|    8000045      543699080              0  (nomove:27, dec:0, free:0)
 calloc|   12002008     1184272684              0
   free|   20002013     1727971764
Histogram for block sizes:
    0-15              2  <1% 
   16-31        4000001  19% ========================
   32-47              1  <1% 
   48-63        8000006  39% =================================================
   80-95              1  <1% 
  128-143       8002005  40% ==================================================
  144-159             1  <1% 
  208-223             1  <1% 
  320-335             1  <1% 
  384-399             1  <1% 
  480-495             1  <1% 
  736-751             1  <1% 
 1104-1119            1  <1% 
 1664-1679            1  <1% 
 2512-2527            1  <1% 
 3776-3791            1  <1% 
 4096-4111            2  <1% 
 5664-5679            1  <1% 
 6000-6015            2  <1% 
 8512-8527            1  <1% 
12768-12783           1  <1% 
19168-19183           1  <1% 
28752-28767           1  <1% 
43136-43151           1  <1% 
64720-64735           1  <1% 
   large             18  <1%

If I compare heap peak (1631967532 bytes; or 1556.3 MB) and input size (8004001 bytes; or 7.6 MB), I can see cmark allocates more then 200 times the size of the input on the heap. Such factor seems extremely high to me, even when taking into account cmark builds complete AST.

(EDIT: Tested on 64-bit Linux)

@nwellnhof
Copy link
Contributor

A struct cmark_node is 136 bytes on 64-bit and the input generates 8M nodes, so this seems about right.

@mity
Copy link
Contributor Author

mity commented Jun 8, 2019

Patient: "Doc, my knee is more then twice the normal size."
Doctor: "Yes. But you have half a liter of water there, so it seems about right."

@jgm
Copy link
Member

jgm commented Jun 8, 2019

@mity I don't understand your comment. Obviously a processor that doesn't construct an AST (such as md4c) could avoid the overhead for an AST node for each list item. But given that we do construct an AST, the only way to slim down would be to reduce the memory per node. 136 bytes isn't much. We've got to store pointers to parents and siblings, source location, node type, and information about the list item type. Do you have a concrete suggestion for improvement?

@mity
Copy link
Contributor Author

mity commented Jun 8, 2019

Well. it does seem a way lot to me. But if I am the only one, feel free to close.

@jgm
Copy link
Member

jgm commented Jun 8, 2019

If this became a problem in practice, we could look into ways to reduce the bytes/node.
For example, we could try to consolidate start/end line and column; 32 bits should be enough for a line or column number. Going further, we could just store a byte offset, and have a separate array correlating byte offsets with line and column numbers. Instead of using a union for as, we could perhaps store a void*.

But I'd be surprised if we could trim down the overhead for a node by more than 40% or so. I think we just have to accept that if we're building an AST with the structure of our AST, documents with a lot of nodes are going to take a lot of memory.

Anyway, this remains pretty theoretical; I've yet to see anyone have a problem in practice with memory usage. A normal sized book is going to be, what, 500K? (with much less density of nodes than your example).

@jgm jgm closed this as completed Jun 8, 2019
@ee7
Copy link

ee7 commented Jun 30, 2022

To others: it looks like #326 reduced the size of cmark_node on 64-bit from 136 bytes to 112 bytes (a decrease of 18%). Nice.

@nwellnhof
Copy link
Contributor

nwellnhof commented Jul 19, 2022

#446 reduces the size further to 104 bytes (13 64-bit words). Everything is neatly packed now and all the low-hanging fruit are picked. There are quite a few optimizations that would require additional and intrusive changes. The absolute minimum required seems to be:

  • parent, next, prev for the tree structure. first_child isn't needed for leaf nodes. There's a trick to remove last_child by making the prev list cyclic.
  • user_data is required to make language bindings performant.
  • 16 bytes for line/column positions. Maybe this could be made optional somehow.
  • 1-2 bytes for type/flags.
  • Something from 0-16 bytes for payload.

This would mean ~6 words (48 bytes) for simple nodes without line numbers, ~10 words (80 bytes) for larger node types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants