Extremely high memory consumption #303

mity · 2019-06-08T08:05:56Z

Consider the command

$ python3 -c 'N=2000; print(("* " * N + "x\n") * N)'

It generates a stream of 8004001 bytes (approx. 7.6 MB):

$ python3 -c 'N=2000; print(("* " * N + "x\n") * N)' | wc -c
8004001

When I measure cmark's memory consumption for processing such input, I can see this:

$ python3 -c 'N=2000; print(("* " * N + "x\n") * N)' | memusage ./cmark >/dev/null

Memory usage summary: heap total: 1727979956, heap peak: 1631967532, stack peak: 8704
         total calls   total memory   failed calls
 malloc|          2           8192              0
realloc|    8000045      543699080              0  (nomove:27, dec:0, free:0)
 calloc|   12002008     1184272684              0
   free|   20002013     1727971764
Histogram for block sizes:
    0-15              2  <1% 
   16-31        4000001  19% ========================
   32-47              1  <1% 
   48-63        8000006  39% =================================================
   80-95              1  <1% 
  128-143       8002005  40% ==================================================
  144-159             1  <1% 
  208-223             1  <1% 
  320-335             1  <1% 
  384-399             1  <1% 
  480-495             1  <1% 
  736-751             1  <1% 
 1104-1119            1  <1% 
 1664-1679            1  <1% 
 2512-2527            1  <1% 
 3776-3791            1  <1% 
 4096-4111            2  <1% 
 5664-5679            1  <1% 
 6000-6015            2  <1% 
 8512-8527            1  <1% 
12768-12783           1  <1% 
19168-19183           1  <1% 
28752-28767           1  <1% 
43136-43151           1  <1% 
64720-64735           1  <1% 
   large             18  <1%

If I compare heap peak (1631967532 bytes; or 1556.3 MB) and input size (8004001 bytes; or 7.6 MB), I can see cmark allocates more then 200 times the size of the input on the heap. Such factor seems extremely high to me, even when taking into account cmark builds complete AST.

(EDIT: Tested on 64-bit Linux)

The text was updated successfully, but these errors were encountered:

nwellnhof · 2019-06-08T14:15:41Z

A struct cmark_node is 136 bytes on 64-bit and the input generates 8M nodes, so this seems about right.

mity · 2019-06-08T14:33:51Z

Patient: "Doc, my knee is more then twice the normal size."
Doctor: "Yes. But you have half a liter of water there, so it seems about right."

jgm · 2019-06-08T17:50:42Z

@mity I don't understand your comment. Obviously a processor that doesn't construct an AST (such as md4c) could avoid the overhead for an AST node for each list item. But given that we do construct an AST, the only way to slim down would be to reduce the memory per node. 136 bytes isn't much. We've got to store pointers to parents and siblings, source location, node type, and information about the list item type. Do you have a concrete suggestion for improvement?

mity · 2019-06-08T23:28:08Z

Well. it does seem a way lot to me. But if I am the only one, feel free to close.

jgm · 2019-06-08T23:52:44Z

If this became a problem in practice, we could look into ways to reduce the bytes/node.
For example, we could try to consolidate start/end line and column; 32 bits should be enough for a line or column number. Going further, we could just store a byte offset, and have a separate array correlating byte offsets with line and column numbers. Instead of using a union for as, we could perhaps store a void*.

But I'd be surprised if we could trim down the overhead for a node by more than 40% or so. I think we just have to accept that if we're building an AST with the structure of our AST, documents with a lot of nodes are going to take a lot of memory.

Anyway, this remains pretty theoretical; I've yet to see anyone have a problem in practice with memory usage. A normal sized book is going to be, what, 500K? (with much less density of nodes than your example).

ee7 · 2022-06-30T15:21:27Z

To others: it looks like #326 reduced the size of cmark_node on 64-bit from 136 bytes to 112 bytes (a decrease of 18%). Nice.

nwellnhof · 2022-07-19T20:52:25Z

#446 reduces the size further to 104 bytes (13 64-bit words). Everything is neatly packed now and all the low-hanging fruit are picked. There are quite a few optimizations that would require additional and intrusive changes. The absolute minimum required seems to be:

parent, next, prev for the tree structure. first_child isn't needed for leaf nodes. There's a trick to remove last_child by making the prev list cyclic.
user_data is required to make language bindings performant.
16 bytes for line/column positions. Maybe this could be made optional somehow.
1-2 bytes for type/flags.
Something from 0-16 bytes for payload.

This would mean ~6 words (48 bytes) for simple nodes without line numbers, ~10 words (80 bytes) for larger node types.

jgm closed this as completed Jun 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extremely high memory consumption #303

Extremely high memory consumption #303

mity commented Jun 8, 2019 •

edited

Loading

nwellnhof commented Jun 8, 2019

mity commented Jun 8, 2019

jgm commented Jun 8, 2019

mity commented Jun 8, 2019

jgm commented Jun 8, 2019

ee7 commented Jun 30, 2022

nwellnhof commented Jul 19, 2022 •

edited

Loading

Extremely high memory consumption #303

Extremely high memory consumption #303

Comments

mity commented Jun 8, 2019 • edited Loading

nwellnhof commented Jun 8, 2019

mity commented Jun 8, 2019

jgm commented Jun 8, 2019

mity commented Jun 8, 2019

jgm commented Jun 8, 2019

ee7 commented Jun 30, 2022

nwellnhof commented Jul 19, 2022 • edited Loading

mity commented Jun 8, 2019 •

edited

Loading

nwellnhof commented Jul 19, 2022 •

edited

Loading