Best way to avoid `(sliced string)` ? #711

alexdima · 2017-07-07T07:39:06Z

Node.js Version: 7.4.0
v8 Version: 5.6.326.50
OS: macOS Sierra
Scope (install, code, runtime, meta, other?): basic string manipulation
Module (and version) (if relevant): Buffer

Hi, I'm working on VS Code (based on Electron), and I'm looking into improving our memory usage when dealing with large files in microsoft/vscode#30180.

Our buffer implementation is basically using an array of lines. I am aware of the advantages and disadvantages of that, but I would still like to push it to its limits. Our file reading involves reading chunks and pushing those through iconv-lite to handle file encoding. Long story short, we have a bunch of ~64KB strings that we need to split into lines.

The fastest way (that doesn't involve a native C++ node module) I've found so far is a using a simple str.split(\r\n|\r|\n). This works very well, but it ends up creating a (sliced string) for each line, all of which point to the parent chunk. When dealing with files of 3MM lines, these objects add up and eliminating them can mean a few extra tens of MB of memory savings.

Our current workaround to rid ourselves of the (sliced string) is here:

var lines = largeStr.split(/\r\n|\r|\n/);
for (var i = 0, len = lines.length; i < len; i++) {
    lines[i] = Buffer.from(lines[i]).toString();
}

I don't know if the above takes advantage of string interning or if it is the most efficient way to do this short of writing a native node module.

Do you have any idea? Thank you.

The text was updated successfully, but these errors were encountered:

bnoordhuis · 2017-07-07T08:35:55Z

Not sure I understand what your question is. A sliced string itself is a small object, it's a pointer to the parent string + offset and length. In that respect you shouldn't worry about seeing them show up in heap snapshots.

Slices do however prevent the parent string from reclaimed by the garbage collector. If that is your concern, see if node --nostring_slices makes a difference.

alexdima · 2017-07-07T09:05:10Z

@bnoordhuis
Thank you for the --nostring_slices tip.

However, I don't want to disable the usage of sliced strings in the entirety of VS Code (for 99% of the code base, I fully agree, they are indeed ignorable from a memory usage point of view).

But I would like to avoid them in a specific place, when constructing a file in VS Code, so I would need a localized solution.

As with any small number, when multiplied with a large number, it yields impressive results. Avoiding sliced strings leads to a save of 36MB for a file with more than 3MM lines.

I was wondering if there is something more efficient than Buffer.from(lines[i]).toString(), as this might end up copying the memory twice ?

Thank you!

bnoordhuis · 2017-07-07T09:20:59Z

Right, I see. There are a number of operations that flatten strings - array.join(), string.charAt(), etc. - but only under specific circumstances and the specifics change over time.

node --allow_natives_syntax gives you access to the %FlattenString(s) intrinsic but that's an implementation detail. Buffer.from(s).toString() is probably the stablest approach longer term.

addaleax · 2017-07-07T09:29:33Z

There’s also https://github.com/davidmarkclements/flatstr – It’s a side effect of Number(s); that its argument is flattened :)

alexdima · 2017-07-07T10:14:54Z

Thank you! ❤️

alexdima closed this as completed Jul 7, 2017

fengmk2 mentioned this issue Oct 26, 2018

Memory leak on sliced string node-modules/serialize-json#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best way to avoid `(sliced string)` ? #711

Best way to avoid `(sliced string)` ? #711

alexdima commented Jul 7, 2017 •

edited

Loading

bnoordhuis commented Jul 7, 2017

alexdima commented Jul 7, 2017

bnoordhuis commented Jul 7, 2017

addaleax commented Jul 7, 2017

alexdima commented Jul 7, 2017

Best way to avoid (sliced string) ? #711

Best way to avoid (sliced string) ? #711

Comments

alexdima commented Jul 7, 2017 • edited Loading

bnoordhuis commented Jul 7, 2017

alexdima commented Jul 7, 2017

bnoordhuis commented Jul 7, 2017

addaleax commented Jul 7, 2017

alexdima commented Jul 7, 2017

Best way to avoid `(sliced string)` ? #711

Best way to avoid `(sliced string)` ? #711

alexdima commented Jul 7, 2017 •

edited

Loading