Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fastest way to create a huge object with NAPI #1219

Closed
ggreco opened this issue Oct 10, 2022 · 7 comments
Closed

Fastest way to create a huge object with NAPI #1219

ggreco opened this issue Oct 10, 2022 · 7 comments
Labels

Comments

@ggreco
Copy link

ggreco commented Oct 10, 2022

This is not strictly an issue, more a performance issue... point me to a better location where to post this, if it's not the right place... I could not find one.

I need to parse huge JSONs in my node application, and node has a limit of 512MB on the string that can be used as input of a JSON.parse(), so I tried to implement a JSON parse on a file (or a remote buffer) using NAPI, but the result is quite slow, more than 10 times slower of JSON.parse() on JSON sizes that JSON.parse can handle, I'm using rapidjson to parse in C++, and this is pretty fast, benchmarking the code the bottleneck seems to be the allocation of the napi values I'm creating when parsing the JSON, so I'm asking if there is something I can do to speed up the allocation of a lot of small objects (something like a bulk allocation method or a sql-like begin/end transaction logic for operations like this one), here is the c++ code I wrote:

#include <napi.h>
#include "rapidjson/document.h"

static Napi::Value parseJSONNode(Napi::Env &env, rapidjson::Value &node, Napi::Object *parent = nullptr, const char *parentKey = nullptr, uint32_t parentIndex = 0) {
    Napi::Value v;
    if (node.IsArray()) {
        Napi::Array a = Napi::Array::New(env, node.Size());
        for (rapidjson::SizeType i = 0; i < node.Size(); ++i) {
            parseJSONNode(env, node[i], &a, nullptr, i);
        }
        v = a;
    } else if (node.IsObject()) {
        Napi::Object o = Napi::Object::New(env);
        for (auto itr = node.MemberBegin(); itr != node.MemberEnd(); ++itr) {
            parseJSONNode(env, itr->value, &o, itr->name.GetString());
        }
        v = o;
    } else if (node.IsString()) {
        v = Napi::String::New(env, node.GetString());
    } else if (node.IsNumber()) {
        v = Napi::Number::New(env, node.GetDouble());
    } else if (node.IsBool()) {
        v = Napi::Boolean::New(env, node.GetBool());
    }
    if (parent) {
        if (parentKey) // parent is an object
            (*parent)[parentKey] = v;
        else // ... or an array
            (*parent)[parentIndex] = v;
    }
    return v;
}

Napi::Value parseJSON(const Napi::CallbackInfo& info) {
    Napi::Env env = info.Env();
    if (info.Length() < 2) {
        Napi::TypeError::New(env, "Invalid argument count").ThrowAsJavaScriptException();
        return env.Undefined();
    }
    if (!info[1].IsNumber() || !info[0].IsBuffer()) {
        Napi::TypeError::New(env, "Invalid argument type").ThrowAsJavaScriptException();
        return env.Undefined();
    }
  
    try {
        rapidjson::Document doc;    
        char *buffer = info[0].As<Napi::Buffer<char>>().Data();
        double buffer_len = info[1].ToNumber();
        doc.Parse(buffer, (size_t)buffer_len);
   
        if (!doc.IsObject() && !doc.IsArray()) {
            Napi::TypeError::New(env, "Unable to parse JSON").ThrowAsJavaScriptException();
            return env.Undefined();
        }
        return parseJSONNode(env, doc);
    } catch (std::string &err) {
        Napi::TypeError::New(env, err).ThrowAsJavaScriptException();
        return env.Undefined();
    }
}

Napi::Object Init(Napi::Env env, Napi::Object exports) {
    exports.Set(Napi::String::New(env, "parseJSON"), Napi::Function::New(env, parseJSON));
    return exports;
}

NODE_API_MODULE(addon, Init)
@NickNaso
Copy link
Member

Hi @ggreco,
you' re right Node.js ha a limit on the dimension of a Buffer that usually is 512MB. When you need to handle big file you should use the Stream API (https://nodejs.org/dist/latest/docs/api/stream.html) so before to implement a native addon my advice is to find on npm if there is a package that could be helpful to you https://www.npmjs.com/search?q=json%20stream.
In your code you're creating a big object that you pass back to JavaScript and maybe you spent most of the time creating that object. Pass a value from C++ and Javascript has a cost becuase the value will be copied. An idea to make the code faster coul be to use the Stream API that RapidJSON https://rapidjson.org/md_doc_stream.html give to you so your netive addon will bacame a sort of passthrough stream. Here you can find some simple example https://github.com/NickNaso/addon-stream. Let us know if this infomation could help you.

@NickNaso
Copy link
Member

NickNaso commented Nov 3, 2022

@ggreco do you need some other informations or is it possible to close the issue?

@ggreco
Copy link
Author

ggreco commented Nov 4, 2022

Not really, my point is to be able to build a huge javascript object inside the native code with a decent performance. At the moment it takes several seconds to create one in the worst case scenario (a 800MB json file), as my example code shows.

I hoped NAPI included some sort of "batch" creator that could create a full structure without having to create the nodes one by one.

The JSON parse was just an example, I have a huge dataset inside my native addon (so streaming is not the issue), and I have to pass pieces of it to the javascript world.

It seems that NAPI is so slow at creating object that passing a buffer containing a binary "packed" object to javascript and parsing it within node using code similar to this one:

buffer.readInt16BE(offset);
offset+=2;
buffer.readInt8(offset);
offset+=1;
buffer.readDoubleBE(offset);
offset+=8;

... is faster that creating the data structure from inside the native module!

Anyway given the original problem the fastest json streamer I could find in the npm registry is 10 times slower that:

JSON.parse(fs.readFileSync("my.json"));

@NickNaso
Copy link
Member

NickNaso commented Nov 4, 2022

Hi @ggreco,
could the DataView api (MDN refs) be good for your use case?

@ggreco
Copy link
Author

ggreco commented Nov 4, 2022

Hi @ggreco, could the DataView api (MDN refs) be good for your use case?

Essentially is what I'm doing with the node Buffer.... my hope was I was missing some napi utility for batch creation, but I also reviewed the C api and there is nothing for my use case... so probably you can close it, I can apply for a feature request against the node C api.

If you know in advance, for instance, that you need to create an array of 1000000 of objects it will be nice to have an API to create them, instead of having to do (pseudo code in c++, not 100% sure about syntax):

auto a = napi::Array::New(env, 1000000);
for (size_t i = 0; i < 1000000; ++i)
   a[i] = napi::Object::New(env);

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2023

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@github-actions github-actions bot added the stale label Feb 3, 2023
@mhdawson
Copy link
Member

Discussion in the Node-api team meeting was that we should close this issue in deference to the discussion going on in node core - nodejs/node#45905. Once that completes then we can re-open this issue or create a new one to cover the node-addon-api side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants