Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add browser support #63

Closed

Conversation

bryanchriswhite
Copy link

@bryanchriswhite bryanchriswhite commented Jun 17, 2022

Summary

This PR re-organizes the work from fetchai/blst-ts#1 in an attempt to make it easier to review and to revert unnecessary changes. This work has also been released on npm as @fetchai/blst-ts.

While all applicable tests are passing in a browser environment, I'm reluctant to say that this "closes" #18 because it is missing bindings to the constants:

  • BLS12_381_G1
  • BLS12_381_NEG_G1
  • BLS12_381_G2
  • BLS12_381_NEG_G2

The rest of the bindings were to functions, which is well documented, but I was neither able to to find documentation on nor come up with a way to bind to C/C++ constants in the time I've spent on this so far. I would be extremely receptive to any help in this area! 😄

The blst submodule is pointed at the current HEAD of supranational/blst#121 but should be updated to the main branch when possible.

Dependencies

@bryanchriswhite bryanchriswhite requested a review from a team as a code owner June 17, 2022 12:58
@CLAassistant
Copy link

CLAassistant commented Jun 17, 2022

CLA assistant check
All committers have signed the CLA.

@dot-asm
Copy link

dot-asm commented Jun 19, 2022

I hope you don't mind if I ask some questions and comment. The goal is to gather some understanding of the platform and corresponding challenges. Maybe there is something that would be appropriate to do slightly differently on the blst side, more specifically in blst.hpp...

// BLS12_381_G1: P1_Affine;
// BLS12_381_NEG_G1: P1_Affine;
// BLS12_381_G2: P2_Affine;
// BLS12_381_NEG_G2: P2_Affine;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest not aiming for providing access to these. There are generator() methods that serve the purpose. Though one can make a case for constant generators. I mean generator() methods create mutable copies... Which can be viewed as an extra burden, because as far as I understand you're responsible for freeing all the objects explicitly. One can reformulate the question as following. Are there cases when constant generator points would be ultimately useful? One common case is the PT constructor, but it should be noted that the underlying subroutine, blst_miller_loop, actually replaces null references with generators...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one has me in a bit of an awkward position. 😅 My original intention was to replace the native binding, leaving the API layer unchanged. I don't know enough to have strong opinion on this but perhaps @dapplion can weigh in here?

Copy link

@dot-asm dot-asm Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, sometimes one has to recognize the limitations. I mean webasm interface is purely functional and one can't share constant data in the same way. As already mentioned, there are generator() methods and they are available even in native bindings. So that it's possible to adapt the existing code to avoid the limitation of one [platform]... But sure, let's first hear how bad is it. In worst case one can define them as constants, and have C++ side recognize them and replace with pointers to constant data...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there are way to execute some code upon module load? There should be... It looks like it would be possible to instantiate the objects by having contructors recognize the constants and returning pointers to constant data... Though problem would be that the data won't be actually constant. I mean unlike the native bindings where attempt to manipulate would trigger a crash, here a buggy program would get away with otherwise erroneous modification, which would incur hard-to-find bugs later on... Again, generator() methods would be preferable... But again, let's first hear...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though problem would be that the data won't be actually constant... hard-to-find bugs later on...

On closer examination maybe it's not that bad. In sense that the constant points in question are affine and the affine class doesn't have methods that would modify the data. In other words an application won't have an easily accessible way to wreck havoc...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(additional lifecycle hooks are also available)

Copy link
Contributor

@dapplion dapplion Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one has me in a bit of an awkward position. sweat_smile My original intention was to replace the native binding, leaving the API layer unchanged. I don't know enough to have strong opinion on this but perhaps @dapplion can weigh in here?

Since this library is eventually wrapped by chainsafe/bls I don't think it necessary that bindings and wasm API are the same. If wasm exports only a functional API that fits consumer ergonomics, then that's fine.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(additional lifecycle hooks are also available)

Could you provide some pointers? Like a real-life usage example and a specification or something...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, no. Usage of the Module.onRuntineInitialized hook and the emscripten docs that I referenced above contain the full extent of my understanding at the moment.

case "Uint8Array":
return [ensureUint8(val), val.length];
// TODO:
// case BigInt:
Copy link

@dot-asm dot-asm Jun 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one would go something like this

if (val < 0) {
  throw new Error(`expecting unsigned value`);
}
var temp = [];
while (val != 0) {
    temp.push(Number(val & 255n));
    val >>= 8n;
}
return [ensureUint8(temp), temp.length];

[As you can tell, I don't have a firm grasp on how things get to the wasm memory. I only know that it's separate, that Javascript can read and write it, while wasm can not read, let alone write, the Javascript memory. Right?]

Copy link
Author

@bryanchriswhite bryanchriswhite Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This binary logic seems very on-brand 😜

I don't think webpack will like the bigint literals, so maybe don't be surprised if BigInt(...)s come around later.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In essence the idea would be to fully harmonize with blst.hpp.ts.

var out = ensureCache.alloc(new Uint8Array(32), HEAPU8);
_emscripten_bind_SecretKey_to_lendian_1(self, out);
return new Uint8Array(HEAPU8.subarray(out, out + 32));
// TODO: free `out`?
Copy link

@dot-asm dot-asm Jun 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that Uint8Array(wasm-pointer, ...) provides direct access to wasm memory. My question is [if] HEAP8.subarray() makes a copy of the data from wasm memory to the Javascript pool. (I apologize if the question is stupid, I'm not that familiar with interactions between wasm and Javascript.)

Copy link

@dot-asm dot-asm Jun 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Internet suggests that one might just as well return a pointer to the webasm stack, which you then immediately copy to the Javascript pool. I suppose this means that data sits safely on the stack till the next call from to the corresponding webasm module. This way you don't need to allocate a temporary buffer and figure out when it can be freed...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is [if] HEAP8.subarray() makes a copy of the data from wasm memory to the Javascript pool.

From what I gather .subarray doesn't, but in the context new Uint8Array does the copy. As for freeing out. My understanding is that ensureCache allocates a chunk of wasm memory and reuses it specifically for argument passing. It has no free method.

if (ikm && typeof ikm === 'object') {
if (ikm instanceof Uint8Array) {
ikm_len = ikm.length;
ikm = ensureUint8(ikm, ikm.length);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensureUint8 doesn't take length. So passing ikm.length is redundant, right? What might be appropriate is to double-check that the length is not shorter than 32.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensureUint8 doesn't take length. So passing ikm.length is redundant, right?

Haha, so it would seem. 🤦

}
} else {
ikm_len = ikm.length;
ikm = ensureString(ikm);
Copy link

@dot-asm dot-asm Jun 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formally speaking the ikm_len can be deceptive in the context, String. The [String's] length method returns amount of characters, while what should reach C is array of bytes. The lengths of the two are not the same if there are non-ASCII characters. BTW, I don't see the intArrayFromString implementation and it doesn't seem to be standard. Where is it? In some dependency? Just in case, expectation is that String would be encoded as UTF-8...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the intArrayFromString implementation

It's provided by emscripten as part for the current module interface, right? As for ikm_len, I gather it should be ikm_len = lengthBytesUTF8(ikm);

Copy link
Author

@bryanchriswhite bryanchriswhite Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intArrayFromString is part of emscripten's preamble.js API. When emcc builds the WASM module, it also dumps the preamble into the accompanying JavaScript (before dead-code elimination).

Just in case, expectation is that String would be encoded as UTF-8...

Correct, is there a potential pitfall there?

Copy link

@dot-asm dot-asm Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case, expectation is that String would be encoded as UTF-8...

This came out wrong. Expectation is that all strings, not "String" as in "Javascript String", that are given to blst would be encoded as UTF-8. And the trouble is that "Javascript String"-s are stored as UTF-16. intArrayFromString does the UTF-16->UTF-8 conversion, but lengths' meanings are different. On the Javascript side it's amount of UTF-16 characters, while on blst side it's the amount of bytes it takes to encode the string in question in UTF-8. To give an example. In Javascript "Straße".length is 6, while the amount of bytes corresponding UTF-8 string takes is 7.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, thanks for clarifying 👍

var self = this.ptr;
ensureCache.prepare();
if (typeof input == 'object') { input = ensureInt8(input); }
_emscripten_bind_SecretKey_from_bendian_1(self, input);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who deallocates input? BTW, why is it copied to stack in the bind method? Is it actually required? BTW, I would argue that a buffer length check would be appropriate here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm used to believing that webasm can't access the Javascript pool. But now I see things about emscripten::val and it looks like one can create a memoryView of webasm memory in C++ and call its "set" method to copy data from the Javascript pool. Do you have any experience with that? Idea would be that instead of allocating input from Javascript one would pass the Uint8Array Javascript object, have C++ copy data to stack and deserialize it into SecretKey. Same technique can be applied to all deserialization methods...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who deallocates input?

No need, it's in ensureCache, it will be reused next time one passes arguments to a method. Length check would still be more than appropriate.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emscripten::val

It doesn't seem to be possible to pass them as arguments.

@dot-asm
Copy link

dot-asm commented Jun 19, 2022

Is it correct understanding that the glue code was first auto-generated and then manually fixed up? If so, maybe it would be appropriate to exercise approach similar to blst/bindings/go/generate.py. I.e. instead of applying same fix-ups to P1 and P2, one would put together P1 and then transliterate it to P2? So the each time P1 is modified, P2 is modified accordingly with zero effort. It's even possible to keep it in one single file. If there is interest, I can make more concrete suggestion...

@dot-asm
Copy link

dot-asm commented Jun 19, 2022

exercise approach similar to blst/bindings/go/generate.py

One can even get more creative and interleave C++ and Javascript in the same file, method by method, for better overview...

@dot-asm
Copy link

dot-asm commented Jun 19, 2022

exercise approach similar to blst/bindings/go/generate.py

One can even get more creative and interleave C++ and Javascript in the same file, method by method, for better overview...

Just to give an idea:

#!/usr/bin/env python3

js = """
P1_Affine.prototype['dup'] = P1_Affine.prototype.dup = /** @suppress {undefinedVars, duplicate} @this{Object} */function() {
  var self = this.ptr;
  return wrapPointer(_emscripten_bind_P1_Affine_dup_0(self), P1_Affine);
};;
"""
cpp = """
blst::P1_Affine* EMSCRIPTEN_KEEPALIVE emscripten_bind_P1_Affine_dup_0(blst::P1_Affine* self) {
  return new blst::P1_Affine(self->dup());
}
"""
js += """
P1_Affine.prototype['to_jacobian'] = P1_Affine.prototype.to_jacobian = /** @suppress {undefinedVars, duplicate} @this{Object} */function() {
  var self = this.ptr;
  return wrapPointer(_emscripten_bind_P1_Affine_to_jacobian_0(self), P1);
};;
"""
cpp += """
blst::P1* EMSCRIPTEN_KEEPALIVE emscripten_bind_P1_Affine_to_jacobian_0(blst::P1_Affine* self) {
  return new blst::P1(self->to_jacobian());
}
"""

print(cpp) # to cpp file, then replace P1 with P2 and emit again
print("8<------" * 10)
print(js)  # to js file, then replace P1 with P2 and emit again

@dot-asm
Copy link

dot-asm commented Jun 20, 2022

My suggestion would be to provide an empty .idl file that simply lists the classes:

[Prefix="blst::"] interface P1_Affine { };
[Prefix="blst::"] interface P1 { };
[Prefix="blst::"] interface P2_Affine { };
[Prefix="blst::"] interface P2 { };
[Prefix="blst::"] interface SecretKey { };
[Prefix="blst::"] interface PT { };
[Prefix="blst::"] interface Pairing { };

This will be used to re-generate empty bindings and all those Javascript helpers [in case it's deemed necessary]. Then one would use a script that interleaves C++/Javascript snippets to append manually written methods to previously generated files. And override the constructors.

As for P1->P2 transliteration. A better way to do it, better than bindings/go/generate.py that is, is to do something along following:

def xchg_1vs2(matchobj):
    if matchobj.group(2) == '1':
        return matchobj.group(1) + '2'
    else:
        return matchobj.group(1) + '1'

...

print(cpp, file=cpp_fd)
print(js, file=js_fd)
print(re.sub(r'((?<!f)[pgPG])([12])', xchg_1vs2, cpp), file=cpp_fd)
print(re.sub(r'((?<!f)[pgPG])([12])', xchg_1vs2, js), file=js_fd)

[Just in case, I'm using this method to generate C# bindings.]

@bryanchriswhite
Copy link
Author

My suggestion would be to provide an empty .idl file that simply lists the classes:

...

This will be used to re-generate empty bindings and all those Javascript helpers [in case it's deemed necessary]. Then one would use a script that interleaves C++/Javascript snippets to append manually written methods to previously generated files. And override the constructors.

I love this idea 😍! It's as if you've done this before 😉

@dapplion
Copy link
Contributor

@bryanchriswhite so happy to see this PR here in a such a great form, I sincerely appreciate your perseverance ❤️

Also thank you @dot-asm for jumping in already with valuable suggestions. Your help was instrumental in delivering blst-ts in the first place.

Will do an in-depth review latter this week

// fs.createReadStream(inputPath);
const data = fs.readFileSync(inputPath);
const js_contents = `
import { Buffer } from "buffer";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the buffer library already used for some other purpose or only imported here to convert to base64?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for base64 decoding. Is there a more browser-idiomatic way you're aware of?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been using this library https://www.npmjs.com/package/uint8arrays

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, my thinking was that Buffer was already available via the webpack polyfill. Looking closer, I'm noticing that maybe all but one usage of Buffer is in tests.

"P1.test.ts",
"P2.test.ts",
"Pairing.test.ts",
"SecretKey.test.ts",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it necessary to list test files explicitly here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, probably no reason. My workflow involved commenting and uncommenting various sets of tests while debugging.

describe("bls lib", () => {
const n = 3;
// TODO: more robust wait for `Module` to initialize
setTimeout(() => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use the maybeWaitForRuntime pattern too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, nice catch!

@@ -66,3 +68,11 @@ export function runInstanceTestCases<InstanceType extends {[key: string]: any}>(
}
}
}

export function maybeWaitForRuntime(callback: () => void): void {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To run some steps before running any test code mocha has some strategies, can you check if these ones play nice with web? Would definitely make the setup simpler.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, I was unaware!

@dapplion
Copy link
Contributor

I'm gonna think more what's the best way to provide both bindings and wasm in the same library. @bryanchriswhite do you think wasm has utility in NodeJS or may only be used in web?

CC @wemeetagain please take a look, specially how this integrates into chainsafe/bls and esm modules

@bryanchriswhite
Copy link
Author

bryanchriswhite commented Jun 21, 2022

do you think wasm has utility in NodeJS or may only be used in web?

@dapplion I believe that node supports the same WebAssembly API as the browser these days and emscripten's preamble is intended to support both. So I'm pretty confident that one could use the wasm binding in node as well.

@dot-asm
Copy link

dot-asm commented Jun 29, 2022

This will be used to re-generate empty bindings and all those Javascript helpers [in case it's deemed necessary]. Then one would use a script that interleaves C++/Javascript snippets to append manually written methods to previously generated files. And override the constructors.

This last part, overriding the constructors, is not possible. Hence https://github.com/dot-asm/blst/tree/emscripten/bindings/emscipten, as a starting point...

@dapplion
Copy link
Contributor

@bryanchriswhite ping, reached out via email from your Github profile

@dot-asm
Copy link

dot-asm commented Jul 4, 2022

There was typo in directory name, fixed. And runnable.js is equivalent of node.js/runnable.js now...

@dot-asm
Copy link

dot-asm commented Jul 8, 2022

https://github.com/dot-asm/blst/tree/emscripten/bindings/emscripten is essentially ready to roll. "Essentially" is an invitation to check it out and provide feedback. Once it's done, it can go to the main blst repo. build.py creates blst.wasm and blst.js in current working directory. And it passes through additional command line flags to emcc, e.g. /some/where/blst/bindings/emscripten/build.py -O2 --closure 1. blst.js interface aims to be fully compatible with the node.js module.

@matthewkeil
Copy link
Member

Closing this PR. Browser support is provided via https://github.com/supranational/blst/tree/master/bindings/emscripten and also via https://github.com/ChainSafe/bls . This repos now hosts napi async bindings for multi-threaded verification and is node specific. There is some discussion about building multi-threaded wasm or emscripten bindings but that is still a future "wish list" thing. If this work becomes relevant again we can open a fresh PR and refer back to this one for the comment history. Thank you for your work on this @bryanchriswhite. If you have any questions or concerns please feel free to reach out to us on our discord and we will be happy to chat 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants