Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not emit 100k data segments, browsers reject it #1350

Merged
merged 10 commits into from
Jan 10, 2018

Conversation

kripken
Copy link
Member

@kripken kripken commented Jan 8, 2018

This was noticed on cib (clang port to wasm).

@xtuc
Copy link

xtuc commented Jan 9, 2018

That might be worth adding to the specification? I can create the issue if you want.

@jfbastien
Copy link
Member

It's in the spec: https://webassembly.github.io/spec/core/appendix/implementation.html?highlight=implementation%20limit
Refinements being discussed: WebAssembly/spec#607

@@ -37,7 +37,8 @@ namespace wasm {

enum {
// the maximum amount of bytes we emit per LEB
MaxLEB32Bytes = 5
MaxLEB32Bytes = 5,
MaxDataSegments = 100 * 1000 // a hard limit wasm VMs impose
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this (collection of limits) should go into abi.h or some similar place; i.e. these are not limitations of the binary format, but web-embedding specific. Maybe even a good way to turn them off and on, although that doesn't necessarily have to be in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, and another limit is the function size issue (#1246 is open for that), so maybe we should collect up these limits in a central place. abi doesn't seem entirely right. Maybe web-limitations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new enum for both those limitations (still inside wasm-binary header, since I realized this is a binary format limitation only).

Copy link
Contributor

@jgravelle-google jgravelle-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall approach looks good

@@ -37,7 +37,8 @@ namespace wasm {

enum {
// the maximum amount of bytes we emit per LEB
MaxLEB32Bytes = 5
MaxLEB32Bytes = 5,
MaxDataSegments = 100 * 1000 // a hard limit wasm VMs impose
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the limit in VMs 100*1000, or 100*1024?
Is there a reference we can link to for this, because all I see on https://webassembly.github.io/spec/core/appendix/implementation.html?highlight=implementation%20limit is "implementations may impose some limit"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also can't find a proper reference for it. I figured I'd set the limit to the lower of the two interpretations ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an official link for the Web as a whole?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The official discussion is here: WebAssembly/spec#607

if (numConstant + numDynamic >= MaxDataSegments) {
numConstant = MaxDataSegments - numDynamic - 1;
num = numConstant + numDynamic;
assert(num == MaxDataSegments - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there runtime cases where this might not be true?

I secretly like this as-is as an affirmation of the expected postcondition of the previous two lines, just in case the reader doesn't feel like doing the algebra. (Although my first reaction was "wait isn't this always true")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, mathematically speaking it should be true anyhow, but I think it's nice for clarity, and to prevent future refactorings from breaking stuff.

Memory::Segment combined(&c);
for (Index j = i; j < segments.size(); j++) {
auto& segment = segments[j];
if (segment.data.size() == 0) continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two conditions are repeated three times here. Maybe extract a lambda (or method), if (isEmptyOrConst(segment)) continue;?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I was thinking merge the conditions to reduce line/continue count, i.e. go from

if (foo.isA()) continue;
if (foo.isB()) continue;

to

if (foo.isAOrB()) continue;

There's one condition where which checks both size() == 0 and NOT ->is<Const>, so either leave it standalone or also make isEmptyOrNotConst.

This is also fairly minor, so Nit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, I didn't understand you fully before.

I kind of prefer it now - with the nice function names, I think it's clear to read. isAOrB() seems kind of excessive, at least to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the name isAOrB is pretty contrived. I think ideally it would be something like canEmit or isSafeToEmit or canMerge or something, where the name implies why we care about A or B. I'm not sure if A and B are actually related though? So I mentioned it so that if you had an idea for what to unify them as, you could name it well. Failing that, a unifying function is probably overkill yeah.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well yeah, actually I'm convinced now. We do only care about one type of segment at that point (and til the end), so why not have a helper function for it. Added.

@kripken kripken merged commit 8f90b65 into master Jan 10, 2018
@kripken kripken deleted the 100k-enough-for-everyone branch January 10, 2018 05:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants