-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not emit 100k data segments, browsers reject it #1350
Conversation
That might be worth adding to the specification? I can create the issue if you want. |
It's in the spec: https://webassembly.github.io/spec/core/appendix/implementation.html?highlight=implementation%20limit |
src/wasm-binary.h
Outdated
@@ -37,7 +37,8 @@ namespace wasm { | |||
|
|||
enum { | |||
// the maximum amount of bytes we emit per LEB | |||
MaxLEB32Bytes = 5 | |||
MaxLEB32Bytes = 5, | |||
MaxDataSegments = 100 * 1000 // a hard limit wasm VMs impose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this (collection of limits) should go into abi.h or some similar place; i.e. these are not limitations of the binary format, but web-embedding specific. Maybe even a good way to turn them off and on, although that doesn't necessarily have to be in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, and another limit is the function size issue (#1246 is open for that), so maybe we should collect up these limits in a central place. abi
doesn't seem entirely right. Maybe web-limitations
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new enum for both those limitations (still inside wasm-binary header, since I realized this is a binary format limitation only).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall approach looks good
src/wasm-binary.h
Outdated
@@ -37,7 +37,8 @@ namespace wasm { | |||
|
|||
enum { | |||
// the maximum amount of bytes we emit per LEB | |||
MaxLEB32Bytes = 5 | |||
MaxLEB32Bytes = 5, | |||
MaxDataSegments = 100 * 1000 // a hard limit wasm VMs impose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the limit in VMs 100*1000
, or 100*1024
?
Is there a reference we can link to for this, because all I see on https://webassembly.github.io/spec/core/appendix/implementation.html?highlight=implementation%20limit is "implementations may impose some limit"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also can't find a proper reference for it. I figured I'd set the limit to the lower of the two interpretations ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an official link for the Web as a whole?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The official discussion is here: WebAssembly/spec#607
src/wasm/wasm-binary.cpp
Outdated
if (numConstant + numDynamic >= MaxDataSegments) { | ||
numConstant = MaxDataSegments - numDynamic - 1; | ||
num = numConstant + numDynamic; | ||
assert(num == MaxDataSegments - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there runtime cases where this might not be true?
I secretly like this as-is as an affirmation of the expected postcondition of the previous two lines, just in case the reader doesn't feel like doing the algebra. (Although my first reaction was "wait isn't this always true")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, mathematically speaking it should be true anyhow, but I think it's nice for clarity, and to prevent future refactorings from breaking stuff.
src/wasm/wasm-binary.cpp
Outdated
Memory::Segment combined(&c); | ||
for (Index j = i; j < segments.size(); j++) { | ||
auto& segment = segments[j]; | ||
if (segment.data.size() == 0) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two conditions are repeated three times here. Maybe extract a lambda (or method), if (isEmptyOrConst(segment)) continue;
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I was thinking merge the conditions to reduce line/continue count, i.e. go from
if (foo.isA()) continue;
if (foo.isB()) continue;
to
if (foo.isAOrB()) continue;
There's one condition where which checks both size() == 0
and NOT ->is<Const>
, so either leave it standalone or also make isEmptyOrNotConst
.
This is also fairly minor, so Nit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, I didn't understand you fully before.
I kind of prefer it now - with the nice function names, I think it's clear to read. isAOrB()
seems kind of excessive, at least to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the name isAOrB
is pretty contrived. I think ideally it would be something like canEmit
or isSafeToEmit
or canMerge
or something, where the name implies why we care about A or B. I'm not sure if A and B are actually related though? So I mentioned it so that if you had an idea for what to unify them as, you could name it well. Failing that, a unifying function is probably overkill yeah.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well yeah, actually I'm convinced now. We do only care about one type of segment at that point (and til the end), so why not have a helper function for it. Added.
This was noticed on cib (clang port to wasm).