Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasmtime dependency includes large binary file #3545

Closed
willbeason opened this issue Jun 10, 2021 · 16 comments · Fixed by #3708
Closed

wasmtime dependency includes large binary file #3545

willbeason opened this issue Jun 10, 2021 · 16 comments · Fixed by #3708

Comments

@willbeason
Copy link
Member

Sometime between v0.24.0 and v0.29.4, we added a dependency on github.com/bytecodealliance/wasmtime-go. There's some history around why we used this particular version, but it boils down to that it has an active community and has everything we need.

The downside is that the repository includes a large binary file which go mod vendor adds to vendor/ (See: vendor/github.com/bytecodealliance/wasmtime-go/build/linux-x86_64/libwasmtime.a). GitHub complains about this, but it doesn't appear to have any other negative effects.

We'd rather not require all repositories which vendor open-policy-agent/opa to download this (possibly unnecessary for us?) file, but there doesn't seem to be a clean solution on our end. The fix will probably either come down to one of:

  • convincing bytecodealliance/wasmtime to not include the binary (or figuring out how to work with go modules to only download it if we actually depend on it)
  • finding a way to get go mod vendor to ignore the binary on our end
  • using a different implementation

Again, there isn't a pressing need and nothing is blocked or broken because of this.

@srenatus
Copy link
Contributor

I suppose we could also introduce some build automation to push two tags for each release, one having the dependency removed.

@srenatus
Copy link
Contributor

srenatus commented Jun 21, 2021

💭 Would it help to actually use LFS, as advised by Github? I don't know for sure, but maybe it could make downloading the large files optional?

@willbeason
Copy link
Member Author

That is a fantastic idea. I'll see if the project still compiles + tests pass with the file in LFS.

@olivierlemasle
Copy link
Contributor

It might come with limitations: golang/go#39720 (I've not tested myself)

@srenatus
Copy link
Contributor

@olivierlemasle thanks for pointing that out! I had only looked into the github action, which seems to support lfs, so we'd be OK in CI usage. But if using LFS means you cannot "just include github.com/open-policy-agent/opa" in your go code, then it's a non-starter indeed.

@srenatus
Copy link
Contributor

Another thing we could do, I guess, is to stop committing vendor/ to git, and to advise users of OPA as a library to not do it? I don't think there's much of an advantage in comitting vendor/, besides having builds that don't do networking to fetch go modules. But I'd think those won't be a problem, with the proxy infra that is backing them...? 💭

It won't help will full checkouts, since the history can't be changed, but for shallow clones, this could improve things. Of course, if the project depending on OPA does commit vendor/, the same issue comes up again. 🤷

@maxsmythe
Copy link
Contributor

maxsmythe commented Aug 4, 2021

In addition to avoiding fetch for faster build times, committing vendor to git has the advantage of a guaranteed static build that is not subject to libraries disappearing due to deletion.

It seems fairly exceptional to require all consumers of a library not to use a common golang workflow, especially after-the-fact.

@srenatus
Copy link
Contributor

srenatus commented Aug 4, 2021

It seems fairly exceptional to require all consumers of a library not to use a common golang workflow, especially after-the-fact.

@maxsmythe Yup, you're completely right about this. This isn't something we can impose on users, it would just be offering an option to people who are concerned about those large files.

💭 That said, if you don't vendor your dependencies, go will not pull the opa vendor/ dir in. So if you're concerned about the large files, you could do that today already, couldn't you?

@maxsmythe
Copy link
Contributor

maxsmythe commented Aug 4, 2021

True, though how many people who are using OPA as a library are concerned about the WASM runtime? They are still forced to choose between vendoring large blobs of files or being subject to dependencies disappearing due to repo deletions (which does happen).

Gomod automatically prunes unused files ("unused" meaning they are not part of the import tree for the vendoring project). I wonder if there is some way to keep the WASM import fork separate from the golang imports? I'm guessing the shared import point is the compile pathway, which might make it hard to untangle. Perhaps if the WASM stuff was injected at runtime?

@maxsmythe
Copy link
Contributor

Also note that the large WASM binaries came after OPA was made available as a library, so the solution of "don't vendor" is still a regression.

@maxsmythe
Copy link
Contributor

It also occurs to me that making the WASM support injected at runtime would allow it to be maintained in a separate repo, which would lower the cost of cloning the OPA repository as well, which would make it easier on contributors.

@srenatus
Copy link
Contributor

srenatus commented Aug 4, 2021

True, though how many people who are using OPA as a library are concerned about the WASM runtime?

Not trying to be annoying on purpose, but we also have no hard numbers on how many people are concerned about the large binary files. (Let alone how many people would even notice this if github didn't return a warning on git push.)

I wouldn't dismiss Wasm stuff that easily either. Most users might not care today -- but that can change (just imagine we pull through #3631).

That said, I am open to suggestions on how to solve this more neatly. -- We'll have to investigate what "WASM support injected at runtime" means here, and at which cost it comes. Do you have any specific solution in mind, @maxsmythe?

@maxsmythe
Copy link
Contributor

Not trying to be annoying on purpose, but we also have no hard numbers on how many people are concerned about the large binary files. (Let alone how many people would even notice this if github didn't return a warning on git push.)

Fair. The file size is mostly an accessibility concern, particularly for those with limited internet bandwidth.

I wouldn't dismiss Wasm stuff that easily either. Most users might not care today -- but that can change (just imagine we pull through #3631).

Not relevant to the current discussion, but worth asking: How much of the WASM blob is devoted to running WASM vs. explicitly providing rego builtins in a WASM form? I know that opaque binaries bundled in repos are generally a security concern, to the point where more security conscious orgs may require such binaries to be stripped from the code before it can be used. Because of this, it's worth considering packaging the source in the repo, generating WASM after-the-fact if at all possible.

That said, I am open to suggestions on how to solve this more neatly. -- We'll have to investigate what "WASM support injected at runtime" means here, and at which cost it comes. Do you have any specific solution in mind, @maxsmythe?

I haven't dug into where the WASM files get used, but let's assume it's the compiler. A naive solution would be:

compiler code:

type WASMTarget interface {
   GetDependencies()
   CompileToWasm()
   ... whatever other WASM stuff is needed
}

consumer (assuming they want wasm):

import (
  "github.com/open-policy-agent/opa/ast"
  "github.com/open-policy-agent/opa/wasm"
)

compiler := ast.NewCompiler(ast.Options{WASM: wasm.Target{}})

Because the consumer is responsible for importing the WASM code, a consumer that does not need it will simply never import it, which means gomod will prune that file from the vendor/ directory.

It shouldn't add performance overhead. It does result in a slightly more complex client model for those who want WASM, the writing of the interface, and figuring out how to decouple WASM from the rest of the code base.

@maxsmythe
Copy link
Contributor

maxsmythe commented Aug 4, 2021

A less naive solution may try to take into account other possible build targets... compile Rego directly to ARM or x86?

@srenatus
Copy link
Contributor

srenatus commented Aug 5, 2021

Thanks, you made me realise how our situation here is similar to database drivers 😅 The approach commonly taken there are underscore imports... so, I've taken a bit of time to sketch this out here: #3708 From playing with this locally, it should let GK (for example) get rid of the wasmtime-go dep.

There's further questions about this approach raised in the PR, it's definitely something that needs some discussion and care.

@maxsmythe
Copy link
Contributor

Nice! Interesting model, this "import to include" system.

Thank you for taking a look at the approach!

srenatus added a commit that referenced this issue Aug 16, 2021
Users of OPA as a library are concerned about big binary blobs in their vendor/
directories. Even more so if they don't use them. This is the case for anyone
using OPA as library, but not using the wasm-backed evaluation feature.

With this change, importers of any packages other than `server` and `cmd`
will have to explicitly opt-in to using wasm evaluation features by having an
underscore import somewhere:

    import _ "github.com/open-policy-agent/opa/features/wasm"

Fixes #3545.

Signed-off-by: Stephan Renatus <[email protected]>
jgchn pushed a commit to jgchn/opa that referenced this issue Aug 20, 2021
…3708)

Users of OPA as a library are concerned about big binary blobs in their vendor/
directories. Even more so if they don't use them. This is the case for anyone
using OPA as library, but not using the wasm-backed evaluation feature.

With this change, importers of any packages other than `server` and `cmd`
will have to explicitly opt-in to using wasm evaluation features by having an
underscore import somewhere:

    import _ "github.com/open-policy-agent/opa/features/wasm"

Fixes open-policy-agent#3545.

Signed-off-by: Stephan Renatus <[email protected]>
dolevf pushed a commit to dolevf/opa that referenced this issue Nov 4, 2021
…3708)

Users of OPA as a library are concerned about big binary blobs in their vendor/
directories. Even more so if they don't use them. This is the case for anyone
using OPA as library, but not using the wasm-backed evaluation feature.

With this change, importers of any packages other than `server` and `cmd`
will have to explicitly opt-in to using wasm evaluation features by having an
underscore import somewhere:

    import _ "github.com/open-policy-agent/opa/features/wasm"

Fixes open-policy-agent#3545.

Signed-off-by: Stephan Renatus <[email protected]>
Signed-off-by: Dolev Farhi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants