-
-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better integration with conda/conda-forge for building packages #795
Comments
We have cross compilation support in conda-forge. We cross compiled numpy, scipy, matplotlib for |
Thanks for writing this up Roman! 😄 This sounds like a good summary to me. I think going from 3 to 4 has been the step I've always been pretty fuzzy on (though I'm guessing you know this pretty well 😉). How does one turn a Conda package tarball into something that pyodide can use? Where do these get hosted? As Isuru's point about adding a platform in Conda, I think this was the idea behind issue ( conda/conda#7619 ), but may be missing things here. One other thing worth discussing is over time LLVM's ability to build WASM binaries has grown. Does it make sense to start using that or is Emscripten still needed for some things? The one thing that I recall use to be a stumbling block was a |
Thanks for the feedback @isuruf @jakirkham ! Great to know that cross compilation works conda-forge.
As far as I understand those The question for me is more is the conda artifacts really the best format for distributing such files (which were not optimized for this use case). Maybe there are some other approaches in the JS/WASM ecosystem that would be better. I haven't studied the question in detail so far. At present, we put the artifacts in an S3 under versioned paths and CDN proxy it by JsDelivr, which allows for a lot of flexibility.
Absolutely, it would be good to revisit if the situation evolved much since the discussion in conda/conda#7619
The link you provide is for WASI or outside of the browser, as far as I understand though? Within the browser, we still need an in-memory filesystem (currently provided by emscripten) as CPython wouldn't be very useful without it. For instance Rust can be easily compiled to WASM (without emscripten), but then you cannot do filesystem I/O at least as indicated in the reference materials. Maybe there are indeed lighter projects providing the necessary abstractions, it could be worth checking. |
I have just updated binaryen (conda-forge/binaryen-feedstock#38) and added an Emscripten recipe to conda-forge (conda-forge/staged-recipes#13178). With those two things we could be ready to explore these ideas. I'd be happy to integrate cross-compilation to javascript "natively" into boa so that it becomes very straight-forward ... although I am not sure how simple that will be. I think conda is more than the package format, and that should not be the thing holding us up -- we can easily create a new, more web-optimized package format (different or no compression etc.) if we have to. Regarding libc, I was under the impression that emscripten did some magic to get me a libc :) in general, emscripten seemed to work quite smoothly. Personally, I'd be quite interested in getting the packages in boa-forge to compile to see if we can bootstrap a wasm-micromamba. |
If you don't use emsdk you might need to package it manually which doesn't sound that simple (but they did it in Homebrew) |
It's not very difficult to package emscripten - I have a working recipe that I have been using to build wasm conda packages. So it all sounds quite doable (including running a micromamba in the browser). For that we'd need to rewrite mamba a bit though, to use the emscripten Fetch and FileSystem APIs instead of curl. We could give it a shot though. @rth you might know better how to do this! We could also make sure to only support one of the two (three, actually) compression algorithms that conda is using currently (e.g. allow only I can upload the recipes I have soon. Would be cool to have a collaboration for this! |
Sorry for slow response @wolfv
That's great!
Indeed. In pyodide, we just mount the filesystem and then it can be interacted with directly from Python. However for interacting with remote URLs one indeed need to rewrite everything with Web APIs (e.g.
Likely https://github.com/emscripten-core/emscripten/blob/1216d230eac6a335f1397f4ab1d2bf297113633b/src/settings.js#L152 needs to be increased with a env variable.
Yes, that would be great. For pyodide that would mean incrementally get closer to the conda-forge apporach and possibly start using some of the tooling. One thing where I would be interested in your feedback is assuming python packages are built with emscripten on conda-forge, would the current way of detecting it at build time in setup.py is appropriate (currently via the |
Are you interested in helping me to bootstrap some "emscripten" enabled recipes? I can create a repo, and we could run a Azure pipeline to build a couple of wasm compiled conda packages. |
@wolfv yes, we could try to experiment there. |
Ok, I'll have to get emscripten on conda-forge first, then I'll set it up. |
So overall emscripten-forge went this road, we are more focused on producing wheels with PyPA / cibuildwheel tooling. But we can certainly open more specific issues about ways to share some of the tools or approaches. |
Here are the current docs. Should they mention MambaLite – which installs packages from emscripten-forge instead of conda-forge, which doesn't host WASM packages – as a third-party tool
"Mamba meets JupyterLite" (2022-07)
From https://github.com/emscripten-forge/empack :
@DerThorsten |
picomamba /mamba-lite is here: |
|
From https://twitter.com/simonw/status/1559969074607599617 w/ @simonw re: WASM package security controls and the just in a browser tab software supply chain :
From https://simonwillison.net/2022/Aug/17/datasette-lite-plugins/ :
FWIU packages are persisted w/ SQLite in WASM, per-request?
"Pypi.org is running a survey on the state of Python packaging" (2022)
|
Happy to add emscripten-forge under related projects (please open a PR) however I don't think this belongs, at least for now, in the main section on how to install packages (which is already rather confusing ). They are alternative distributions. For instance, you won't see in the pip documentation, "well you can also install this with conda", or conda-forge advertising to use homebrew in the official documentation. Though in any case, it would be good to figure out binary compatibility first between Pyodide and emscripten-forge packages Also, I'm very happy to chat @DerThorsten and see points on which we could work together. For instance, we are currently unvendoring micropip from the monorepo so it's easier to reuse if necessary #3093 My point in closing this issue was that we can open more specific discussion points, but we can also continue the discussion here if you prefer. @westurner thank you for your comments with this information, but I'm not entirely sure what you proposing though :) |
agreed, that would be a great first step!
Would be happy to work together more! @westurner I have no clue what you are proposing |
This issue having been closed, it seemed out of the way from actual progress that it might be holding up. There are many systems for package metadata and signed cryptographic manifests (some with per-file hash checksums). Where we have functional overlap and duplication of effort, there is potential for security vulnerability. How should pyodide's build change to better integrate with conda-forge (and emscripten-forge)? Hopefully the aforementioned tools copy the manifest signatures over when re-packing and re-hosting. When pyodide / micropip was written:
There are many opportunities to drop the ball in build systems and application dependency composition; DevSecOps for software supply chain security. How can {pyodide, micropip, mambalite,} require package signatures (to a standard better than the tools they build atop) in order to prevent (widescale) exploitation of browsers with WASM and no quotas and someday, local File System Access? |
Yes, these are good questions. Would you mind though opening a separate issue about end-to-end package signing, as it's a very specific technical point that it would be better to discuss separately from this very general issue? In our case, as compared to pip or conda, the attack surface is very much reduced due to the browser sandbox. But yes, the security story can certainly be improved and there is still the usage in Node which doesn't provide a sandbox.
Short answer is we don't plan to integrate with conda forge anymore. emscripten-forge is working on that. For Pyodide we are going for better integration with PyPI, PyPA tooling, cibuildwheel etc. |
Are there projections as to what the load impact on PyPI and it's CDN
expenses will be from WASM apps pulling dependencies on every run?
https://pypi.org/sponsors/
- "[Discussions on Python.org] [Packaging]
Draft PEP: PyPI cost solutions: CI, mirrors, containers, and caching to
scale"
https://discuss.python.org/t/draft-pep-pypi-cost-solutions-ci-mirrors-containers-and-caching-to-scale/3681
https://groups.google.com/g/pypa-dev/c/Pdnoi8UeFZ8
- Pip downloads wheels for every CI build and deployment
- Pip does not download wheels for every process invocation
- Micropip downloads wheels for every page load process invocation from PyPI
- MambaLite downloads empkg rebuilds of conda packages unless there's a
noarch conda package, but their HTTP headers aren't setup to CDN for
"download every package on every process invocation / page load"
…On Sun, Sep 18, 2022, 12:20 PM Roman Yurchak ***@***.***> wrote:
Yes, these are good questions. Would you mind though opening a separate
issue about end-to-end package signing, as it's a very specific technical
point that it would be better to discuss separately from this very general
issue?
In our case, as compared to pip or conda, the attack surface is very much
reduced due to the browser sandbox. But yes, the security story can
certainly be improved and there is still the usage in Node which doesn't
provide a sandbox.
How should pyodide's build change to better integrate with conda-forge
(and emscripten-forge)?
Short answer is we don't plan to integrate with conda forge anymore.
emscripten-forge is working on that. For Pyodide we are going for better
integration with PyPI, PyPA tooling, cibuildwheel etc.
—
Reply to this email directly, view it on GitHub
<#795 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMNS2LCDVBMQ5A57EMALTV646OBANCNFSM4TP353FA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
(I must have confused
"building pyodide" /
"building pyodide packages like conda-forge / emscripten-forge"
with "just have piplite solve and install the dependency graph from PyPI for every page load")
…On Sun, Sep 18, 2022, 2:30 PM Wes Turner ***@***.***> wrote:
Are there projections as to what the load impact on PyPI and it's CDN
expenses will be from WASM apps pulling dependencies on every run?
https://pypi.org/sponsors/
- "[Discussions on Python.org] [Packaging]
Draft PEP: PyPI cost solutions: CI, mirrors, containers, and caching to
scale"
https://discuss.python.org/t/draft-pep-pypi-cost-solutions-ci-mirrors-containers-and-caching-to-scale/3681
https://groups.google.com/g/pypa-dev/c/Pdnoi8UeFZ8
- Pip downloads wheels for every CI build and deployment
- Pip does not download wheels for every process invocation
- Micropip downloads wheels for every page load process invocation from
PyPI
- MambaLite downloads empkg rebuilds of conda packages unless there's a
noarch conda package, but their HTTP headers aren't setup to CDN for
"download every package on every process invocation / page load"
On Sun, Sep 18, 2022, 12:20 PM Roman Yurchak ***@***.***>
wrote:
> Yes, these are good questions. Would you mind though opening a separate
> issue about end-to-end package signing, as it's a very specific technical
> point that it would be better to discuss separately from this very general
> issue?
>
> In our case, as compared to pip or conda, the attack surface is very much
> reduced due to the browser sandbox. But yes, the security story can
> certainly be improved and there is still the usage in Node which doesn't
> provide a sandbox.
>
> How should pyodide's build change to better integrate with conda-forge
> (and emscripten-forge)?
>
> Short answer is we don't plan to integrate with conda forge anymore.
> emscripten-forge is working on that. For Pyodide we are going for better
> integration with PyPI, PyPA tooling, cibuildwheel etc.
>
> —
> Reply to this email directly, view it on GitHub
> <#795 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAAMNS2LCDVBMQ5A57EMALTV646OBANCNFSM4TP353FA>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
@westurner I opened #3127 for CDN resource usage discussion. |
Just as an FYI, I opened conda-forge/staged-recipes#20961 adding micropip as a conda-forge noarch package, which should (I hope? idk :D) allow emscriptenforge and micropip to be used together like how conda-forge and pip can be used together. |
awesome @yuvipanda – that's great news. It didn't even come to my mind to just add it to conda-forge :) |
The idea to rely on conda-forge for building Python packages to WebAssembly has been mentioned for a while now (#38 (comment), conda/conda#7619, regro/cf-scripts#1052 (comment)), and in this issue I wanted start a discussion about current situation and existing challenges to move in that direction from the perspective of pyodide, as I understand it (please correct me if needed).
First the main motivation is that the present way of building all the packages in one repo is not sustainable with the increase of the number of packages and the associated increase in CI time. To resolve this we would need significant development resources, which we don't have. Even if we did, it would amount to doing many things (including a community) that already exist and work great at conda-forge, which wouldn't make sense.
Now as to challenges (it's a long post),
1. Updating emscripten
With a single repo it's relatively fast to rebuild all packages with a different version of the emsdk toolchain (emscripten, binaryen, ..) or different options. We currently still have a couple of patches applied to emscripten, and we also ideally need to update emscripten frequently to benefit from improvements and fixes (currently 1.5 years late with respect to the latest release, unfortunately). In conda-forge rebuilding all the packages with a new compiler would take longer (though this got better recently regro/cf-scripts#1052 (comment)). Also the use-case where a) we update emscripten version b) some package fails to build c) we have to go back and change some global emscripten settings would really be unpractical I think.
This would hopefully become less of an issue with time as emscripten becomes more and more stable, but it's still an issue now (see e.g. #480 (comment))
2. Build approach
The cross-compilation of scientific Python packages (based on distutils) is difficult (scipy/scipy#8571, numpy/numpy#17620) as far as I understand, even on Linux between different architectures.
I'm not sure if this was the reason, but pyodide doesn't do cross-compilation in the classical sense. Instead it compiles the package with the host compilers, stores a log of all executed compilation commands and re-run those commands with the emscripten compiler.
3. Shared package specifications
Package specifications where chosen as close as possible to the
meta.yaml
in conda, and hopefully soon the package index will also use the same format (#791)4. Artifacts format
Currently each package consist of 2 separate (.data, .js) files which we distribute via jsDelivr. Those would probably not fit as conda artifacts, which would mean that we likely need to handle some of this in any case.
5. Dependency resolution in the browser
There are two use cases for pyodide,
Either way we also need to install pure python wheels (from PyPi or other custom location), so we still have this duality between conda/pyodide packages and Python wheels as well. Meaning we have to maintain a minimalistic pip (micropip) in pyodide.
I haven't followed close WebAssembly related developments at conda-forge, maybe I am missing something.
cc @wolfv @jakirkham
The text was updated successfully, but these errors were encountered: