Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New mode - in-memory install #89

Open
jaraco opened this issue Dec 6, 2023 · 5 comments
Open

New mode - in-memory install #89

jaraco opened this issue Dec 6, 2023 · 5 comments
Assignees

Comments

@jaraco
Copy link
Owner

jaraco commented Dec 6, 2023

Imagine - instead of having to assemble expanded installs of packages into some folder on a file system, instead resolving dependencies to wheels and importing modules from those wheels directly from zip files in memory, similar to how a web browser can load javascript dependencies from a URL. This ambitious approach would require developing custom loaders/finders/resource providers and developing facilities for handling non-pure wheels.

@jaraco
Copy link
Owner Author

jaraco commented Dec 6, 2023

I'm unable to assign this issue, but @bswck has volunteered to work on it.

@bswck
Copy link
Contributor

bswck commented Dec 6, 2023

@bswck
Copy link
Contributor

bswck commented Dec 31, 2023

I've created a separate repository for the project to share low-level interfaces for downloading, installing and running modules all in memory: mempip. State: WIP.

The reason for this is a whole lot of logic in the task and the amount of tests it takes—what primarily led me to this decision is the single responsibility rule and the very often practice of decoupling.

When a mempip MVP is ready, the incorporation to pip-run can be discussed: mempip could be added as an extra dependency pip-run[mem] and exposed as an optional feature. This could help stabilizing the feature and lead to finally incorporating in-memory installations as a pip-run-only feature, as it was initially intended.

I am open for advice and recommendations.

@bswck
Copy link
Contributor

bswck commented Dec 31, 2023

Here's a quick write-up of my initial ideas.

Things might change as I explore the possibilities. Take this is a 0-prefixed release of my personal stream of consciousness.

Brainstorm

The New Definition of "Installation"

In mempip, installation would mean creating a ready-to-use data structure in memory (a high-level ZIP file wrapper) from a wheel that allows to import a new package with all its declared dependencies guaranteed to exist at runtime.

Similarly to how a normal pip installation most often results in a ready-to-use, importable package, located typically in site-packages with its dependencies spread across the neighboring subfolders.

A Generic Roadmap

Some things a minimal mempip MUST handle:

  • Downloading packages from PyPI, VCS project URLs, local project directories and remote source archives.
  • Identifying package requirements and dependency resolution.
    For instance, installing jaraco.functools==4.0.0 in a new environment should result in storing two wheels in memory, jaraco.functools-4.0.0-py3-none-any.whl and more_itertools-10.1.0-py3-none-any.whl (as of 31.12.2023).
  • Reusing wheels from the pip cache, building wheels from source packages.
  • Flawless installation of pure wheels.
  • Import machinery implementation for loading & executing the in-memory modules.

mempip SHOULD also handle:

  • Non-pure wheels that require running custom code for installation—what if they strongly depend on moving around files?

The Can't-Touch-This Policy

mempip should not touch the filesystem. That means pressing Ctrl+C with a running mempip in a shell session will free up all the resources used by mempip immediately and leave no trace (except for the prompt in your shell history file).

pip Compliance

mempip's main goal SHOULD be to offer a known, standard interface with strong focus on portability across all platforms supported by pip-run. mempip is not, however, meant to replace pip, but rather broaden its horizons to allow memory-located installations.

Here's a rundown on the current programmatic interface design inspired by the current pip CLI commands: https://github.com/bswck/mempip/issues/1.

Reusing pip

Whenever possible, mempip SHOULD reuse the internal pip implementation for resolving dependencies or building wheels.

Distributed Approach

Ideally, mempip SHOULD come with a daemon service (possible names: mempipd, depstore, cachedpip?) that would behave similarly to Redis, allowing to reuse in-memory installations in a distributed manner, which might be useful, for example, for large concurrent test frameworks running in isolated filesystems (like containers) with a lot of dependencies to operate on. The daemon would serve as a store where dependencies can be looked up dynamically by test runners and imported directly from the aforementioned ZIP files in the daemon's memory.

Let's call it the "distributed strategy" of mempip. The implementation MUST obey the ACID rules—broken mempip installations SHOULD NOT pollute the existing public memory state of the daemon and future importers should wait until their in-memory installation is complete. The daemon could optionally, similarly to Redis, dump its state to a backup file and be recovered to the last state as a recovery feature, useful for speeding up pipelines by reusing already installed & cached packages.

Technical Plan

More to come here later...

Downloading Wheels/Source Packages

raise NotImplementedError

Building Wheels from Source Packages

raise NotImplementedError

Installing Wheels into In-Memory ZIP Files

raise NotImplementedError

Distributed Strategy

raise NotImplementedError

Access Control

return NotImplemented

Backups

return NotImplemented

Importing

raise NotImplementedError

Benchmarking

raise NotImplementedError

Testing

raise NotImplementedError

Learning Materials

@bswck
Copy link
Contributor

bswck commented Feb 14, 2024

As part of my research, I'm exploring PyPA specifications to get a good grasp of what I'm doing. Maybe I will figure out what to do with non-pure Python installations in the process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants