diff --git a/docusaurus-docs/conda-store/explanations/artifacts.md b/docusaurus-docs/conda-store/explanations/artifacts.md index 6ba8630f5..988655357 100644 --- a/docusaurus-docs/conda-store/explanations/artifacts.md +++ b/docusaurus-docs/conda-store/explanations/artifacts.md @@ -4,27 +4,27 @@ description: Understand environment artifacts generated by conda-store # Artifacts -:::warning -This page is in active development, some content may be inaccurate. -::: - conda environments can be created in a few different ways. -conda-store creates "artifacts" (corresponding to different environment creation options) that can be shared with colleagues and can be used to reproduce environments. -In the conda-store UI, these are available in the "Logs and Artifacts" section at the end of the environment page. +conda-store creates "artifacts" (corresponding to different environment creation options) for every environment, that can be shared with colleagues and used to reproduce environments. +In the conda-store UI, these are available in the **"Logs and Artifacts"** section +at the end of the environment page. The following sections describe the various artifacts generated and how to create environments with them. -### YAML file (pinned) +Environments in shared namespaces on conda-store can be accessed by everyone with access to that namespace, in which case you may not need to share the artifacts manually. +Artifacts are used to share your environment with external collaborators who don't have access to conda-store. -YAML files that follow the conda specification is a common way to create environments. -conda-store creates a "pinned" YAML, where all the exact versions of requested packages (including `pip` packages) as well as all their dependencies are specified, to ensure new environments created match the original environment as closely as possible. +:::note +The libraries (conda, conda-lock, conda-pack, etc.) mentioned in the following sections are separate projects in the conda ecosystem. The environments created using them are not managed by conda-store. +::: -A pinned YAML file is generated for each environment ta is built. -This includes pinning of the `pip`` packages as well. +## YAML file (pinned) + +YAML files that follow the conda specification are a common way to create environments. +conda-store creates a "pinned" YAML, where all the exact versions of requested packages (including `pip` packages) as well as all their dependencies are specified, to ensure new environments created match the original environment as closely as possible. :::info -In rare cases, the completely pinned packages may not solve because packages are -routinely marked as broken and removed. +In rare cases, building environments from "pinned" YAML files may not solve because packages are routinely marked as broken and removed at the repository level. **conda-forge** (default channel in conda-store) has a [policy that packages are never removed but are marked as @@ -32,159 +32,130 @@ broken][conda-forge-immutability-policy]. Most other channels do not have such a policy. ::: -Assuming you have `conda` installed, to create a conda environment (on any machine) using this file: +Click on **"Show yml file"** link in the conda-store UI to open the file in a new browser tab. You can copy-and-past this file in [conda-store UI's YAML editor][cs-ui-yaml] to create a new environment managed by conda-store in a different namespace. -1. Click on **"Show yml file"** link in the conda-store UI to open the file in a new browser tab. -2. Save the file with: Right-click on the page -> Select "Save As" -> Give the file a meaningful name (like `environment.yml`) -3. Run the following command and use the corresponding filename: - ```bash - conda env create --file - ``` +You can download the file and share with someone or use it to create an environment on a different machine. Assuming `conda` is installed, run the [CLI commands mentioned in the conda-documentation][conda-docs-create-env] with the corresponding filename to create a conda environment (on any machine). -### Lockfile +## Lockfile -A conda lockfile is a representation of only the `conda` dependencies in +A conda lockfile is a representation of all (`conda` and `pip`) dependencies in a given environment. -conda-store created lockfiles using the [conda-lock][conda-lock-github] project. +conda-store creates lockfiles using the [conda-lock][conda-lock-github] project. + +Click on **"Show lockfile"** to open the lockfile in a new browser tab. +You can download the file and share with someone or use it to create an environment in a different space. + +To create an environment att the new location, follow the [commands in the conda-lock documentation][conda-lock-install-env]. + +## Tarballs or archives :::warning -This file will not reproduce the `pip` dependencies in a given environment. -It is usually a good practice to not mix pip and conda dependencies. +Building environments from archives is only supported on Linux machines +because the tarballs are built on Linux machines. ::: -Click the `lockfile` icon to download the -lockfile. First install `conda-lock` if it is not already installed. +A tarball or archive is a _packaged_ environment that can be moved, unpacked, and used in a different location or on a different machine. -```shell -conda install -c conda-forge lockfile -``` +conda-store uses [Conda-Pack][conda-pack], a library for +creating tarballs of conda environments. -Install the locked environment file from conda-store. +Click **"Download archive"** button to download the archive of your conda environment, and share/move it to the desired location. -```shell -conda-lock install -``` +To install the tarball, follow the [instructions for the target machine in the conda-pack documentation][conda-pack-usage]. -### conda-pack archive +## Docker images -[Conda-Pack](https://conda.github.io/conda-pack/) is a package for -creating tarballs of given Conda environments. Creating a Conda archive -is not as simple as packing and unpacking a given directory. This is -due to the base path for the environment that may -change. [Conda-Pack](https://conda.github.io/conda-pack/) handles all -of these issues. Click the `archive` button and download the given -environment. The size of the archive will be less than the size seen -on the environment UI element due to compression. +:::warning +Docker image creation is currently only supported on Linux. -```shell -conda install -c conda-forge conda-pack -``` +The docker image generation and registry features are experimental, +and the following instructions are not thoroughly tested. +If you face any difficulties, open an issue on the GitHub repository. +::: -Install the Conda-Pack tarball. The directions are [slightly -complex](https://conda.github.io/conda-pack/#commandline-usage). Note -that `my_env` can be any name in any given prefix. +conda-store acts as a docker registry. +It leverages [Conda Docker][conda-docker], which builds docker images without Docker, allowing for advanced caching, reduced image sizes, and does not require elevated privileges. -```shell -mkdir -p my_env -tar -xzf .tar.gz -C my_env +### Authentication -source my_env/bin/activate +The `conda-store` docker registry requires authentication. +You can use **any username** and your **user token as the password**. -conda-unpack +```bash +docker login -u -p ``` -### Docker images +To get your user token: -:::note -Docker image creation is currently only supported on Linux. -::: +1. Visit your user page at `/admin/user` +2. Click on "Create token", which displays your token +3. Click on "copy" to copy the token to your clipboard -conda-store acts as a docker registry which allows for interesting -ways to handle Conda environment. In addition this registry leverages -[conda-docker](https://github.com/conda-incubator/conda-docker) which -builds docker images without docker allowing for advanced caching, -reduced image sizes, and does not require elevated privileges. Click -on the `docker` link this will copy a url to your clipboard. Note the -beginning of the url for example `localhost:8080/`. This is required to tell -docker where the docker registry is located. Otherwise by default it -will try and user docker hub. Your url will likely be different. - -The `conda-store` docker registry requires authentication via any -username with password set to a token that is generated by visiting -the user page to generate a token. Alternatively in the -`conda_store_config.py` you can set -`c.AuthenticationBackend.predefined_tokens` which have environment -read permissions on the given docker images needed for pulling. +Alternatively, you can set `c.AuthenticationBackend.predefined_tokens` in `conda_store_config.py`, which have environment read permissions on the given docker images required for pulling images. -``` -docker login -u token -p -docker pull -docker run -it python -``` +### General usage -#### General usage +To use a specific environment build, click on the **"Show Docker image"** to get the URL to the docker image. For example: `localhost:8080/analyst/python-numpy-env:583dd55140491c6b4cfa46e36c203e10280fe7e180190aa28c13f6fc35702f8f-20210825-180211-244815-3-python-numpy-env`. -```shell -docker run -it localhost:8080// -``` +The URL consists of: `//:` + +* The conda-store domain (for example `localhost:8080/`) at the beginning tells Docker where the docker registry is located. Otherwise, Docker will try to use Docker Hub by default. +* The `/` refers to the specific conda environment +* The "build key" is a combination of `---` which points to specific build of the environment. For example, a past version of the environment. -If you want to use a specific build (say one that was built in the -past and is not the current environment) you can visit the specific -build that you want in the UI and copy its docker registry tag -name. The tag name is a combination of `---` that we will refer to as build -key. An example would be -`localhost:5000/filesystem/python-numpy-env:583dd55140491c6b4cfa46e36c203e10280fe7e180190aa28c13f6fc35702f8f-20210825-180211-244815-3-python-numpy-env`. +To use a conda-store environment docker image: -```shell -docker run -it localhost:8080//: +```bash +docker run -it ``` -#### On Demand Docker Image +### On-demand (dynamic) docker image + +In conda-store, you can also specify the required packages within the docker image name itself, without needing an actual environment to be created by conda-store UI. + +The URL format is: `:/conda-store-dynamic//.../`. -conda-store has an additional feature which allow for specifying the -packages within the docker image name itself without requiring an -actual environment to be created on the conda-store UI side. +After `conda-store-dynamic`, you can specify packages with constraints separated by +slashes in the following format: +* `<=1.10` as `.lt.1.10` +* `>=1.10` as `.gt.1.10` -The following convention is used -`:/conda-store-dynamic/`. After -`conda-store-dynamic` you specify packages needed separated by -slashes. Additionally you may specify package constraints -for example `<=1.10` as `.lt.1.10`. +For example, if you need Python less than `3.10` and NumPy +greater than `1.0`, this would be the docker image +name: `:/conda-store-dynamic/python.lt.3.10/numpy.gt.1.0`. -As full example support we want python less than `3.8` and NumPy -greater than `1.0`. This would be the following docker image -name. `:/conda-store-dynamic/python.lt.3.8/numpy.gt.1.0`. conda-store -will then create the following environment and the docker image will -download upon the docker image being built. +conda-store creates the environment ands builds the docker image, which you can then download. -### Installers +## Installers -conda-store uses [constructor] to generate an installer for the current platform -(where the server is running): +Installers are another way to share and use a set of (bundled) packages. +conda-store uses [constructor][constructor-docs] to generate an installer for the current platform (where the server is running): -- on Linux and macOS, it generates a `.sh` installer -- on Windows, it generates a `.exe` installer using NSIS. +- on Linux and MacOS, it generates a `.sh` installer +- on Windows, it generates a `.exe` installer using NSIS conda-store automatically adds `conda` and `pip` to the target environment because these are required for the installer to work. -Also note that `constructor` uses a separate dependency solver instead of +:::note +`constructor` uses a separate dependency solver instead of utilizing the generated lockfile, so the package versions used by the installer might be different compared to the environment available in conda-store. There are plans to address this issue in the future. +::: -#### Existing Deployments - -conda-store saves environment settings and doesn't automatically update them on -startup (see `CondaStore.ensure_settings`). Existing deployments need to -manually enable installer builds via the admin interface. This can be done by -going to `/admin/setting///` (or -clicking on the `Settings` button on the environment page) and adding -`"CONSTRUCTOR_INSTALLER"` to `build_artifacts`. + +[cs-ui-yaml]: ../../conda-store-ui/tutorials/create-envs#yaml-editor [conda-docs]: https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html [conda-forge-immutability-policy]: https://conda-forge.org/docs/maintainer/updating_pkgs.html#packages-on-conda-forge-are-immutable +[conda-docs-create-env]: https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file [conda-lock-github]: https://github.com/conda-incubator/conda-lock +[conda-lock-install-env]: https://conda.github.io/conda-lock/output/#environment-lockfile [constructor]: https://github.com/conda/constructor +[conda-pack]: https://conda.github.io/conda-pack/ +[conda-pack-usage]: https://conda.github.io/conda-pack/index.html#commandline-usage +[conda-docker]: https://github.com/conda-incubator/conda-docker +[constructor-docs]: https://conda.github.io/constructor/ diff --git a/docusaurus-docs/conda-store/explanations/conda-concepts.md b/docusaurus-docs/conda-store/explanations/conda-concepts.md index 734452a7c..3117caa6e 100644 --- a/docusaurus-docs/conda-store/explanations/conda-concepts.md +++ b/docusaurus-docs/conda-store/explanations/conda-concepts.md @@ -1,94 +1,148 @@ --- -description: Understand conda basics +sidebar_position: 1 +description: Understand basics of package management with conda --- # Conda concepts -:::note -This page is in active development. -::: +conda is a Python package and environment manager, used widely in the Python data science ecosystem. +conda-store build on conda and other supporting libraries in the conda community. +This page briefly covers some key conda concepts necessary to use conda-store. +For detailed explanations, check out the [conda documentation][conda-docs]. + +## Python package + +Open source software projects (sometimes called libraries) are shared with users as *packages*. You need to "install" the package on your local workspace to use it. -## Packages/libraries +[pip][pip-docs] and [conda][conda-docs] are popular package management tools in the Python ecosystem. - +pip ships with the Python programming language, and can install packages from the PyPI (Python Package Index) - a community managed collection of packages, public/private PyPI mirrors, GitHub sources, and local directories. + +conda needs to be downloaded separately (through a distribution like Anaconda or Miniconda), and can install packages from conda [*channels*](#channels) and local builds. + +Some Python packages depend on non-Python code (for example, NumPy includes some C libraries). Installing such packages from PyPI using pip can be un-reliable and sometimes it can be your responsibility to separately install the non-Python libraries. +However, conda provides a package management solution that includes both Python and other underlying non-Python code. ## Dependencies - +Modern open source software (and software in general) is created using or builds on other libraries, which are called the *dependencies* of the project. +For example, pandas uses NumPy's `ndarray`s and is written partially in Python, hence, NumPy and Python are dependencies of pandas. +Specifically, they are the direct dependencies. +The dependencies of NumPy and pandas, and the dependencies of those dependencies, and so on creates a complete dependency graph for pandas. + +Since conda-store focuses on [environments](#environments), the terms *dependencies* usually refers to the full set of compatible dependencies for all the packages specified in an environment. + +## Channels (conda) + +The [conda documentation][conda-docs-channels] defines: + +> Conda channels are the locations where packages are stored. They serve as the base for hosting and managing packages. Conda packages are downloaded from remote channels, which are URLs to directories containing conda packages. + +Similar to PyPI, conda channels are URLs of remote servers that manage packages. + +In conda-store, packages are installed from the [conda-forge][conda-forge] channel by default. +conda-forge is a community maintained channel for hosting open source libraries. + +:::note +This behavior is different from conda downloaded from Anaconda/Miniconda distribution, that gets packages from the "default" channel by default. + +Other distributions like Miniforge also use conda-forge as the default channel. +::: ## Environments -conda-store helps you create and manage "conda environments", also referred to as "data science environments" because `conda` is the leading package and environment management library for the Python data science. +conda-store helps create and manage "conda environments", sometimes also referred to as "data science environments" or simply "environments" in conda-store spaces. +An environment is an isolated set of installed packages. The [official conda documentation][conda-docs-environments] states: > A conda environment is a directory that contains a specific collection of conda packages that you have installed. > > If you change one environment, your other environments are not affected. You can easily activate or deactivate environments, which is how you switch between them. -conda-store is a higher-level toolkit that enforces some conda best practices behind-the-scenes to enable reliable and reproducible environment sharing for collaborative settings. - -One of the ways conda-store ensures reproducibility is by auto-generating certain artifacts. +In data science and development workflows, you often use different environments for different projects and sub-projects. It gives you a clean space for development with only the packages and versions that you need for the specific project. You can also use different versions of the same package in different environments depending on your project needs. -## Channels +Using isolated environments is a good practice to follow. The alternative, where requirements for all projects are added to a single "base" environment can not only give you un-reliable results but also be very tedious to manage across projects. - +## Environment specification (spec) -## Reproducibility of conda +conda environments are specified through a YAML file, which is called the *environment specification* and has the following major components: ```yaml -name: example -channels: - - defaults - - conda-forge -dependencies: - - python >=3.7 +name: my-cool-env # name of your environment +channels: # conda channels to get packages from, in order of priority + - conda-forge + - default +dependencies: # list of packages required for your work + - python >=3.10 + - numpy + - pandas + - matplotlib + - scikit-learn + - nodejs # conda can install non-Python packages as well, if it's available on a channel + - pip + - pip: # Optionally, conda can also install packages using pip if needed + - pytest ``` -Suppose we have the given `environment.yaml` file. How does conda -perform a build? +conda uses this file to create a conda *environment*. + +:::tip +In some cases, installing packages using pip through conda can cause issues dependency conflicts. We suggest you use the `pip:` section only if the package you need is not available on conda-forge. +::: + +Learn more in the [conda documentation about created an environment file manually][conda-docs-env-file] + +## Environment creation -1. Conda downloads `channeldata.json` from each of the channels which +Given an `environment.yaml` file, this is how conda performs a build (in brief): + +1. Conda downloads `channeldata.json`, a metadata file from each of the channels which list the available architectures. 2. Conda then downloads `repodata.json` for each of the architectures - it is interested in (specifically your compute architecture along - with noarch). The `repodata.json` has fields like package name, + it is interested in (specifically your particular compute architecture along + with noarch[^1]). The `repodata.json` has fields like package name, version, and dependencies. -You may notice that the channels listed above do not have a url. This -is because in general you can add -`https://conda.anaconda.org/` to a non-url channel. +[^1]: noarch is a cross-platform architecture which has no OS-specific files. Read [noarch packages in the conda documentation][conda-docs-noarch] for more information. -3. Conda then performs a solve to determine the exact version and - sha256 of each package that it will download +:::tip +You may notice that the channels listed in the YAML do not have a URL. This +is because in general , non-URL channels are expected to be present at `https://conda.anaconda.org/`. +::: -4. The specific packages are downloaded +3. Conda then performs a *solve* to determine the exact version and + sha256 of each package to download. -5. Conda does magic to fix the path prefixes of the install +4. The specific packages are downloaded. -There are two spots that introduce issues to reproducibility. The -first issue is tracking when an `environment.yaml` file has -changes. This can be easily tracked by taking a sha256 of the file -. This is what conda-store does but sorts the dependencies to make -sure it has a way of not triggering a rebuild if the order of two -packages changes in the dependencies list. In step (2) `repodata.json` -is updated regularly. When Conda solves for a user's environment it -tries to use the latest version of each package. Since `repodata.json` -could be updated the next minute the same solve for the same -`environment.yaml` file can result in different solves. +For a detailed walkthrough, check out the [conda install deep dive in the conda documentation][conda-docs-install]. + +Understand how conda-store builds on conda for improved reproducibility in [conda-store concepts page][conda-store-concepts]. + +## Conda configuration (`conda config`) + +You can configure various behaviors in conda through the [`.condarc` configuration file][conda-docs-config]. + +conda-store needs to configure some parts of conda without modifying your conda configuration file, for this conda-store (internally) sets some conda +configuration variables using environment variables. + +The impact of this is that if you try to print your conda configuration with [`conda config --show` CLI command][conda-docs-config-cli], some configuration settings displayed by that command will not reflect the values that are actually used by conda-store. + +In particular, `conda-store` internally sets `CONDA_FLAGS=--strict-channel-priority`, overriding the channel priority in the conda configuration file. Keep this in mind when using `conda config` to inspect your conda configuration and when viewing the build logs. +[conda-docs]: https://docs.conda.io/ +[pip]: https://pip.pypa.io/en/stable/index.html [conda-docs-environments]: https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html - -## Understanding `conda config` and how it relates to conda-store - -Because conda-store needs to configure some parts of conda without modifying -the user's conda configuration file, internally conda-store sets some conda -configuration variables using environment variables. The impact of this is that -if a user tries to print their conda configuration with `conda config`, some of -the configuration settings displayed by that command will not reflect the values -that are actually used by conda-store. In particular, `conda-store` internally -sets `CONDA_FLAGS=--strict-channel-priority`, overriding the channel priority in -the conda configuration file. Please keep this in mind when using `conda config` -to inspect your conda configuration and when viewing the build logs. +[conda-docs-channels]: https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/channels.html#what-is-a-conda-channel +[conda-forge]: https://conda-forge.org/ +[conda-docs-env-file]: https://docs.conda.io/projects/conda/en/stable/user-guide/tasks/manage-environments.html#creating-an-environment-file-manually +[conda-docs-noarch]: https://docs.conda.io/projects/conda/en/stable/user-guide/concepts/packages.html#noarch-packages +[conda-docs-install]: https://docs.conda.io/projects/conda/en/stable/dev-guide/deep-dives/install.html#fetching-the-index +[conda-docs-config]: https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html +[conda-docs-config-cli]: https://conda.io/projects/conda/en/latest/commands/config.html + + +[conda-store-concepts]: conda-store-concepts diff --git a/docusaurus-docs/conda-store/explanations/conda-store-concepts.md b/docusaurus-docs/conda-store/explanations/conda-store-concepts.md index a6f878f7d..6acea6f7e 100644 --- a/docusaurus-docs/conda-store/explanations/conda-store-concepts.md +++ b/docusaurus-docs/conda-store/explanations/conda-store-concepts.md @@ -1,13 +1,94 @@ --- +sidebar_position: 2 description: Overview of some conda-store concepts --- # conda-store concepts -:::note -This page is in active development. -::: +conda-store was developed with two key goals in mind: reliable reproducibility of environments, and features for collaboratively using an environment. +This page describes how conda-store achieves these goals. + +## Reproducibility + +In the [conda-based environment creation process][conda-concepts-env-creation], there are two areas where runtime reproducibility is improved through conda-store: + +* Auto-tracking when an `environment.yaml` (which is created and updated manually) file has changes. This can be easily tracked by taking a sha256 of the file, which is what conda-store does but sorts the dependencies to make sure it has a way of not triggering a rebuild if the order of two packages changes in the dependencies list. +* When a user creates an environment, conda tries to use the latest version of each package requested in the environment specification. Conda channels are constantly being updated with new package versions, so the same solve for the same `environment.yaml` file can result in different dependencies being downloaded. To enable reproducibility, conda-store auto-generates certain artifacts like lockfiles and tarballs that capture the actual versions of packages and can be used reliably re-create the same environment. Learn more about them in the [artifacts documentation][artifacts]. ## Namespaces +Namespaces are how conda-store manages environment access for groups of users. + +Every environment in conda-store is a part of a "namespace", and is displayed in the format: `/`. + +Users can have access to view/edit/manage certain "namespaces", which means they have that level of permission for all the environments in that namespace. +This allows a large team or organization to have isolated spaces for environment sharing between smaller groups. + +Each individual user has a separate namespace, which has the same name as their username (used while logging in). All environments in this namespace are private to the individual. + +A user can be a part of several other "shared" namespaces, and based on the level of access given to them, they can view and use the environment, edit the environment, or delete it all together. The permission are dictated by "role mappings". + ## Role mappings + +By default, the following roles are available in conda-store. All users are in one of these groups and have corresponding permissions. + +- **Viewer:** Read-only permissions for environments in selected namespaces +- **Editor (previously called Developer):** Permission to read, create, and update environments in specific namespaces +- **Admin:** Permission to read, create, update, and delete environments in all existing namespaces + +
+ Specific role-mappings: + +```yaml + _viewer_permissions = { + schema.Permissions.ENVIRONMENT_READ, + schema.Permissions.NAMESPACE_READ, + schema.Permissions.NAMESPACE_ROLE_MAPPING_READ, + } + _editor_permissions = { + schema.Permissions.BUILD_CANCEL, + schema.Permissions.ENVIRONMENT_CREATE, + schema.Permissions.ENVIRONMENT_READ, + schema.Permissions.ENVIRONMENT_UPDATE, + schema.Permissions.ENVIRONMENT_SOLVE, + schema.Permissions.NAMESPACE_READ, + schema.Permissions.NAMESPACE_ROLE_MAPPING_READ, + schema.Permissions.SETTING_READ, + } + _admin_permissions = { + schema.Permissions.BUILD_DELETE, + schema.Permissions.BUILD_CANCEL, + schema.Permissions.ENVIRONMENT_CREATE, + schema.Permissions.ENVIRONMENT_DELETE, + schema.Permissions.ENVIRONMENT_READ, + schema.Permissions.ENVIRONMENT_UPDATE, + schema.Permissions.ENVIRONMENT_SOLVE, + schema.Permissions.NAMESPACE_CREATE, + schema.Permissions.NAMESPACE_DELETE, + schema.Permissions.NAMESPACE_READ, + schema.Permissions.NAMESPACE_UPDATE, + schema.Permissions.NAMESPACE_ROLE_MAPPING_CREATE, + schema.Permissions.NAMESPACE_ROLE_MAPPING_READ, + schema.Permissions.NAMESPACE_ROLE_MAPPING_UPDATE, + schema.Permissions.NAMESPACE_ROLE_MAPPING_DELETE, + schema.Permissions.SETTING_READ, + schema.Permissions.SETTING_UPDATE, + } +``` + +
+ +## Environment versions/builds + +conda-store always re-builds an environment from scratch when edits are detected, which is required for ensuring truly reproducible environments. +Version control is very useful in any collaborative setting, and environments are no exception. +Hence, conda-store keeps older versions (also called "builds") of the environment for reference, and allows you to select and use different (previous or newer) versions when needed. conda-store-ui also provides a graphical way to [switch between versions][conda-store-ui-version-control]. + +:::tip +Internally, conda-store handles versions with ✨ symlinking magic ✨, where the environment name points to different environments corresponding to versions. +::: + + +[conda-concepts-env-creation]: conda-concepts#environment-creation +[artifacts]: artifacts +[conda-store-ui-version-control]: ../../conda-store-ui/tutorials/version-control diff --git a/docusaurus-docs/conda-store/explanations/performance.md b/docusaurus-docs/conda-store/explanations/performance.md index f9a685fa7..c7393b13c 100644 --- a/docusaurus-docs/conda-store/explanations/performance.md +++ b/docusaurus-docs/conda-store/explanations/performance.md @@ -1,53 +1,51 @@ --- -description: conda-store's performance +description: Learn to make conda-store performant --- # Performance -:::warning -This page is in active development, some content may be missing or inaccurate. +Several components can impact conda-store's overall performance. +They are listed and described in order of decreasing impact below. + +## Worker storage + +When conda-store builds a given environment it has to locally install the environment in the directory specified in the [Traitlets][traitlets] configuration `CondaStore.store_directory`. +Conda environments consist of many hardlinks to small files. +This means that the performance of `store_directory` is limited to the number of +[Input/output operations per second (IOPS)][IOPS-wikipedia] the directory can +perform. +Many cloud providers have high performance storage options you can consider. + +### When to use NFS + +If you do not need to mount the environments via NFS into the containers, it's recommend to not use NFS and instead use traditional block storage. +Not only is it significantly cheaper, but also the IOPS performance will be better. + +If you want to mount the environments in containers or running VMs, then NFS +may be a good option. +With NFS, many cloud providers provide a high performance filesystem option at a significant premium in cost, like [GCP Filestore][gcp-filestore], [Amazon EFS][aws-efs], and [Azure Files][azure-files]. + +:::note +Choosing an NFS storage option with low IOPS will result in long environment +creation times. ::: -There are several parts of conda-store to consider for performance. We -have tried to list them in order of performance impact that may be -seen. - -### Worker storage - -When conda-store builds a given environment it has to locally install -the environment in the directory specified in the -[Traitlets](https://traitlets.readthedocs.io/en/stable/using_traitlets.html) -configuration `CondaStore.store_directory`. Conda environments consist -of many hardlinks to small files. This means that the -`store_directory` is limited to the number of -[IOPS](https://en.wikipedia.org/wiki/IOPS) the directory can -perform. Many cloud providers have high performance storage -options. These include: - -If you do not need to mount the environments via NFS into the -containers we highly recommend not using NFS and using traditional -block storage. Not only is it significantly cheaper but the IOPs -performance will be better as well. - -If you want to mount the environments in containers or running VMs NFS -may be a good option for you. With NFS many cloud providers provide a -high performance filesystem option at a significant premium in -cost. Example of these include [GCP -Filestore](https://cloud.google.com/filestore/docs/performance#expected_performance), -[AWS EFS](https://aws.amazon.com/efs/features/), and [Azure -files](https://docs.microsoft.com/en-us/azure/storage/files/understanding-billing#provisioning-method). Choosing -an nfs storage option with low IOPS will result in long environment -install times. - -### Network speed - -While Conda does its best to cache packages, it will have to reach out -to download the `repodata.json` along with the packages as well. Thus -network speeds may be important. Typically cloud environments have -plenty fast Internet. - -### S3 storage - -All build artifacts from conda-store are stored in object storage that -behaves S3 like. S3 traditionally has great performance if you use the -cloud provider implementation. +## Network speed + +While conda does its best to cache packages, it will have to connect over the internet +to download the `repodata.json` along with the packages. +Thus network speeds can impact performance, but typically cloud environments have plenty fast Internet. + +## Artifact storage + +All build artifacts from conda-store are stored in object storage that behaves like [Amazon S3][amazon-s3]. +S3 traditionally has great performance if you use the cloud provider implementation. + + + +[amazon-s3]: https://aws.amazon.com/s3/ +[traitlets]: https://traitlets.readthedocs.io/en/stable/using_traitlets.html +[iops-wikipedia]: https://en.wikipedia.org/wiki/IOPS +[gcp-filestore]: https://cloud.google.com/filestore/docs/performance#expected_performance +[aws-efs]: https://aws.amazon.com/efs/features/ +[azure-files]: https://docs.microsoft.com/en-us/azure/storage/files/understanding-billing#provisioning-method