From d0f40202945283c33c9b4462eb4a2f04e9d7970e Mon Sep 17 00:00:00 2001 From: Pavithra Eswaramoorthy Date: Sun, 7 Jan 2024 22:34:27 +0530 Subject: [PATCH 01/12] :memo: Update conda-concepts --- .../explanations/conda-concepts.md | 132 ++++++++++++------ 1 file changed, 88 insertions(+), 44 deletions(-) diff --git a/docusaurus-docs/conda-store/explanations/conda-concepts.md b/docusaurus-docs/conda-store/explanations/conda-concepts.md index c326ba769..1a7f15f19 100644 --- a/docusaurus-docs/conda-store/explanations/conda-concepts.md +++ b/docusaurus-docs/conda-store/explanations/conda-concepts.md @@ -1,82 +1,126 @@ --- -description: Understand conda basics +sidebar_position: 1 +description: Understand basics of package management with conda --- # Conda concepts -:::note -This page is in active development. -::: +conda is a Python package and environment manager, used widely in the Python data science ecosystem. +conda-store build on conda and other supporting libraries in the conda community. +This page briefly covers some key conda concepts necessary to use conda-store. +For detailed explanations, check out the [conda documentation][conda-docs]. -## Packages/libraries +## Python package - +Open source software projects (sometimes called libraries) are shared with users as *packages*. You need to "install" the package on your local workspace to use it. -## Dependencies +[pip][pip-docs] and [conda][conda-docs] are popular package management tools in the Python ecosystem. + +pip ships with the Python programming language, and can install packages from the PyPI (Python Package Index) - a community managed collection of packages, public/private PyPI mirrors, GitHub sources, and local directories. + +conda needs to be downloaded separately (through a distribution like Anaconda or Miniconda), and can install packages from conda [*channels*](#channels) and local builds. + +## Channels (conda) - +The [conda documentation](conda-docs-channels) defines: + +> Conda channels are the locations where packages are stored. They serve as the base for hosting and managing packages. Conda packages are downloaded from remote channels, which are URLs to directories containing conda packages. + +In conda-store, packages are installed from the [conda-forge][conda-forge] channel by default. +conda-forge is a community maintained channel for hosting open source libraries. + +:::note +This behavior is different from conda that gets packages from the "default" channel by default. +::: ## Environments -conda-store helps you create and manage "conda environments", also referred to as "data science environments" because `conda` is the leading package and environment management library for the Python data science. +conda-store helps create and manage "conda environments", sometimes also referred to as "data science environments" or simply "environments" in conda-store spaces. +Environments are an isolated set of installed packages. The [official conda documentation][conda-docs-environments] states: > A conda environment is a directory that contains a specific collection of conda packages that you have installed. > > If you change one environment, your other environments are not affected. You can easily activate or deactivate environments, which is how you switch between them. -conda-store is a higher-level toolkit that enforces some conda best practices behind-the-scenes to enable reliable and reproducible environment sharing for collaborative settings. +## Environment specification (spec) -One of the ways conda-store ensures reproducibility is by auto-generating certain artifacts. +conda environments are specified through a YAML file, which is called the *environment specification* and has the following major components: -## Channels +```yaml +name: my-cool-env # name of your environment +channels: # conda channels to get packages from, in order of priority + - conda-forge + - default +dependencies: # list of packages required for your work + - python >=3.10 + - numpy + - pandas + - matplotlib + - scikit-learn + - nodejs # conda can install non-Python packages as well, if it's available on a channel + - pip + - pip: # Optionally, conda can also install packages using pip if needed + - pytest +``` - +conda uses this file to create a conda *environment*. -## Reproducibility of conda +Learn more in the [conda documentation about created an environment file manually][conda-docs-env-file] -```yaml -name: example -channels: - - defaults - - conda-forge -dependencies: - - python >=3.7 -``` +## Dependencies + +Modern open source software (and software in general) is created using or builds on other libraries, which are called the *dependencies* of the project. +For example, pandas uses NumPy's `ndarray`s and is written partially in Python, hence, NumPy and Python are dependencies of pandas. +Specifically, they are the direct dependencies. +The dependencies of NumPy and pandas, and the dependencies of those dependencies, and so on creates a complete dependency graph for pandas. + +Since conda-store focuses on [environments](#environments), the terms *dependencies* usually refers to the full set of compatible dependencies for all the packages specified in an environment. -Suppose we have the given `environment.yaml` file. How does conda -perform a build? +## Environment creation and improving reproducibility -1. Conda downloads `channeldata.json` from each of the channels which +Given an `environment.yaml` file, this is how conda perform a build (in brief): + +1. Conda downloads `channeldata.json`, a metadata file from each of the channels which list the available architectures. 2. Conda then downloads `repodata.json` for each of the architectures - it is interested in (specifically your compute architecture along - with noarch). The `repodata.json` has fields like package name, + it is interested in (specifically your particular compute architecture along + with noarch[^1]). The `repodata.json` has fields like package name, version, and dependencies. -You may notice that the channels listed above do not have a url. This -is because in general you can add -`https://conda.anaconda.org/` to a non-url channel. +[^1]: noarch is a cross-platform architecture which has no OS-specific files. Read [noarch packages in the conda documentation][conda-docs-noarch] for more information. + +:::tip +You may notice that the channels listed in the YAML do not have a URL. This +is because in general , non-URL channels are expected to be present at `https://conda.anaconda.org/`. +::: + +3. Conda then performs a *solve* to determine the exact version and + sha256 of each package to download. + +4. The specific packages are downloaded. -3. Conda then performs a solve to determine the exact version and - sha256 of each package that it will download +5. Conda does :sparkles: magic :sparkles: to fix the path prefixes of the installs, which is beyond the scope of this page. -4. The specific packages are downloaded +For a detailed walkthrough, check out the [conda install deep dive in the conda documentation][conda-docs-install]. -5. Conda does magic to fix the path prefixes of the install +In the above process, there are two spots where runtime reproducibility can be improved: -There are two spots that introduce issues to reproducibility. The -first issue is tracking when an `environment.yaml` file has -changes. This can be easily tracked by taking a sha256 of the file -. This is what conda-store does but sorts the dependencies to make -sure it has a way of not triggering a rebuild if the order of two -packages changes in the dependencies list. In step (2) `repodata.json` -is updated regularly. When Conda solves for a user's environment it -tries to use the latest version of each package. Since `repodata.json` -could be updated the next minute the same solve for the same -`environment.yaml` file can result in different solves. +* Auto-tracking when an `environment.yaml` (which is created and updated manually) file has changes. This can be easily tracked by taking a sha256 of the file, which is what conda-store does but sorts the dependencies to make sure it has a way of not triggering a rebuild if the order of two packages changes in the dependencies list. +* In step (2) `repodata.json` is updated regularly. When conda solves for a user's environment it tries to use the latest version of each package. Since `repodata.json` could be updated the very next minute, the same solve for the same +`environment.yaml` file can result in different solves. To enable reproducibility, conda-store auto-generates certain artifacts like lockfiles and tarballs that capture the actual versions of packages and can be used reliably re-create the same environment. Learn more about them in the [artifacts documentation][artifacts]. +[conda-docs]: https://docs.conda.io/ +[pip]: https://pip.pypa.io/en/stable/index.html [conda-docs-environments]: https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html +[conda-docs-channels]: https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/channels.html#what-is-a-conda-channel +[conda-forge]: https://conda-forge.org/ +[conda-docs-env-file]: https://docs.conda.io/projects/conda/en/stable/user-guide/tasks/manage-environments.html#creating-an-environment-file-manually +[conda-docs-noarch]: https://docs.conda.io/projects/conda/en/stable/user-guide/concepts/packages.html#noarch-packages +[conda-docs-install]: https://docs.conda.io/projects/conda/en/stable/dev-guide/deep-dives/install.html#fetching-the-index + + +[artifacts]: artifacts.md From 39f26076ca07897346b63c2a8b05595631f0cd85 Mon Sep 17 00:00:00 2001 From: Pavithra Eswaramoorthy Date: Tue, 9 Jan 2024 16:23:53 +0530 Subject: [PATCH 02/12] :broom: fix link --- docusaurus-docs/conda-store/explanations/conda-concepts.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docusaurus-docs/conda-store/explanations/conda-concepts.md b/docusaurus-docs/conda-store/explanations/conda-concepts.md index 1a7f15f19..44e09a1ac 100644 --- a/docusaurus-docs/conda-store/explanations/conda-concepts.md +++ b/docusaurus-docs/conda-store/explanations/conda-concepts.md @@ -22,7 +22,7 @@ conda needs to be downloaded separately (through a distribution like Anaconda or ## Channels (conda) -The [conda documentation](conda-docs-channels) defines: +The [conda documentation][conda-docs-channels] defines: > Conda channels are the locations where packages are stored. They serve as the base for hosting and managing packages. Conda packages are downloaded from remote channels, which are URLs to directories containing conda packages. From bca682c7653af3f83382d02272b5034136d996ad Mon Sep 17 00:00:00 2001 From: Pavithra Eswaramoorthy Date: Tue, 9 Jan 2024 17:16:58 +0530 Subject: [PATCH 03/12] :memo: Update conda-store concepts --- .../explanations/conda-concepts.md | 10 ++-- .../explanations/conda-store-concepts.md | 46 +++++++++++++++++-- 2 files changed, 46 insertions(+), 10 deletions(-) diff --git a/docusaurus-docs/conda-store/explanations/conda-concepts.md b/docusaurus-docs/conda-store/explanations/conda-concepts.md index 44e09a1ac..bc970f903 100644 --- a/docusaurus-docs/conda-store/explanations/conda-concepts.md +++ b/docusaurus-docs/conda-store/explanations/conda-concepts.md @@ -78,7 +78,7 @@ The dependencies of NumPy and pandas, and the dependencies of those dependencies Since conda-store focuses on [environments](#environments), the terms *dependencies* usually refers to the full set of compatible dependencies for all the packages specified in an environment. -## Environment creation and improving reproducibility +## Environment creation Given an `environment.yaml` file, this is how conda perform a build (in brief): @@ -106,11 +106,7 @@ is because in general , non-URL channels are expected to be present at `https:// For a detailed walkthrough, check out the [conda install deep dive in the conda documentation][conda-docs-install]. -In the above process, there are two spots where runtime reproducibility can be improved: - -* Auto-tracking when an `environment.yaml` (which is created and updated manually) file has changes. This can be easily tracked by taking a sha256 of the file, which is what conda-store does but sorts the dependencies to make sure it has a way of not triggering a rebuild if the order of two packages changes in the dependencies list. -* In step (2) `repodata.json` is updated regularly. When conda solves for a user's environment it tries to use the latest version of each package. Since `repodata.json` could be updated the very next minute, the same solve for the same -`environment.yaml` file can result in different solves. To enable reproducibility, conda-store auto-generates certain artifacts like lockfiles and tarballs that capture the actual versions of packages and can be used reliably re-create the same environment. Learn more about them in the [artifacts documentation][artifacts]. +Understand how conda-store builds on conda for improved reproducibility in [conda-store concepts page][conda-store-concepts]. [conda-docs]: https://docs.conda.io/ @@ -123,4 +119,4 @@ In the above process, there are two spots where runtime reproducibility can be i [conda-docs-install]: https://docs.conda.io/projects/conda/en/stable/dev-guide/deep-dives/install.html#fetching-the-index -[artifacts]: artifacts.md +[conda-store-concepts]: conda-store-concepts diff --git a/docusaurus-docs/conda-store/explanations/conda-store-concepts.md b/docusaurus-docs/conda-store/explanations/conda-store-concepts.md index a6f878f7d..7bd7362e0 100644 --- a/docusaurus-docs/conda-store/explanations/conda-store-concepts.md +++ b/docusaurus-docs/conda-store/explanations/conda-store-concepts.md @@ -1,13 +1,53 @@ --- +sidebar_position: 2 description: Overview of some conda-store concepts --- # conda-store concepts -:::note -This page is in active development. -::: +conda-store was developed with two key goals in mind: reliable reproducibility of environments, and features for collaboratively using an environment. +This page describes how conda-store achieves these goals. + +## Reproducibility + +In the [conda-based environment creation process][conda-concepts-env-creation], there are two areas where runtime reproducibility is improved through conda-store: + +* Auto-tracking when an `environment.yaml` (which is created and updated manually) file has changes. This can be easily tracked by taking a sha256 of the file, which is what conda-store does but sorts the dependencies to make sure it has a way of not triggering a rebuild if the order of two packages changes in the dependencies list. +* In step (2) `repodata.json` is updated regularly. When conda solves for a user's environment it tries to use the latest version of each package. Since `repodata.json` could be updated the very next minute, the same solve for the same +`environment.yaml` file can result in different solves. To enable reproducibility, conda-store auto-generates certain artifacts like lockfiles and tarballs that capture the actual versions of packages and can be used reliably re-create the same environment. Learn more about them in the [artifacts documentation][artifacts]. ## Namespaces +Namespaces are how conda-store manages environment access for groups of users. + +Every environment in conda-store is a part of a "namespace", and is displayed in the format: `/`. + +Users can have access to view/edit/manage certain "namespaces", which means they have that level of permission for all the environments in that namespace. +This allows a large team or organization to have isolated spaces for environment sharing between smaller groups. + +Each individual user has a separate namespace, which has the same name as their username (used while logging in). All environments in this namespace are private to the individual. + +A user can be a part of several other "shared" namespaces, and based on the level of access given to them, they can view and use the environment, edit the environment, or delete it all together. The permission are dictated by "role mappings". + ## Role mappings + + + +- Viewer +- Developer (to be changed to Editor) +- Admin + +## Environment versions + +conda-store always re-builds an environment from scratch when edits are detected, which is required for ensuring truly reproducible environments. +Version control is very useful in any collaborative setting, and environments are no exception. +Hence, conda-store keeps older versions of the environment for reference, and allows you to select different versions when needed. conda-store-ui also provides a graphical way to [switch between versions][conda-store-ui-version-control]. + +:::tip +Internally, conda-store handles versions with ✨ symlinking magic ✨, where the environment name points to different environments corresponding to versions. +::: + + +[conda-concepts-env-creation]: conda-concepts#environment-creation +[artifacts]: artifacts +[conda-store-ui-version-control]: ../../conda-store-ui/tutorials/version-control From 4c809ca833baebef940809a4c514279de7c16463 Mon Sep 17 00:00:00 2001 From: Pavithra Eswaramoorthy Date: Tue, 9 Jan 2024 18:03:39 +0530 Subject: [PATCH 04/12] :memo: Update performance explanation --- .../conda-store/explanations/performance.md | 90 +++++++++---------- 1 file changed, 44 insertions(+), 46 deletions(-) diff --git a/docusaurus-docs/conda-store/explanations/performance.md b/docusaurus-docs/conda-store/explanations/performance.md index f9a685fa7..c7393b13c 100644 --- a/docusaurus-docs/conda-store/explanations/performance.md +++ b/docusaurus-docs/conda-store/explanations/performance.md @@ -1,53 +1,51 @@ --- -description: conda-store's performance +description: Learn to make conda-store performant --- # Performance -:::warning -This page is in active development, some content may be missing or inaccurate. +Several components can impact conda-store's overall performance. +They are listed and described in order of decreasing impact below. + +## Worker storage + +When conda-store builds a given environment it has to locally install the environment in the directory specified in the [Traitlets][traitlets] configuration `CondaStore.store_directory`. +Conda environments consist of many hardlinks to small files. +This means that the performance of `store_directory` is limited to the number of +[Input/output operations per second (IOPS)][IOPS-wikipedia] the directory can +perform. +Many cloud providers have high performance storage options you can consider. + +### When to use NFS + +If you do not need to mount the environments via NFS into the containers, it's recommend to not use NFS and instead use traditional block storage. +Not only is it significantly cheaper, but also the IOPS performance will be better. + +If you want to mount the environments in containers or running VMs, then NFS +may be a good option. +With NFS, many cloud providers provide a high performance filesystem option at a significant premium in cost, like [GCP Filestore][gcp-filestore], [Amazon EFS][aws-efs], and [Azure Files][azure-files]. + +:::note +Choosing an NFS storage option with low IOPS will result in long environment +creation times. ::: -There are several parts of conda-store to consider for performance. We -have tried to list them in order of performance impact that may be -seen. - -### Worker storage - -When conda-store builds a given environment it has to locally install -the environment in the directory specified in the -[Traitlets](https://traitlets.readthedocs.io/en/stable/using_traitlets.html) -configuration `CondaStore.store_directory`. Conda environments consist -of many hardlinks to small files. This means that the -`store_directory` is limited to the number of -[IOPS](https://en.wikipedia.org/wiki/IOPS) the directory can -perform. Many cloud providers have high performance storage -options. These include: - -If you do not need to mount the environments via NFS into the -containers we highly recommend not using NFS and using traditional -block storage. Not only is it significantly cheaper but the IOPs -performance will be better as well. - -If you want to mount the environments in containers or running VMs NFS -may be a good option for you. With NFS many cloud providers provide a -high performance filesystem option at a significant premium in -cost. Example of these include [GCP -Filestore](https://cloud.google.com/filestore/docs/performance#expected_performance), -[AWS EFS](https://aws.amazon.com/efs/features/), and [Azure -files](https://docs.microsoft.com/en-us/azure/storage/files/understanding-billing#provisioning-method). Choosing -an nfs storage option with low IOPS will result in long environment -install times. - -### Network speed - -While Conda does its best to cache packages, it will have to reach out -to download the `repodata.json` along with the packages as well. Thus -network speeds may be important. Typically cloud environments have -plenty fast Internet. - -### S3 storage - -All build artifacts from conda-store are stored in object storage that -behaves S3 like. S3 traditionally has great performance if you use the -cloud provider implementation. +## Network speed + +While conda does its best to cache packages, it will have to connect over the internet +to download the `repodata.json` along with the packages. +Thus network speeds can impact performance, but typically cloud environments have plenty fast Internet. + +## Artifact storage + +All build artifacts from conda-store are stored in object storage that behaves like [Amazon S3][amazon-s3]. +S3 traditionally has great performance if you use the cloud provider implementation. + + + +[amazon-s3]: https://aws.amazon.com/s3/ +[traitlets]: https://traitlets.readthedocs.io/en/stable/using_traitlets.html +[iops-wikipedia]: https://en.wikipedia.org/wiki/IOPS +[gcp-filestore]: https://cloud.google.com/filestore/docs/performance#expected_performance +[aws-efs]: https://aws.amazon.com/efs/features/ +[azure-files]: https://docs.microsoft.com/en-us/azure/storage/files/understanding-billing#provisioning-method From 4fea797814e7c064900f2ae3670a861478991135 Mon Sep 17 00:00:00 2001 From: Pavithra Eswaramoorthy Date: Wed, 21 Feb 2024 05:20:34 +0530 Subject: [PATCH 05/12] :memo: Update YAML, lockfile, and tarball --- .../conda-store/explanations/artifacts.md | 111 ++++++++++-------- 1 file changed, 64 insertions(+), 47 deletions(-) diff --git a/docusaurus-docs/conda-store/explanations/artifacts.md b/docusaurus-docs/conda-store/explanations/artifacts.md index 6ba8630f5..5b6da125c 100644 --- a/docusaurus-docs/conda-store/explanations/artifacts.md +++ b/docusaurus-docs/conda-store/explanations/artifacts.md @@ -4,26 +4,24 @@ description: Understand environment artifacts generated by conda-store # Artifacts -:::warning -This page is in active development, some content may be inaccurate. -::: - conda environments can be created in a few different ways. -conda-store creates "artifacts" (corresponding to different environment creation options) that can be shared with colleagues and can be used to reproduce environments. -In the conda-store UI, these are available in the "Logs and Artifacts" section at the end of the environment page. +conda-store creates "artifacts" (corresponding to different environment creation options) for every environment, that can be shared with colleagues and used to reproduce environments. +In the conda-store UI, these are available in the **"Logs and Artifacts"** section +at the end of the environment page. The following sections describe the various artifacts generated and how to create environments with them. +:::note +Environments in shared namespaces can be accessed by everyone with access to that namespace, in which case you may not need to share the artifacts manually. +::: + ### YAML file (pinned) -YAML files that follow the conda specification is a common way to create environments. +YAML file that follows the conda specification is a common way to create environments. conda-store creates a "pinned" YAML, where all the exact versions of requested packages (including `pip` packages) as well as all their dependencies are specified, to ensure new environments created match the original environment as closely as possible. -A pinned YAML file is generated for each environment ta is built. -This includes pinning of the `pip`` packages as well. - :::info -In rare cases, the completely pinned packages may not solve because packages are +In rare cases, the pinned packages may not solve because packages are routinely marked as broken and removed. **conda-forge** (default channel in conda-store) @@ -32,67 +30,84 @@ broken][conda-forge-immutability-policy]. Most other channels do not have such a policy. ::: -Assuming you have `conda` installed, to create a conda environment (on any machine) using this file: +Click on **"Show yml file"** link in the conda-store UI to open the file in a new browser tab. You can download the file[^1] and share with someone or use it to create an environment on a different machine. -1. Click on **"Show yml file"** link in the conda-store UI to open the file in a new browser tab. -2. Save the file with: Right-click on the page -> Select "Save As" -> Give the file a meaningful name (like `environment.yml`) -3. Run the following command and use the corresponding filename: - ```bash - conda env create --file - ``` +[^1]: Concretely, download the browser page displaying the file. For example, on macOS: Right-click on the page -> Select "Save As" -> Give the file a meaningful name (like `environment.yml`). + +Assuming `conda` is installed, run the following CLI command with the corresponding filename to create a conda environment (on any machine): + +```bash +conda env create --file +``` ### Lockfile -A conda lockfile is a representation of only the `conda` dependencies in +A conda lockfile is a representation of all (`conda` and `pip`) dependencies in a given environment. -conda-store created lockfiles using the [conda-lock][conda-lock-github] project. +conda-store creates lockfiles using the [conda-lock][conda-lock-github] project. -:::warning -This file will not reproduce the `pip` dependencies in a given environment. -It is usually a good practice to not mix pip and conda dependencies. -::: +Click on **"Show lockfile"** to open the lockfile in a new browser tab. +You can download the file[^1] and share with someone or use it to create an environment in a different space. -Click the `lockfile` icon to download the -lockfile. First install `conda-lock` if it is not already installed. +At the new location, install `conda-lock` if it is not already installed: ```shell -conda install -c conda-forge lockfile +conda install -c conda-forge conda-lock ``` -Install the locked environment file from conda-store. +Create an environment using the lockfile generated by conda-store: ```shell -conda-lock install +conda-lock install ``` -### conda-pack archive +### Tarballs or archives + +:::warning +Only works on Linux machines because the environment archives are built on Linux machines. +::: + +A tarball or archive is a _packaged_ environment that can be moved, unpacked, and used in a different location or on a different machine. -[Conda-Pack](https://conda.github.io/conda-pack/) is a package for -creating tarballs of given Conda environments. Creating a Conda archive -is not as simple as packing and unpacking a given directory. This is -due to the base path for the environment that may -change. [Conda-Pack](https://conda.github.io/conda-pack/) handles all -of these issues. Click the `archive` button and download the given -environment. The size of the archive will be less than the size seen -on the environment UI element due to compression. +conda-store uses [Conda-Pack][conda-pack], a library for +creating tarballs of conda environments. -```shell -conda install -c conda-forge conda-pack +:::tip +Creating an archive of a conda environment is more complex than packing and unpacking a given directory because the base path for the environment can change. +[Conda-Pack][conda-pack] handles this complexity. +::: + +Click **"Download archive"** button to download the archive of your conda environment, and share/move it to the desired location. + +To install the tarball, execute the following commands at the location: + +1. Create a new directory for the environment (called `` here) and unpack the environment tarball in that directory: + +```bash +mkdir -p +tar -xzf -C ``` -Install the Conda-Pack tarball. The directions are [slightly -complex](https://conda.github.io/conda-pack/#commandline-usage). Note -that `my_env` can be any name in any given prefix. +2. Activate the environment with: -```shell -mkdir -p my_env -tar -xzf .tar.gz -C my_env +```bash +source /bin/activate +``` -source my_env/bin/activate +3. From the active environment, clean-up prefixes: +```bash conda-unpack ``` +4. You can use any library present in the environment, and when done, deactivate the environment with: + +```bash +source /bin/deactivate +``` + +Learn more about using environment tarballs in the [conda-pack documentation][conda-pack-usage]. + ### Docker images :::note @@ -188,3 +203,5 @@ clicking on the `Settings` button on the environment page) and adding [conda-forge-immutability-policy]: https://conda-forge.org/docs/maintainer/updating_pkgs.html#packages-on-conda-forge-are-immutable [conda-lock-github]: https://github.com/conda-incubator/conda-lock [constructor]: https://github.com/conda/constructor +[conda-pack]: https://conda.github.io/conda-pack/ +[conda-pack-usage]: https://conda.github.io/conda-pack/index.html#commandline-usage From a1806fbf26140ff9ef38e15e877f738de592ddef Mon Sep 17 00:00:00 2001 From: Pavithra Eswaramoorthy Date: Tue, 27 Feb 2024 19:02:06 +0530 Subject: [PATCH 06/12] :memo: Update Docker, remove installer & constructor --- .../conda-store/explanations/artifacts.md | 120 +++++++----------- 1 file changed, 48 insertions(+), 72 deletions(-) diff --git a/docusaurus-docs/conda-store/explanations/artifacts.md b/docusaurus-docs/conda-store/explanations/artifacts.md index 5b6da125c..641312a10 100644 --- a/docusaurus-docs/conda-store/explanations/artifacts.md +++ b/docusaurus-docs/conda-store/explanations/artifacts.md @@ -15,7 +15,7 @@ The following sections describe the various artifacts generated and how to creat Environments in shared namespaces can be accessed by everyone with access to that namespace, in which case you may not need to share the artifacts manually. ::: -### YAML file (pinned) +## YAML file (pinned) YAML file that follows the conda specification is a common way to create environments. conda-store creates a "pinned" YAML, where all the exact versions of requested packages (including `pip` packages) as well as all their dependencies are specified, to ensure new environments created match the original environment as closely as possible. @@ -40,7 +40,7 @@ Assuming `conda` is installed, run the following CLI command with the correspond conda env create --file ``` -### Lockfile +## Lockfile A conda lockfile is a representation of all (`conda` and `pip`) dependencies in a given environment. @@ -61,10 +61,11 @@ Create an environment using the lockfile generated by conda-store: conda-lock install ``` -### Tarballs or archives +## Tarballs or archives :::warning -Only works on Linux machines because the environment archives are built on Linux machines. +Environment builds from archives is only supported on Linux machines +because the tarballs are built on Linux machines. ::: A tarball or archive is a _packaged_ environment that can be moved, unpacked, and used in a different location or on a different machine. @@ -108,95 +109,69 @@ source /bin/deactivate Learn more about using environment tarballs in the [conda-pack documentation][conda-pack-usage]. -### Docker images +## Docker images -:::note +:::warning Docker image creation is currently only supported on Linux. + +The docker image generation and registry features are experimental, +and the following instructions are not thoroughly tested. +If you face any difficulties, open an issue on the GitHub repository. ::: -conda-store acts as a docker registry which allows for interesting -ways to handle Conda environment. In addition this registry leverages -[conda-docker](https://github.com/conda-incubator/conda-docker) which -builds docker images without docker allowing for advanced caching, -reduced image sizes, and does not require elevated privileges. Click -on the `docker` link this will copy a url to your clipboard. Note the -beginning of the url for example `localhost:8080/`. This is required to tell -docker where the docker registry is located. Otherwise by default it -will try and user docker hub. Your url will likely be different. - -The `conda-store` docker registry requires authentication via any -username with password set to a token that is generated by visiting -the user page to generate a token. Alternatively in the -`conda_store_config.py` you can set -`c.AuthenticationBackend.predefined_tokens` which have environment -read permissions on the given docker images needed for pulling. +conda-store acts as a docker registry. +It leverages [Conda Docker][conda-docker], which builds docker images without Docker, allowing for advanced caching, reduced image sizes, and does not require elevated privileges. -``` -docker login -u token -p -docker pull -docker run -it python -``` +### Authentication -#### General usage +The `conda-store` docker registry requires authentication. +You can use **any username** and your **user token as the password**. -```shell -docker run -it localhost:8080// +```bash +docker login -u -p ``` -If you want to use a specific build (say one that was built in the -past and is not the current environment) you can visit the specific -build that you want in the UI and copy its docker registry tag -name. The tag name is a combination of `---` that we will refer to as build -key. An example would be -`localhost:5000/filesystem/python-numpy-env:583dd55140491c6b4cfa46e36c203e10280fe7e180190aa28c13f6fc35702f8f-20210825-180211-244815-3-python-numpy-env`. +To get your user token: -```shell -docker run -it localhost:8080//: -``` +1. Visit your user page at `/admin/user` +2. Click on "Create token", which displays your token +3. Click on "copy" to copy the token to your clipboard + +Alternatively, you can set `c.AuthenticationBackend.predefined_tokens` in `conda_store_config.py`, which have environment read permissions on the given docker images required for pulling images. -#### On Demand Docker Image +### General usage -conda-store has an additional feature which allow for specifying the -packages within the docker image name itself without requiring an -actual environment to be created on the conda-store UI side. +To use a specific environment build, click on the **"Show Docker image"** to get the URL to the docker image. For example: `localhost:8080/analyst/python-numpy-env:583dd55140491c6b4cfa46e36c203e10280fe7e180190aa28c13f6fc35702f8f-20210825-180211-244815-3-python-numpy-env`. -The following convention is used -`:/conda-store-dynamic/`. After -`conda-store-dynamic` you specify packages needed separated by -slashes. Additionally you may specify package constraints -for example `<=1.10` as `.lt.1.10`. +The URL consists of: `//:` -As full example support we want python less than `3.8` and NumPy -greater than `1.0`. This would be the following docker image -name. `:/conda-store-dynamic/python.lt.3.8/numpy.gt.1.0`. conda-store -will then create the following environment and the docker image will -download upon the docker image being built. +* The conda-store domain (for example `localhost:8080/`) at the beginning tells Docker where the docker registry is located. Otherwise, Docker will try to use Docker Hub by default. +* The `/` refers to the specific conda environment +* The "build key" is a combination of `---` which points to specific build of the environment. For example, a past version of the environment. -### Installers +To use a conda-store environment docker image: + +```bash +docker run -it +``` -conda-store uses [constructor] to generate an installer for the current platform -(where the server is running): +### On-demand (dynamic) docker image -- on Linux and macOS, it generates a `.sh` installer -- on Windows, it generates a `.exe` installer using NSIS. +In conda-store, you can also specify the required packages within the docker image name itself, without needing an actual environment to be created by conda-store UI. -conda-store automatically adds `conda` and `pip` to the target environment -because these are required for the installer to work. +The URL format is: `:/conda-store-dynamic//.../`. -Also note that `constructor` uses a separate dependency solver instead of -utilizing the generated lockfile, so the package versions used by the installer -might be different compared to the environment available in conda-store. There -are plans to address this issue in the future. +After `conda-store-dynamic`, you can specify packages with constraints separated by +slashes in the following format: +* `<=1.10` as `.lt.1.10` +* `>=1.10` as `.gt.1.10` -#### Existing Deployments +For example, if you need Python less than `3.10` and NumPy +greater than `1.0`, this would be the docker image +name: `:/conda-store-dynamic/python.lt.3.10/numpy.gt.1.0`. -conda-store saves environment settings and doesn't automatically update them on -startup (see `CondaStore.ensure_settings`). Existing deployments need to -manually enable installer builds via the admin interface. This can be done by -going to `/admin/setting///` (or -clicking on the `Settings` button on the environment page) and adding -`"CONSTRUCTOR_INSTALLER"` to `build_artifacts`. +conda-store creates the environment ands builds the docker image, which you can then download. [conda-docs]: https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html @@ -205,3 +180,4 @@ clicking on the `Settings` button on the environment page) and adding [constructor]: https://github.com/conda/constructor [conda-pack]: https://conda.github.io/conda-pack/ [conda-pack-usage]: https://conda.github.io/conda-pack/index.html#commandline-usage +[conda-docker]: https://github.com/conda-incubator/conda-docker From 2554920b09fc26ca1cb4ab7d252dcfda3eb3488b Mon Sep 17 00:00:00 2001 From: Pavithra Eswaramoorthy Date: Tue, 27 Feb 2024 19:20:36 +0530 Subject: [PATCH 07/12] :memo: Add installer --- .../conda-store/explanations/artifacts.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/docusaurus-docs/conda-store/explanations/artifacts.md b/docusaurus-docs/conda-store/explanations/artifacts.md index 641312a10..3c162faeb 100644 --- a/docusaurus-docs/conda-store/explanations/artifacts.md +++ b/docusaurus-docs/conda-store/explanations/artifacts.md @@ -173,6 +173,24 @@ name: `:/conda-store-dynamic/python.lt.3.10/numpy.g conda-store creates the environment ands builds the docker image, which you can then download. +## Installers + +Installers are another way to share and use a set of (bundled) packages. +conda-store uses [constructor][constructor-docs] to generate an installer for the current platform (where the server is running): + +- on Linux and MacOS, it generates a `.sh` installer +- on Windows, it generates a `.exe` installer using NSIS + +conda-store automatically adds `conda` and `pip` to the target environment +because these are required for the installer to work. + +:::note +`constructor` uses a separate dependency solver instead of +utilizing the generated lockfile, so the package versions used by the installer +might be different compared to the environment available in conda-store. There +are plans to address this issue in the future. +::: + [conda-docs]: https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html [conda-forge-immutability-policy]: https://conda-forge.org/docs/maintainer/updating_pkgs.html#packages-on-conda-forge-are-immutable @@ -181,3 +199,4 @@ conda-store creates the environment ands builds the docker image, which you can [conda-pack]: https://conda.github.io/conda-pack/ [conda-pack-usage]: https://conda.github.io/conda-pack/index.html#commandline-usage [conda-docker]: https://github.com/conda-incubator/conda-docker +[constructor-docs]: https://conda.github.io/constructor/ From e1a9eba2fd125bde11517aad2337355d0f720d31 Mon Sep 17 00:00:00 2001 From: Pavithra Eswaramoorthy Date: Tue, 4 Jun 2024 18:53:37 +0530 Subject: [PATCH 08/12] Apply suggestions from code review Co-authored-by: Peyton Murray --- docusaurus-docs/conda-store/explanations/artifacts.md | 7 +++---- docusaurus-docs/conda-store/explanations/conda-concepts.md | 4 ++-- 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/docusaurus-docs/conda-store/explanations/artifacts.md b/docusaurus-docs/conda-store/explanations/artifacts.md index 3c162faeb..da24e0ae0 100644 --- a/docusaurus-docs/conda-store/explanations/artifacts.md +++ b/docusaurus-docs/conda-store/explanations/artifacts.md @@ -17,12 +17,11 @@ Environments in shared namespaces can be accessed by everyone with access to tha ## YAML file (pinned) -YAML file that follows the conda specification is a common way to create environments. +YAML files that follow the conda specification are a common way to create environments. conda-store creates a "pinned" YAML, where all the exact versions of requested packages (including `pip` packages) as well as all their dependencies are specified, to ensure new environments created match the original environment as closely as possible. :::info -In rare cases, the pinned packages may not solve because packages are -routinely marked as broken and removed. +In rare cases, building environments from "pinned" YAML files may not solve because packages are routinely marked as broken and removed at the repository level. **conda-forge** (default channel in conda-store) has a [policy that packages are never removed but are marked as @@ -64,7 +63,7 @@ conda-lock install ## Tarballs or archives :::warning -Environment builds from archives is only supported on Linux machines +Building environments from archives is only supported on Linux machines because the tarballs are built on Linux machines. ::: diff --git a/docusaurus-docs/conda-store/explanations/conda-concepts.md b/docusaurus-docs/conda-store/explanations/conda-concepts.md index bc970f903..80a72e4d5 100644 --- a/docusaurus-docs/conda-store/explanations/conda-concepts.md +++ b/docusaurus-docs/conda-store/explanations/conda-concepts.md @@ -37,7 +37,7 @@ This behavior is different from conda that gets packages from the "default" chan conda-store helps create and manage "conda environments", sometimes also referred to as "data science environments" or simply "environments" in conda-store spaces. -Environments are an isolated set of installed packages. +An environment is an isolated set of installed packages. The [official conda documentation][conda-docs-environments] states: > A conda environment is a directory that contains a specific collection of conda packages that you have installed. @@ -80,7 +80,7 @@ Since conda-store focuses on [environments](#environments), the terms *dependenc ## Environment creation -Given an `environment.yaml` file, this is how conda perform a build (in brief): +Given an `environment.yaml` file, this is how conda performs a build (in brief): 1. Conda downloads `channeldata.json`, a metadata file from each of the channels which list the available architectures. From ef2aa0c72e6285a55247a46769da7d3cd5b38c9a Mon Sep 17 00:00:00 2001 From: Pavithra Eswaramoorthy Date: Tue, 4 Jun 2024 20:08:11 +0530 Subject: [PATCH 09/12] remove conda, conda-lock, conda-pack commands Signed-off-by: Pavithra Eswaramoorthy --- .../conda-store/explanations/artifacts.md | 68 ++++--------------- 1 file changed, 14 insertions(+), 54 deletions(-) diff --git a/docusaurus-docs/conda-store/explanations/artifacts.md b/docusaurus-docs/conda-store/explanations/artifacts.md index da24e0ae0..eeb25fe96 100644 --- a/docusaurus-docs/conda-store/explanations/artifacts.md +++ b/docusaurus-docs/conda-store/explanations/artifacts.md @@ -11,8 +11,11 @@ at the end of the environment page. The following sections describe the various artifacts generated and how to create environments with them. +Environments in shared namespaces on conda-store can be accessed by everyone with access to that namespace, in which case you may not need to share the artifacts manually. +Artifacts are used to share your environment with external collaborators who don't have access to conda-store. + :::note -Environments in shared namespaces can be accessed by everyone with access to that namespace, in which case you may not need to share the artifacts manually. +The libraries (conda, conda-lock, conda-pack, etc.) mentioned in the following sections are separate projects in the conda ecosystem. The environments created using them are not managed by conda-store. ::: ## YAML file (pinned) @@ -29,15 +32,9 @@ broken][conda-forge-immutability-policy]. Most other channels do not have such a policy. ::: -Click on **"Show yml file"** link in the conda-store UI to open the file in a new browser tab. You can download the file[^1] and share with someone or use it to create an environment on a different machine. - -[^1]: Concretely, download the browser page displaying the file. For example, on macOS: Right-click on the page -> Select "Save As" -> Give the file a meaningful name (like `environment.yml`). +Click on **"Show yml file"** link in the conda-store UI to open the file in a new browser tab. You can copy-and-past this file in [conda-store UI's YAML editor][cs-ui-yaml] to create a new environment managed by conda-store in a different namespace. -Assuming `conda` is installed, run the following CLI command with the corresponding filename to create a conda environment (on any machine): - -```bash -conda env create --file -``` +You can download the file and share with someone or use it to create an environment on a different machine. Assuming `conda` is installed, run the [CLI commands mentioned in the conda-documentation][conda-docs-create-env] with the corresponding filename to create a conda environment (on any machine). ## Lockfile @@ -46,19 +43,9 @@ a given environment. conda-store creates lockfiles using the [conda-lock][conda-lock-github] project. Click on **"Show lockfile"** to open the lockfile in a new browser tab. -You can download the file[^1] and share with someone or use it to create an environment in a different space. - -At the new location, install `conda-lock` if it is not already installed: - -```shell -conda install -c conda-forge conda-lock -``` +You can download the file and share with someone or use it to create an environment in a different space. -Create an environment using the lockfile generated by conda-store: - -```shell -conda-lock install -``` +To create an environment att the new location, follow the [commands in the conda-lock documentation][conda-lock-install-env]. ## Tarballs or archives @@ -72,41 +59,9 @@ A tarball or archive is a _packaged_ environment that can be moved, unpacked, an conda-store uses [Conda-Pack][conda-pack], a library for creating tarballs of conda environments. -:::tip -Creating an archive of a conda environment is more complex than packing and unpacking a given directory because the base path for the environment can change. -[Conda-Pack][conda-pack] handles this complexity. -::: - Click **"Download archive"** button to download the archive of your conda environment, and share/move it to the desired location. -To install the tarball, execute the following commands at the location: - -1. Create a new directory for the environment (called `` here) and unpack the environment tarball in that directory: - -```bash -mkdir -p -tar -xzf -C -``` - -2. Activate the environment with: - -```bash -source /bin/activate -``` - -3. From the active environment, clean-up prefixes: - -```bash -conda-unpack -``` - -4. You can use any library present in the environment, and when done, deactivate the environment with: - -```bash -source /bin/deactivate -``` - -Learn more about using environment tarballs in the [conda-pack documentation][conda-pack-usage]. +To install the tarball, follow the [instructions for the target machine in the conda-pack documentation][conda-pack-usage]. ## Docker images @@ -190,10 +145,15 @@ might be different compared to the environment available in conda-store. There are plans to address this issue in the future. ::: + +[cs-ui-yaml]: conda-store-ui/tutorials/create-envs#yaml-editor + [conda-docs]: https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html [conda-forge-immutability-policy]: https://conda-forge.org/docs/maintainer/updating_pkgs.html#packages-on-conda-forge-are-immutable +[conda-docs-create-env]: https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file [conda-lock-github]: https://github.com/conda-incubator/conda-lock +[conda-lock-install-env]: https://conda.github.io/conda-lock/output/#environment-lockfile [constructor]: https://github.com/conda/constructor [conda-pack]: https://conda.github.io/conda-pack/ [conda-pack-usage]: https://conda.github.io/conda-pack/index.html#commandline-usage From be5bac2c57bd1947a0b18f0379b68f2dca61095c Mon Sep 17 00:00:00 2001 From: Pavithra Eswaramoorthy Date: Tue, 4 Jun 2024 20:24:44 +0530 Subject: [PATCH 10/12] Code review suggestion for conda-store-concepts Signed-off-by: Pavithra Eswaramoorthy --- .../conda-store/explanations/artifacts.md | 2 +- .../explanations/conda-store-concepts.md | 58 ++++++++++++++++--- 2 files changed, 51 insertions(+), 9 deletions(-) diff --git a/docusaurus-docs/conda-store/explanations/artifacts.md b/docusaurus-docs/conda-store/explanations/artifacts.md index eeb25fe96..988655357 100644 --- a/docusaurus-docs/conda-store/explanations/artifacts.md +++ b/docusaurus-docs/conda-store/explanations/artifacts.md @@ -146,7 +146,7 @@ are plans to address this issue in the future. ::: -[cs-ui-yaml]: conda-store-ui/tutorials/create-envs#yaml-editor +[cs-ui-yaml]: ../../conda-store-ui/tutorials/create-envs#yaml-editor [conda-docs]: https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html diff --git a/docusaurus-docs/conda-store/explanations/conda-store-concepts.md b/docusaurus-docs/conda-store/explanations/conda-store-concepts.md index 7bd7362e0..b9b96a460 100644 --- a/docusaurus-docs/conda-store/explanations/conda-store-concepts.md +++ b/docusaurus-docs/conda-store/explanations/conda-store-concepts.md @@ -31,17 +31,59 @@ A user can be a part of several other "shared" namespaces, and based on the leve ## Role mappings - - -- Viewer -- Developer (to be changed to Editor) -- Admin - -## Environment versions +By default, the following roles are available in conda-store. All users are in one of these groups and have corresponding permissions. + +- **Viewer:** Read-only permissions for environments in selected namespaces +- **Editor (previously called Developer):** Permission to read, create, and update environments in specific namespaces +- **Admin:** Permission to read, create, update, and delete environments in all existing namespaces + +
+ Specific role-mappings: + +```yaml + _viewer_permissions = { + schema.Permissions.ENVIRONMENT_READ, + schema.Permissions.NAMESPACE_READ, + schema.Permissions.NAMESPACE_ROLE_MAPPING_READ, + } + _editor_permissions = { + schema.Permissions.BUILD_CANCEL, + schema.Permissions.ENVIRONMENT_CREATE, + schema.Permissions.ENVIRONMENT_READ, + schema.Permissions.ENVIRONMENT_UPDATE, + schema.Permissions.ENVIRONMENT_SOLVE, + schema.Permissions.NAMESPACE_READ, + schema.Permissions.NAMESPACE_ROLE_MAPPING_READ, + schema.Permissions.SETTING_READ, + } + _admin_permissions = { + schema.Permissions.BUILD_DELETE, + schema.Permissions.BUILD_CANCEL, + schema.Permissions.ENVIRONMENT_CREATE, + schema.Permissions.ENVIRONMENT_DELETE, + schema.Permissions.ENVIRONMENT_READ, + schema.Permissions.ENVIRONMENT_UPDATE, + schema.Permissions.ENVIRONMENT_SOLVE, + schema.Permissions.NAMESPACE_CREATE, + schema.Permissions.NAMESPACE_DELETE, + schema.Permissions.NAMESPACE_READ, + schema.Permissions.NAMESPACE_UPDATE, + schema.Permissions.NAMESPACE_ROLE_MAPPING_CREATE, + schema.Permissions.NAMESPACE_ROLE_MAPPING_READ, + schema.Permissions.NAMESPACE_ROLE_MAPPING_UPDATE, + schema.Permissions.NAMESPACE_ROLE_MAPPING_DELETE, + schema.Permissions.SETTING_READ, + schema.Permissions.SETTING_UPDATE, + } +``` + +
+ +## Environment versions/builds conda-store always re-builds an environment from scratch when edits are detected, which is required for ensuring truly reproducible environments. Version control is very useful in any collaborative setting, and environments are no exception. -Hence, conda-store keeps older versions of the environment for reference, and allows you to select different versions when needed. conda-store-ui also provides a graphical way to [switch between versions][conda-store-ui-version-control]. +Hence, conda-store keeps older versions (also called "builds") of the environment for reference, and allows you to select and use different (previous or newer) versions when needed. conda-store-ui also provides a graphical way to [switch between versions][conda-store-ui-version-control]. :::tip Internally, conda-store handles versions with ✨ symlinking magic ✨, where the environment name points to different environments corresponding to versions. From 1acb2467e570f290f6384124354f501f942cfa34 Mon Sep 17 00:00:00 2001 From: Pavithra Eswaramoorthy Date: Tue, 4 Jun 2024 20:25:08 +0530 Subject: [PATCH 11/12] Update docusaurus-docs/conda-store/explanations/conda-store-concepts.md Co-authored-by: Peyton Murray --- .../conda-store/explanations/conda-store-concepts.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docusaurus-docs/conda-store/explanations/conda-store-concepts.md b/docusaurus-docs/conda-store/explanations/conda-store-concepts.md index b9b96a460..6acea6f7e 100644 --- a/docusaurus-docs/conda-store/explanations/conda-store-concepts.md +++ b/docusaurus-docs/conda-store/explanations/conda-store-concepts.md @@ -13,8 +13,7 @@ This page describes how conda-store achieves these goals. In the [conda-based environment creation process][conda-concepts-env-creation], there are two areas where runtime reproducibility is improved through conda-store: * Auto-tracking when an `environment.yaml` (which is created and updated manually) file has changes. This can be easily tracked by taking a sha256 of the file, which is what conda-store does but sorts the dependencies to make sure it has a way of not triggering a rebuild if the order of two packages changes in the dependencies list. -* In step (2) `repodata.json` is updated regularly. When conda solves for a user's environment it tries to use the latest version of each package. Since `repodata.json` could be updated the very next minute, the same solve for the same -`environment.yaml` file can result in different solves. To enable reproducibility, conda-store auto-generates certain artifacts like lockfiles and tarballs that capture the actual versions of packages and can be used reliably re-create the same environment. Learn more about them in the [artifacts documentation][artifacts]. +* When a user creates an environment, conda tries to use the latest version of each package requested in the environment specification. Conda channels are constantly being updated with new package versions, so the same solve for the same `environment.yaml` file can result in different dependencies being downloaded. To enable reproducibility, conda-store auto-generates certain artifacts like lockfiles and tarballs that capture the actual versions of packages and can be used reliably re-create the same environment. Learn more about them in the [artifacts documentation][artifacts]. ## Namespaces From 1d6677a7a5919037d1a0aa5e5de7c38b2f160466 Mon Sep 17 00:00:00 2001 From: Pavithra Eswaramoorthy Date: Tue, 4 Jun 2024 21:24:22 +0530 Subject: [PATCH 12/12] code review suggestions for conda concepts Signed-off-by: Pavithra Eswaramoorthy --- .../explanations/conda-concepts.md | 39 ++++++++++++------- 1 file changed, 26 insertions(+), 13 deletions(-) diff --git a/docusaurus-docs/conda-store/explanations/conda-concepts.md b/docusaurus-docs/conda-store/explanations/conda-concepts.md index 80a72e4d5..b91eeb8d6 100644 --- a/docusaurus-docs/conda-store/explanations/conda-concepts.md +++ b/docusaurus-docs/conda-store/explanations/conda-concepts.md @@ -20,17 +20,33 @@ pip ships with the Python programming language, and can install packages from th conda needs to be downloaded separately (through a distribution like Anaconda or Miniconda), and can install packages from conda [*channels*](#channels) and local builds. +Some Python packages depend on non-Python code (for example, NumPy includes some C libraries). Installing such packages from PyPI using pip can be un-reliable and sometimes it can be your responsibility to separately install the non-Python libraries. +However, conda provides a package management solution that includes both Python and other underlying non-Python code. + +## Dependencies + +Modern open source software (and software in general) is created using or builds on other libraries, which are called the *dependencies* of the project. +For example, pandas uses NumPy's `ndarray`s and is written partially in Python, hence, NumPy and Python are dependencies of pandas. +Specifically, they are the direct dependencies. +The dependencies of NumPy and pandas, and the dependencies of those dependencies, and so on creates a complete dependency graph for pandas. + +Since conda-store focuses on [environments](#environments), the terms *dependencies* usually refers to the full set of compatible dependencies for all the packages specified in an environment. + ## Channels (conda) The [conda documentation][conda-docs-channels] defines: > Conda channels are the locations where packages are stored. They serve as the base for hosting and managing packages. Conda packages are downloaded from remote channels, which are URLs to directories containing conda packages. -In conda-store, packages are installed from the [conda-forge][conda-forge] channel by default. +Similar to PyPI, conda channels are URLs of remote servers that manage packages. + +In conda-store, packages are installed from the [conda-forge][conda-forge] channel by default. conda-forge is a community maintained channel for hosting open source libraries. :::note -This behavior is different from conda that gets packages from the "default" channel by default. +This behavior is different from conda downloaded from Anaconda/Miniconda distribution, that gets packages from the "default" channel by default. + +Other distributions like Miniforge also use conda-forge as the default channel. ::: ## Environments @@ -44,6 +60,10 @@ The [official conda documentation][conda-docs-environments] states: > > If you change one environment, your other environments are not affected. You can easily activate or deactivate environments, which is how you switch between them. +In data science and development workflows, you often use different environments for different projects and sub-projects. It gives you a clean space for development with only the packages and versions that you need for the specific project. You can also use different versions of the same package in different environments depending on your project needs. + +Using isolated environments is a good practice to follow. The alternative, where requirements for all projects are added to a single "base" environment can not only give you un-reliable results but also be very tedious to manage across projects. + ## Environment specification (spec) conda environments are specified through a YAML file, which is called the *environment specification* and has the following major components: @@ -67,16 +87,11 @@ dependencies: # list of packages required for your work conda uses this file to create a conda *environment*. -Learn more in the [conda documentation about created an environment file manually][conda-docs-env-file] - -## Dependencies - -Modern open source software (and software in general) is created using or builds on other libraries, which are called the *dependencies* of the project. -For example, pandas uses NumPy's `ndarray`s and is written partially in Python, hence, NumPy and Python are dependencies of pandas. -Specifically, they are the direct dependencies. -The dependencies of NumPy and pandas, and the dependencies of those dependencies, and so on creates a complete dependency graph for pandas. +:::tip +In some cases, installing packages using pip through conda can cause issues dependency conflicts. We suggest you use the `pip:` section only if the package you need is not available on conda-forge. +::: -Since conda-store focuses on [environments](#environments), the terms *dependencies* usually refers to the full set of compatible dependencies for all the packages specified in an environment. +Learn more in the [conda documentation about created an environment file manually][conda-docs-env-file] ## Environment creation @@ -102,8 +117,6 @@ is because in general , non-URL channels are expected to be present at `https:// 4. The specific packages are downloaded. -5. Conda does :sparkles: magic :sparkles: to fix the path prefixes of the installs, which is beyond the scope of this page. - For a detailed walkthrough, check out the [conda install deep dive in the conda documentation][conda-docs-install]. Understand how conda-store builds on conda for improved reproducibility in [conda-store concepts page][conda-store-concepts].