Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New "Caching" doc page #1615

Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
feeac1f
Move package caching information to new subpage called Caching along …
brycegbrazen Jan 12, 2024
caa12fd
Add doc from predat.
brycegbrazen Jan 22, 2024
2562c18
Add some new lines to fix issue with code blocks not appearing.
brycegbrazen Jan 22, 2024
b7437ff
Add new benefits/downsides section and fix typos, and small reorg.
brycegbrazen Jan 22, 2024
031885e
Another small cleanup.
brycegbrazen Jan 22, 2024
893c804
Add cache invalidation section and more reorg
brycegbrazen Jan 22, 2024
d8a0c89
Add section about cache contents
brycegbrazen Jan 22, 2024
07c179c
Update configuration section. NOTE: The caching cross reference isnt …
brycegbrazen Jan 25, 2024
70820b9
Remove extra = from Caching header
brycegbrazen Jan 25, 2024
d8cf7fb
Move validating operation below cache invalidation.
brycegbrazen Jan 25, 2024
3714703
Cleanup extra - lines
brycegbrazen Jan 25, 2024
4bb926f
Reword stats section
brycegbrazen Jan 25, 2024
a0f5ad7
Move benefits/downsides and dont make them sections.
brycegbrazen Jan 25, 2024
4c0fca8
Add info about memcached, wording fixes
brycegbrazen Jan 25, 2024
ebc2cec
Small wording change
brycegbrazen Jan 25, 2024
be3781e
Small wording change
brycegbrazen Jan 25, 2024
af03fa3
Cleanup some package caching wording
brycegbrazen Jan 25, 2024
2849742
Sentence rewording
brycegbrazen Jan 26, 2024
276c9b5
Update wording of setup for resolve caching
brycegbrazen Jan 26, 2024
8afe28e
* Fix caching reference
JeanChristopheMorinPerso Jan 27, 2024
dbeda38
More tweaks
JeanChristopheMorinPerso Jan 27, 2024
ae7d711
Quick typo fix
brycegbrazen Jan 29, 2024
75997b1
Address wording issue
brycegbrazen Jan 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
270 changes: 270 additions & 0 deletions docs/source/caching.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,270 @@
=================
Caching
=================
brycegbrazen marked this conversation as resolved.
Show resolved Hide resolved

Resolve Caching
===============

Resolve caching is a feature that caches resolves to a memcached (in-memory) server. Because the server is in-memory,
brycegbrazen marked this conversation as resolved.
Show resolved Hide resolved
the full contents of the cache are lost if the memcached service shuts down by any means.

Cache contents
--------------
The following information is stored to the memcached server for each solve:

* Solver information about the previously cached solve.
* Release times information about when each package variant in the resolve was last released.
* Variant states information about the state of a variant. For example, in the 'filesystem' repository type, the 'state' is the last modified date of the file associated with the variant (perhaps a package.py). If the state of any variant has changed from a cached resolve - eg. if a file has been modified - the cached resolve is discarded.

Setup
-----

To enable memcached caching, you need to configure the :data:`memcached_uri` config variable. This variable accepts a list of memcached uri servers or None. Example with memcached running on localhost on its default port:
brycegbrazen marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: console

memcached_uri = ["127.0.0.1:11211"]

This is the only parameter you need to configure to enable caching of the content and location of package file definitions and resolutions in Rez.

Configuration
-------------
There are several variables that can be accessed to modify the default behavior of resolve caching:

* :data:`resolve_caching`
* :data:`cache_package_files`
* :data:`cache_listdir`
* :data:`resource_caching_maxsize`
* :data:`memcached_package_file_min_compress_len`
* :data:`memcached_context_file_min_compress_len`
* :data:`memcached_listdir_min_compress_len`
* :data:`memcached_resolve_min_compress_len`
JeanChristopheMorinPerso marked this conversation as resolved.
Show resolved Hide resolved


Validating operation
brycegbrazen marked this conversation as resolved.
Show resolved Hide resolved
--------------------------
To print debugging information about memcached usage, you can temporarily declare the following variables in a terminal:

.. code-block:: console

export REZ_DEBUG_MEMCACHE=1 (linux/macos bash)
$env:REZ_DEBUG_MEMCACHE=1 (powershell)

or set :data:`debug_memcache` to True in your rezconfig.py.

Cache invalidation
----------------------
brycegbrazen marked this conversation as resolved.
Show resolved Hide resolved
Cache entries will automatically be invalidated when a newer package version is released that would change the result
of an existing resolve.

For example, let's say you are running rez-env with the package ``foo1+<2``, and originally, the only available
``foo`` package version is ``1.0.0``, so the cached resolve points to ``1.0.0``. However, at some point afterwards
you release a new version ``1.0.1``. The cache would invalidate for the request ``foo1+<2`` and the next resolve
would correctly retrieve package version ``1.0.1``.


Show stats from memcached server
--------------------------------
Rez provides a command-line tool :ref:`rez-memcache` for query the memcached server and obtaining status information and statistics.
brycegbrazen marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: console

$ rez-memcache

CACHE SERVER UPTIME HITS MISSES HIT RATIO MEMORY USED
------------ ------ ---- ------ --------- ------ ----
127.0.0.1:11211 20 hours 27690 5205 84% 119 Gb 10 Mb (0%)
central.example.com:11211 6.2 months 19145089 456 99% 64 Mb 1.9 Mb (2%)

Benefits
brycegbrazen marked this conversation as resolved.
Show resolved Hide resolved
--------
In a studio environment (with many machines), machines that perform a solve that is already cached to the
resolve cache will simply receive the cached result rather than preforming a re-solve.

Downsides
---------
Resolve caching has almost no downsides. Only in rare edge cases where you have to "hack" a released package into
production do you see any issues. In this case, because resolves are cached, you may receive a different package than
you expect. In this case however, it's better to just manually invalidate the cache anyway.

.. _package-caching:

Package Caching
===============

Package caching is a feature that copies package payloads onto local disk in
order to speed up runtime environments. For example, if your released packages
reside on shared storage (which is common), then running say, a Python process,
will fetch all source from the shared storage across your network. The point of
brycegbrazen marked this conversation as resolved.
Show resolved Hide resolved
the cache is to copy that content locally instead, and avoid the network cost.

.. note::
Please note: Package caching does **NOT** cache package
brycegbrazen marked this conversation as resolved.
Show resolved Hide resolved
definitions. Only their payloads (ie, the package root directory).

Build behavior
--------------

Package caching during a package build is disabled by default. To enable caching during
a package build, you can set :data:`package_cache_during_build` to True.

.. _enabling-package-caching:

Enabling Package Caching
========================

Package caching is not enabled by default. To enable it, you need to configure
:data:`cache_packages_path` to specify a path to
store the cache in.

You also have granular control over whether an individual package will or will
not be cached. To make a package cachable, you can set :attr:`cachable`
to False in its package definition file. Reasons you may *not* want to do this include
packages that are large, or that aren't relocatable because other compiled packages are
linked to them in a way that doesn't support library relocation.

There are also config settings that affect cachability in the event that :attr:`cachable`
is not defined in a package's definition. For example, see
:data:`default_cachable`, :data:`default_cachable_per_package`
and :data:`default_cachable_per_repository`.

Note that you can also disable package caching on the command line, using
:option:`rez-env --no-pkg-cache`.

Verifying
---------

When you resolve an environment, you can see which variants have been cached by
noting the ``cached`` label in the right-hand column of the :ref:`rez-context` output,
as shown below:

.. code-block:: console

$ rez-env Flask

You are now in a rez-configured environment.

requested packages:
Flask
~platform==linux (implicit)
~arch==x86_64 (implicit)
~os==Ubuntu-16.04 (implicit)

resolved packages:
Flask-1.1.2 /home/ajohns/package_cache/Flask/1.1.2/d998/a (cached)
Jinja2-2.11.2 /home/ajohns/package_cache/Jinja2/2.11.2/6087/a (cached)
MarkupSafe-1.1.1 /svr/packages/MarkupSafe/1.1.1/d9e9d80193dcd9578844ec4c2c22c9366ef0b88a
Werkzeug-1.0.1 /home/ajohns/package_cache/Werkzeug/1.0.1/fe76/a (cached)
arch-x86_64 /home/ajohns/package_cache/arch/x86_64/6450/a (cached)
click-7.1.2 /home/ajohns/package_cache/click/7.1.2/0da2/a (cached)
itsdangerous-1.1.0 /home/ajohns/package_cache/itsdangerous/1.1.0/b23f/a (cached)
platform-linux /home/ajohns/package_cache/platform/linux/9d4d/a (cached)
python-3.7.4 /home/ajohns/package_cache/python/3.7.4/ce1c/a (cached)

For reference, cached packages also have their original payload location stored to
an environment variable like so:

.. code-block:: console

$ echo $REZ_FLASK_ORIG_ROOT
/svr/packages/Flask/1.1.2/88a70aca30cb79a278872594adf043dc6c40af99

How it Works
------------

Package caching actually caches :doc:`variants`, not entire packages. When you perform
a resolve, or source an existing context, the variants required are copied to
local disk asynchronously (if they are cachable), in a separate process called
:ref:`rez-pkg-cache`. This means that a resolve will not necessarily use the cached
variants that it should, the first time around. Package caching is intended to have
a cumulative effect, so that more cached variants will be used over time. This is
a tradeoff to avoid blocking resolves while variant payloads are copied across
your network (and that can be a slow process).

Note that a package cache is **not** a package repository. It is simply a store
of variant payloads, structured in such a way as to be able to store variants from
any package repository, into the one shared cache.

Variants that are cached are assumed to be immutable. No check is done to see if
a variant's payload has changed, and needs to replace an existing cache entry. So
you should **not** enable caching on package repositories where packages may get
overwritten. It is for this reason that caching is disabled for local packages by
default (see :data:`package_cache_local`).

Commandline Tool
----------------

Inspection
++++++++++

Use the :ref:`rez-pkg-cache` tool to view the state of the cache, and to perform
warming and deletion operations. Example output follows:

.. code-block:: console

$ rez-pkg-cache
Package cache at /home/ajohns/package_cache:

status package variant uri cache path
------ ------- ----------- ----------
cached Flask-1.1.2 /svr/packages/Flask/1.1.2/package.py[0] /home/ajohns/package_cache/Flask/1.1.2/d998/a
cached Jinja2-2.11.2 /svr/packages/Jinja2/2.11.2/package.py[0] /home/ajohns/package_cache/Jinja2/2.11.2/6087/a
cached Werkzeug-1.0.1 /svr/packages/Werkzeug/1.0.1/package.py[0] /home/ajohns/package_cache/Werkzeug/1.0.1/fe76/a
cached arch-x86_64 /svr/packages/arch/x86_64/package.py[] /home/ajohns/package_cache/arch/x86_64/6450/a
cached click-7.1.2 /svr/packages/click/7.1.2/package.py[0] /home/ajohns/package_cache/click/7.1.2/0da2/a
cached itsdangerous-1.1.0 /svr/packages/itsdangerous/1.1.0/package.py[0] /home/ajohns/package_cache/itsdangerous/1.1.0/b23f/a
cached platform-linux /svr/packages/platform/linux/package.py[] /home/ajohns/package_cache/platform/linux/9d4d/a
copying python-3.7.4 /svr/packages/python/3.7.4/package.py[0] /home/ajohns/package_cache/python/3.7.4/ce1c/a
stalled MarkupSafe-1.1.1 /svr/packages/MarkupSafe/1.1.1/package.py[1] /home/ajohns/package_cache/MarkupSafe/1.1.1/724c/a

Each variant is stored into a directory based on a partial hash of that variant's
unique identifier (its "handle"). The package cache is thread and multiprocess
proof, and uses a file lock to control access where necessary.

Cached variants have one of the following statuses at any given time:

* **copying**: The variant is in the process of being copied into the cache, and is not
yet available for use;
* **cached**: The variant has been cached and is ready for use;
* **stalled**: The variant was getting copied, but something went wrong and there is
now a partial copy present (but unused) in the cache.

Logging
+++++++

Caching operations are stored into logfiles within the cache directory. To view:

.. code-block:: console

$ rez-pkg-cache --logs
rez-pkg-cache 2020-05-23 16:17:45,194 PID-29827 INFO Started daemon
rez-pkg-cache 2020-05-23 16:17:45,201 PID-29827 INFO Started caching of variant /home/ajohns/packages/Werkzeug/1.0.1/package.py[0]...
rez-pkg-cache 2020-05-23 16:17:45,404 PID-29827 INFO Cached variant to /home/ajohns/package_cache/Werkzeug/1.0.1/fe76/a in 0.202576 seconds
rez-pkg-cache 2020-05-23 16:17:45,404 PID-29827 INFO Started caching of variant /home/ajohns/packages/python/3.7.4/package.py[0]...
rez-pkg-cache 2020-05-23 16:17:46,006 PID-29827 INFO Cached variant to /home/ajohns/package_cache/python/3.7.4/ce1c/a in 0.602037 seconds

Cleaning The Cache
++++++++++++++++++

Cleaning the cache refers to deleting variants that are stalled or no longer in use.
It isn't really possible to know whether a variant is in use, so there is a
configurable :data:`package_cache_max_variant_days`
setting, that will delete variants that have not been used (ie that have not appeared
in a created or sourced context) for more than N days.

You can also manually remove variants from the cache using :option:`rez-pkg-cache -r`.
Note that when you do this, the variant is no longer available in the cache,
however it is still stored on disk. You must perform a clean (:option:`rez-pkg-cache --clean`)
to purge unused cache files from disk.

You can use the :data:`package_cache_clean_limit`
setting to asynchronously perform some cleanup every time the cache is updated. If
you do not use this setting, it is recommended that you set up a cron or other form
of execution scheduler, to run :option:`rez-pkg-cache --clean` periodically. Otherwise,
your cache will grow indefinitely.

Lastly, note that a stalled variant will not attempt to be re-cached until it is
removed by a clean operation. Using :data:`package_cache_clean_limit` will not clean
stalled variants either, as that could result in a problematic variant getting
cached, then stalled, then deleted, then cached again and so on. You must run
:option:`rez-pkg-cache --clean` to delete stalled variants.
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ Welcome to rez's documentation!
context_bundles
suites
managing_packages
caching
pip

.. toctree::
Expand Down
Loading