Skip to content

Commit

Permalink
[Docs] Simplifying for better user understanding (#5878)
Browse files Browse the repository at this point in the history
* Doc simplifying for better user understanding

Signed-off-by: 10sharmashivam <[email protected]>

* Caching Docs

Signed-off-by: 10sharmashivam <[email protected]>

* Reviewed changes and suggestions applied

Signed-off-by: 10sharmashivam <[email protected]>

---------

Signed-off-by: 10sharmashivam <[email protected]>
  • Loading branch information
10sharmashivam authored Oct 23, 2024
1 parent 7e78e71 commit 8bd573e
Showing 1 changed file with 12 additions and 4 deletions.
16 changes: 12 additions & 4 deletions docs/user_guide/development_lifecycle/caching.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,23 +19,31 @@ Let's watch a brief explanation of caching and a demo in this video, followed by
```

### Input Caching

In Flyte, input caching allows tasks to automatically cache the input data required for execution. This feature is particularly useful in scenarios where tasks may need to be re-executed, such as during retries due to failures or when manually triggered by users. By caching input data, Flyte optimizes workflow performance and resource usage, preventing unnecessary recomputation of task inputs.

### Output Caching

Output caching in Flyte allows users to cache the results of tasks to avoid redundant computations. This feature is especially valuable for tasks that perform expensive or time-consuming operations where the results are unlikely to change frequently.

There are four parameters and one command-line flag related to caching.

## Parameters

* `cache`(`bool`): Enables or disables caching of the workflow, task, or launch plan.
By default, caching is disabled to avoid unintended consequences when caching executions with side effects.
To enable caching set `cache=True`.
To enable caching, set `cache=True`.
* `cache_version` (`str`): Part of the cache key.
A change to this parameter will invalidate the cache.
Changing this version number tells Flyte to ignore previous cached results and run the task again if the task's function has changed.
This allows you to explicitly indicate when a change has been made to the task that should invalidate any existing cached results.
Note that this is not the only change that will invalidate the cache (see below).
Also, note that you can manually trigger cache invalidation per execution using the [`overwrite-cache` flag](#overwrite-cache-flag).
* `cache_serialize` (`bool`): Enables or disables [cache serialization](./cache_serializing).
When enabled, Flyte ensures that a single instance of the task is run before any other instances that would otherwise run concurrently.
This allows the initial instance to cache its result and lets the later instances reuse the resulting cached outputs.
Cache serialization is disabled by default.
* `cache_ignore_input_vars` (`Tuple[str, ...]`): Input variables that should not be included when calculating hash for cache. By default, no input variables are ignored. This parameter only applies to task serialization.
* `cache_ignore_input_vars` (`Tuple[str, ...]`): Input variables that Flyte should ignore when deciding if a task’s result can be reused (hash calculation). By default, no input variables are ignored. This parameter only applies to task serialization.

Task caching parameters can be specified at task definition time within `@task` decorator or at task invocation time using `with_overrides` method.

Expand Down Expand Up @@ -127,7 +135,7 @@ Task executions can be cached across different versions of the task because a ch

### How does local caching work?

The flytekit package uses the [diskcache](https://github.com/grantjenks/python-diskcache) package, specifically [diskcache.Cache](http://www.grantjenks.com/docs/diskcache/tutorial.html#cache), to aid in the memoization of task executions. The results of local task executions are stored under `~/.flyte/local-cache/` and cache keys are composed of **Cache Version**, **Task Signature**, and **Task Input Values**.
Flyte uses a tool called [diskcache](https://github.com/grantjenks/python-diskcache), specifically [diskcache.Cache](http://www.grantjenks.com/docs/diskcache/tutorial.html#cache), to save task results so they don’t need to be recomputed if the same task is executed again, a technique known as ``memoization``. The results of local task executions are stored under `~/.flyte/local-cache/` and cache keys are composed of **Cache Version**, **Task Signature**, and **Task Input Values**.

Similar to the remote case, a local cache entry for a task will be invalidated if either the `cache_version` or the task signature is modified. In addition, the local cache can also be emptied by running the following command: `pyflyte local-cache clear`, which essentially obliterates the contents of the `~/.flyte/local-cache/` directory.
To disable the local cache, you can set the `local.cache_enabled` config option (e.g. by setting the environment variable `FLYTE_LOCAL_CACHE_ENABLED=False`).
Expand Down

0 comments on commit 8bd573e

Please sign in to comment.