diff --git a/docs/user_guide/flyte_fundamentals/optimizing_tasks.md b/docs/user_guide/flyte_fundamentals/optimizing_tasks.md index b6b41514ba..15ed1c0555 100644 --- a/docs/user_guide/flyte_fundamentals/optimizing_tasks.md +++ b/docs/user_guide/flyte_fundamentals/optimizing_tasks.md @@ -52,26 +52,26 @@ represents the cache key. Learn more in the {ref}`User Guide float: @@ -80,10 +80,12 @@ def compute_mean(data: List[float]) -> float: return sum(data) / len(data) ``` -```{note} -Retries only take effect when running a task on a Flyte cluster. -See {ref}`Fault Tolerance ` for details on the types of errors that will be retried. -``` + +- **System Errors**: Managed at the platform level through settings like `max-node-retries-system-failures` in the FlytePropeller configuration. This setting helps manage retries without requiring changes to the task code. + + Additionally, the `interruptible-failure-threshold` option in the node-config key defines how many system-level retries are considered interruptible. This is particularly useful for tasks running on preemptible instances. + + For more details, refer to the [Flyte Propeller Configuration](https://docs.flyte.org/en/latest/deployment/configuration/generated/flytepropeller_config.html#config-nodeconfig). ### Interruptible Tasks and Map Tasks @@ -92,9 +94,13 @@ Tasks marked as interruptible can be preempted and retried without counting agai For map tasks, the interruptible behavior aligns with that of regular tasks. The `retries` field in the task annotation is not necessary for handling SYSTEM errors, as these are managed by the platform's configuration. Alternatively, the USER budget is set by defining retries in the task decorator. +Map Tasks: The behavior of interruptible tasks extends seamlessly to map tasks. The platform's configuration manages SYSTEM errors, ensuring consistency across task types without additional task-level settings. + ### Advanced Retry Policies -Flyte also supports advanced retry policies that allow finer control over retry behavior, such as defining a threshold for interruptible failures. This means you can specify how many retries should be considered as interruptible before marking a task as non-interruptible. Refer this for details: [Flyte Propeller Configuration](https://docs.flyte.org/en/latest/deployment/configuration/generated/flytepropeller_config.html). +Flyte supports advanced configurations that allow more granular control over retry behavior, such as specifying the number of retries that can be interruptible. This advanced setup helps in finely tuning the task executions based on the criticality and resource availability. + +For a deeper dive into configuring retries and understanding their impact, see the [Fault Tolerance](https://docs.flyte.org/en/latest/concepts/fault-tolerance.html) section in the Flyte documentation. ## Timeouts