Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation around interruptible #527

Merged
merged 1 commit into from
Sep 29, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions rsts/user/features/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,4 @@ Flyte Features
roles
single_task_execution
on_failure_policy
interruptible
50 changes: 50 additions & 0 deletions rsts/user/features/interruptible.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
.. _features-interruptible

Interruptible
####################

What is interruptible?
======================

Interruptible allows users to specify that their tasks are ok to be scheduled on machines that may get preempted such as AWS spot instances.
`Spot instances <https://aws.amazon.com/ec2/spot/?cards.sort-by=item.additionalFields.startDateTime&cards.sort-order=asc>`_ can lead up to 90% savings over on-demand. Anyone looking to realize cost savings should look into interruptible.

What are Spot Instances?
========================

Spot Instances are unused EC2 capacity in AWS. Spot instances are available at up to a 90% discount compared to on-demand prices. The caveat is that at any point these instances can be preempted and no longer be available for use. This can happen due to:

* Price – The Spot price is greater than your maximum price.
* Capacity – If there are not enough unused EC2 instances to meet the demand for Spot Instances, Amazon EC2 interrupts Spot Instances. The order in which the instances are interrupted is determined by Amazon EC2.
* Constraints – If your request includes a constraint such as a launch group or an Availability Zone group, these Spot Instances are terminated as a group when the constraint can no longer be met.

As a general rule of thumb, most spot instances are obtained for around 2 hours (median), with the floor being around 20 minutes, and the ceiling being unbounded duration.

Setting Interruptible
=====================

In order to run your workload on spot, you can set interruptible to True. Example:

.. code-block:: python

@inputs(value_to_print=Types.Integer)
@outputs(out=Types.Integer)
@python_task(cache_version='1', interruptible=True)
def add_one_and_print(workflow_parameters, value_to_print, out):
workflow_parameters.stats.incr("task_run")
added = value_to_print + 1
print("My printed value: {}".format(added))
out.set(added)


By setting this value, Flyte will schedule your task on an ASG with only spot instances. In the case your task gets preempted, Flyte will retry your task on a non-spot instance. This retry will not count towards a retry that a user sets.


What tasks should be set to interruptible?
==========================================

Most Flyte workloads should be good candidates for spot instances. If your task does not exhibit the following properties, then the recommendation would be to set interruptible to true.

* Time sensitive. I need this to run now and can not have any unexpected delays.
* Side Effects. My task is not idempotent and retrying will cause issues.
* Long Running Task. My task takes > 2 hours. Having an interruption during this time frame could potentially waste a lot of computation already done.