[Alpha] - v0.16.0a2
Pre-releaseAlpha release - v0.16.0a2 (almost final)
We've been hard at work over the holidays putting this together. This will be the final alpha release for the new natively typed Flytekit. Please see the earlier two releases as well as the proposal doc, which will be updated again and finalized for the coming beta release, which we expect to make by the end of next week.
Changes
Potentially Breaking
First, the updates in this release that may break your existing code, and what changes you'll need to make.
- Task settings have been condensed into
TaskConfig
subclasses as opposed to just indiscriminatekwargs
. For instance,from flytekit.taskplugins.hive.task import HiveConfig HiveTask( # cluster_label="flyte", (old) task_config=HiveConfig(cluster_label="flyte"), # (new) ... )
- Spark session functionality has been nested inside the Spark context
sess = flytekit.current_context().spark_session count = sess.parallelize(...) # (old) count = sess.sparkContext.parallelize(...) # (new)
- Tasks that take an explicit metadata should be changed to use the
TaskMetadata
dataclass instead of themetadata()
function.# from flytekit import metadata (old) from flytekit import TaskMetadata # (new) # in a task declaration WaitForObjectStoreFile( metadata=metadata(retries=2), # (old) metadata=TaskMetadata(retries=2), # (new) ... )
- Types have been moved to a subfolder so your imports may need to change. This was done because previously importing one custom type (say
FlyteFile
) would trigger imports of all the custom types in flytekit.# from flytekit.types import FlyteFile, FlyteSchema (old) from flytekit.types.file import FlyteFile # (new) from flytekit.types.schema import FlyteSchema
- Flytekit tasks and workflows by default assign names to the outputs,
o0
,o1
, etc. If you want to explicitly name your outputs, you can use atyping.NamedTuple
, likePreviously though, flytekit was accidentally de-tuple-tizing single-output named tuples. That has been fixed.rankings = typing.NamedTuple("Rankings", order=int) def t1() -> rankings: ...
r = t1() # read_ordering_task(r=r) # (old) read_ordering_task(r=r.order) # (new)
Process Changes
Registration of Flyte entities (that is, translating your Python tasks, workflows, and launch plans from Python code to something that Flyte understands) has always been a two step process, even if it looks like one step. The first step is compilation (aka serialization), where users Python code is compiled down to protobuf files, and the second step is sending those files over the wire to the Flyte control plane.
In this release, we've further isolated moved where certain settings are read and applied. That is, when you call
pyflyte --config /code/sandbox.config serialize workflows -f /tmp/output
the project
, domain
, version
values are no longer in the compiled protobuf files. kubernetes-service-account
assumable-iam-role
, and output-location-prefix
have all also been removed. If you inspect the protobuf files via flyte-cli parse-proto -f serialized_task.pb -p flyteidl.admin.task_pb2.TaskSpec
you should see that those values are now missing. Instead, they will be filled in during the second step, in flyte-cli
. Your registration command should now look like.
flyte-cli register-files -p <project> -d <domain> -v <your version, we recommend still the git sha> --kubernetes-service-account <account> OR --assumable-iam-role --output-location-prefix s3://some/bucket -h flyte.host.com serialized_protos/*
Note however that the container image that tasks run is still specified at serialization time (and we suggest that the image version also be same git sha).
This change was made because serialized protos are now completely portable. You can serialize your code, hand them to someone else, and as long as they have access to the container images specified in the task specifications, they can register them under a completely different AWS account or cloud provider.
In the near future, we hope to add some automation to our examples repo so that with each release of flytekit, we also publish the docker image for the cookbook (Second Edition) along with all the serialized artifacts so that users can just run one registration command to pick them all up and play around with them.
Other Changes
-
Resources, Auth and custom environment variables now piped through correctly in task decorator
-
Schedules and Notifications added to launch plans
-
Reference Entities refactored and Reference Launch Plans added
-
Added
FlyteDirectory
as a parallel toFlyteFile
-
Minor cleanup of some mypy warnings
-
Shift operators (aka
runs_before
function) along with an explicit node creating call introduced. For those of you familiar with the existing Flytekit (master), this style should be reminiscent. -
Additional task types ported over
- Pytorch
- Sagemaker task and custom training task
- Sagemaker HPO
-
The default node and output names that Flytekit assigns have been condensed from
out0
,out1
, etc, too0
,o1
... Nodes have been shortened fromnode-0
ton0
. This is just to save on disk, network, and compute. -
Workflow metadata settings like failure policy and interruptible have been added to the workflow decorator.