Skip to content
/ flyte Public
forked from flyteorg/flyte

Kubernetes-native workflow automation platform for complex, mission-critical data and ML processes at scale. It has been battle-tested at Lyft, Spotify, Freenome, and others and is truly open-source.

License

Notifications You must be signed in to change notification settings

palchicz/flyte

This branch is 4044 commits behind flyteorg/flyte:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

3ba39c5 · Dec 23, 2021
Dec 8, 2021
Dec 11, 2021
Oct 14, 2021
Jul 29, 2021
Dec 23, 2021
Dec 23, 2021
Dec 18, 2021
Apr 28, 2020
Dec 11, 2021
Dec 23, 2021
Dec 9, 2021
Dec 16, 2021
Dec 23, 2021
Dec 18, 2021
Apr 3, 2021
Sep 3, 2021
Oct 21, 2019
Aug 30, 2021
Aug 20, 2021
Jan 28, 2021
Jul 29, 2021
Apr 20, 2021
Oct 14, 2021
Oct 21, 2019
Nov 25, 2021
Oct 21, 2019
Dec 23, 2021
Jul 7, 2021
Sep 9, 2021
Sep 9, 2021
Feb 26, 2021
Feb 26, 2021

Repository files navigation

Flyte and LF AI & Data Logo

Flyte

Flyte is a workflow automation platform for complex, mission-critical data, and ML processes at scale

Current Release Sandbox Build End-to-End Tests License Commit Activity Commits since Last Release GitHub Milestones Completed GitHub Next Milestone Percentage Docs Twitter Follow Flyte Helm Chart Join Flyte Slack

💥 Introduction

Flyte is a structured programming and distributed processing platform that enables highly concurrent, scalable, and maintainable workflows for Machine Learning and Data Processing. It is a fabric that connects disparate computation backends using a type-safe data dependency graph. It records all changes to a pipeline, making it possible to rewind time. It also stores a history of all executions and provides an intuitive UI, CLI, and REST/gRPC API to interact with the computation.

Flyte is more than a workflow engine -- it uses workflow as a core concept, and task (a single unit of execution) as a top-level concept. Multiple tasks arranged in a data producer-consumer order creates a workflow.

Workflows and Tasks can be written in any language, with out-of-the-box support for Python, Java and Scala. Flyte was designed to manage the complexity that arises in Data and ML teams and ensure they keep up their high velocity of delivering business impacting features. One way it achieves this is by separating the control-plane from the user-plane. Thus, every organization can offer Flyte as a service to their end-users where the service is managed by folks who are more infrastructure-focused, while the users use the intuitive interface of Flytekit.

⏳ Five Reasons to Use Flyte

  • Kubernetes-Native Workflow Automation Platform
  • Ergonomic SDK's in Python, Java & Scala
  • Versioned, Auditable & Reproducible Pipelines
  • Data Aware and Strongly Typed
  • Resource Aware and deployed to scale with your organization

🚀 Quick Start

With Docker installed and Flytectl installed, run the following command:

  flytectl sandbox start

This creates a local Flyte sandbox. Once the sandbox is ready, you should see the following message: Flyte is ready! Flyte UI is available at http://localhost:30081/console.

Visit http://localhost:30081/console to view the Flyte dashboard.

Here is a quick visual tour of the console:

Flyte console Example

To dig deeper into Flyte, refer to the Documentation.

⭐️ Current Deployments & Contributors

🛣️ Live Roadmap

Live roadmap for the project can be found @Github Live Roadmap.

🔥 Features

  • Used at Scale in production by 500+ on one deployment. Used in production at multiple firms. Proved to scale to more than 1 million executions, and 40+ million containers
  • Data Aware and Resource Aware (Allows organizations to separate concerns - users can use the API, platforms/infra teams can manage the deployments and scaling)
  • Enables collaboration across your organization by:
    • Executing distributed data pipelines/workflows
    • Making it easy to stitch together workflows from different teams and domain experts and share them across teams
    • Comparing results of training workflows over time and across pipelines
    • Simplifying the complexity of multi-step, multi-owner workflows
  • Get Started quickly -- start locally and scale to the cloud instantly
  • gRPC / REST interface to define and execute tasks and workflows
  • Typesafe construction of pipelines -- each task has an interface characterized by its input and output, so illegal construction of pipelines fails during declaration, rather than at runtime
  • Supports multiple data types for machine learning and data processing pipelines, such as Blobs (images, arbitrary files), Directories, Schema (columnar structured data), collections, maps, etc.
  • Memoization and Lineage tracking
  • Provides logging and observability
  • Workflow features:
    • Start with one task, convert to a pipeline, attach multiple schedules, trigger using a programmatic API, or on-demand
    • Parallel step execution
    • Extensible backend to add customized plugin experience (with simplified user experience)
    • Branching
    • Workflow of workflows - subworkflows (a workflow can be embedded within one node of the top-level workflow)
    • Distributed remote child workflows (a remote workflow can be triggered and statically verified at compile time)
    • Array Tasks (map a function over a large dataset -- ensures controlled execution of thousands of containers)
    • Dynamic workflows creation and execution with runtime type safety
    • Flytekit plugins with first-class support in Python
    • Arbitrary Flytekit-less containers tasks (RawContainer)
  • Guaranteed reproducibility of pipelines via:
  • Multi-cloud support (AWS, GCP, and others)
  • No single point of failure, and is resilient by design
  • Automated notifications to Slack, Email, and Pagerduty
  • Multi K8s cluster support
  • Out of the box support to run Spark jobs on K8s, Hive queries, etc.
  • Snappy Console & Golang CLI (Flytectl)
  • Written in Golang and optimized for jobs that run for a long period of time.
  • Grafana templates (user/system observability)
  • Deploy with Helm and Kustomize

🔌 Available Plugins

📦 Component Repos

Repo Language Purpose Status
flyte Kustomize,RST deployment, documentation, issues Production-grade
flyteidl Protobuf gRPC/REST API, Workflow language spec Production-grade
flytepropeller Go execution engine Production-grade
flyteadmin Go control plane Production-grade
flytekit Python python SDK and tools Production-grade
flyteconsole Typescript Flyte UI Production-grade
datacatalog Go manage input & output artifacts Production-grade
flyteplugins Go Flyte Backend plugins Production-grade
flytecopilot Go Sidecar to manage input/output for sdk-less Production-grade
flytestdlib Go standard library Production-grade
flytesnacks Python examples, tips, and tricks Maintained
flytekit-java Java/Scala Java & scala SDK for authoring Flyte workflows Incubating
flytectl Go A standalone Flyte CLI Production-grade
homebrew-tap Ruby Tap for downloadable flyte tools (cli etc) Production-grade
bazel-rules skylark/py Use Bazel to build Flyte workflows and tasks Incubating

Functional Tests Matrix

We run a suite of tests (defined in https://github.com/flyteorg/flytesnacks/blob/master/cookbook/flyte_tests_manifest.json) to ensure that basic functionality, and a subset of the integrations work across a variety of release versions. Those tests are run in a cluster where specific versions of the Flyte components, such as console, flyteadmin, datacatalog, and flytepropeller, are installed. The table below has different release versions as the columns and the result of each test suite as rows.

workflow group nightly
core core
integrations-hive integration-hive
integrations-k8s-spark integrations-k8s-spark
integrations-kfpytorch integrations-kfpytorch
integrations-pod integrations-pod
integrations-pandera_examples integrations-pandera_examples
integrations-papermilltasks integrations-papermilltasks
integrations-greatexpectations integrations-greatexpectations
integrations-sagemaker-pytorch integrations-sagemaker-pytorch
integrations-sagemaker-training integrations-sagemaker-training

🛣️ RFC's (Request for Commments) & Proposals

Flyte is a Community Driven and Community Owned Software. It is managed using a steering committee and encourages collaboration. The community has a long roadmap for Flyte, but they know that there might be some other interesting ideas, extensions or additions that you may want to propose. This is done usually starting with a

For small changes, RFCs are not required, but for larger changes, RFC's are encouraged. You are welcome to drop into the Slack channel and talk to the community, if you want to test the waters, before proposing.

🤝 Community & Resources

Here are some resources to help you learn more about Flyte.

Communication Channels

Biweekly Community Sync

  • 📣 Flyte OSS Community Sync Every other Tuesday, 9:00 am - 10:00 am PT. Check out the calendar, and register to stay up-to-date with our meeting times. Or join us on Zoom.
  • Upcoming meeting agenda, previous meeting notes, and a backlog of topics are captured in this document.
  • If you'd like to revisit any previous community sync meetings, you can access the video recordings on Flyte's YouTube channel.

Office Hours

Ask us anything Flyte, weekly on Wednesdays:

Blog Posts

Newsletter

Conference Talks & Podcasts

Conferences

2019

  • Kubecon 2019 - Flyte: Cloud Native Machine Learning and Data Processing Platform video | deck
  • Kubecon 2019 - Running LargeScale Stateful workloads on Kubernetes at Lyft video
  • re:invent 2019 - Implementing ML workflows with Kubernetes and Amazon Sagemaker video
  • Cloud-native machine learning at Lyft with AWS Batch and Amazon EKS video

2020

2021

  • OSPOCon 2021:
    • Building and Growing an Open Source Community for an Incubating Project video
    • Enforcing Data Quality in Data Processing and ML Pipelines with Flyte and Pandera video
    • Self-serve Feature Engineering Platform Using Flyte and Feast video
    • Efficient Data Parallel Distributed Training with Flyte, Spark & Horovod video
  • KubeCon+CloudNativeCon North America 2021 - How Spotify Leverages Flyte To Coordinate Financial Analytics Company-Wide session
  • PyData Global 2021 - Robust, End-to-end Online Machine Learning Applications with Flytekit, Pandera and Streamlit session
  • ODSC West Reconnect - Deep Dive Into Flyte workshop

2022

  • DataCouncil Austin - Type-Safe Data Processing and Machine Learning Pipelines with Flyte and Pandera session

Podcasts

💖 All Contributors

A big thank you to the community for making Flyte possible!



About

Kubernetes-native workflow automation platform for complex, mission-critical data and ML processes at scale. It has been battle-tested at Lyft, Spotify, Freenome, and others and is truly open-source.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 49.9%
  • Shell 25.1%
  • HCL 13.7%
  • Mustache 5.1%
  • Dockerfile 3.0%
  • Makefile 2.5%
  • CSS 0.7%