Skip to content
/ flyte Public

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.

License

Notifications You must be signed in to change notification settings

flyteorg/flyte

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

0ec52f9 · Mar 7, 2023
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Flyte and LF AI & Data Logo

Flyte

💻 🛳 🚀

Code. Ship. Scale.

Flyte is a workflow automation platform for complex, mission-critical data, and ML processes at scale

Current Release Sandbox Build End-to-End Tests License Commit Activity Commits since Last Release GitHub Milestones Completed GitHub Next Milestone Percentage Docs OpenSSF Best Practices Flyte Helm Chart Twitter Follow Join Flyte Slack


What is Flyte?

Flyte is a structured programming and distributed processing platform that enables highly concurrent, scalable, and maintainable workflows for Machine Learning and Data Processing. It is a fabric that connects disparate computation backends using a type-safe data dependency graph. It records all changes to a pipeline, making it possible to rewind time. It also stores a history of all executions and provides an intuitive UI, CLI, and REST/gRPC API to interact with the computation.

Flyte is more than a workflow engine -- it uses workflow as a core concept, and task (a single unit of execution) as a top-level concept. Multiple tasks arranged in a data producer-consumer order creates a workflow.

Workflows and Tasks can be written in any language, with out-of-the-box support for Python, Java and Scala. Flyte was designed to manage the complexity that arises in Data and ML teams and ensure they keep up their high velocity of delivering business impacting features. One way it achieves this is by separating the control-plane from the user-plane. Thus, every organization can offer Flyte as a service to their end-users where the service is managed by folks who are more infrastructure-focused, while the users use the intuitive interface of Flytekit.

⁉️ Why Flyte

  • Kubernetes-native workflow automation platform
  • Ergonomic SDKs in Python, Java, and Scala
  • Versioned, auditable, and reproducible pipelines
  • Data-aware and strongly-typed
  • Resource-aware and deployments at scale
Flyte UI
✨ Flyte UI ✨

Quickstart

NOTE: If you want to try Flyte in the browser without installing anything locally, visit the Flyte Hosted Sandbox


  1. Ensure you're using Python version >=3.7, <=3.10
  2. Install Flytekit, Flyte's Python SDK:
$ pip install flytekit
  1. Then install flytectl, the CLI for interacting with a Flyte backend:
brew install flyteorg/homebrew-tap/flytectl

You're all set!

For installation instructions for other operating systems and how to run an example workflow, visit the Getting Started guide!

⭐️ Organizations/projects using Flyte (excerpt)

🔥 Features

  • Used at Scale in production by multiple firms and projects.
  • Proved to scale to more than 1 million executions, and 40+ million containers
  • Data Aware and Resource Aware (Allows organizations to separate concerns - users can use the API, platforms/infra teams can manage the deployments and scaling)
  • Enables collaboration across your organization by:
    • Executing distributed data pipelines/workflows
    • Making it easy to stitch together workflows from different teams and domain experts and share them across teams
    • Comparing results of training workflows over time and across pipelines
    • Simplifying the complexity of multi-step, multi-owner workflows
  • Get Started quickly -- start locally and scale to the cloud instantly
  • gRPC / REST interface to define and execute tasks and workflows
  • Typesafe construction of pipelines -- each task has an interface characterized by its input and output, so illegal construction of pipelines fails during declaration, rather than at runtime
  • Supports multiple data types for machine learning and data processing pipelines, such as Blobs (images, arbitrary files), Directories, Schema (columnar structured data), collections, maps, etc.
  • Memoization and Lineage tracking
  • Provides logging and observability
  • Workflow features:
    • Start with one task, convert to a pipeline, attach multiple schedules, trigger using a programmatic API, or on-demand
    • Parallel step execution
    • Extensible backend to add customized plugin experience (with simplified user experience)
    • Branching
    • Workflow of workflows - subworkflows (a workflow can be embedded within one node of the top-level workflow)
    • Distributed remote external workflows (a remote workflow can be triggered and statically verified at compile time)
    • Array Tasks (map a function over a large dataset -- ensures controlled execution of thousands of containers)
    • Dynamic workflows creation and execution with runtime type safety
    • Flytekit plugins with first-class support in Python
    • Arbitrary Flytekit-less containers tasks (RawContainer)
  • Guaranteed reproducibility of pipelines via:
    • Versioned data, code, and models
    • Automatically tracked executions
    • Declarative pipelines
  • Multi-cloud support (AWS, GCP, and others)
  • No single point of failure, and is resilient by design
  • Automated notifications to Slack, Email, and Pagerduty
  • Multi K8s cluster support
  • Out of the box support to run Spark jobs on K8s, Dask jobs on K8s, Hive queries, etc.
  • Snappy Console & Golang CLI (Flytectl)
  • Written in Golang and optimized for jobs that run for a long period of time.
  • Grafana templates (user/system observability)
  • Deploy with Helm and Kustomize

🔌 Available Plugins

Check out the current list of plugins and how to use them


Contributing

There are many ways in which you can participate in this project, including:

🛣️ Live Roadmap

Live roadmap for the project can be found @Github Live Roadmap

📄 RFCs (Request for Comments) & Proposals

Flyte is a community-driven and community-owned software. It is managed using a steering committee and encourages collaboration. The community has a long roadmap for Flyte, but there might be interesting ideas, extensions, or additions that you may want to propose. This is usually done by starting with:

RFCs are encouraged for larger changes. You are welcome to hop into our Slack and talk to the community if you want to test the waters before proposing.

📦 Component Repos

Repo Language Purpose Status
flyte Kustomize,RST deployment, documentation, issues Production-grade
flyteidl Protobuf gRPC/REST API, Workflow language spec Production-grade
flytepropeller Go execution engine Production-grade
flyteadmin Go control plane Production-grade
flytekit Python python SDK and tools Production-grade
flyteconsole Typescript Flyte UI Production-grade
datacatalog Go manage input & output artifacts Production-grade
flyteplugins Go Flyte Backend plugins Production-grade
flytecopilot Go Sidecar to manage input/output for sdk-less Production-grade
flytestdlib Go standard library Production-grade
flytesnacks Python examples, tips, and tricks Maintained
flytekit-java Java/Scala Java & scala SDK for authoring Flyte workflows Incubating
flytectl Go A standalone Flyte CLI Production-grade
homebrew-tap Ruby Tap for downloadable flyte tools (cli etc) Production-grade
bazel-rules skylark/py Use Bazel to build Flyte workflows and tasks Incubating

🤝 Community & Resources

Use the following resources to communicate with Flyte maintainers, contributors and other community members to ask questions and enhance your learning experience:

Asynchronous

Social media

Meetings

Biweekly Community Sync

  • 📣 Flyte OSS Community Sync
    Join this meeting to learn from other users how they are leveraging Flyte for different use cases, ask questions or continue community-related discussions
  • When: every other Tuesday, 9:00 am - 10:00 am PT.

  • Where: Zoom bridge

  • Previous meetings: [notes| recordings]

  • Subscribe to the calendar

Office Hours

Receive support from Flyte maintainers:

  • When: weekly on Wednesdays
  • How: schedule a 30 min session from one of the available slots (7AM, 1PM and 9PM PT) using this link

Knowledge Base

Find answers to the FAQs at Knowledge Base: our minified StackOverflow and magnified Slack.

Conference Talks & Podcasts

Videos and recordings can be found on Flyte's YouTube channel under the Conference Talks and Podcasts playlist.

Podcasts

- MLOps Coffee Session - [Why You Need More Than Airflow](http://go.mlops.community/Oz48gY) - Kelsey Hightower Twitter Space - [Machine Learning in Production](https://twitter.com/i/spaces/1ZkKzbXLekWKv) - Contributor.fyi - [Flyte with Ketan Umare](https://www.contributor.fyi/flyte) - TWIML&AI - [Scalable and Maintainable ML Workflows at Lyft - Flyte](https://twimlai.com/twiml-talk-343-scalable-and-maintainable-workflows-at-lyft-with-flyte-w-haytham-abuelfutuh-and-ketan-umare/) - Software Engineering Daily - [Flyte: Lyft Data Processing Platform](https://softwareengineeringdaily.com/2020/03/12/flyte-lyft-data-processing-platform-with-allyson-gale-and-ketan-umare/) - MLOps Coffee session - [Flyte: an open-source tool for scalable, extensible, and portable workflows](https://anchor.fm/mlops/episodes/MLOps-Coffee-Sessions-12-Flyte-an-open-source-tool-for-scalable--extensible---and-portable-workflows-eksa5k) - Open Data Science - [West Warm Up session with Ketan Umare - Creator of Flyte](https://twitter.com/odsc/status/1451594432369758212)

🧪 Functional Tests Matrix

We run a suite of tests to ensure that basic functionality and a subset of the integrations work across a variety of release versions. Those tests are run in a cluster where specific versions of the Flyte components (such as flyteconsole, flyteadmin, datacatalog, and flytepropeller) are installed, including the released versions and also the latest versions in the case of the nighly runs.

The table below has different release versions as the columns and the result of each test suite as rows:

workflow group nightly v1.0.1 v1.0.0 v0.19.4
core core core core core
integrations-hive integration-hive integration-hive integration-hive integration-hive
integrations-k8s-spark integrations-k8s-spark integrations-k8s-spark integrations-k8s-spark integrations-k8s-spark
integrations-kfpytorch integrations-kfpytorch integrations-kfpytorch integrations-kfpytorch integrations-kfpytorch
integrations-pod integrations-pod integrations-pod integrations-pod integrations-pod
integrations-pandera_examples integrations-pandera_examples integrations-pandera_examples integrations-pandera_examples integrations-pandera_examples
integrations-papermilltasks integrations-papermilltasks integrations-papermilltasks integrations-papermilltasks integrations-papermilltasks
integrations-greatexpectations integrations-greatexpectations integrations-greatexpectations integrations-greatexpectations integrations-greatexpectations
integrations-sagemaker-pytorch integrations-sagemaker-pytorch integrations-sagemaker-pytorch integrations-sagemaker-pytorch integrations-sagemaker-pytorch
integrations-sagemaker-training integrations-sagemaker-training integrations-sagemaker-training integrations-sagemaker-training integrations-sagemaker-training

💖 Contributors

A big thank you to the community for making Flyte possible!

