Flow deployment patterns #4042
Replies: 7 comments 15 replies
-
Here's an example -- I haven't actually used this approach yet but want to provide some tangible ideas for how the template would be filled. Overview
Project structure
Developing flowsWriting flowsFlows are written in their own files within the
Flows define their own executor but storage and run configs are set by the CI process. Testing flowsFlows are tested locally using Deploying flowsRegistering flowsFlows are registered in CI by iterating through every file in the Here we will set the storage and run_config to be consistent across all flows. Storing flowsFlows are stored on a remote backend of your choice e.g. S3/GCS. Running flowsRun configThe run config for this project is a
AgentUses a docker agent. The agent could be run locally or on a single cloud node. EnvironmentsDependencies are managed by Limitations
Benefits
|
Beta Was this translation helpful? Give feedback.
-
Thanks for starting this conversation, @madkinsz ! Here is my first crack at this: OverviewI am not finished building this out, so some items are "how I intend to do it" as opposed to "what I've already done".
We're an AWS shop, and use CodePipeline/CodeBuild for CI/CD, and we're using EKS for running our agents and flows. We have some fairly consistent and opinionated practices around infra-as-code and CI/CD. In short, we design our repos/apps so that each git branch is deployable as a fully operational entity. This means our Prefect projects, flows, etc will all be defined in this context, eg project name of I'm eager and grateful for any feedback, thoughts, or alternative ways of doing things! I'm also willing to share (lightly edited) code examples if helpful. Project structureHere is my basic structure. I am omitting common code and other unrelated items. The
Developing flowsFlows go in the Writing flowsTBD Testing flowsTBD Deploying flowsDiscovering flows in source, building storage, registering serialized flow with the backend Registering flowsI have a designated directory called Philosophically, I have tried to set as many sensible defaults as possible so that flow files can be focused entirely on the logic of the flow/tasks, and flow authors can override specific configuration options as needed. If a given flow has storage already defined, the build process respects that. If not, it adds default docker storage:
If a given flow has a run configuration already defined, the build process respects that. If not, it adds default configuration:
Finally, registration is called:
Storing flowsWe are currently using docker storage. This is a natural fit, as we were already using docker images and are comfortable with the tooling. I'd be open to other storage (such as S3) if there was some particular use case where docker was not well-suited. Running flowsI am still building this out, but I am planning to create a wrapper around the prefect CLI which will make running flows easier, given the opinionated nature of our infra and CI/CD processes. I anticipate all flows will run on Kubernetes. Run configSo far, exclusively using AgentI have a dedicated repo for our Prefect agents. We have one agent per environment (dev, prod) which are being run as kubernetes deployments, via helm. EnvironmentsI anticipate passing a list of Python dependencies to be installed in the docker image at build/registration time, but have not focused on this area yet. Generally, we only use a few basic environment variables, and these are set on the run configuration. Other, more specific variables can be looked up from the SSM Parameter Store (AWS service) if needed. Limitations
Benefits
|
Beta Was this translation helpful? Give feedback.
-
OverviewThis is the approach used at infima developed by @marwan116 and myself. A first pass with a few iterations on it has been built out and is in active use. Can provide more detailed code of certain parts if that would be of interest.
Project structureFlows are a module within the broader package:
Developing & Writing flowsThe dev’s job is to implement the method that contains the actual flow logic, define the flow parameters, and specify environment configs values.
Testing flowsWe plan on using Deploying/Registering flowsWe have a manual way to indicate within the package if a flow is "active". Our CLI tool calls Storing flowsWe currently use S3 storage because it's easy. Also, last we checked, you could not pull a flow by name from a Git storage though it should be possible. We will likely overhaul this all when we migrate to Running flowsWe have an agent running in EKS. Our Future PlansWe are planning to migrate to a staging/prod split where the flows will still be automatically registered but the auto-register to prod will only occur when we create a prod release through a tag. This means there will be duplication of some of the flows that require it. Limitations
Benefits
|
Beta Was this translation helpful? Give feedback.
-
OverviewOur approach is as follows:
Project structure
Developing flowsWriting a flow, testing a flow We currently keep flows very simple. Here is a very minimal example:
We use We store tests in the
Writing flowsFlows are all stored in the Testing flowsSee example above. On each PR we run:
Deploying flowsOur CI process executes a
Some of the other elements referenced above are the Dockerfile:
And our setup.py file:
And if anyone is using CircleCI, this is what our
Registering flowsSee above Storing flowsDuring the CI process we add each flow to a Docker container. We then build and push a monster Docker container that contains all flows and all dependencies. This is not ideal as a new container is built every time the CI runs. We want to move to a process where we have a base Docker image that contains all dependencies and Python library code (that only gets updated if Running flowsDuring
We run the Prefect agent using a standard k8s deployment.yaml file. The only notable thing is that we use Run configSee above AgentStandard Kubernetes Deployment EnvironmentsSee above Limitations
Benefits
Future plansWe have lots of improvements we'd like to make:
|
Beta Was this translation helpful? Give feedback.
-
OverviewWe want to make ETL easy for every developer, even for non-Data Engineers. On an high-level perspective, the dev -> deploy -> run flow works as follow:
Project structureFlows are organized into projects as described below
This file structure is reflected on Prefect Server, meaning that once the Developing flowsDevelopers modify
Writing flowsWhere are flows written? Do they share common patterns?
This Dockerfile is used to build the image where the Prefect flow is stored. Testing flowsWe do not have automatic tests yet. Deploying flowsDiscovering flows in source, building storage, registering serialized flow with the backend
And in each Dockerfile we do the following:
This Dockerfile is used in our CI/CD process to create the flow storage. Registering flowsWe have a little
Storing flowsWe use Docker storage. Running flowsWe try to keep our flows as simple as possible and we encourage developers to use Run configWe are still using Prefect AgentWe have a
EnvironmentsFlow dependencies are listed in a dedicated LimitationsThe biggest downside of this approach is that we are generating a lot of ECR repos/images. BenefitsThe main benefit of this approach is that developers can focus on writing ETL, they do not need to care about deployment stuff. |
Beta Was this translation helpful? Give feedback.
-
Adding link to a post about a deployment pattern on Azure outlined in https://infinitelambda.com/post/prefect-workflow-automation-azure-devops-aks/ by @nikvakl |
Beta Was this translation helpful? Give feedback.
-
See also, this guide on using GitHub actions: https://www.prefect.io/blog/deploying-prefect-flows-with-github-actions/ |
Beta Was this translation helpful? Give feedback.
-
This is a home for discussion of patterns users are using for CI/CD of their flows.
We are working on developing an opinion on what patterns are best-practice and in the interim want to enable our community to share their thoughts on this important part of developing at scale with Prefect.
Please abide by some simple posting rules so this can stay organized:
(this template may be updated after a few examples are in here)
Beta Was this translation helpful? Give feedback.
All reactions