Skip to content

Commit

Permalink
feat: Canary-release anything behind K8s service
Browse files Browse the repository at this point in the history
Resolves #371

---

This adds the support for `corev1.Service` as the `targetRef.kind`, so that we can use Flagger just for canary analysis and traffic-shifting on existing and pre-created services. Flagger doesn't touch deployments and HPAs in this mode.

This is useful for keeping your full-control on the resources backing the service to be canary-released, including pods(behind a ClusterIP service) and external services(behind an ExternalName service).

Major use-case in my mind are:

- Canary-release a K8s cluster. You create two clusters and a master cluster. In the master cluster, you create two `ExternalName` services pointing to (the hostname of the loadbalancer of the targeted app instance in) each cluster. Flagger runs on the master cluster and helps safely rolling-out a new K8s cluster by doing a canary release on the `ExternalName` service.
- You want annotations and labels added to the service for integrating with things like external lbs(without extending Flagger to support customizing any aspect of the K8s service it manages

**Design**:

A canary release on a K8s service is almost the same as one on a K8s deployment. The only fundamental difference is that it operates only on a set of K8s services.

For example, one may start by creating two Helm releases for `podinfo-blue` and `podinfo-green`, and a K8s service `podinfo`. The `podinfo` service should initially have the same `Spec` as that of  `podinfo-blue`.

On a new release, you update `podinfo-green`, then trigger Flagger by updating the K8s service `podinfo` so that it points to pods or `externalName` as declared in `podinfo-green`. Flagger does the rest. The end result is the traffic to `podinfo` is gradually and safely shifted from `podinfo-blue` to `podinfo-green`.

**How it works**:

Under the hood, Flagger maintains two K8s services, `podinfo-primary` and `podinfo-canary`. Compared to canaries on K8s deployments, it doesn't create the service named `podinfo`, as it is already provided by YOU.

Once Flagger detects the change in the `podinfo` service, it updates the `podinfo-canary` service and the routes, then analyzes the canary. On successful analysis, it promotes the canary service to the `podinfo-primary` service. You expose the `podinfo` service via any L7 ingress solution or a service mesh so that the traffic is managed by Flagger for safe deployments.

**Giving it a try**:

To give it a try, create a `Canary` as usual, but its `targetRef` pointed to a K8s service:

```
apiVersion: flagger.app/v1alpha3
kind: Canary
metadata:
  name: podinfo
spec:
  provider: kubernetes
  targetRef:
    apiVersion: core/v1
    kind: Service
    name: podinfo
  service:
    port: 9898
  canaryAnalysis:
    # schedule interval (default 60s)
    interval: 10s
    # max number of failed checks before rollback
    threshold: 2
    # number of checks to run before rollback
    iterations: 2
    # Prometheus checks based on
    # http_request_duration_seconds histogram
    metrics: []
```

Create a K8s service named `podinfo`, and update it. Now watch for the services `podinfo`, `podinfo-primary`, `podinfo-canary`.

Flagger tracks `podinfo` service for changes. Upon any change, it reconciles `podinfo-primary` and `podinfo-canary` services. `podinfo-canary` always replicate the latest `podinfo`. In contract, `podinfo-primary` replicates the latest successful `podinfo-canary`.

**Notes**:

- For the canary cluster use-case, we would need to write a K8s operator to, e.g. for App Mesh, sync `ExternalName` services to AppMesh `VirtualNode`s. But that's another story!
  • Loading branch information
mumoshu committed Nov 25, 2019
1 parent 3fbe62a commit 692402f
Show file tree
Hide file tree
Showing 19 changed files with 1,018 additions and 199 deletions.
11 changes: 11 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,17 @@ jobs:
- run: test/e2e-kubernetes.sh
- run: test/e2e-kubernetes-tests.sh

e2e-kubernetes-svc-testing:
machine: true
steps:
- checkout
- attach_workspace:
at: /tmp/bin
- run: test/container-build.sh
- run: test/e2e-kind.sh
- run: test/e2e-kubernetes.sh
- run: test/e2e-kubernetes-svc-tests.sh

e2e-smi-istio-testing:
machine: true
steps:
Expand Down
4 changes: 3 additions & 1 deletion pkg/canary/controller.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
package canary

import "github.com/weaveworks/flagger/pkg/apis/flagger/v1alpha3"
import (
"github.com/weaveworks/flagger/pkg/apis/flagger/v1alpha3"
)

type Controller interface {
IsPrimaryReady(canary *v1alpha3.Canary) (bool, error)
Expand Down
31 changes: 6 additions & 25 deletions pkg/canary/deployment.go → pkg/canary/deployment_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ import (
"io"

"github.com/google/go-cmp/cmp"
"github.com/mitchellh/hashstructure"
"go.uber.org/zap"
appsv1 "k8s.io/api/apps/v1"
hpav1 "k8s.io/api/autoscaling/v2beta1"
Expand Down Expand Up @@ -34,6 +33,7 @@ type DeploymentController struct {
// scales to zero the canary deployment and returns the pod selector label and container ports
func (c *DeploymentController) Initialize(cd *flaggerv1.Canary, skipLivenessChecks bool) (label string, ports map[string]int32, err error) {
primaryName := fmt.Sprintf("%s-primary", cd.Spec.TargetRef.Name)

label, ports, err = c.createPrimaryDeployment(cd)
if err != nil {
return "", ports, fmt.Errorf("creating deployment %s.%s failed: %v", primaryName, cd.Namespace, err)
Expand Down Expand Up @@ -143,30 +143,7 @@ func (c *DeploymentController) HasTargetChanged(cd *flaggerv1.Canary) (bool, err
return false, fmt.Errorf("deployment %s.%s query error %v", targetName, cd.Namespace, err)
}

if cd.Status.LastAppliedSpec == "" {
return true, nil
}

newHash, err := hashstructure.Hash(canary.Spec.Template, nil)
if err != nil {
return false, fmt.Errorf("hash error %v", err)
}

// do not trigger a canary deployment on manual rollback
if cd.Status.LastPromotedSpec == fmt.Sprintf("%d", newHash) {
return false, nil
}

if cd.Status.LastAppliedSpec != fmt.Sprintf("%d", newHash) {
return true, nil
}

return false, nil
}

// HaveDependenciesChanged returns true if the canary configmaps or secrets have changed
func (c *DeploymentController) HaveDependenciesChanged(cd *flaggerv1.Canary) (bool, error) {
return c.configTracker.HasConfigChanged(cd)
return hasSpecChanged(cd, canary.Spec.Template)
}

// Scale sets the canary deployment replicas
Expand Down Expand Up @@ -425,6 +402,10 @@ var sidecars = map[string]bool{
"envoy": true,
}

func (c *DeploymentController) HaveDependenciesChanged(cd *flaggerv1.Canary) (bool, error) {
return c.configTracker.HasConfigChanged(cd)
}

// getPorts returns a list of all container ports
func (c *DeploymentController) getPorts(cd *flaggerv1.Canary, deployment *appsv1.Deployment) (map[string]int32, error) {
ports := make(map[string]int32)
Expand Down
File renamed without changes.
7 changes: 7 additions & 0 deletions pkg/canary/factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,17 @@ func (factory *Factory) Controller(kind string) Controller {
FlaggerClient: factory.flaggerClient,
},
}
serviceCtrl := &ServiceController{
logger: factory.logger,
kubeClient: factory.kubeClient,
flaggerClient: factory.flaggerClient,
}

switch {
case kind == "Deployment":
return deploymentCtrl
case kind == "Service":
return serviceCtrl
default:
return deploymentCtrl
}
Expand Down
135 changes: 135 additions & 0 deletions pkg/canary/service_controller.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
package canary

import (
"fmt"

ex "github.com/pkg/errors"
"go.uber.org/zap"
"k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/client-go/kubernetes"

flaggerv1 "github.com/weaveworks/flagger/pkg/apis/flagger/v1alpha3"
clientset "github.com/weaveworks/flagger/pkg/client/clientset/versioned"
)

// ServiceController is managing the operations for Kubernetes service kind
type ServiceController struct {
kubeClient kubernetes.Interface
flaggerClient clientset.Interface
logger *zap.SugaredLogger
}

// SetStatusFailedChecks updates the canary failed checks counter
func (c *ServiceController) SetStatusFailedChecks(cd *flaggerv1.Canary, val int) error {
return setStatusFailedChecks(c.flaggerClient, cd, val)
}

// SetStatusWeight updates the canary status weight value
func (c *ServiceController) SetStatusWeight(cd *flaggerv1.Canary, val int) error {
return setStatusWeight(c.flaggerClient, cd, val)
}

// SetStatusIterations updates the canary status iterations value
func (c *ServiceController) SetStatusIterations(cd *flaggerv1.Canary, val int) error {
return setStatusIterations(c.flaggerClient, cd, val)
}

// SetStatusPhase updates the canary status phase
func (c *ServiceController) SetStatusPhase(cd *flaggerv1.Canary, phase flaggerv1.CanaryPhase) error {
return setStatusPhase(c.flaggerClient, cd, phase)
}

var _ Controller = &ServiceController{}

// Initialize creates the primary deployment, hpa,
// scales to zero the canary deployment and returns the pod selector label and container ports
func (c *ServiceController) Initialize(cd *flaggerv1.Canary, skipLivenessChecks bool) (label string, ports map[string]int32, err error) {
return "", nil, nil
}

// Promote copies target's spec from canary to primary
func (c *ServiceController) Promote(cd *flaggerv1.Canary) error {
targetName := cd.Spec.TargetRef.Name
primaryName := fmt.Sprintf("%s-primary", targetName)

canary, err := c.kubeClient.CoreV1().Services(cd.Namespace).Get(targetName, metav1.GetOptions{})
if err != nil {
if errors.IsNotFound(err) {
return fmt.Errorf("service %s.%s not found", targetName, cd.Namespace)
}
return fmt.Errorf("service %s.%s query error %v", targetName, cd.Namespace, err)
}

primary, err := c.kubeClient.CoreV1().Services(cd.Namespace).Get(primaryName, metav1.GetOptions{})
if err != nil {
if errors.IsNotFound(err) {
return fmt.Errorf("service %s.%s not found", primaryName, cd.Namespace)
}
return fmt.Errorf("service %s.%s query error %v", primaryName, cd.Namespace, err)
}

primaryCopy := canary.DeepCopy()
primaryCopy.ObjectMeta.Name = primary.ObjectMeta.Name
if primaryCopy.Spec.Type == "ClusterIP" {
primaryCopy.Spec.ClusterIP = primary.Spec.ClusterIP
}
primaryCopy.ObjectMeta.ResourceVersion = primary.ObjectMeta.ResourceVersion
primaryCopy.ObjectMeta.UID = primary.ObjectMeta.UID

// apply update
_, err = c.kubeClient.CoreV1().Services(cd.Namespace).Update(primaryCopy)
if err != nil {
return fmt.Errorf("updating service %s.%s spec failed: %v",
primaryCopy.GetName(), primaryCopy.Namespace, err)
}

return nil
}

// HasServiceChanged returns true if the canary service spec has changed
func (c *ServiceController) HasTargetChanged(cd *flaggerv1.Canary) (bool, error) {
targetName := cd.Spec.TargetRef.Name
canary, err := c.kubeClient.CoreV1().Services(cd.Namespace).Get(targetName, metav1.GetOptions{})
if err != nil {
if errors.IsNotFound(err) {
return false, fmt.Errorf("service %s.%s not found", targetName, cd.Namespace)
}
return false, fmt.Errorf("service %s.%s query error %v", targetName, cd.Namespace, err)
}

return hasSpecChanged(cd, canary.Spec)
}

// Scale sets the canary deployment replicas
func (c *ServiceController) Scale(cd *flaggerv1.Canary, replicas int32) error {
return nil
}

func (c *ServiceController) ScaleFromZero(cd *flaggerv1.Canary) error {
return nil
}

func (c *ServiceController) SyncStatus(cd *flaggerv1.Canary, status flaggerv1.CanaryStatus) error {
dep, err := c.kubeClient.CoreV1().Services(cd.Namespace).Get(cd.Spec.TargetRef.Name, metav1.GetOptions{})
if err != nil {
if errors.IsNotFound(err) {
return fmt.Errorf("service %s.%s not found", cd.Spec.TargetRef.Name, cd.Namespace)
}
return ex.Wrap(err, "SyncStatus service query error")
}

return syncCanaryStatus(c.flaggerClient, cd, status, dep.Spec, func(cdCopy *flaggerv1.Canary) {})
}

func (c *ServiceController) HaveDependenciesChanged(cd *flaggerv1.Canary) (bool, error) {
return false, nil
}

func (c *ServiceController) IsPrimaryReady(cd *flaggerv1.Canary) (bool, error) {
return true, nil
}

func (c *ServiceController) IsCanaryReady(cd *flaggerv1.Canary) (bool, error) {
return true, nil
}
30 changes: 30 additions & 0 deletions pkg/canary/spec.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
package canary

import (
"fmt"

"github.com/mitchellh/hashstructure"
"github.com/weaveworks/flagger/pkg/apis/flagger/v1alpha3"
)

func hasSpecChanged(cd *v1alpha3.Canary, spec interface{}) (bool, error) {
if cd.Status.LastAppliedSpec == "" {
return true, nil
}

newHash, err := hashstructure.Hash(spec, nil)
if err != nil {
return false, fmt.Errorf("hash error %v", err)
}

// do not trigger a canary deployment on manual rollback
if cd.Status.LastPromotedSpec == fmt.Sprintf("%d", newHash) {
return false, nil
}

if cd.Status.LastAppliedSpec != fmt.Sprintf("%d", newHash) {
return true, nil
}

return false, nil
}
Loading

0 comments on commit 692402f

Please sign in to comment.