Skip to content
This repository has been archived by the owner on Jan 25, 2019. It is now read-only.

Improve release name uniqueness and ownership #57

Merged
merged 6 commits into from
Oct 25, 2018
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 71 additions & 21 deletions helm-app-operator/pkg/helm/helm.go
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,10 @@ const (
// API_VERSION and KIND environment variables.
HelmChartEnvVar = "HELM_CHART"

operatorName = "helm-app-operator"
defaultHelmChartWatchesFile = "/opt/helm/watches.yaml"

annotationReleaseName = "helm.operator-sdk/release-name"
annotationUseNameAsReleaseName = "helm.operator-sdk/use-name-as-release-name"
)

// Installer can install and uninstall Helm releases given a custom resource
Expand Down Expand Up @@ -235,8 +237,10 @@ func (c installer) ReconcileRelease(r *unstructured.Unstructured) (*unstructured
return r, needsUpdate, fmt.Errorf("failed to sync release status: %s", err)
}

releaseName := getReleaseName(r)

// Get release history for this release name
releases, err := c.storageBackend.History(releaseName(r))
releases, err := c.storageBackend.History(releaseName)
if err != nil && !notFoundErr(err) {
return r, needsUpdate, fmt.Errorf("failed to retrieve release history: %s", err)
}
Expand All @@ -254,30 +258,36 @@ func (c installer) ReconcileRelease(r *unstructured.Unstructured) (*unstructured
}

var updatedRelease *release.Release
latestRelease, err := c.storageBackend.Deployed(releaseName(r))
latestRelease, err := c.storageBackend.Deployed(releaseName)
if err != nil || latestRelease == nil {
updatedRelease, err = c.installRelease(r, tiller, chart, config)
// If there's no deployed release, attempt a tiller install.
updatedRelease, err = c.installRelease(tiller, r.GetNamespace(), releaseName, chart, config)
if err != nil {
return r, needsUpdate, fmt.Errorf("install error: %s", err)
}
needsUpdate = true
logrus.Infof("Installed release for %s release=%s", ResourceString(r), updatedRelease.GetName())

} else if status.Release == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelanford I'm wondering if this scenario is possible:

  • There is no deployed release, so we install the release c.installRelease()
  • We fail to update the status.Release
  • Retry the reconcile loop and this time c.storageBackend.Deployed() returns latestRelease, nil.
  • status.Release==nil from our last failed status update and we return an error thinking this release is not owned by the CR(even though it actually is).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that definitely sounds possible. Off the top of my head, I can think of a few things we can try:

  • If we fail to update, try to rollback the release to whatever the previous status.Release was, or uninstall if it was nil. Depending on the error, the rollback may also fail.
  • Rather than using c.storageBackend.Deployed() as the latestRelease, use status.Release.
  • Keep an in-memory map of resource UUID to release name.
  • See if there's a way to store the resource UUID in the release (maybe with the chart values), such that c.storageBackend.Deployed() would "know" its owner.

Each of these may cause other interesting failure scenarios though, so I'd have to experiment to know whether any of these are feasible.

Any other ideas?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if there's a way to store the resource UUID in the release (maybe with the chart values), such that c.storageBackend.Deployed() would "know" its owner.

Turns out it's pretty easy to add an arbitrary config value to the release, which gets saved with the release in the storage backend. It seems like this is probably more robust than the other options. The latest commit changes the ownership checks to use this method instead.

// If the object has no release status, it does not own the release,
// so return an error.
return r, needsUpdate, fmt.Errorf("install error: release \"%s\" already exists", releaseName)
} else {
candidateRelease, err := c.getCandidateRelease(r, tiller, chart, config)
candidateRelease, err := c.getCandidateRelease(tiller, releaseName, chart, config)
if err != nil {
return r, needsUpdate, fmt.Errorf("failed to generate candidate release: %s", err)
}

latestManifest := latestRelease.GetManifest()
if latestManifest == candidateRelease.GetManifest() {
err = c.reconcileRelease(r, latestManifest)
err = c.reconcileRelease(r.GetNamespace(), latestManifest)
if err != nil {
return r, needsUpdate, fmt.Errorf("reconcile error: %s", err)
}
updatedRelease = latestRelease
logrus.Infof("Reconciled release for %s release=%s", ResourceString(r), updatedRelease.GetName())
} else {
updatedRelease, err = c.updateRelease(r, tiller, chart, config)
updatedRelease, err = c.updateRelease(tiller, releaseName, chart, config)
if err != nil {
return r, needsUpdate, fmt.Errorf("update error: %s", err)
}
Expand All @@ -298,8 +308,17 @@ func (c installer) ReconcileRelease(r *unstructured.Unstructured) (*unstructured
// UninstallRelease accepts a custom resource, uninstalls the existing Helm release
// using Tiller, and returns the custom resource with updated `status`.
func (c installer) UninstallRelease(r *unstructured.Unstructured) (*unstructured.Unstructured, error) {
// If the object has no release status, it does not own the release,
// so there's nothing to do.
status := v1alpha1.StatusFor(r)
if status.Release == nil {
return r, nil
}

releaseName := getReleaseName(r)

// Get history of this release
h, err := c.storageBackend.History(releaseName(r))
h, err := c.storageBackend.History(releaseName)
if err != nil {
return r, fmt.Errorf("failed to get release history: %s", err)
}
Expand All @@ -312,13 +331,13 @@ func (c installer) UninstallRelease(r *unstructured.Unstructured) (*unstructured

tiller := c.tillerRendererForCR(r)
_, err = tiller.UninstallRelease(context.TODO(), &services.UninstallReleaseRequest{
Name: releaseName(r),
Name: releaseName,
Purge: true,
})
if err != nil {
return r, err
}
logrus.Infof("Uninstalled release for %s release=%s", ResourceString(r), releaseName(r))
logrus.Infof("Uninstalled release for %s release=%s", ResourceString(r), releaseName)
return r, nil
}

Expand All @@ -327,37 +346,55 @@ func ResourceString(r *unstructured.Unstructured) string {
return fmt.Sprintf("apiVersion=%s kind=%s name=%s/%s", r.GetAPIVersion(), r.GetKind(), r.GetNamespace(), r.GetName())
}

func (c installer) installRelease(r *unstructured.Unstructured, tiller *tiller.ReleaseServer, chart *cpb.Chart, config *cpb.Config) (*release.Release, error) {
func (c installer) installRelease(tiller *tiller.ReleaseServer, namespace, name string, chart *cpb.Chart, config *cpb.Config) (*release.Release, error) {
installReq := &services.InstallReleaseRequest{
Namespace: r.GetNamespace(),
Name: releaseName(r),
Namespace: namespace,
Name: name,
Chart: chart,
Values: config,
}

releaseResponse, err := tiller.InstallRelease(context.TODO(), installReq)
if err != nil {
// Workaround for helm/helm#3338
uninstallReq := &services.UninstallReleaseRequest{
Name: releaseResponse.GetRelease().GetName(),
Purge: true,
}
_, uninstallErr := tiller.UninstallRelease(context.TODO(), uninstallReq)
if uninstallErr != nil {
return nil, fmt.Errorf("failed to roll back failed installation: %s: %s", uninstallErr, err)
}
return nil, err
}
return releaseResponse.GetRelease(), nil
}

func (c installer) updateRelease(r *unstructured.Unstructured, tiller *tiller.ReleaseServer, chart *cpb.Chart, config *cpb.Config) (*release.Release, error) {
func (c installer) updateRelease(tiller *tiller.ReleaseServer, name string, chart *cpb.Chart, config *cpb.Config) (*release.Release, error) {
updateReq := &services.UpdateReleaseRequest{
Name: releaseName(r),
Name: name,
Chart: chart,
Values: config,
}

releaseResponse, err := tiller.UpdateRelease(context.TODO(), updateReq)
if err != nil {
// Workaround for helm/helm#3338
rollbackReq := &services.RollbackReleaseRequest{
Name: name,
Force: true,
}
_, rollbackErr := tiller.RollbackRelease(context.TODO(), rollbackReq)
if rollbackErr != nil {
return nil, fmt.Errorf("failed to roll back failed update: %s: %s", rollbackErr, err)
}
return nil, err
}
return releaseResponse.GetRelease(), nil
}

func (c installer) reconcileRelease(r *unstructured.Unstructured, expectedManifest string) error {
expectedInfos, err := c.tillerKubeClient.BuildUnstructured(r.GetNamespace(), bytes.NewBufferString(expectedManifest))
func (c installer) reconcileRelease(namespace string, expectedManifest string) error {
expectedInfos, err := c.tillerKubeClient.BuildUnstructured(namespace, bytes.NewBufferString(expectedManifest))
if err != nil {
return err
}
Expand Down Expand Up @@ -387,9 +424,9 @@ func (c installer) reconcileRelease(r *unstructured.Unstructured, expectedManife
})
}

func (c installer) getCandidateRelease(r *unstructured.Unstructured, tiller *tiller.ReleaseServer, chart *cpb.Chart, config *cpb.Config) (*release.Release, error) {
func (c installer) getCandidateRelease(tiller *tiller.ReleaseServer, name string, chart *cpb.Chart, config *cpb.Config) (*release.Release, error) {
dryRunReq := &services.UpdateReleaseRequest{
Name: releaseName(r),
Name: name,
Chart: chart,
Values: config,
DryRun: true,
Expand Down Expand Up @@ -442,8 +479,21 @@ func (c installer) tillerRendererForCR(r *unstructured.Unstructured) *tiller.Rel
return tiller.NewReleaseServer(env, internalClientSet, false)
}

func releaseName(r *unstructured.Unstructured) string {
return fmt.Sprintf("%s-%s", operatorName, r.GetName())
func getReleaseName(r *unstructured.Unstructured) string {
status := v1alpha1.StatusFor(r)
if status.Release != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelanford The implementation seems fine to me but I'm thinking we should try our best to avoid relying on Status for our control logic. That seems to be the upstream convention as well.

https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#spec-and-status
https://github.com/kubernetes/community/blob/master/contributors/design-proposals/architecture/principles.md#api

Object status must be 100% reconstructable by observation. Any history kept must be just an optimization and not required for correct operation.

The valueResourceUID is useful in letting us verify the correct owner of a pre-existing release.
But we still end up storing the release name in the Status to look up again in the future.

Right now with this implementation what happens if:

  • We deploy a new release with no release name, which generates a new unique release name e.g "foo"
      c.installRelease(tiller, r.GetNamespace(), "", chart, config)
  • Fail to update the status with updatedRelease=foo
  • Retry the reconcile. See that status.Release is empty from our last failed attempt.
  • Retry deploying the release with no release name
      c.installRelease(tiller, r.GetNamespace(), "", chart, config)
  • Does that give us a new deployed release with a new unique name updatedRelease=bar?

Ideally we could make the UUID part of the release name e.g cr.name+UUID to make it fully unique which avoids the problem of saving the release name since we can reconstruct it just by looking at the CR. And that solves the ownership problem.
But there's the issue of the 53 char limit. I know UUID's are 36 chars so we could stay within the limit but I'm wondering how human readable do release names need to be?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right. That would result in a separate new release.

Thanks for linking the conventions and principles. That's good info!

I did think about using UUIDs in the release name, but there are some downsides:

  • As you point out, UUIDs would take up 36 of the 53 characters, leaving only 17 if we wanted to include the CR name. That may be cutting it a little too close for comfort.
  • Release names end up in the generated resource names for all helm chart resources, so users would end up with really long names for everything managed by the helm operator.

We could do something like cr.Name + last_N_chars(cr.UUID), where N is < 36. Assuming the UUIDs are uniformally random hex characters, the probability of collision would likely be sufficiently low at N=8.

If we want better human readability, another option could be that we add the release name to the CR (somewhere in metadata or spec maybe?) prior to any attempt to invoke the release installation, update, etc. That way a CR never results in any deployed release until it has been updated with a release name.

return status.Release.GetName()
}
if v, ok := r.GetAnnotations()[annotationReleaseName]; ok {
return v
}
if v, ok := r.GetAnnotations()[annotationUseNameAsReleaseName]; ok && v == "true" {
return r.GetName()
}

// An empty release name will be populated automatically by tiller
// during installation
return ""
}

func notFoundErr(err error) bool {
Expand Down