This repository has been archived by the owner on May 3, 2022. It is now read-only.
Releases: bookingcom/shipper
Releases · bookingcom/shipper
v0.10.0-alpha.0
Changelog since v0.8
You might notice that there is no version 0.9 of Shipper. This is
because in version 0.9, we tried to split Shipper into two components
(shipper-mgmt
andshipper-app
) which would run in management and
application clusters respectively. However, that version was behaving
erratically in a way that was hard to predict and debug. After
spending months trying to patch all the holes, we decided to forgo the
separation for now and move on with the development of other features.Please note that this means that Shipper is still one component,
running only in the management cluster.
Breaking Changes
- Shipper now uses different names for its service accounts, roles,
rolebindings and clusterrolebindings. Refer to the migrating to Shipper 0.10
section for more information on how to migrate to
the new version safely.
Improvements
Shipperctl admin clusters apply
is split into multiple commands,
so that each operation can be done separately. For example, this
allows operators to only set up the application clusters, without
touching the management cluster (#358)- Shipper now rejects all modifications to the
environment
field of
all releases. This fixes an issue where users would modify this
field and cause an unsupported behavior (#357) - Shipper now exposes metrics on the health of the webhook. For now,
that includes the time that the SSL certificate expires, and a secondly
heartbeat (#366) - Shipperctl now creates and modifies the webhook with the failure
policy set toFail
(#366. This means that the webhook
becomes a very important piece of the user experience, and we
suggest you monitor the Shipper webhook's health using the metrics
mentioned above.
Migrating to 0.10
- Run
shipperctl clusters setup management
,shipperctl clusters join
to create the
relevant CRDs, service accounts and RBAC objects - Make sure your context is set to the management cluster, and apply
the Shipper 0.10 deployment object by doingkubectl apply -f https://github.com/bookingcom/shipper/releases/download/v0.10.0/shipper.deployment.v0.10.0.yaml
and for shipper-state-metricskubectl apply -f https://github.com/bookingcom/shipper/releases/download/v0.10.0/shipper-state-metrics.deployment.v0.10.0.yaml
- Start monitoring the health of the webhook. You can use the
shipper_webhook_health_expire_time_epoch
and
shipper_webhook_health_heartbeat
Prometheus metrics.
Reverting to 0.8
- Remove the Shipper deployments on management cluster
- Run
shipperctl
0.8 to revert service accounts and cluster role objects
to the state that Shipper 0.8
expects them to be in - Create the Shipper deployment on the management cluster with the
relevant image tag,v0.8.2
v0.8.2: Fixed inconsistency in historical release strategy state conditions
This commit fixes release status strategy condition insonsistency reported in #299. The issue boils down to the way strategy executor handles historical release: it only runs against traffic and capacity ensurers and updates the release strategy state if it looks incomplete. Under a positive scenario branch, an update never happens, causing strategy conditions to stall in mistakenly broken state. This commit strategy executor behavior slightly. In particular, all releases except contender are forced to look ahead only. The prior implementation was not strict on this behavior and all releases kept reporting their incumbent state along with their own state. A contender is the only release in the chain that is looking behind at it's contender. This implies on a more deterministic strategy state and conditions buildup: non-contender releases drop any information about incumbent's state in their conditions, only remaining the information about the owning release's state. The motivation behind this move is to reduce the number of oscillations in the release chain by eliminating simultaneous look-ahead and look-behind actions. In fact, there is no use in non-contender release's incumbent state: the only essential transition is happening between incumbent and contender. Apart from this change, the commit includes a series of bug fixes. It introduces a minor change in shipper v1alpha1 types `ReleaseStrategyCondition` definition: attributes `Reason`, `Message` and `Step` have dropped `omitempty` flag. This fixes the problem when conditions remained in an inconsistent state where status was indicating a healthy state but the reason and the message attributes were present. This behavior can be explained by the combination of `omitempty` k8s strategic merge patch usage. When an attribute is set to a zero-value (https://dave.cheney.net/2013/01/19/what-is-the-zero-value-and-why-is-it-useful) `json.Marshal` omits these values causing k8s strategic merge strategy to merge old non-empty attributes of a structure (like: `Reason`, `Step`) with updated ones (like: `Status`). Explicit no-omitempty behavior forces json to encode empty values in patches and fixes this inconsistency. Signed-off-by: Oleg Sidorov <[email protected]>
v0.8.1: Removed SecretChecksumAnnotation from cluster secret annotations
This commit removes SecretChecksumAnnotation annotation. In the current implementation this annotation is a must-have item that ensures that the cluster was provisioned correctly and notifies clusterclientstore on secret changes. In this commit we remove this annotation and re-compute cluster secret checksum every time syncSecret/1 is being invoked. This approach ensures no preliminary config is needed to get a cluster object to an operating state. Every time a cluster secret changes, the cache invalidates the stored cluster object and re-creates it. Signed-off-by: Oleg Sidorov <[email protected]>
v0.8.0-beta.1: capacity controller: better summarization of conditions
Before this, the capacity controller would only put a list of unready clusters in the CapacityTarget's Ready condition when it would set it to False. This requires users to go digging into each cluster condition, and most likely they would only be directed to SadPods, where they could finally get some useful information. Now, that information is summarized in a very brief format, in the hopes that users will have to do less jumping around when investigating why their CapacityTarget is not progressing. For instance, if the CapacityTarget is stuck because one container can't pull its image, we'll now have the following in the CapacityTarget's .stauts.conditions: ``` [ { "lastTransitionTime": "2020-02-12T13:16:44Z", "status": "True", "type": "Operational" }, { "lastTransitionTime": "2020-02-12T13:16:44Z", "message": "docker-desktop: PodsNotReady 3/3: 3x\"test-nginx\" containers with [ImagePullBackOff]", "reason": "ClustersNotReady", "status": "False", "type": "Ready" } ] ``` As a bonus, this is also shown on a `kubectl get ct`: ``` % kubectl get ct snowflake-db84be2b-0 NAME OPERATIONAL READY REASON AGE snowflake-db84be2b-0 True False docker-desktop: PodsNotReady 3/3: 3x"test-nginx" containers with [ImagePullBackOff] 8d ```
v0.8.0-alpha.6: capacity controller: better summarization of conditions
Before this, the capacity controller would only put a list of unready clusters in the CapacityTarget's Ready condition when it would set it to False. This requires users to go digging into each cluster condition, and most likely they would only be directed to SadPods, where they could finally get some useful information. Now, that information is summarized in a very brief format, in the hopes that users will have to do less jumping around when investigating why their CapacityTarget is not progressing. For instance, if the CapacityTarget is stuck because one container can't pull its image, we'll now have the following in the CapacityTarget's .stauts.conditions: ``` [ { "lastTransitionTime": "2020-02-12T13:16:44Z", "status": "True", "type": "Operational" }, { "lastTransitionTime": "2020-02-12T13:16:44Z", "message": "docker-desktop: PodsNotReady 3/3: 3x\"test-nginx\" containers with [ImagePullBackOff]", "reason": "ClustersNotReady", "status": "False", "type": "Ready" } ] ``` As a bonus, this is also shown on a `kubectl get ct`: ``` % kubectl get ct snowflake-db84be2b-0 NAME OPERATIONAL READY REASON AGE snowflake-db84be2b-0 True False docker-desktop: PodsNotReady 3/3: 3x"test-nginx" containers with [ImagePullBackOff] 8d ```
v0.8.0: capacity controller: better summarization of conditions
Before this, the capacity controller would only put a list of unready clusters in the CapacityTarget's Ready condition when it would set it to False. This requires users to go digging into each cluster condition, and most likely they would only be directed to SadPods, where they could finally get some useful information. Now, that information is summarized in a very brief format, in the hopes that users will have to do less jumping around when investigating why their CapacityTarget is not progressing. For instance, if the CapacityTarget is stuck because one container can't pull its image, we'll now have the following in the CapacityTarget's .stauts.conditions: ``` [ { "lastTransitionTime": "2020-02-12T13:16:44Z", "status": "True", "type": "Operational" }, { "lastTransitionTime": "2020-02-12T13:16:44Z", "message": "docker-desktop: PodsNotReady 3/3: 3x\"test-nginx\" containers with [ImagePullBackOff]", "reason": "ClustersNotReady", "status": "False", "type": "Ready" } ] ``` As a bonus, this is also shown on a `kubectl get ct`: ``` % kubectl get ct snowflake-db84be2b-0 NAME OPERATIONAL READY REASON AGE snowflake-db84be2b-0 True False docker-desktop: PodsNotReady 3/3: 3x"test-nginx" containers with [ImagePullBackOff] 8d ```
v0.8.0-alpha.5
Fixed historical release awakening due to targetStep resolve inconsis…
v0.8.0-alpha.4: Bugfixes in release controller strategy executor
This commit addresses multiple issues identified in strategy executor, among which: * Wrong recepients for patch updates: due to an error in the code some patches were applied to a wrong generation of release and target objects. * Target object spec checkers used to return an incomplete spec if only some of the clusters are misbehaving: there was a risk of de-scheduling the workload on healthy clusters. Signed-off-by: Oleg Sidorov <[email protected]>
v0.8.0-alpha.3: Release controller: patches are applied on the owning release only
Since the moment of introduction of Patch interface, release controller and strategy executor in particular started using `Patch.Alters()` method in order to distinguish altering patches from no-op. It turned out there was an inconsistency between the recepient and the validation objects. In essense, we were checking if a patch alters a predecessor release object whereas on a positive check it was sent to alter the successor release. This patch ensures all patches are validated against the same generation of releases. Signed-off-by: Oleg Sidorov <[email protected]>
v0.8.0-alpha.2: Fixed out-of-range index dereferencing error in release controller
This commit fixes a problem where a corresponding release strategy was resolved twice: once for an actual execution and second time for reporting. These resolutions were happening in distinct places: the 1st one in strategy executor, the latter in release controller. As a result, the release controller one was causing a panic as it was not taking into account the updated logic of strategy resolution where an incumbent is supposed to look ahead and use it's successor's strategy, and the index validity was only happening in strategy executor, which was calculating the desired strategy correctly. This commit is also moving things around: strategy executor is being initialized with a specific strategy and a pointer to the target step and Execute() step takes the sequence of executable releases as arguments. Signed-off-by: Oleg Sidorov <[email protected]>