From 653387b5ced994d427742e8cbdefde05c051055f Mon Sep 17 00:00:00 2001 From: Eric Fried Date: Tue, 8 Aug 2023 17:23:29 -0500 Subject: [PATCH] e2e-pool: Start real pool provisions after inventory added Goal: reduce e2e-pool wallclock time by ~35m. Problem Statement: When ClusterPool inventory (ClusterDeploymentCustomization) testing was added to e2e-pool (4fddbe7 / #1672), it triggered ClusterPool's staleness algorithm such that we were actually wasting a whole cluster while waiting for the real pool to become ready. Grab a cup of coffee... To make the flow of the test a little bit easier, we were creating the real pool, then using its definition to generate the fake pool definition -- which does not have inventory -- and then adding inventory to the real pool. But if you add or change a pool's inventory, we mark all its clusters stale. So because of the flow above, when we initially created the real pool without inventory, it started provisioning a cluster. Then when we updated it (mere seconds later, if that), that cluster immediately became stale. Now, the way we decided to architect replacement of stale clusters, we prioritize _having claimable clusters_ over _all clusters being current_. Thus in this scenario we were actually ending up waiting until the stale cluster was fully provisioned before deleting it and starting over with the (inventory-affected) cluster. Solution: Create the real pool with an initial `size=0`. Scale it up to `size=1` _after_ adding the inventory. --- hack/e2e-pool-test.sh | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/hack/e2e-pool-test.sh b/hack/e2e-pool-test.sh index c43ccd97afa..43cdaa49607 100755 --- a/hack/e2e-pool-test.sh +++ b/hack/e2e-pool-test.sh @@ -215,7 +215,12 @@ go run "${SRC_ROOT}/contrib/cmd/hiveutil/main.go" clusterpool create-pool \ --pull-secret-file="${PULL_SECRET_FILE}" \ --image-set "${IMAGESET_NAME}" \ --region us-east-1 \ - --size "${POOL_SIZE}" \ + # NOTE: We start with a zero-size pool and scale it up after we add the inventory. + # Otherwise, adding the inventory immediately makes the already-provisioning cluster + # stale, BUT it doesn't get purged until it's done provisioning. (Intentional + # architectural decision to prefer presenting a stale claimable cluster early vs. + # delaying until a non-stale one is available.) + --size 0 \ ${REAL_POOL_NAME} ### INTERLUDE: FAKE POOL @@ -234,7 +239,8 @@ oc get clusterpool ${REAL_POOL_NAME} -o json \ NEW_CLUSTER_NAME=cdcci-${CLUSTER_NAME#*-} create_customization "cdc-test" "${CLUSTER_NAMESPACE}" "${NEW_CLUSTER_NAME}" oc patch cp -n $CLUSTER_NAMESPACE $REAL_POOL_NAME --type=merge -p '{"spec": {"inventory": [{"name": "cdc-test"}]}}' - +# Now we can scale up the pool so it starts creating clusters +oc scale cp -n $CLUSTER_NAMESPACE $REAL_POOL_NAME --replicas=$POOL_SIZE wait_for_pool_to_be_ready $FAKE_POOL_NAME