Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discard and replace new VMs if they never join the cluster #123

Merged
merged 2 commits into from
Jun 24, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions api/v1beta1/cloudstackmachine_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ package v1beta1
import (
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"time"
)

const (
Expand Down Expand Up @@ -130,10 +131,23 @@ type CloudStackMachineStatus struct {
// +optional
InstanceState InstanceState `json:"instanceState,omitempty"`

// InstanceStateLastUpdated is the time the instance state was last updated.
// +optional
InstanceStateLastUpdated metav1.Time `json:"instanceStateLastUpdated,omitempty"`

// Ready indicates the readiness of the provider resource.
Ready bool `json:"ready"`
}

// TimeSinceLastStateChange returns the amount of time that's elapsed since the state was last updated. If the state
// hasn't ever been updated, it returns a negative value.
func (s *CloudStackMachineStatus) TimeSinceLastStateChange() time.Duration {
if s.InstanceStateLastUpdated.IsZero() {
return time.Duration(-1)
}
return time.Since(s.InstanceStateLastUpdated.Time)
}
Comment on lines +144 to +149
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we start adding some unit tests here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I just added some.


// +kubebuilder:object:root=true
// +kubebuilder:resource:path=cloudstackmachines,scope=Namespaced,categories=cluster-api,shortName=csm
// +kubebuilder:storageversion
Expand Down
1 change: 1 addition & 0 deletions api/v1beta1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,11 @@ spec:
description: InstanceState is the state of the CloudStack instance
for this machine.
type: string
instanceStateLastUpdated:
description: InstanceStateLastUpdated is the time the instance state
was last updated.
format: date-time
type: string
ready:
description: Ready indicates the readiness of the provider resource.
type: boolean
Expand Down
21 changes: 19 additions & 2 deletions controllers/cloudstackmachinestatechecker_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -70,16 +70,33 @@ func (r *CloudStackMachineStateCheckerReconciliationRunner) Reconcile() (ctrl.Re
if res, err := r.GetParent(r.ReconciliationSubject, r.CSMachine)(); r.ShouldReturn(res, err) {
return res, err
}

if res, err := r.GetParent(r.CSMachine, r.CAPIMachine)(); r.ShouldReturn(res, err) {
return res, err
}

if err := r.CSClient.ResolveVMInstanceDetails(r.CSMachine); err != nil && !strings.Contains(strings.ToLower(err.Error()), "no match found") {
return r.ReturnWrappedError(err, "failed to resolve VM instance details")
}
if r.CSMachine.Status.InstanceState == "Running" {

csRunning := r.CSMachine.Status.InstanceState == "Running"
csTimeInState := r.CSMachine.Status.TimeSinceLastStateChange()
capiRunning := r.CAPIMachine.Status.Phase == "Running"

// capiTimeout indicates that a new VM is running, but it isn't reachable due to a network issue or a misconfiguration.
// When this happens, the machine should be deleted or else the cluster won't ever recover.
capiTimeout := csRunning && !capiRunning && csTimeInState > 5*time.Minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be configurable instead of hardcoded? Maybe default to 5 minutes, unless some env var is set or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to have a good mechanism for making it configurable. I wish we could put something in the EKS-A cluster definition file, but nothing there passes through nicely. Is there a good way to set environment variables on the VMs in bulk?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a GitHub issue and move forward without this configuration for now, since it's not essential.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #127


if csRunning && capiRunning {
r.ReconciliationSubject.Status.Ready = true
} else {
} else if !csRunning || capiTimeout {
r.Log.Info("CloudStack instance in bad state",
"name", r.CSMachine.Name,
"instance-id", r.CSMachine.Spec.InstanceID,
"cs-state", r.CSMachine.Status.InstanceState,
"cs-time-in-state", csTimeInState.String(),
"capi-phase", r.CAPIMachine.Status.Phase)

if err := r.K8sClient.Delete(r.RequestCtx, r.CAPIMachine); err != nil {
return r.ReturnWrappedError(err, "failed to delete CAPI machine")
}
Expand Down
7 changes: 6 additions & 1 deletion pkg/cloud/instance.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ package cloud

import (
"fmt"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"strings"

capiv1 "sigs.k8s.io/cluster-api/api/v1beta1"
Expand All @@ -41,7 +42,11 @@ func setMachineDataFromVMMetrics(vmResponse *cloudstack.VirtualMachinesMetric, c
csMachine.Spec.ProviderID = pointer.StringPtr(fmt.Sprintf("cloudstack:///%s", vmResponse.Id))
csMachine.Spec.InstanceID = pointer.StringPtr(vmResponse.Id)
csMachine.Status.Addresses = []corev1.NodeAddress{{Type: corev1.NodeInternalIP, Address: vmResponse.Ipaddress}}
csMachine.Status.InstanceState = infrav1.InstanceState(vmResponse.State)
newInstanceState := infrav1.InstanceState(vmResponse.State)
if newInstanceState != csMachine.Status.InstanceState || (newInstanceState != "" && csMachine.Status.InstanceStateLastUpdated.IsZero()) {
csMachine.Status.InstanceState = newInstanceState
csMachine.Status.InstanceStateLastUpdated = metav1.Now()
}
}

// ResolveVMInstanceDetails Retrieves VM instance details by csMachine.Spec.InstanceID or csMachine.Name, and
Expand Down
2 changes: 1 addition & 1 deletion test/e2e/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,6 @@ You can specify JOB environment variable which value is a regular expression to
For example,

```shell
JOB=PR-Blocking make run
JOB=PR-Blocking make run-e2e
```
This command runs the e2e tests that contains `PR-Blocking` in their spec names.