Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make src/dst checks configurable on awsmachineclass #504

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions kubernetes/crds/machine.sapcloud.io_awsmachineclasses.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,11 @@ spec:
items:
type: string
type: array
srcAndDstChecksEnabled:
description: If set to false, source and destination checks are
disabled on machine network inferface level, default value is
true
type: boolean
subnetID:
description: The ID of the subnet associated with the network
string. Applies only if creating a network interface when launching
Expand All @@ -203,6 +208,8 @@ spec:
name must be unique.
type: string
type: object
spotPrice:
type: string
tags:
additionalProperties:
type: string
Expand Down
23 changes: 23 additions & 0 deletions kubernetes/crds/machine.sapcloud.io_machinedeployments.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ spec:
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
description: Standard object metadata.
type: object
spec:
description: Specification of the desired behavior of the MachineDeployment.
Expand Down Expand Up @@ -221,6 +222,28 @@ spec:
description: Name of machine class
type: string
type: object
creationTimeout:
description: MachineCreationTimeout is the timeout after which
machinie creation is declared failed.
type: string
drainTimeout:
description: MachineDraintimeout is the timeout after which
machine is forcefully deleted.
type: string
healthTimeout:
description: MachineHealthTimeout is the timeout after which
machine is declared unhealhty/failed.
type: string
maxEvictRetries:
description: MaxEvictRetries is the number of retries that will
be attempted while draining the node.
format: int32
type: integer
nodeConditions:
description: NodeConditions are the set of conditions if set
to true for MachineHealthTimeOut, machine will be declared
failed.
type: string
nodeTemplate:
description: NodeTemplateSpec describes the data a node should
have when created from a template
Expand Down
21 changes: 21 additions & 0 deletions kubernetes/crds/machine.sapcloud.io_machines.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,27 @@ spec:
description: Name of machine class
type: string
type: object
creationTimeout:
description: MachineCreationTimeout is the timeout after which machinie
creation is declared failed.
type: string
drainTimeout:
description: MachineDraintimeout is the timeout after which machine
is forcefully deleted.
type: string
healthTimeout:
description: MachineHealthTimeout is the timeout after which machine
is declared unhealhty/failed.
type: string
maxEvictRetries:
description: MaxEvictRetries is the number of retries that will be attempted
while draining the node.
format: int32
type: integer
nodeConditions:
description: NodeConditions are the set of conditions if set to true
for MachineHealthTimeOut, machine will be declared failed.
type: string
nodeTemplate:
description: NodeTemplateSpec describes the data a node should have
when created from a template
Expand Down
22 changes: 22 additions & 0 deletions kubernetes/crds/machine.sapcloud.io_machinesets.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,28 @@ spec:
description: Name of machine class
type: string
type: object
creationTimeout:
description: MachineCreationTimeout is the timeout after which
machinie creation is declared failed.
type: string
drainTimeout:
description: MachineDraintimeout is the timeout after which
machine is forcefully deleted.
type: string
healthTimeout:
description: MachineHealthTimeout is the timeout after which
machine is declared unhealhty/failed.
type: string
maxEvictRetries:
description: MaxEvictRetries is the number of retries that will
be attempted while draining the node.
format: int32
type: integer
nodeConditions:
description: NodeConditions are the set of conditions if set
to true for MachineHealthTimeOut, machine will be declared
failed.
type: string
nodeTemplate:
description: NodeTemplateSpec describes the data a node should
have when created from a template
Expand Down
3 changes: 3 additions & 0 deletions pkg/apis/machine/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -829,6 +829,9 @@ type AWSNetworkInterfaceSpec struct {
// The ID of the subnet associated with the network string. Applies only if
// creating a network interface when launching an machine.
SubnetID string

// If set to false, source and destination checks are disabled on machine network inferface level, default value is true
SrcAndDstChecksEnabled *bool
}

/********************** AzureMachineClass APIs ***************/
Expand Down
3 changes: 3 additions & 0 deletions pkg/apis/machine/v1alpha1/aws_machineclass_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -201,4 +201,7 @@ type AWSNetworkInterfaceSpec struct {
// The ID of the subnet associated with the network string. Applies only if
// creating a network interface when launching an machine.
SubnetID string `json:"subnetID,omitempty"`

// If set to false, source and destination checks are disabled on machine network inferface level, default value is true
SrcAndDstChecksEnabled *bool `json:"srcAndDstChecksEnabled,omitempty"`
}
2 changes: 2 additions & 0 deletions pkg/apis/machine/v1alpha1/zz_generated.conversion.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions pkg/apis/machine/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions pkg/apis/machine/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 13 additions & 4 deletions pkg/driver/driver_aws.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,7 @@ func (d *AWSDriver) Create() (string, string, error) {
if err != nil {
return "Error", "Error", err
}

UserDataEnc := base64.StdEncoding.EncodeToString([]byte(d.UserData))

var imageIds []*string
imageID := aws.String(d.AWSMachineClass.Spec.AMI)
imageIds = append(imageIds, imageID)
Expand Down Expand Up @@ -96,7 +94,7 @@ func (d *AWSDriver) Create() (string, string, error) {
if err != nil {
return "Error", "Error", err
}

srcDstChecks := true
var networkInterfaceSpecs []*ec2.InstanceNetworkInterfaceSpecification
for i, netIf := range d.AWSMachineClass.Spec.NetworkInterfaces {
spec := &ec2.InstanceNetworkInterfaceSpecification{
Expand All @@ -113,6 +111,10 @@ func (d *AWSDriver) Create() (string, string, error) {
}

networkInterfaceSpecs = append(networkInterfaceSpecs, spec)

if netIf.SrcAndDstChecksEnabled != nil && !*netIf.SrcAndDstChecksEnabled {
srcDstChecks = false
}
}

// Specify the details of the machine
Expand Down Expand Up @@ -150,7 +152,14 @@ func (d *AWSDriver) Create() (string, string, error) {
return "Error", "Error", err
}
metrics.APIRequestCount.With(prometheus.Labels{"provider": "aws", "service": "ecs"}).Inc()

// disable srcDstChecks on ec2 Instance if set
if !srcDstChecks {
klog.V(2).Infof("Disabling src/dst checks for machine with instanceID: %s", *runResult.Instances[0].InstanceId)
_, err = svc.ModifyInstanceAttribute(&ec2.ModifyInstanceAttributeInput{InstanceId: runResult.Instances[0].InstanceId, SourceDestCheck: &ec2.AttributeBooleanValue{Value: aws.Bool(false)}})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be the Best Effort and not Guaranteed feature, would that be feasible for the request for now?

  • Basically, a VM might not have the src/dst check disabled if MCM crashes immediately after making a CreateVM() call, doesnt get a chance for ModifyInstanceAttr call.
    It basically loses the internal state and adopts the existing VM in the next reconciliation It's rare but it can happen.

To resolve it, we need to fix #444 .
Happy to go ahead with further review/merge if we think this is a feasible solution with a known shortcoming.
cc @zanetworker

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a feasible solution for now but in the long term we require the fix from #444 as you mentioned. I'm eager to see that change happen.

Copy link
Contributor

@zanetworker zanetworker Aug 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hardikdr This is an important call and 'best effort' would not be enough IMO. New VMs without src/destination checks disabled will fail to communicate cross zones which is a downtime that is not desired.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testing the feature it would be enough but if you want to go live with it, it wouldn't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, to achieve something like this we will need in-place updates or some sort of intermediate updates. This solution wouldn't work is what even i think.

Copy link
Contributor

@prashanth26 prashanth26 Aug 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, something worth mentioning would be that.

We could probably work this PR for now. So what would happen is say machine creation fails in intermediate step (VM created but setting this flag fails). Then the machine is assumed to be created (as the VM creation succeeds, and next reconciliation assumes that VM is found) and waits for it to join the cluster in 20mins. And I am guessing that the machine won't join (hopefully) once we enable calico's "CrossSubnet" mode. And then the machine would re-create. And at the gardener, it would still require a rolling update of machines. We could use this approach, however, I wouldn't prefer this. We could use this only as a stop-gap solution until we support in-place updates.

However, my preferred approach would be to wait for the in-place updates, and not go with this approach.

cc: @amshuman-kr

Copy link
Contributor

@zanetworker zanetworker Aug 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prashanth26 The machine will probably join (access to the internet is not much affected by this change, its NATed from the node to the internet). The affected clusters are multi-zoned clusters and intra-node routes (node-to-node networking).So I would not rely on Calico failing the join procedure as a workaround.

if err != nil {
return "Error", "Error", err
}
}
return d.encodeMachineID(d.AWSMachineClass.Spec.Region, *runResult.Instances[0].InstanceId), *runResult.Instances[0].PrivateDnsName, nil
}

Expand Down
7 changes: 7 additions & 0 deletions pkg/openapi/openapi_generated.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.