Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make src/dst checks configurable on awsmachineclass #504

Closed

Conversation

DockToFuture
Copy link
Member

What this PR does / why we need it:
This PR makes the aws src/dst checks configurable on a machine network interface level.
To disable the the src/dst checks the srcAndDstChecksEnabled field has to be set to false in the awsmachineclass.
The default value is set to true.

spec:
  networkInterfaces:
  -   srcAndDstChecksEnabled: false

Which issue(s) this PR fixes:
Fixes #499

Src/dst checks are now configurable on awsmachineclass via `spec.networkInterfaces.srcAndDstChecksEnabled` field. The checks are enabled by default.

@DockToFuture DockToFuture requested review from ggaurav10 and a team as code owners August 26, 2020 15:52
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Aug 26, 2020
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Aug 26, 2020
@gardener-robot-ci-2 gardener-robot-ci-2 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Aug 27, 2020
// disable srcDstChecks on ec2 Instance if set
if !srcDstChecks {
klog.V(2).Infof("Disabling src/dst checks for machine with instanceID: %s", *runResult.Instances[0].InstanceId)
_, err = svc.ModifyInstanceAttribute(&ec2.ModifyInstanceAttributeInput{InstanceId: runResult.Instances[0].InstanceId, SourceDestCheck: &ec2.AttributeBooleanValue{Value: aws.Bool(false)}})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be the Best Effort and not Guaranteed feature, would that be feasible for the request for now?

  • Basically, a VM might not have the src/dst check disabled if MCM crashes immediately after making a CreateVM() call, doesnt get a chance for ModifyInstanceAttr call.
    It basically loses the internal state and adopts the existing VM in the next reconciliation It's rare but it can happen.

To resolve it, we need to fix #444 .
Happy to go ahead with further review/merge if we think this is a feasible solution with a known shortcoming.
cc @zanetworker

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a feasible solution for now but in the long term we require the fix from #444 as you mentioned. I'm eager to see that change happen.

Copy link
Contributor

@zanetworker zanetworker Aug 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hardikdr This is an important call and 'best effort' would not be enough IMO. New VMs without src/destination checks disabled will fail to communicate cross zones which is a downtime that is not desired.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testing the feature it would be enough but if you want to go live with it, it wouldn't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, to achieve something like this we will need in-place updates or some sort of intermediate updates. This solution wouldn't work is what even i think.

Copy link
Contributor

@prashanth26 prashanth26 Aug 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, something worth mentioning would be that.

We could probably work this PR for now. So what would happen is say machine creation fails in intermediate step (VM created but setting this flag fails). Then the machine is assumed to be created (as the VM creation succeeds, and next reconciliation assumes that VM is found) and waits for it to join the cluster in 20mins. And I am guessing that the machine won't join (hopefully) once we enable calico's "CrossSubnet" mode. And then the machine would re-create. And at the gardener, it would still require a rolling update of machines. We could use this approach, however, I wouldn't prefer this. We could use this only as a stop-gap solution until we support in-place updates.

However, my preferred approach would be to wait for the in-place updates, and not go with this approach.

cc: @amshuman-kr

Copy link
Contributor

@zanetworker zanetworker Aug 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prashanth26 The machine will probably join (access to the internet is not much affected by this change, its NATed from the node to the internet). The affected clusters are multi-zoned clusters and intra-node routes (node-to-node networking).So I would not rely on Calico failing the join procedure as a workaround.

@hardikdr
Copy link
Member

hardikdr commented Sep 5, 2020

Update: As discussed with @zanetworker and @DockToFuture, we'll first complete #444 and then merge this PR.
cc @prashanth26

@CLAassistant
Copy link

CLAassistant commented Sep 16, 2020

CLA assistant check
All committers have signed the CLA.

@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Nov 16, 2020
@gardener-robot gardener-robot added lifecycle/rotten Nobody worked on this for 12 months (final aging stage) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels May 19, 2021
@prashanth26
Copy link
Contributor

/close in favour of gardener/machine-controller-manager-provider-aws#39.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress lifecycle/rotten Nobody worked on this for 12 months (final aging stage) needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants