-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make src/dst checks configurable on awsmachineclass #504
Conversation
6cdb0d9
to
192d850
Compare
// disable srcDstChecks on ec2 Instance if set | ||
if !srcDstChecks { | ||
klog.V(2).Infof("Disabling src/dst checks for machine with instanceID: %s", *runResult.Instances[0].InstanceId) | ||
_, err = svc.ModifyInstanceAttribute(&ec2.ModifyInstanceAttributeInput{InstanceId: runResult.Instances[0].InstanceId, SourceDestCheck: &ec2.AttributeBooleanValue{Value: aws.Bool(false)}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be the Best Effort
and not Guaranteed
feature, would that be feasible for the request for now?
- Basically, a VM might not have the src/dst check disabled if MCM crashes immediately after making a CreateVM() call, doesnt get a chance for
ModifyInstanceAttr
call.
It basically loses the internal state and adopts the existing VM in the next reconciliation It's rare but it can happen.
To resolve it, we need to fix #444 .
Happy to go ahead with further review/merge if we think this is a feasible solution with a known shortcoming.
cc @zanetworker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a feasible solution for now but in the long term we require the fix from #444 as you mentioned. I'm eager to see that change happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hardikdr This is an important call and 'best effort' would not be enough IMO. New VMs without src/destination checks disabled will fail to communicate cross zones which is a downtime that is not desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For testing the feature it would be enough but if you want to go live with it, it wouldn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, to achieve something like this we will need in-place updates or some sort of intermediate updates. This solution wouldn't work is what even i think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, something worth mentioning would be that.
We could probably work this PR for now. So what would happen is say machine creation fails in intermediate step (VM created but setting this flag fails). Then the machine is assumed to be created (as the VM creation succeeds, and next reconciliation assumes that VM is found) and waits for it to join the cluster in 20mins. And I am guessing that the machine won't join (hopefully) once we enable calico's "CrossSubnet" mode. And then the machine would re-create. And at the gardener, it would still require a rolling update of machines. We could use this approach, however, I wouldn't prefer this. We could use this only as a stop-gap solution until we support in-place updates.
However, my preferred approach would be to wait for the in-place updates, and not go with this approach.
cc: @amshuman-kr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prashanth26 The machine will probably join (access to the internet is not much affected by this change, its NATed from the node to the internet). The affected clusters are multi-zoned clusters and intra-node routes (node-to-node networking).So I would not rely on Calico failing the join procedure as a workaround.
Update: As discussed with @zanetworker and @DockToFuture, we'll first complete #444 and then merge this PR. |
/close in favour of gardener/machine-controller-manager-provider-aws#39. |
What this PR does / why we need it:
This PR makes the aws src/dst checks configurable on a machine network interface level.
To disable the the src/dst checks the
srcAndDstChecksEnabled
field has to be set to false in theawsmachineclass
.The default value is set to true.
Which issue(s) this PR fixes:
Fixes #499