Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should service action (restore) remove all node ports from a service? #354

Closed
ncdc opened this issue Mar 6, 2018 · 14 comments
Closed

Should service action (restore) remove all node ports from a service? #354

ncdc opened this issue Mar 6, 2018 · 14 comments
Labels
Breaking change Impacts backwards compatibility Enhancement/User End-User Enhancement to Velero
Milestone

Comments

@ncdc
Copy link
Contributor

ncdc commented Mar 6, 2018

We are currently removing all nodePorts from all services when we restore them:

https://github.com/heptio/ark/blob/973f630cc7b2bdd464cc4e78d776ab4099a43381/pkg/restore/service_action.go#L60-L63

This doesn't really make sense to me. @skriss do you remember what the original motivation for this was?

@skriss
Copy link
Contributor

skriss commented Mar 6, 2018

IIRC it was because keeping them makes the "clone namespace" use case fail due to port collision.

@ncdc
Copy link
Contributor Author

ncdc commented Mar 6, 2018

Ok, that makes sense. I don't think they should generally be dropped by default, though. We should think through ways to allow the user to express the desired behavior in a clean manner.

@ncdc
Copy link
Contributor Author

ncdc commented Mar 8, 2018

cc @jbeda - another UX question. How do we support both 1) restore into new cluster and 2) restore into new cluster into a different namespace?

@jbeda
Copy link

jbeda commented Mar 8, 2018

My answer would be similar to #192 (comment). By all means let's come up with granular flags for stuff like this. Or have a config file for specifying what gets remapped how. But then package up the common use cases under a single flag. Document it so that users can tweak it if they want.

@ncdc
Copy link
Contributor Author

ncdc commented Mar 9, 2018

Makes sense. Maybe we allow users to remap node ports as an option?

ark restore create --from-backup foo --namespace-mappings bob1:bob2 --node-port-mappings 40001:40002,41234:42345

@jbeda
Copy link

jbeda commented Mar 9, 2018

That sounds great. But from a user point of view you'd want to be able to ask "what node ports does this backup/namespace use?" and "Will those conflict with the cluster I'm trying to apply to?" Answering those questions doesn't fit cleanly into this model unfortunately.

In addition, you may want to let the system pick the node port as you remap. Unfortunately, it is difficult to know if the node port in the backup was assigned by the system during admission control or if it was assigned explicitly by the user. Once Daniel's big multi-merge thing goes in we may have more data. To really tell what the intent was we'd have to start cracking into the annotation added by kubectl apply.

Perhaps we need some sort of a restore dry run that helps to suss out these issues and can give you a summary of the resources to be created along with perhaps testing those against the target cluster? That is a pretty big new feature though.

@ncdc
Copy link
Contributor Author

ncdc commented Mar 9, 2018

Yeah, this is potentially a pretty big thing... Right now it's "fire and forget it" when it comes to restoring. The server does the best it can, but it's all automated and there's no opportunity for user input. Nor do we have a holistic in-memoryr view of either the backed up resources or the destination cluster's resources, so we can't currently easily evaluate everything and say "hey, your node port is going to conflict".

nrb pushed a commit to nrb/velero that referenced this issue Mar 29, 2018
When restoring resources that raise an already exists error, check their
equality before logging a message on the restore. If they're the same
except for some metadata, don't generate a message.

The restore process was modified so that if an object had an empty
namespace string, no namespace key is created on the object. This was to
avoid manipulating the copy of the current cluster's object by adding
the target namespace.

There are some cases right now that are known to not be equal via this
method:

- The `default` ServiceAccount in a namespace will not match, primarily
because of differing default tokens. These will be handled in their own
patch
- IP addresses for Services are recorded in the backup object, but are
either not present on the cluster object, or different. An issue for
this already exists at vmware-tanzu#354
- Endpoints have differing values for `renewTime`. This may be
insubstantial, but isn't currently handled by the resetMetadataAndStatus
function.
- PersistentVolume objects in the backup have a `claimRef` section,
while those in cluster do not.

Signed-off-by: Nolan Brubaker <[email protected]>
nrb pushed a commit to nrb/velero that referenced this issue Apr 2, 2018
When restoring resources that raise an already exists error, check their
equality before logging a message on the restore. If they're the same
except for some metadata, don't generate a message.

The restore process was modified so that if an object had an empty
namespace string, no namespace key is created on the object. This was to
avoid manipulating the copy of the current cluster's object by adding
the target namespace.

There are some cases right now that are known to not be equal via this
method:

- The `default` ServiceAccount in a namespace will not match, primarily
because of differing default tokens. These will be handled in their own
patch
- IP addresses for Services are recorded in the backup object, but are
either not present on the cluster object, or different. An issue for
this already exists at vmware-tanzu#354
- Endpoints have differing values for `renewTime`. This may be
insubstantial, but isn't currently handled by the resetMetadataAndStatus
function.
- PersistentVolume objects in the backup have a `claimRef` section,
while those in cluster do not.

Signed-off-by: Nolan Brubaker <[email protected]>
nrb pushed a commit to nrb/velero that referenced this issue Apr 2, 2018
When restoring resources that raise an already exists error, check their
equality before logging a message on the restore. If they're the same
except for some metadata, don't generate a message.

The restore process was modified so that if an object had an empty
namespace string, no namespace key is created on the object. This was to
avoid manipulating the copy of the current cluster's object by adding
the target namespace.

There are some cases right now that are known to not be equal via this
method:

- The `default` ServiceAccount in a namespace will not match, primarily
because of differing default tokens. These will be handled in their own
patch
- IP addresses for Services are recorded in the backup object, but are
either not present on the cluster object, or different. An issue for
this already exists at vmware-tanzu#354
- Endpoints have differing values for `renewTime`. This may be
insubstantial, but isn't currently handled by the resetMetadataAndStatus
function.
- PersistentVolume objects in the backup have a `claimRef` section,
while those in cluster do not.

Signed-off-by: Nolan Brubaker <[email protected]>
nrb pushed a commit to nrb/velero that referenced this issue Apr 4, 2018
When restoring resources that raise an already exists error, check their
equality before logging a message on the restore. If they're the same
except for some metadata, don't generate a message.

The restore process was modified so that if an object had an empty
namespace string, no namespace key is created on the object. This was to
avoid manipulating the copy of the current cluster's object by adding
the target namespace.

There are some cases right now that are known to not be equal via this
method:

- The `default` ServiceAccount in a namespace will not match, primarily
because of differing default tokens. These will be handled in their own
patch
- IP addresses for Services are recorded in the backup object, but are
either not present on the cluster object, or different. An issue for
this already exists at vmware-tanzu#354
- Endpoints have differing values for `renewTime`. This may be
insubstantial, but isn't currently handled by the resetMetadataAndStatus
function.
- PersistentVolume objects in the backup have a `claimRef` section,
while those in cluster do not.

Signed-off-by: Nolan Brubaker <[email protected]>
nrb pushed a commit to nrb/velero that referenced this issue Apr 5, 2018
When restoring resources that raise an already exists error, check their
equality before logging a message on the restore. If they're the same
except for some metadata, don't generate a message.

The restore process was modified so that if an object had an empty
namespace string, no namespace key is created on the object. This was to
avoid manipulating the copy of the current cluster's object by adding
the target namespace.

There are some cases right now that are known to not be equal via this
method:

- The `default` ServiceAccount in a namespace will not match, primarily
because of differing default tokens. These will be handled in their own
patch
- IP addresses for Services are recorded in the backup object, but are
either not present on the cluster object, or different. An issue for
this already exists at vmware-tanzu#354
- Endpoints have differing values for `renewTime`. This may be
insubstantial, but isn't currently handled by the resetMetadataAndStatus
function.
- PersistentVolume objects do not match on spec fields, such as
claimRef and cloud provider persistent disk info

Signed-off-by: Nolan Brubaker <[email protected]>
nrb pushed a commit to nrb/velero that referenced this issue Apr 5, 2018
When restoring resources that raise an already exists error, check their
equality before logging a message on the restore. If they're the same
except for some metadata, don't generate a message.

The restore process was modified so that if an object had an empty
namespace string, no namespace key is created on the object. This was to
avoid manipulating the copy of the current cluster's object by adding
the target namespace.

There are some cases right now that are known to not be equal via this
method:

- The `default` ServiceAccount in a namespace will not match, primarily
because of differing default tokens. These will be handled in their own
patch
- IP addresses for Services are recorded in the backup object, but are
either not present on the cluster object, or different. An issue for
this already exists at vmware-tanzu#354
- Endpoints have differing values for `renewTime`. This may be
insubstantial, but isn't currently handled by the resetMetadataAndStatus
function.
- PersistentVolume objects do not match on spec fields, such as
claimRef and cloud provider persistent disk info

Signed-off-by: Nolan Brubaker <[email protected]>
nrb pushed a commit to nrb/velero that referenced this issue Apr 5, 2018
When restoring resources that raise an already exists error, check their
equality before logging a message on the restore. If they're the same
except for some metadata, don't generate a message.

The restore process was modified so that if an object had an empty
namespace string, no namespace key is created on the object. This was to
avoid manipulating the copy of the current cluster's object by adding
the target namespace.

There are some cases right now that are known to not be equal via this
method:

- The `default` ServiceAccount in a namespace will not match, primarily
because of differing default tokens. These will be handled in their own
patch
- IP addresses for Services are recorded in the backup object, but are
either not present on the cluster object, or different. An issue for
this already exists at vmware-tanzu#354
- Endpoints have differing values for `renewTime`. This may be
insubstantial, but isn't currently handled by the resetMetadataAndStatus
function.
- PersistentVolume objects do not match on spec fields, such as
claimRef and cloud provider persistent disk info

Signed-off-by: Nolan Brubaker <[email protected]>
nrb pushed a commit to nrb/velero that referenced this issue Apr 9, 2018
When restoring resources that raise an already exists error, check their
equality before logging a message on the restore. If they're the same
except for some metadata, don't generate a message.

The restore process was modified so that if an object had an empty
namespace string, no namespace key is created on the object. This was to
avoid manipulating the copy of the current cluster's object by adding
the target namespace.

There are some cases right now that are known to not be equal via this
method:

- The `default` ServiceAccount in a namespace will not match, primarily
because of differing default tokens. These will be handled in their own
patch
- IP addresses for Services are recorded in the backup object, but are
either not present on the cluster object, or different. An issue for
this already exists at vmware-tanzu#354
- Endpoints have differing values for `renewTime`. This may be
insubstantial, but isn't currently handled by the resetMetadataAndStatus
function.
- PersistentVolume objects do not match on spec fields, such as
claimRef and cloud provider persistent disk info

Signed-off-by: Nolan Brubaker <[email protected]>
nrb pushed a commit to nrb/velero that referenced this issue Apr 10, 2018
When restoring resources that raise an already exists error, check their
equality before logging a message on the restore. If they're the same
except for some metadata, don't generate a message.

The restore process was modified so that if an object had an empty
namespace string, no namespace key is created on the object. This was to
avoid manipulating the copy of the current cluster's object by adding
the target namespace.

There are some cases right now that are known to not be equal via this
method:

- The `default` ServiceAccount in a namespace will not match, primarily
because of differing default tokens. These will be handled in their own
patch
- IP addresses for Services are recorded in the backup object, but are
either not present on the cluster object, or different. An issue for
this already exists at vmware-tanzu#354
- Endpoints have differing values for `renewTime`. This may be
insubstantial, but isn't currently handled by the resetMetadataAndStatus
function.
- PersistentVolume objects do not match on spec fields, such as
claimRef and cloud provider persistent disk info

Signed-off-by: Nolan Brubaker <[email protected]>
nrb pushed a commit to nrb/velero that referenced this issue Apr 10, 2018
When restoring resources that raise an already exists error, check their
equality before logging a message on the restore. If they're the same
except for some metadata, don't generate a message.

The restore process was modified so that if an object had an empty
namespace string, no namespace key is created on the object. This was to
avoid manipulating the copy of the current cluster's object by adding
the target namespace.

There are some cases right now that are known to not be equal via this
method:

- The `default` ServiceAccount in a namespace will not match, primarily
because of differing default tokens. These will be handled in their own
patch
- IP addresses for Services are recorded in the backup object, but are
either not present on the cluster object, or different. An issue for
this already exists at vmware-tanzu#354
- Endpoints have differing values for `renewTime`. This may be
insubstantial, but isn't currently handled by the resetMetadataAndStatus
function.
- PersistentVolume objects do not match on spec fields, such as
claimRef and cloud provider persistent disk info

Signed-off-by: Nolan Brubaker <[email protected]>
nrb pushed a commit to nrb/velero that referenced this issue Apr 10, 2018
When restoring resources that raise an already exists error, check their
equality before logging a message on the restore. If they're the same
except for some metadata, don't generate a message.

The restore process was modified so that if an object had an empty
namespace string, no namespace key is created on the object. This was to
avoid manipulating the copy of the current cluster's object by adding
the target namespace.

There are some cases right now that are known to not be equal via this
method:

- The `default` ServiceAccount in a namespace will not match, primarily
because of differing default tokens. These will be handled in their own
patch
- IP addresses for Services are recorded in the backup object, but are
either not present on the cluster object, or different. An issue for
this already exists at vmware-tanzu#354
- Endpoints have differing values for `renewTime`. This may be
insubstantial, but isn't currently handled by the resetMetadataAndStatus
function.
- PersistentVolume objects do not match on spec fields, such as
claimRef and cloud provider persistent disk info

Signed-off-by: Nolan Brubaker <[email protected]>
nrb pushed a commit to nrb/velero that referenced this issue Apr 10, 2018
When restoring resources that raise an already exists error, check their
equality before logging a message on the restore. If they're the same
except for some metadata, don't generate a message.

The restore process was modified so that if an object had an empty
namespace string, no namespace key is created on the object. This was to
avoid manipulating the copy of the current cluster's object by adding
the target namespace.

There are some cases right now that are known to not be equal via this
method:

- The `default` ServiceAccount in a namespace will not match, primarily
because of differing default tokens. These will be handled in their own
patch
- IP addresses for Services are recorded in the backup object, but are
either not present on the cluster object, or different. An issue for
this already exists at vmware-tanzu#354
- Endpoints have differing values for `renewTime`. This may be
insubstantial, but isn't currently handled by the resetMetadataAndStatus
function.
- PersistentVolume objects do not match on spec fields, such as
claimRef and cloud provider persistent disk info

Signed-off-by: Nolan Brubaker <[email protected]>
nrb pushed a commit to nrb/velero that referenced this issue Apr 10, 2018
When restoring resources that raise an already exists error, check their
equality before logging a message on the restore. If they're the same
except for some metadata, don't generate a message.

The restore process was modified so that if an object had an empty
namespace string, no namespace key is created on the object. This was to
avoid manipulating the copy of the current cluster's object by adding
the target namespace.

There are some cases right now that are known to not be equal via this
method:

- The `default` ServiceAccount in a namespace will not match, primarily
because of differing default tokens. These will be handled in their own
patch
- IP addresses for Services are recorded in the backup object, but are
either not present on the cluster object, or different. An issue for
this already exists at vmware-tanzu#354
- Endpoints have differing values for `renewTime`. This may be
insubstantial, but isn't currently handled by the resetMetadataAndStatus
function.
- PersistentVolume objects do not match on spec fields, such as
claimRef and cloud provider persistent disk info

Signed-off-by: Nolan Brubaker <[email protected]>
@rosskukulinski rosskukulinski added this to the v1.0.0 milestone Jun 15, 2018
@rosskukulinski
Copy link
Contributor

Ref #448

This is tricky, because NodePort can be defined by the user, or can be selected by kube-controller-manager at creation time. IMO, if kube-controller-manager picked the NodePort, then let that process happen again. If the user specified it, then we probably need a remap mechanism. I don’t believe this information is actually available to Ark after the fact.

This is a big-impact issue that touches UX (via CLI flags) and user expectations. Will need more thought around use-cases.

@jbeda
Copy link

jbeda commented Jun 18, 2018

This is tricky but i think there is a generally accepted way here. There are a set of annotations that list the specified state of the resource. kubectl apply uses these. From here you can figure out user intent. it isn't 100% guaranteed to work but it is something that we can document.

I'd say:

  1. If the annotation is there and the user specified the node port then keep that specification. if they didn't then remove it from the backup.
  2. If the annotation is missing, then default to something and give users an option to modify that default. I'm leaning toward removing the node port specification but could go either way.

Over time, we may need a way to specify generic munging policy so that users can specify how/where to remove stuff like this on a per-restore basis.

Also be aware that there is work to make that annotation more official with a feature called "server side apply" and that we should be tracking that. BryanL has experience with this issue wrt ksonnet and it may be worth reaching out to him.

@rosskukulinski
Copy link
Contributor

WFM +1

Perhaps #474 could help in terms of generic munging policy down the road.

@rosskukulinski rosskukulinski added Enhancement/User End-User Enhancement to Velero and removed Enhancement labels Jun 25, 2018
@ncdc ncdc added the Breaking change Impacts backwards compatibility label Jun 27, 2018
@timoreimann
Copy link
Contributor

timoreimann commented Jul 16, 2018

Just to add one user data point: We just encountered this very issue when running a test recovery with Ark on one of our namespaces, where the omission of the node ports eventually prevented some access patterns we had employed before. In this very case, it'd have been desirable to default to keeping the specified node ports, or at least have an option to enforce such behavior.

It's nothing overly critical for us right now though as we pin node ports in a single, non-production use case only. However, it's something we'll have to look out for when creating new service specs.

The approach that @jbeda describes to leverage the specified JSON state from the annotation (when available) sounds like a reasonable first step. If that's something we could agree to, I'd be happy to take a stab at a PR. Please let me know.

@rosskukulinski
Copy link
Contributor

Thanks for volunteering, @timoreimann! @ncdc / @skriss does Joe's approach work for you? [My suggestion for (2) is to default to removing nodePort and add CLI flag to enable keeping it]

@skriss
Copy link
Contributor

skriss commented Jul 17, 2018

The approach makes sense to me, and I think removing nodePort by default during restore is sensible. I'd still back everything up, which is inline with Ark's existing practice.

@skriss
Copy link
Contributor

skriss commented Oct 12, 2018

this was fixed by #712, closing.

@skriss skriss closed this as completed Oct 12, 2018
weshayutin pushed a commit to weshayutin/velero that referenced this issue Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Breaking change Impacts backwards compatibility Enhancement/User End-User Enhancement to Velero
Projects
None yet
Development

No branches or pull requests

5 participants