Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nomad namespace apply -region=otherregion mynamespace does not propagate to federated region #20128

Closed
benvanstaveren opened this issue Mar 12, 2024 · 5 comments · Fixed by #20196
Assignees
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/acl theme/docs Documentation issues and enhancements theme/federation theme/namespaces type/bug

Comments

@benvanstaveren
Copy link

Nomad version

1.6.2

Issue

Running nomad namespace apply -region=otherregion mynamespace results in "mynamespace" showing up in the region of the server the command was run against, instead of the otherregion specified in the command.

Reproduction steps

Federate 2 clusters together, create a namespace in the "other" region.

Expected Result

Namespace to show up in the region specified using the -region flag

Actual Result

Namespace shows up on the region of the nomad server the command was run against

@tgross
Copy link
Member

tgross commented Mar 14, 2024

Hi @benvanstaveren! Applying a namespace to a single region when regions are federated doesn't work the way you might expect. When two regions are federated, the namespace apply command should be forwarded to the authoritative region, and the namespace will be first written there and then replicated to the other regions. So if you're passing the -region flag, all that should be doing is sending the command to the -region region first, and then forwarding to the authoritative region anyways (so there's no point in doing it, as it just creates more hops).

If that's not what you're seeing, it'd be good to know more about the cluster topology involved here.

@benvanstaveren
Copy link
Author

Hey @tgross, that's a little confusing... I don't see any replication taking place. On the authoritative region for instance:

$ nomad namespace list
Name          Description
developers    Developer namespace
database      Database Servers
default       Default shared namespace
ops           Ops namespace

but then if you do this:

$ nomad namespace list -region=other-region
Name     Description
default  Default shared namespace

Which would seem to suggest the other-region has only the default namespace. I tested it with a job, and indeed the other-region doesn't seem to have anything other than default.

Topology wise, we have the authoritative region, with 3 other clusters joined to that. TLS is enabled, but ACL is not enabled.

@tgross
Copy link
Member

tgross commented Mar 15, 2024

but ACL is not enabled

Oh, well that's the problem then! I think there's probably a documentation gap around this. Namespaces are access control objects, and only access control objects like ACLs, auth methods, etc. are replicated between regions, and that only happens if ACLs are enabled. (See leader.go#L414-L434.)

So when you federated your regions, all you're doing is allowing RPC forwarding between them (which can be useful, no doubt) but there's no "authoritative region" because ACLs are disabled. So when you're writing an ACL-related object like a namespace, it's trying to forward it to an authoritative region (ref namespace_endpoint.go#L33 but that value is effectively empty and no forwarding is happening.

I just took a quick look through a bunch of documentation and I don't see this relationship between federation and namespaces explicitly called out anywhere in the federation tutorial, the nomad server join docs, the authoritative_region config docs, the namespace spec docs, or even the ACL policy docs. So that's one of those big docs gaps that must have been hard for us as Nomad engineers to see because it's "obvious" because we're too close to it. Sorry to hear you stumbled onto this gap, but we'll get that fixed.

We should probably also have a validation step in the Namespace.Apply RPC (and maybe similar RPCs) so that if the region has been set explicitly and doesn't match the current region, and the authoritative region is empty, that we return an error that explains you can't forward these RPCs without ACLs enabled.

I've marked this issue for roadmapping.

@tgross tgross added stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/acl theme/docs Documentation issues and enhancements theme/federation labels Mar 15, 2024
@tgross tgross removed their assignment Mar 15, 2024
@benvanstaveren
Copy link
Author

So that's one of those big docs gaps that must have been hard for us as Nomad engineers to see because it's "obvious" because we're too close to it. Sorry to hear you stumbled onto this gap, but we'll get that fixed.

Hah! I know all about that, I fall into the same trap with my own projects way too often 😅 Anyway, at least if it's in the docs that'll be nice. I guess I can cheat the system by just creating the namespace on the cluster directly. Cheers!

@tgross tgross self-assigned this Mar 15, 2024
tgross added a commit that referenced this issue Mar 22, 2024
Our documentation has a hidden assumption that users know that federation
replication requires ACLs to be enabled and bootstrapped. Add notes at some of
the places users are likely to look for it.

A separate follow-up PR to the federation tutorial should point to the ACL
multi-region tutorial as well.

Fixes: #20128
@tgross
Copy link
Member

tgross commented Mar 22, 2024

Docs PR is at #20196. I've pulled a discussion about the RPC's behavior out to #20197.

tgross added a commit that referenced this issue Mar 22, 2024
Our documentation has a hidden assumption that users know that federation
replication requires ACLs to be enabled and bootstrapped. Add notes at some of
the places users are likely to look for it.

A separate follow-up PR to the federation tutorial should point to the ACL
multi-region tutorial as well.

Fixes: #20128
tgross added a commit that referenced this issue Mar 22, 2024
Our documentation has a hidden assumption that users know that federation
replication requires ACLs to be enabled and bootstrapped. Add notes at some of
the places users are likely to look for it.

A separate follow-up PR to the federation tutorial should point to the ACL
multi-region tutorial as well.

Fixes: #20128
tgross added a commit that referenced this issue Mar 22, 2024
Our documentation has a hidden assumption that users know that federation
replication requires ACLs to be enabled and bootstrapped. Add notes at some of
the places users are likely to look for it.

A separate follow-up PR to the federation tutorial should point to the ACL
multi-region tutorial as well.

Fixes: #20128
tgross added a commit that referenced this issue Mar 22, 2024
… into release/1.5.x (#20201)

Our documentation has a hidden assumption that users know that federation
replication requires ACLs to be enabled and bootstrapped. Add notes at some of
the places users are likely to look for it.

A separate follow-up PR to the federation tutorial should point to the ACL
multi-region tutorial as well.

Fixes: #20128

Co-authored-by: Tim Gross <[email protected]>
tgross added a commit that referenced this issue Mar 22, 2024
… into release/1.6.x (#20202)

Our documentation has a hidden assumption that users know that federation
replication requires ACLs to be enabled and bootstrapped. Add notes at some of
the places users are likely to look for it.

A separate follow-up PR to the federation tutorial should point to the ACL
multi-region tutorial as well.

Fixes: #20128

Co-authored-by: Tim Gross <[email protected]>
tgross added a commit that referenced this issue Mar 25, 2024
Although it's not recommended, it's possible to federate regions without ACLs
enabled. In this case, ACL-related objects such as namespaces and node pools can
be written independently in each region and won't be replicated. If you use
commands like `namespace apply` or `node pool delete`, the RPC is supposed to be
forwarded to the authoritative region. But when ACLs are disabled, there is no
authoritative region and so the RPC will always be applied to the local region
even if the `-region` flag is passed.

Remove the change to the RPC region for the namespace and node pool write RPC
whenver ACLs are disabled, so that forwarding works.

Fixes: #20197
Ref: #20128
tgross added a commit that referenced this issue Mar 26, 2024
…#20220)

Although it's not recommended, it's possible to federate regions without ACLs
enabled. In this case, ACL-related objects such as namespaces and node pools can
be written independently in each region and won't be replicated. If you use
commands like `namespace apply` or `node pool delete`, the RPC is supposed to be
forwarded to the authoritative region. But when ACLs are disabled, there is no
authoritative region and so the RPC will always be applied to the local region
even if the `-region` flag is passed.

Remove the change to the RPC region for the namespace and node pool write RPC
whenver ACLs are disabled, so that forwarding works.

Fixes: #20197
Ref: #20128
philrenaud pushed a commit that referenced this issue Apr 18, 2024
Our documentation has a hidden assumption that users know that federation
replication requires ACLs to be enabled and bootstrapped. Add notes at some of
the places users are likely to look for it.

A separate follow-up PR to the federation tutorial should point to the ACL
multi-region tutorial as well.

Fixes: #20128
philrenaud pushed a commit that referenced this issue Apr 18, 2024
…#20220)

Although it's not recommended, it's possible to federate regions without ACLs
enabled. In this case, ACL-related objects such as namespaces and node pools can
be written independently in each region and won't be replicated. If you use
commands like `namespace apply` or `node pool delete`, the RPC is supposed to be
forwarded to the authoritative region. But when ACLs are disabled, there is no
authoritative region and so the RPC will always be applied to the local region
even if the `-region` flag is passed.

Remove the change to the RPC region for the namespace and node pool write RPC
whenver ACLs are disabled, so that forwarding works.

Fixes: #20197
Ref: #20128
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/acl theme/docs Documentation issues and enhancements theme/federation theme/namespaces type/bug
Projects
Development

Successfully merging a pull request may close this issue.

2 participants