Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sled-agent] Allocate VNICs over etherstubs, fix inter-zone routing #1066
[sled-agent] Allocate VNICs over etherstubs, fix inter-zone routing #1066
Changes from 5 commits
c20a122
ba5113b
c95b139
b88c15f
211a95c
8b12900
5b239a3
942f843
e33c646
c9bf704
e1920be
4362d67
d9d6cfc
1e93169
44dc885
a85594b
8fa90c7
63cbeca
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ended up being a major aspect of this patch - without it, I could ping all
/64
addresses between GZ / non-GZ zones, but not the DNS addresses.However, by opening it up to the AZ prefix, I can also communicate between arbitrary "sled-local" services and the internal-dns server, which resides outside the sled's
/64
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see how we need that, but I'm not entirely sure it's how we want to solve the problem of routing to the DNS server. IIUC, you're saying that the sled agent's VNICs for Oxide services (sled agent, nexus, propolis, etc) are now things like:
fd00:1122:3344:101:/48
. What does the sled's /64 prefix mean in this setup? I think @rcgoodfellow or @rmustacc should probably weigh in here, since it seems to me to kinda be skirting the real meaning of that prefix.I think one option would be to add a separate route which specifies the DNS server's address / prefix. I believe DDM will ultimately be manipulating the OS's routing tables so that's actually true. But that may not be enough, in that traffic from the VNIC also needs a route pointing it to an interface for the DNS address. I don't know enough to be sure here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context, this is what my global zone looks like:
Meanwhile, in Nexus (non-global zone):
This
/48
specifically alters the routing within the non-global zone -netstat -rn -f inet6
in Nexus shows the following:Having all traffic destined for the AZ routed through the interface is the piece I really care about here.
Do you think it would be preferable to:
/64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for those details. I think that is what I expected, that we'd have a route entry that says "anything in
fd00:1122:3344::/48
should go out the VNICoxControlService1
". The part I'm wondering about is, that would imply that the netstack would expect that it could use that interface for any traffic from the Nexus zone for any other sled. As I write this, I realize that may be fine. If Nexus is trying to reach another service on the same sled, that packet will go out the VNIC, to the etherstub, and then presumably to the other zone's VNIC. If it's trying to reach something off the sled, I'm less sure of what'll happen there. It looks like it'll still go to the zone VNIC, the etherstub, and then to whatever route you have in the GZ that matches that (if one exists).I think this is probably fine. It also seems to be working for a single machine, and it's easy enough to update this if we find it doesn't work for multiple machines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've verified that this method works too - I'm seeing routing between zones by using:
route add -inet6 <address>/48 <address> -interface
When setting up a non-GZ address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the point of the
default
route that "if the destination address doesn't match the other rules, it should use this gateway"?I tried your suggestion, but this doesn't seem to be working for me:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chatting out-of-band with @bnaecker a bit: By issuing the following in the GZ:
I'm seeing the routing make the extra hop, from NGZ -> GZ (and now) -> NGZ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, I explained that very poorly. I was trying to point out that this command:
isn't what I'd expect. In particular, that says for any traffic without a more specific route, send it to the gateway
fd00:1122:3344:101::1
. But that's not a gateway that the nexus zone has! Thenetstat -rn
output shows the gateway we need asfd00:1122:3344:101::3
.But in any case, Robert pointed out that these routing tables are necessary but not sufficient to get this all to work. Specifically, we need to tell the GZ to actually act as a router, forwarding packets between different networks. That is, we've provided rules (assuming we can figure out how to express them 😆 ) for the routing daemon to use when forwarding packets, but it'll only do so if it's explicitly told it should.
I believe this can be accomplished with the command
routeadm -e ipv6-forwarding -u
, which enables route forwarding and restarts the SMF service(s) necessary to make that apply to the running system. IIUC, at that point, when the GZ networking stack receives a packet from the nexus zone, with an IP address of the (non-global) DNS zone, it'll attempt to forward that, by consulting the routing table.I'm hypothesizing, but it seems like we need two routes then:
The former could be a default route, or a more constrained one listing the prefix for the DNS server. It seems like either should work, as long as the gateway is the IP address of the VNIC in the nexus zone, in this case
fd00:1122:3344:101::3
.The latter can be accomplished by adding a route table that directs all the DNS traffic to the GZ's VNIC, I think. My understanding is that this would go onto the GZ VNIC, to the etherstub, and then forwarded to the non-global DNS zone VNIC.
I was initially confused as to why the "virtual switch" that
man dladm
describes under thecreate-etherstub
command doesn't transparently do this. All the traffic is within that etherstub, and I'd have expected neighbor discovery and thus routing to be done automatically. So why do we need this?The key is that the DNS addresses are in a different subnet. The etherstub will transparently create routes between all the other non-global zones, but once you're trying to reach an address in a different subnet, that has to involve routing. This explains why the
-interface
flag worked initially, too. That's effectively telling the etherstub that the other subnet can actually be routed to through the same L2 domain, even though it's on a different L3 subnet.Robert pointed out that we may actually want a separate etherstub for the DNS zone. That'd more closely model the actual network we're emulating. In particular, we're trying to say that the GZ and all the non-DNS service zones are one little subnet, in the sled's /64. The DNS service is explicitly in a separate /64, for route summarization and the fact that it really is supposed to be a rack-wide or AZ-wide service.
To be clear, we should not add an additional etherstub in this PR. I think that's where we want to go longer-term, but we can defer it for sure.
So summarizing everything. When nexus wants to send a packet to the DNS server, that'll first go to the etherstub. The etherstub will not explicitly have a gateway for that, since it's in another subnet. It'll deliver it to the GZ. At that point, the IP stack in the GZ will take the packet and also note that it doesn't have that address. It'll instead consult the routing tables (assuming forwarding is enabled), and note that it can send that...back to the etherstub! That'll then go to the DNZ zone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the following will work, which summarizes the above conversation in part, and also makes a few simplifications.
I've tested this out on a fresh VM by creating the zones and doing all the plumbing and things appear to work. Here is what the setup looks like live. It does still require
routeadm -e ipv6-forwarding -u
in the GZ.GZ
Ping the Omicron zone
Ping the DNS zone
DNS Zone
Ping the Omicron zone
Omicron Zone
Ping the DNS zone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of 44dc885 , I am automatically adding these routes within the Sled Agent, and confirm connectivity between all zones / GZ.