Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Networking: Bootstrap address needed in switch zone #2298

Closed
andrewjstone opened this issue Feb 1, 2023 · 8 comments
Closed

Networking: Bootstrap address needed in switch zone #2298

andrewjstone opened this issue Feb 1, 2023 · 8 comments
Assignees
Labels
networking Related to the networking.

Comments

@andrewjstone
Copy link
Contributor

For both the wicketd artifact server (used to server recovery images) and (potentially) MGS, there needs to be a bootstrap address to listen on. This needs to be established by the bootstrap agent in a few steps:

  1. Configure an etherstub for the bootstrap network in the switch zone
  2. Setup a VNIC in the switch zone over the etherstub
  3. Assign an address using the bootstrap prefix to the VNIC in the switch zone. This address should probably end in ::2, since the global zone has the ::1 suffix on the bootstrap network.
@andrewjstone andrewjstone added the networking Related to the networking. label Feb 1, 2023
@andrewjstone andrewjstone added this to the Manufacturing PVT1 milestone Feb 1, 2023
@andrewjstone andrewjstone self-assigned this Feb 1, 2023
@bnaecker
Copy link
Collaborator

bnaecker commented Feb 1, 2023

I could be wrong, but I think the etherstub in (1) should be in the GZ, with the VNIC for wicketd in the zone. Also, (1) is tracked in #2297 for reference.

@rmustacc
Copy link

rmustacc commented Feb 1, 2023

@bnaecker is correct. The Etherstub should be in the GZ, not in a zone.

@rcgoodfellow
Copy link
Contributor

rcgoodfellow commented Feb 1, 2023

I have some high-level concerns here.

  1. Services that straddle switch ports and gimlet ports.
  2. Putting non-networking services in the switch zone.

Both concerns are rooted in the idea that the switch zone is an abstraction that defines a boundary between what part of the compute sled is the switch and what part is the gimlet. When we start creating services that communicate over both the tfports in the switch zone and the gimlet ports in the global zone, we break that abstraction.

Continuing to think about the switch zone as a switch, would we put services like wicketd and mgs on a switch normally? Probably not. We'd probably connect some sort of computer to the switch and put them there. And we can do that with a bit of virtual network plumbing.

image

The diagram on the left shows what we are currently doing, and the diagram on the right shows how we could break things up a bit. Network services in blue, other services in green.

There is a tension here between wanting separation of concerns and the proliferation of virtual network interfaces and zones. On the one hand it's nice to have separation across zones, on the other hand we don't want to stack vnics, etherstubs and zones to the moon - and from a certain perspective, just being able to bring the services to the closest network attachment point has appeal.

@andrewjstone
Copy link
Contributor Author

@rcgoodfellow I think your proposal here would work. I totally get your perspective around the switch zone abstraction, which is definitely something I didn't think about before.

Presumably, this allows traffic from the technician port to flow into the management network services zone. Does that mean that we can run the login shell for the technician port (wicket) in the management network services zone also? Or does wicket have to run in the switch zone? CC @rmustacc

@ahl
Copy link
Contributor

ahl commented Feb 1, 2023

@andrewjstone to your point, I imagine that we've got sshd in the "management network services zone" for both the operator user -> wicket and service user -> bash (in one possible outcome of RFD 354).

@rcgoodfellow this proposal would seem to add complexity (and architectural rigor) at a time when we may need to optimize for urgency--though I confess to not having a strong handle on the relative amount of work associated with each approach.

@jgallagher
Copy link
Contributor

jgallagher commented Feb 1, 2023

Primarily just for clarity, these are MGS's requirements on the networking system / organization:

  1. It needs access to all of the management network vlan devices (so it can communicate with the SPs, with the scope IDs on those devices being how it distinguishes incoming packets)
  2. It has to be able to listen on the underlay network once it exists (to allow nexus to talk to it)
  3. It has to be reachable from wicketd before the underlay network exists. Today we plan on doing this by having MGS listen on localhost since it and wicketd are in the same zone

I don't think any of these are violated by either of the proposals above, although I'm less clear about the management network vlan devices, since dendrite is creating them. AFAIK MGS does not and should not listen on the bootstrap network.

@rmustacc
Copy link

rmustacc commented Feb 1, 2023

Thanks for raising this @rcgoodfellow. I think there are a couple of different aspects that we should consider and tease apart.

First, I think the question of what the switch zone represents is an interesting thing. When I proposed this as a thing to exist to run Dendrite and somewhat forced @Nieuwejaar to deal with it, it wasn't because of the notion of what is compute server or switch. More fundamentally I did that because I wanted isolation from the global zone. I do think also if I actually had a full Gimlet inside the switch, would I put wicketd or mgs on that server, inside their own disjoint isolated zones: yes.

Another part of this is that while we have management network traffic coming over the PCIe interface, strictly speaking we didn't have to design it this way and that was a source of conversation between Arjen and myself. For example, you could imagine a totally different design (not saying we should do this) where we forward it to some other thing over the standard 100 GbE network with something encapsulated with geneve. Anyways, I mostly point it out in that a lot of this looking like a switch and being specific to it is mostly an incidental design thing and not necessarily intentional.

Now, if we look at this from the general position of the colocation of disjoint services, in our ideal world, all of the following would probably be operating in different zones that were locked down in different ways:

  • dpd / tfportd
  • lldpd (theoretical)
  • The internal maghemite instance
  • Each distinct bgp instance from maghemite (especially when we have VRFs)
  • wicketd
  • installinator's image server (sharing a file system with wicketd)
  • mgs
  • ndpd
  • Some other daemon that we haven't thought of

So in an ideal world these would all be disjoint, with some subset of them potentially sharing partial netstacks, interfaces, or ways of getting at them. We know that we won't want to colocate bgp instances running in different VRFs just so that way we can keep things consistent and not have to overthink the netstack.

I mostly agree with @ahl here that for the time being, we should stick the course with what we have. It's not the best split we have, but it will work. I think we can try to help with some of the colocation concerns with further locking down some privileges and related. I do think some of the tensions that you laid out are just fundamental to what we're building, but I also would say that we can do better isolation as well in the future, possibly more fine-grained than otherwise.

@andrewjstone
Copy link
Contributor Author

Implemented in #2320

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
networking Related to the networking.
Projects
None yet
Development

No branches or pull requests

6 participants