-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] VLANs don't work on IOL/IOLL2 #1381
Comments
This is a more serious bug. Symlinking iosvl2 results in the configuration being deployed successfully, but the interfaces come up with "no switchport". Still at work, will look into it later. |
Symlinking initial/iosvl2.vlan.j2 into initial/ioll2.vlan.j2 and vlan/iosvl2.j2 into vlan/ioll2.j2 resulted in working 01-vlan-bridge-simple.yml test. Will run the full set of VLAN integration tests once the BGP plugin ones finish. IOL is a different story. It does not have the vlan database, but also does not work with the IOS bridging configuration. You can't even configure the IEEE STP (which is a huge red flag). I would suggest we declare VLAN unsupported on IOL unless you really want to figure out how to make it work ;) |
Indeed, it does work. And no, I do not have an immediate itch to figure this out. Id rather spend the time I have learning more about netlab internals and explore the test suite. I learned a lot those days, and your comments where very useful, but there is much more left. |
So, I ran the VLAN integration tests for IOLL2 and all the more complex ones failed. The results are here: https://tests.netlab.tools/_html/ioll2-clab-vlan Unfortunately, there's not much one can do to validate the VLAN setups apart from end-to-end pings, so the errors are not particularly enlightening. If you want to fix stuff, it's best if you spin up one of the failing scenarios, figure out what's wrong, fix the config, and repeat. |
I created the |
I think I found the root cause: all IOLL2 instances have the same base MAC address (STP system ID), so the trunk ports go into blocking because the switches think they hear themselves. No idea how to change that on IOLL2 :( |
How the heck did you figured that out ? Anyway, I will look into options. It might be possible to change it at image startup. NETMAP iol startup file options to dig in, or env vars. Ill ask containerlab guy who did the iol integration if he knows the full netmap format. |
Apparently VIRL can do it: https://learningnetwork.cisco.com/s/question/0D53i00000KszBMCAZ/change-switch-base-mac-in-virl-and-remove-management-ports-from-stp-evaluation Otherwise, we could start with supporting at most 1 node per topology |
The trunk port was not in the list of active VLAN ports, so I started investigating. It was blocking, so STP was the culprit. STP claimed the device is the root bridge, so I started looking at STP details and found that both devices use the same system ID.
It's definitely possible (or VIRL wouldn't be able to do it), but I couldn't figure out how. Anyway, looking at GNS3 code, it looks like IOL can take node ID, and the GNS3 code has "512 + id" in https://github.com/GNS3/gns3-server/blob/225779bc11a0d5a5af6aeb2c9a7642639cf3da06/gns3server/compute/iou/iou_vm.py#L776, and there's hard-coded 513 in https://github.com/hellt/vrnetlab/blob/master/cisco/iol/docker/entrypoint.sh#L14 so... 🤔 |
@DanPartelly: I would start with a very strong caveat saying "bridge domains don't work on IOL, so we disabled VLANs, and all IOL-L2 nodes use the same System ID, so you can have only one IOL-L2 node in the bridging domain". I can also add the same caveat to integration tests. Without that, the current state of IOL-L2 is a release show-stopper. We can't release a broken functionality that is not described in caveats. |
@ipspace Hey, I did the integration for IOL in Containerlab, and i'm currently working on a fix to this. I discussed this with @DanPartelly in the Containerlab Discord. To sum it up, the system base MAC is set by the PID which the IOL binary launches as. You have to set a PID when executing the IOL binary, The entrypoint script for the container statically sets the PID to 1. NETMAP uses the PID to bind the IOL processes interfaces to UDP ports, then IOUYAP will bind the UDP port to the linux container interfaces (eth0, eth1 etc.). It should be easy enough to signal a PID to the entrypoint script when launching the container in containerlab, the problem is just making sure each IOL node has a unique PID that can persist reboots. VIRL/CML launches IOL in LXCs and has some mechanism to increment the PID that IOL launches with to make sure there are no overlaps between the nodes. |
FYI, @ipspace Big fan of your blog and your work. I see in a recent commit that the docs have been edited to say Catalyst 8000v doesn't support MPLS. Maybe you are already aware of this but you just have to upgrade the boot license to 'advantage' or 'premier' for MPLS/SRv6 support. vrnetlab already does this with
Since we initially boot the node in the container build process, the license is applied in the bootstrap config. Then when the node is booted in a containerlab topology this license will have been applied on boot. |
I totally agree. Documentation was always a first class citizen in networklab, few tools are so well documented. When do you want to release next version ? If it is not right around the corner , maybe we can give it a few days. Weekend is here in a day and we can work on it. I will keep in touch with @kaelemc on this issue, with his permission.
|
I've submitted the PRs which fix this. Even in the CML the base of the MAC is |
That was fast, thanks a million. @DanPartelly: I would suggest we still add that caveat explaining what's going on (so we can push out a new release at any time), and once the new containerlab version comes out, I run the integration tests, change the containerlab release in the installation script, and we revise the caveats. OK? |
No rush, we don't have any major feature to push out (but have accumulated enough stuff so I'm not comfortable with a -post1 release), I just like to have my Ts crossed ;) |
Thank you!
Thanks a million, will add to the initial configuration script (in case someone is running a Cat8K VM) and run the tests. |
@ipspace No problem Netlab looks really cool and could be of some use for me. I'm currently a heavy user of IOS-XR, but XRv runs too old of a software version (6.x) and XRv9k is well.. too heavy. I'm curious, how much effort do you think it would be for me to integrate XRd support into netlab?. I would say XRd is almost on par with the the containerised IOL, fast boot, instant commits and 90% feature parity with the full fat XR VMs. I assume it's not that much work as XR support is somewhat existent with XRv/9k? Maybe just adding the relevant provider 'stuff'? (sorry not too familiar with the project code). |
I think it's working: https://netlab.tools/platforms/#supported-virtualization-providers I never tried it myself, but someone submitted XRv patches and claimed it was running for him. |
I'm saying this should already be supported. It uses |
Awesome, thanks. I'll give it a shot 😊. Sorry for clouding this issue with XR stuff. |
Yes, we should add the caveats.
I think point 2 should be documented as a caveat too. Ill run more tests this evening with the netlab more complex VLAN toplogies. Ive run a simple test and i have rstp up.
|
@ipspace . Ive ran almost all the VLAN test battery. The first 5 tests - prefix 01- to 23 all succeed. The second trunking was involved starting with test prefix 31, all went south. nothing worked anymore. If anyone has any fast ideas, Im all ears. |
Yes, the moment you add the second IOLL2 node the "duplicate STP system ID" kicks in. We have to wait for the vrnetlab/containerlab fixes. |
Of course we should document it (give me a day or so), but this just makes it more like real life where you never know who the root bridge will be after you add a node to the network (unless you set bridge priorities). Nonetheless, if you don't rename IOLL2 nodes, their relative order will not change, and the node with the highest MAC address will stay the same. |
Both of them are using the new PR branches. I have different STP IDs on all nodes now. Ports do go though learning and end up in forwarding state in 31-xxxx_xxx where I spent some time. |
Oh, so it's worse than I thought. No further ideas at the moment, will wait for the new releases. I could rebuild the IOL container, but would setting the environment variable for the container be enough? Looking at the containerlab code, it seems it's doing more than that. |
In theory yes, you could do that and pass a unique PID to the image. But you have more important things to do probably.
|
Yeah containerlab generates the NETMAP and IOUYAP files. NETMAP needs to know the PID of the IOL container so that it can do it's IOL->Container interface binding magic (with IOUYAP). Manually changing the PID will not get you connectivity into IOL and ports won't work. You can always use the gh actions build artifacts. Containerlab artifact download |
The baseline settings and caveats are in #1390. We should merge that one to stop 'netlab initial' crashes and to disable VLANs on IOL. |
I merged that one the day you put it up for review. As a side note, the only way I could get spanning tree to work on ioll2 was in mstp mode. With the changes to containerlab and image generation to have different bridge ids. But still no connectivity in advanced labs. In all other modes, BPDUs are sent by interfaces E0/1 (s1-s2)but not received, according to show spanning-tree detail. I then stopped looking into it, as I wanted to do netlab exec.
|
When you decide that's good enough, please add the necessary configuration commands to the VLAN configuration module (so we'll have a working config), document the caveat, and submit a PR.
If we can't get a two-switch network with a single trunk to work we might as well call it a day and disable VLANs for IOLL2 (or drop IOLL2 support -- what good is it without VLANs?) |
At this stage, Im tempted to drop it altogether , at least for the foreseeable future. Ill stash all ioll2 changes in a backup branch until such time at least all VLAN tests run. Then we can put it back. You OK with this move ? About iol L3 , are there integration tests that still need to run?
|
As we have IOLL2 mentioned in so many places, I'd just write a caveat. It would also be evident from the integration tests that things don't work. Maybe it will trigger someone to chime in ;) OK? |
Ok sure. It will be done. |
So, Ive been going the wrong way about this. Today after capturing packets from 31-vlan-bridge-trunk again, what I seen made not much sense. So I decided to replicate the config in GNS3. I dumped the device configs from netlab, copy paste , and lo and behold, everything works. The key difference is the utility used to tunnel IOU udp in GNS3 is a new one, called ubridge. This result is something tractable we can follow up. It points towards a possible bug in the utility used in the container. Finally light. |
@DanPartelly I saw uBridge but didn't give any thought after seeing it supported plenty of other non-IOL relevant things, my take was to keep the IOL container somewhat lean in a sense, and considering IOL just 'worked' in my limited testing I didn't need to pursue uBridge. I tried to make some modifications to iouyap using the source but I couldn't even get it to build after mucking around with it for a while, decided it's not worth my time and just to live with whatever issues we have. It's also the reason we use the prepackaged iouyap via apt, couldn't get it to build. We could give uBridge a shot and see if you get the desired behaviour 🙂. |
Thanks a million for figuring this out. I added an IOL L2 caveat warning users that multi-node topologies won't work, and will remove it whenever we manage to solve this. |
There seems to be an Alpine ubridge package (https://pkgs.alpinelinux.org/package/edge/community/x86/ubridge) and an RPM (https://rpmfind.net/linux/RPM/fedora/devel/rawhide/x86_64/u/ubridge-0.9.18-13.fc41.x86_64.html) so maybe it's as simple as installing a different package and creating a more complex config file? I know I'm kibitzing ;) |
There is a package for ubridge in the GNS3 ppa used by the Dockerfile to instal iouyap. So yeah, fingers crossed, we install it , generate on containerlab side a different config, and test.
|
@ipspace Unfortunately finger crossing did not worked out ;) seems uBridge at this time can;t crate an IOL type bridge from an ini file. And you have to have special logic to multiplex/demultiplex IOL packets. In GNS3 they control the creation of bridges using telnet. That's it, the bridge utility will open a local port, and you connect to telnet to this port and send instructions what type of bridge to create and what are the members. So that's that. |
Another hour of debugging. Some light struck. An update: When you capture traffic using a AF_PACKET raw socket in Linux, it appears that VLAN tags are always stripped and stored away in data structure. It is later made available to the raw socket (if relevant setsockopt() are set), in a special control message. this is by design [1] I looked into libpcap source code and indeed, they are using cmsg() to rebuild vlan data. this behavior explains all oddities I seen with Cisco L2 IOL devices. (putting ports in broken state in tests which had a native VLAN over trunk , not negotiating pv-rstp over trunks which have no native vlan, no multicasts reaching switch interface - as reported by Cisco ios when using non-native vlan trunk, - but seeing them coming in on the veth interface with wireshark, symmetric behavior on rx path ) I will try to confirm this by dumping the raw socket data. But even at this step, I feel pretty strongly that we will need to fix
Chime in with your opinions ! |
Most other vrnetlab integrations connect to the VM console port and run an equivalent of an expect script. Taking that and extracting the relevant bits should be good enough to do the telnet stuff. It would definitely be simpler than rewriting iouyap. |
@ipspace As my C is good, it took me half an hour to understand ioyapp and another half to fix the bugs in iouyap, switch to the ubridge version of rebuilding vlan headers (which is more or less copied from libpacp) . Ill upload the code to my github. However, the following considerations must be made regarding a potential new version iouyap release:
|
Wow. Congratulations!! Now we can only hope the rest of your todo list gets implemented. |
Thanks. The code is up now on github for those adventurous enough to want to compile it and use it in their images.
|
Can we close this now, or the documentation still needs some PRs ? |
If you wish you could mention something or add a point to this issue in caveats.md. Right now, the caveats describe the current state (without your changes), so I'm OK with the way documentation is right now. |
A topology using VLANs on IOL/IOLL2 crashes during "netlab initial". The initial configuration template tries to include platform-specific VLAN configuration, and those files don't exist for iol/ioll2. We could either create symlinks or change the include logic.
Sample test scenario:
tests/integration/vlan/01-vlan-bridge-single.yml
The text was updated successfully, but these errors were encountered: