Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VoQ configuration using minigraph.xml file #5991

Merged
merged 1 commit into from
Apr 27, 2021

Conversation

mlorrillere
Copy link
Contributor

@mlorrillere mlorrillere commented Nov 20, 2020

Contains the following changes:

  • Add support for system ports configuration to minigraph
  • Add support for SwitchId, SwitchType and MaxCores to minigraph
  • Add support for inband vlan configuration in minigraph
  • Make asic_name in CONFIG_DB mandatory for VoQ switches
  • Added sample-voq-graph.xml and updated test_cfggen.py to test for VoQ switches config attributes

- Why I did it

Adding support for configuring a VoQ switch using a minigraph.xml file.

- How I did it

Added the following new sections to the minigraph.xml file:

  • VoqInbandInterfaces to DeviceDataPlaneInfo
    Contains a list of VoQInbandInterface for this device. Example:
        <a:VoqInbandInterface>
          <Name>Vlan3094</Name>
          <Type>vlan</Type>
          <a:PrefixStr>1.1.1.1/24</a:PrefixStr>
        </a:VoqInbandInterface>
  • SystemPorts in DeviceInfo
    Contains the list of all system ports present in the system as SystemPort elements. Example:
        <SystemPort>
          <Name>Ethernet1/6</Name>
          <Hostname>linecard-2</Hostname>
          <AsicName>ASIC0</AsicName>
          <Speed>40000</Speed>
          <SystemPortId>258</SystemPortId>
          <SwitchId>2</SwitchId>
          <CoreId>1</CoreId>
          <CorePortId>2</CorePortId>
          <NumVoq>8</NumVoq>
        </SystemPort>
  • SwitchId to DeviceMetadata
    This is used to set the switch ID of a device. Example:
      <a:DeviceMetadata>
        <a:Name>ASIC0</a:Name>
        <a:Properties>
          <a:DeviceProperty>
            <a:Name>SwitchId</a:Name>
            <a:Reference i:nil="true"/>
            <a:Value>2</a:Value>
          </a:DeviceProperty>
        </a:Properties>
      </a:DeviceMetadata>

- How to verify it

  • Updated tests in sonic-config-engine
  • Manually tested on single and multi-asic VoQ switches

@mlorrillere mlorrillere marked this pull request as ready for review November 20, 2020 23:21
src/sonic-config-engine/minigraph.py Outdated Show resolved Hide resolved
src/sonic-config-engine/minigraph.py Outdated Show resolved Hide resolved
src/sonic-config-engine/minigraph.py Outdated Show resolved Hide resolved
else:
if child.tag == str(QName(ns, "DpgDec")):
(intfs, lo_intfs, mvrf, mgmt_intf, vlans, vlan_members, pcs, pc_members, acls, vni, tunnel_intfs, dpg_ecmp_content) = parse_dpg(child, asic_name)
(intfs, lo_intfs, mvrf, mgmt_intf, voq_inband_intfs, vlans, vlan_members, pcs, pc_members, acls, vni, tunnel_intfs, dpg_ecmp_content) = parse_dpg(child, asic_name)
host_lo_intfs = parse_host_loopback(child, hostname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the minigraph.xml this is defined under each asic.. We don't need this here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For single-asic linecards this is defined under the host's DpgDec section.

elif child.tag == str(QName(ns, "LinkMetadataDeclaration")):
linkmetas = parse_linkmeta(child, hostname)
elif child.tag == str(QName(ns, "DeviceInfos")):
(port_speeds_default, port_descriptions) = parse_deviceinfo(child, hwsku)
(port_speeds_default, port_descriptions, sys_ports) = parse_deviceinfo(child, hwsku)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sysports are also for each asic. I dont think we need it here.

Copy link
Contributor Author

@mlorrillere mlorrillere Feb 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sysports a defined globally but needs to be programmed by each asic.

@@ -1268,6 +1347,9 @@ def parse_xml(filename, platform=None, port_config_file=None, asic_name=None, hw
results['INTERFACE'] = phyport_intfs
results['VLAN_INTERFACE'] = vlan_intfs

if sys_ports:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a unit tests for this ?

@@ -1115,6 +1143,141 @@
<Speed>40000</Speed>
</a:EthernetInterface>
</EthernetInterfaces>
<SystemPorts>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the system ports needs to be defined inside each ASIC, not at the line card level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each asic needs to know about all the system ports in the system. Currently system ports are initialized during orchagent init so this makes it easier to have them defined in DeviceInfo and parsed by each asic (either in multi-asic setup or in single-asic linecards) than having them duplicated for each asic in the minigraph.xml

speed = sysport.find(str(QName(ns, "Speed"))).text
num_voq = sysport.find(str(QName(ns, "NumVoq"))).text
key = "%s|%s" % (device, portname)
if slot_id is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't slot_id a required attribute ? will handling of system ports in swss work, if slot-id is not there in the key ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slot_id is required to identify each linecard in a chassis. For a fixed system with multiple asics this is not required since the device is enough to identify the each asic of the system.

In the end slot_id is only necessary to build a unique key to identify each asic in the system. If that helps we can make it mandatory for single-asic systems even if this is not required.

@@ -808,7 +823,15 @@ def parse_meta(meta, hname):
cloudtype = value
elif name == "ResourceType":
resource_type = value
return syslog_servers, dhcp_servers, ntp_servers, tacacs_servers, mgmt_routes, erspan_dst, deployment_id, region, cloudtype, resource_type
elif name == "SwitchId":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is configuration is for the Line card host not asic, do we need these attributes ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are required in the case of single asic linecards. In this case the linecard is a single host device. For multi-asic linecards or multi-asic fixed systems these attributes needs to be parsed for each asic in parse_asic_meta.

@arlakshm
Copy link
Contributor

arlakshm commented Feb 5, 2021

Hi @mlorrillere, The multi_npu minigraph.xml has configuration which is not applicable to the VoQ chassis and vice-versa. Can you please add a new sample minigraph.xml and unit tests for VoQ chassis, instead of modifying the multi_npu ones.

Copy link
Contributor

@arlakshm arlakshm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as comments

@mlorrillere
Copy link
Contributor Author

Hi @mlorrillere, The multi_npu minigraph.xml has configuration which is not applicable to the VoQ chassis and vice-versa. Can you please add a new sample minigraph.xml and unit tests for VoQ chassis, instead of modifying the multi_npu ones.

Sure, I'll add a dedicated unit test for VoQ minigraph.xml

Comment on lines 1264 to 1266
# on Voq system each asic has a instance_name
if instance_name is not None:
results['DEVICE_METADATA']['localhost']['instance_name'] = instance_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not use the "instance_name" anywhere in the voq implementation. If this is not used anywhere we can remove this.
But we need "slot_id" and "device" in DEVICE_METADATA to derive the system lag name in the same way how system port name is derived (Refer: Voq LAG HLD sonic-net/SONiC#697)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "instance_name" here has the same meaning as "asic_name" in the VoQ LAG HLD, so I will rename "instance_name" -> "asic_name".

Copy link
Contributor

@vganesan-nokia vganesan-nokia Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that there is an api in minigraph.py that extracts asic_id from asic_name. This indicates that asic_name will be either of the two forms like asic0, asic1, etc. or asic<PCI device id> form.
If we have to have the asic_id in the format PCI device id, I do not think it will be good idea to use it for system lag name.

Therefore, I recommend to change the voq lag hld to use "slot_id" and "device" and add these attributes in device meta data so that both system port name and system lag name will be derived in similar way (using slot_id and device). We can get rid of "instance_name".

If this is fine with all, I'll take care of updating voq lag hld.

Copy link
Contributor Author

@mlorrillere mlorrillere Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that asic_id in minigraph.py has the same meaning as asic_id from CONFIG_DB. The former is used to extract the instance ID as described by systemd services, while the later is intended to be used internally by the platform to identify a particular asic (like PCI device ID, see also https://github.com/Azure/sonic-buildimage/blob/master/dockers/docker-orchagent/orchagent.sh#L28-L42).

I think asic_id in minigraph.py is misleading as it really is the "instance name" as defined by systemd.unit manual.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vganesan-nokia

It looks like asic_name in minigraph.py is actually the asic's hostname. Adding device to minigraph.xml will basically replicate asic's Hostname. I think it would be fair here to simply do:

        results['DEVICE_METADATA']['localhost']['asic_name'] = asic_name

or, if it makes more sense to name it device_name in CONFIG_DB:

        results['DEVICE_METADATA']['localhost']['device_name'] = asic_name

Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vganesan-nokia

It looks like asic_name in minigraph.py is actually the asic's hostname. Adding device to minigraph.xml will basically replicate asic's Hostname. I think it would be fair here to simply do:

        results['DEVICE_METADATA']['localhost']['asic_name'] = asic_name

or, if it makes more sense to name it device_name in CONFIG_DB:

        results['DEVICE_METADATA']['localhost']['device_name'] = asic_name

Let me know what you think.

I think both in minigraph and config_db "asic_name" takes the value "asic0", "asic1" and so on. Also note that asic_name is nothing but the namespace passed to parse_xml() by sonic-cfggen. And we have a separate config_db attribute for "hostname" which is "Hostname" in minigraph. So I guess "asic_name" cannot be hostname but should be considered as asic instance name. Therefore, we do not need "InstanceName" and "Device" in minigraph instead use the "asic_name".

The "hostname" which is used for both single asic and multi-asic should be equivalent to the line card in chassis context. So we can consider the "Hostname" for "slot_id" and avoid introducing "SlotId" in minigraph.

So my recommendations are:
(1) Set config db ['DEVICE_METADATA']['localhost']['asic_name'] = asic_name, as you suggested.
(2) Set config_db ['DEVICE_METADATA']['localhost']['hostname'] = hostname
(3) Remove "InstanceName" and use "asic_name" for instance name if instance name is needed
(4) For system port table: Remove "SlotId" and "Device". Use "hostname" and "asic_name" instead and derive system port key using the format "hostname|asic_name|portname"

I think (1) and (2) are already there as part of multi-asic. I feel that it is better to extend the use of existing multi-asic attributes for voq chassis systems also instead of introducing new attributes in config db and minigraph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vganesan-nokia ,

Regarding this point:

(4) For system port table: Remove "SlotId" and "Device". Use "hostname" and "asic_name" instead and derive system port key using the format "hostname|asic_name|portname"

Just to confirm that we are on the same page, we still need to add Hostname and AsicName to each SystemPort because they are defined globally. We can't infer hostname and asic_name while parsing each individual system port.

Copy link
Contributor

@vganesan-nokia vganesan-nokia Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. We need "hostname" and "asic_name" in the DEVICE_METADATA and should use these for deriving the system port name something like "<hostname>|<asic_name>|Ethernet1". The recommendation (4) is to say that we do not need to define "SlotId", 'Device" and "InstanceName" in minigraph. We already have "hostname" and "asic_name" and "asic_name" is same as instance name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to define these in the minigraph, but we still do need to use them in SystemPort, i.e.:

        <SystemPort>
          <Name>Ethernet1/3</Name>
          <Hostname>multi_npu_platform_01</Hostname>
          <AsicName>ASIC0</AsicName>
          [...]
        </SystemPort>

The reason for that is that the minigraph contains all system ports defined in the chassis, so each system port has to include hostname and asic_name of the system port.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sounds good to me. It will be good to have the values of HostName and AsicName consistent with values in DEVICE_METADATA "hostname" and "asic_name". The reason is: the values of ['DEVICE_METADATA']['localhost']['hostname'] and ['DEVICE_METADATA']['localhost']['asic_name'] are used for deriving the system lag name. It will be good to have naming convention consistent for system port also. So for here, if we make sure HostName and AsicName, use the same corresponding values (the values of ['DEVICE_METADATA']['localhost']['hostname'] and ['DEVICE_METADATA']['localhost']['asic_name'], it will be great.

@@ -1192,6 +1252,25 @@ def parse_xml(filename, platform=None, port_config_file=None, asic_name=None, hw
current_device['sub_role'] = sub_role
results['DEVICE_METADATA']['localhost']['sub_role'] = sub_role
results['DEVICE_METADATA']['localhost']['asic_name'] = asic_name
# on Voq system each asic has a switch_id
Copy link
Contributor

@vganesan-nokia vganesan-nokia Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VoQ system needs "asic_id" also. Setting of asic_id in ['DEVICE_METADATA']['localhost']['asic_id''] is done in sonic-cfggen. But this asic_id will be in simple number format like 0, 1, etc. Please make sure that this is over-written by asic_id (in either PCI device id format or simple number) from default_config.json before orchagent starts as discussed in sonic-net/sonic-utilities#1228.

@lgtm-com
Copy link

lgtm-com bot commented Mar 16, 2021

This pull request introduces 1 alert when merging d5fe4253932597614b97d179bfbbe0ed59ee4a5a into 43d4d45 - view on LGTM.com

new alerts:

  • 1 for Mismatch in multiple assignment

@mlorrillere mlorrillere force-pushed the pr-voq-minigraph branch 3 times, most recently from 973a926 to 509a315 Compare March 22, 2021 20:36
@mlorrillere mlorrillere force-pushed the pr-voq-minigraph branch 2 times, most recently from 511b834 to 6d511db Compare April 1, 2021 22:10
@lgtm-com
Copy link

lgtm-com bot commented Apr 1, 2021

This pull request introduces 1 alert when merging 6d511db8ebd653b4623dae5accd953d2ae2df6c7 into aa63c71 - view on LGTM.com

new alerts:

  • 1 for Mismatch in multiple assignment

@lgtm-com
Copy link

lgtm-com bot commented Apr 1, 2021

This pull request introduces 1 alert when merging 8448c1448f903ca490533be65dddece4451b188f into aa63c71 - view on LGTM.com

new alerts:

  • 1 for Mismatch in multiple assignment

@mlorrillere
Copy link
Contributor Author

Hi @vganesan-nokia and @arlakshm ,

Last changes include updated unit test with a dedicated sample minigraph XML file and address other comments related to InstanceName and SystemPorts .

output = json.loads(self.run_script(argument))
self.assertEqual(output['localhost']['switch_id'], '0')
self.assertEqual(output['localhost']['switch_type'], 'voq')
self.assertEqual(output['localhost']['max_cores'], '16')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we checking for the "asic_id", "asic_name", and "hostname" in any other test case? If not would you please add verification (assertEqual) for ['localhost']['asic_id'], ['localhost']['hostname'] and ['localhost']['asic_name'] here? These are mandatory attributes if "switch_type" is "voq".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asic_id and asic_name are multi-asic attributes, not specific to VoQ switches. I don't think we need either of them on single-asic if switch_type is "voq". On single-asic devices where asic_name is not specified the system port key is "hostname|portname".

For checks:

  • asic_name is tested in test_multinpu_cfggen.py but not asic_id.
  • hostname is not specific to VoQ switches but I didn't see any check for this one.

I'll update test_cfggen.py to add a check for hostname and test_multinpu_cfggen.py for asic_id.

@vganesan-nokia
Copy link
Contributor

Hi @vganesan-nokia and @arlakshm ,

Last changes include updated unit test with a dedicated sample minigraph XML file and address other comments related to InstanceName and SystemPorts .

@mlorrillere, thanks for addressing comments. Things look good. I have left a minor comment in test_cfggen.py.

@mlorrillere
Copy link
Contributor Author

Hi @vganesan-nokia ,

Last changes include making asic_name mandatory for VoQ switches and updated unit test for hostname and asic_name.

vganesan-nokia
vganesan-nokia previously approved these changes Apr 9, 2021
@mlorrillere
Copy link
Contributor Author

Hi @arlakshm and @lguohan

Can you approve and merge this review if there are no further comments?

Thanks,
Maxime

Contains the following changes:
- Add support for system ports configuration to minigraph
- Add support for SwitchId, SwitchType and MaxCores to minigraph
- Add support for inband vlan configuration in minigraph
- `asic_name` is now a mandatory attribute in CONFIG_DB on VoQ switches
Copy link
Contributor

@arlakshm arlakshm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@arlakshm arlakshm merged commit a92da83 into sonic-net:master Apr 27, 2021
raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-buildimage that referenced this pull request May 23, 2021
This commit contains the following changes to support for configuring a VoQ switch using a minigraph.xml file.:
- Add support for system ports configuration to minigraph
- Add support for SwitchId, SwitchType and MaxCores to minigraph
- Add support for inband vlan configuration in minigraph
- `asic_name` is now a mandatory attribute in CONFIG_DB on VoQ switches

Co-authored-by: Maxime Lorrillere <[email protected]>
carl-nokia pushed a commit to carl-nokia/sonic-buildimage that referenced this pull request Aug 7, 2021
This commit contains the following changes to support for configuring a VoQ switch using a minigraph.xml file.:
- Add support for system ports configuration to minigraph
- Add support for SwitchId, SwitchType and MaxCores to minigraph
- Add support for inband vlan configuration in minigraph
- `asic_name` is now a mandatory attribute in CONFIG_DB on VoQ switches

Co-authored-by: Maxime Lorrillere <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Chassis 🤖 Modular chassis support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants