Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EVPN Multi Home Support #2084

Merged
merged 2 commits into from
Dec 18, 2024
Merged

EVPN Multi Home Support #2084

merged 2 commits into from
Dec 18, 2024

Conversation

JaiOCP
Copy link
Contributor

@JaiOCP JaiOCP commented Oct 1, 2024

Host may be reachable via multiple VTEPs in the EVPN topology. In this case load balancing need to happen across VTEPs.

This is achieved using a new bridge port type SAI_BRIDGE_PORT_TYPE_BRIDGE_PORT_NEXT_HOP_GROUP that is going to use SAI_BRIDGE_PORT_ATTR_BRIDGE_PORT_NEXT_HOP_GROUP_ID for nexthop selection. Nexthop wll be the one of the available VTEP tunnels.

There can be an associated protection NHG configured SAI_BRIDGE_PORT_ATTR_BRIDGE_PORT_PROTECTION_NEXT_HOP_GROUP_ID, as a backup path in case the bridge port goes down.

This PR is open in lieu of
#2058

@JaiOCP JaiOCP mentioned this pull request Oct 1, 2024
inc/saibridge.h Outdated Show resolved Hide resolved
@tjchadaga tjchadaga added the reviewed PR is discussed in SAI Meeting label Oct 24, 2024
- SAI_OBJECT_TYPE_ISOLATION_GROUP_MEMBER isogrp_mbr_oid_31 corresponding to isogrp_oid_3+bp_lag_oid_2 ( Traffic from VTEP-3 will be dropped on LAG-2 )
- SAI_OBJECT_TYPE_ISOLATION_GROUP_MEMBER isogrp_mbr_oid_41 corresponding to isogrp_oid_4+bp_lag_oid_1 ( Traffic from VTEP-4 will be dropped on LAG-1 )
- SAI_OBJECT_TYPE_ISOLATION_GROUP_MEMBER isogrp_mbr_oid_42 corresponding to isogrp_oid_4+bp_lag_oid_2 ( Traffic from VTEP-4 will be dropped on LAG-2 )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please double check the naming of isogrp_mbr_oid. I think the first two can be renamed to isogrp_mbr_oid_22 and isogrp_mbr_oid_32 respectively.
nit: swap lines 3 and 4, so 3 lines for LAG-2 are together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@ashutosh-agrawal ashutosh-agrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

vector<sai_attribute_t> attrs;

attr.id = SAI_FDB_ENTRY_ATTR_TYPE;
attr.value.s32 = SAI_FDB_ENTRY_TYPE_DYNAMIC;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be or TYPE STATIC ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it should be STATIC. Changed the doc.

attr.value.oid = bp_nh_grp_1_oid;
attrs.push_back(attr);

attr.id = SAI_FDB_ENTRY_ATTR_ALLOW_MAC_MOVE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this operation have any relation with saihost "SAI_HOSTIF_TRAP_TYPE_STATIC_FDB_MOVE". Default command is set to "drop". Does application need to configure explicit trap object with action "forward" ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that ALLOW_MAC_MOVE is set to true there will be no need to
trap/drop the packets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that ALLOW_MAC_MOVE is set to true there will be no need to trap/drop the packets.

So the assumption here is that "ALLOW_MAC_MOVE = true" on specific fdb entry overrides the packet command configured on SAI_HOSTIF_TRAP_TYPE_STATIC_FDB_MOVE ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the discussion on the above in the below PR where ALLOC_MAC_MOVE was introduced.

#1024

inc/saivlan.h Outdated
* @brief Indicates if the bridge port is set to drop the broadcast, unknown unicast and multicast traffic
*
* When set to true, ingress BUM traffic will be dropped
* Valid only when the SAI_VLAN_MEMBER_ATTR_BRIDGE_PORT_ID is of type PORT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these new attributes, can we indicate the conditions using "@Validonly" flag ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SAI_VLAN_MEMBER_ATTR_BRIDGE_PORT_ID is an oid. So I am not clear how one can formalise this in a @validonly statement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SAI_VLAN_MEMBER_ATTR_BRIDGE_PORT_ID is an oid. So I am not clear how one can formalise this in a @validonly statement.

Looks like SAI infra has no way to mention this. We can go by comment description. Thanks.

@@ -85,6 +85,8 @@ typedef enum _sai_bridge_port_type_t
/** Bridge tunnel port */
SAI_BRIDGE_PORT_TYPE_TUNNEL,

/** Bridge port next hop group */
SAI_BRIDGE_PORT_TYPE_BRIDGE_PORT_NEXT_HOP_GROUP,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have this named as SAI_BRIDGE_PORT_TYPE_NEXT_HOP_GROUP ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specific to NHG of type bridgeport and hence this name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JaiOCP Please comment if the extra BRIDGE_PORT in the name can be dropped ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rational for naming it like is to be explicit in intent that the NHG type is BRIDGE_PORT. Though the attributes are for bridge port object and it may seem redundant but the NHG are of many type and one used here is of type bridge port.

It would be good to clarify this in comments to avoid such confusion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated comments.

* @objects SAI_OBJECT_TYPE_NEXT_HOP_GROUP
* @condition SAI_BRIDGE_PORT_ATTR_TYPE == SAI_BRIDGE_PORT_TYPE_BRIDGE_PORT_NEXT_HOP_GROUP
*/
SAI_BRIDGE_PORT_ATTR_BRIDGE_PORT_NEXT_HOP_GROUP_ID,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above comment, can we name it as SAI_BRIDGE_PORT_ATTR_NEXT_HOP_GROUP_ID ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same response as above.

*
* When set to true, egress BUM traffic will be dropped
* Valid only when the SAI_VLAN_MEMBER_ATTR_BRIDGE_PORT_ID is of type PORT.
* Valid only for .1Q bridge ports.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the comment be made explicit and state it as below to include both port and lag ?

  • Valid only when the SAI_VLAN_MEMBER_ATTR_BRIDGE_PORT_ID is of type SAI_BRIDGE_PORT_TYPE_PORT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack


- Tunnels of peer mode P2P are only considered here. The isolation group object is used to achieve the split horizon functionality.
- There is no change to the Isolation group and group member definition.
- Bridgeport of type TUNNEL will also now have the isolation group attribute set.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the comment more explicit, stating only 'SAI_ISOLATION_GROUP_TYPE_BRIDGE_PORT' is applicable to realize split horizon filtering - with applicable bridge port types - SAI_BRIDGE_PORT_TYPE_PORT and SAI_BRIDGE_PORT_TYPE_TUNNEL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

* @allownull true
* @default SAI_NULL_OBJECT_ID
*/
SAI_BRIDGE_PORT_ATTR_BRIDGE_PORT_PROTECTION_NEXT_HOP_GROUP_ID,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid repeating 'BRIDGE_PORT' in the attribute names ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responded earlier.


### 3.2.4 Fast Failover support

A new Bridge port attribute (for type PORT) is introduced to specify the protection nexthop group ID.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be explicit and use 'SAI_BRIDGE_PORT_TYPE_PORT' when you say PORT. As all these cases are applicable to port and lag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.

*
* @type sai_object_id_t
* @flags CREATE_AND_SET
* @objects SAI_OBJECT_TYPE_NEXT_HOP_GROUP

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the next hop group type be explicitly stated as SAI_NEXT_HOP_GROUP_TYPE_PROTECTION ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NH group type should be of type BRIDGE_PORT. Will mention this

* @type bool
* @flags CREATE_AND_SET
* @default false
* @validonly SAI_BRIDGE_PORT_ATTR_TYPE == SAI_BRIDGE_PORT_TYPE_PORT or SAI_BRIDGE_PORT_ATTR_TYPE == SAI_BRIDGE_PORT_TYPE_SUB_PORT or SAI_BRIDGE_PORT_ATTR_TYPE == SAI_BRIDGE_PORT_TYPE_TUNNEL

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need bridge port of type "SAI_BRIDGE_PORT_TYPE_TUNNEL" in DF election ? Any use case?
It would be helpful if we can extend "4.3 DF workflow" with bridge port type as SAI_BRIDGE_PORT_TYPE_TUNNEL.

Copy link
Contributor

@srj102 srj102 Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are for "Interconnect ES" (RFC 9014) wherein the ES could be a Tunnel.

### 3.2.5 Single Active Redundancy Mode support

Single Active redundancy mode requires unicast and BUM traffic to be dropped in the ingress and egress directions
on the client port.
Copy link

@skumar041 skumar041 Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shall use "multi-homed host/server" or "Multihomed ES port" instead of client, applicable for all places wherever client port is being used in this document

Copy link
Contributor

@srj102 srj102 Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of single active redundancy the client is still single homed whereas in other scenarios it is dual homed client. Since the diagrams make it explicit I don't think it needs to be elaborated.

* @flags MANDATORY_ON_CREATE | CREATE_ONLY
* @condition SAI_NEXT_HOP_ATTR_TYPE == SAI_NEXT_HOP_TYPE_IP or SAI_NEXT_HOP_ATTR_TYPE == SAI_NEXT_HOP_TYPE_MPLS or SAI_NEXT_HOP_ATTR_TYPE == SAI_NEXT_HOP_TYPE_TUNNEL_ENCAP or SAI_NEXT_HOP_ATTR_TYPE == SAI_NEXT_HOP_TYPE_BRIDGE_PORT
*/
SAI_NEXT_HOP_ATTR_IP,
Copy link

@skumar041 skumar041 Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need attr SAI_NEXT_HOP_ATTR_IP for SAI_NEXT_HOP_TYPE_BRIDGE_PORT? Is this for SAI_TUNNEL_PEER_MODE_P2MP?

if yes, please mention this under @brief .

Copy link
Contributor

@srj102 srj102 Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for P2MP. Same behaviour as existing SAI_NEXT_HOP_TYPE_TUNNEL_ENCAP and hence not mentioning it explicitly.

nhg_attr.value.s32 = SAI_NEXT_HOP_GROUP_TYPE_BRIDGE_PORT;
nhg_attrs.push_back(nhg_attr);

sai_object_id_t next_hop_group_id;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shall use "sai_object_id_t nh_grp_oid_1"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

* @allownull true
* @default SAI_NULL_OBJECT_ID
*/
SAI_BRIDGE_PORT_ATTR_BRIDGE_PORT_PROTECTION_NEXT_HOP_GROUP_ID,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, instead of having the protectin at nexthop group level, having protection for each nexthop(NHG member) might helps.

Assume, VTEP-5, had a nexthop group with members VTEP-{1,2,3,4}. what will be the backup members for this NHG. instead VTEP-1, can be protected with one member from VTEP-{2,3,4} and same way, VTEP-2 can be protected with one of the VTEP from VTEP-{1,3,4}.
Generally, not all members in a give nexthop group will not go down, so under what condition, switchover to backup Nexthop group is Initiated. if backup NHG also constituted from same VTEP list{1,2,3,4} by skipping one member. how does NOS knowns which member will go down

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you look at section 4.4 Fast Failover workflow and see if it clarifies ?

Copy link

@sathk-marvell sathk-marvell Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it didn't explain how the backup members are constructed. SONIC HLD mentions that we can remove the, member from L2-NHG when VTEP advertises ES_ID Down notification, so that all FDBs learnt on the Ethernet segment will be updated with a single forwading update(NHG member delete). but for that we don't need backup protection next-hop group, simple member removal from Primary NHG, when VTEP advertises ES_ID down notification.

Still not clear why the backup NHG is needed and how it will be constructed

Copy link
Contributor

@srj102 srj102 Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4.4 has no VTEP-5 in the diagram.

As mentioned in 4.4,
Step1 - Traffic from Host1 received on LAG1 and fwded to Host2 over LAG2.
This is due to Local bias.
Step2 - When LAG2 fails, Traffic will be sent to a backup NHG comprising of
the remaining members sharing the ES.

This is the reason why the backup exists and is not for the scenario you mention.

This is a different flow from 4.1 which you seem to be referring to.

As to how the backup NHG is constructed and for more queries on the need for this usecase please raise this query in the SONiC EVPN-MH HLD PR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explaination, it addressed my query

mem_attr[0].id = SAI_ISOLATION_GROUP_MEMBER_ATTR_ISOLATION_GROUP_ID;
mem_attr[0].value.oid = isogrp_oid_2;
mem_attr[1].id = SAI_ISOLATION_GROUP_MEMBER_ATTR_ISOLATION_OBJECT;
mem_attr[1].value.oid = bp_po_oid_2;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is bp_po_oid_2 ? No other reference to this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can attaching the isolation group also be covered in the workflow.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can workflow for realizing local-bias be included ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bp_po_oid_2 should be renamed as bp_lag_oid_2. Will do the change.

Following lines show isolation group being attached to the bridgeport.
505 attr.id = SAI_BRIDGE_PORT_ATTR_ISOLATION_GROUP;
506 attr.value.oid = isogrp_oid_2;
507 status = sai_bridge_api->set_bridge_port_attribute(bp_tnl_oid_2, &attr);

Local bias is basically MAC being HW learnt locally or MAC learnt over BGP from the peers being made to point to the local ES. These are existing constructs and fairly basic and hence did not cover them.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srj102 , regarding local bias, Assume this scenario. In section "4.3 DF workflow".
a) BUM traffic originates from a single homed server (say HOST-3) connected to VTEP-4.
b) This BUM packet reaches VTEP-3, which is the DF-node. It should not forward the packet to HOST-2.
c) At VTEP-3, this filtering should be based on 'ES and SIP of VTEP-4'.
How is this expected to be realized in your proposal ? I understand you are not considering P2MP now and the split horizon explained in 4.2 is expected to take effect. But is this proposal future extensible to support P2MP later ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposal does not add anything specific for Split Horizon+Local bias.
It just suggests a mechanism using existing SAI spec to realise split horizon + Local bias with P2P as an example.

For P2MP, ACLs could be used and this would work for P2P as well.

SAI_ACL_ENTRY_ATTR_FIELD_TUNNEL_TERMINATED
SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP
SAI_ACL_ENTRY_ATTR_ACTION_EGRESS_BLOCK_PORT_LIST

Note however that the EGRESS_BLOCK_PORT_LIST has the below comment.

This action would be deprecated in future release. To achieve this
* functionality use isolation group

So whenever this is deprecated, the isolation group will have to be enhanced to support P2MP tunnels.

@kcudnik
Copy link
Collaborator

kcudnik commented Nov 4, 2024

please fix build errors, you will need to squash your commits and force push, since validator is checking each commit separately

@srj102 srj102 force-pushed the evpn-mh branch 2 times, most recently from cd78928 to cb197ed Compare November 7, 2024 15:04
Signed-off-by: JaiOCP <[email protected]>

Updated EVPN MH workflow document

Signed-off-by: Rajesh Sankaran <[email protected]>
@srj102
Copy link
Contributor

srj102 commented Nov 13, 2024

Can all the reviewers signoff if there are no outstanding comments to be handled ?

@adyeung
Copy link

adyeung commented Nov 14, 2024

Can all the reviewers signoff if there are no outstanding comments to be handled ?

@helloanandhi @sathk-marvell @skumar041 @kcudnik The proposal has been reviewed in the community call, if there is no further comments, please signoff and approve

Signed-off-by: Rajesh Sankaran <[email protected]>
@srj102
Copy link
Contributor

srj102 commented Dec 11, 2024

@tjchadaga could you please advise why the below check is stuck ?

"opencomputeproject.SAIExpected — Waiting for status to be reported"

@tjchadaga
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tjchadaga
Copy link
Collaborator

@kcudnik - could you please help sign off?

@tjchadaga tjchadaga merged commit 0aba236 into opencomputeproject:master Dec 18, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reviewed PR is discussed in SAI Meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants