Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orchagent changes in sonic-swss submodule to support NAT feature. #1125

Merged
merged 14 commits into from
Jan 19, 2020

Conversation

AkhileshSamineni
Copy link
Contributor

@AkhileshSamineni AkhileshSamineni commented Nov 11, 2019

Added orchagent and zone related changes.

Link to NAT HLD : https://github.com/Azure/SONiC/blob/master/doc/nat/nat_design_spec.md

Depends on:
sonic-swss : #1059 and #1126
sonic-swss-common : sonic-net/sonic-swss-common#304
sonic-linux-kernel : sonic-net/sonic-linux-kernel#100
sonic-sairedis : sonic-net/sonic-sairedis#519 and sonic-net/sonic-sairedis#546

Signed-off-by: Akhilesh Samineni [email protected]

AkhileshSamineni added a commit to AkhileshSamineni/sonic-swss that referenced this pull request Nov 11, 2019
AkhileshSamineni added a commit to AkhileshSamineni/sonic-swss that referenced this pull request Nov 13, 2019
sai_attribute_t attr;
memset(&attr, 0, sizeof(attr));
attr.id = SAI_SWITCH_ATTR_AVAILABLE_SNAT_ENTRY;
maxAllowedNatEntries = 0;
Copy link
Contributor

@arlakshm arlakshm Nov 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rename this variable as we are using this to check the number SNAT entries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed this variable to maxAllowedSNatEntries.

attr.id = SAI_SWITCH_ATTR_AVAILABLE_SNAT_ENTRY;
maxAllowedNatEntries = 0;

status = sai_switch_api->get_switch_attribute(gSwitchId, 1, &attr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move this SAI call to when NAT is enabled from CLI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get operation on a unimplemented SAI attribute doesn't result in crash, so retaining in current changes as discussed.

if (m_natEntries.find(ip_address) != m_natEntries.end())
{
SWSS_LOG_INFO("Duplicate %s %s NAT entry with ip %s and it's translated ip %s, do nothing",
entry.entry_type.c_str(), entry.nat_type.c_str(), ip_address.to_stri
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be can do this check at the beginning of the function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAT entries have to be cached irrespective of NAT is enabled or not.

if (m_natEntries.find(ip_address) != m_natEntries.end())
{
SWSS_LOG_INFO("Duplicate %s %s NAT entry with ip %s and it's translated ip %s, do nothing",
entry.entry_type.c_str(), entry.nat_type.c_str(), ip_address.to_stri
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are counters being updated here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing dynamic entry is replaced by static entry as Static entry has higher priority, so decremented the dynamic and incremented the static counters.

if (m_natEntries.find(ip_address) != m_natEntries.end())
{
SWSS_LOG_INFO("Duplicate %s %s NAT entry with ip %s and it's translated ip %s, do nothing",
entry.entry_type.c_str(), entry.nat_type.c_str(), ip_address.to_stri
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to clean up the entries which are added to HW as well ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will also delete the entries from hardware.

if (m_natEntries.find(ip_address) != m_natEntries.end())
{
SWSS_LOG_INFO("Duplicate %s %s NAT entry with ip %s and it's translated ip %s, do nothing",
entry.entry_type.c_str(), entry.nat_type.c_str(), ip_address.to_stri
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, this notification comes when a L3 interface is added to NAT ZONE, so every time a interface is added to a NAT_ZONE. I have 2 questions:

  • since all the conntrack entries are flushed, the entries in asic will also be removed right ? What will happen to the traffic, they will be dropped ?
  • When we need to flush of conntrack entries from natorch can this be done from natmgr instead ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This notification comes when do "sonic-clear nat translations" is executed. Yes all the entries in the asic are removed. Current traffic flows will see a drop and relearned again and added to asic.

OA is the component that tries to keep the Kernel conntrack entries and the hardware entries in sync. So, it is doing the flush entries.

if (m_natEntries.find(ip_address) != m_natEntries.end())
{
SWSS_LOG_INFO("Duplicate %s %s NAT entry with ip %s and it's translated ip %s, do nothing",
entry.entry_type.c_str(), entry.nat_type.c_str(), ip_address.to_stri
Copy link
Contributor

@arlakshm arlakshm Nov 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current SONiC architecture, the kernel entries are not updated/added/deleted from orchagent. Can you move this functionality to natmgr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OA is responsible for checking the hardware activity/hitbit of each of the entry in its cache. Since the cache is in OA code, it is the appropriate place to update the timeouts for the corresponding entries in the kernel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand OA is checking the hitbit, but in the current SONiC architecture the kernel entries are no update from OA. The mgr modules normally updates the kernel entries

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Arvind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The natmgr has no idea about the hardware activity of nat entries, so it cannot update the entries timeouts in kernel, if natmgr has to update the timeouts, OA has to inform the natmgr through separate communication channel which is an overhead and has it issues with scaling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NAT entries are not updated real time, polling for hit-bit happens every 30 seconds, so adding another hop will not have a big impact on the performance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, lets keep the current code for now. Let's open a issue to track this change. We can make the code changes in the next release.

@@ -73,6 +73,8 @@ static map<string, sai_hostif_trap_type_t> trap_id_map = {
{"udld", SAI_HOSTIF_TRAP_TYPE_UDLD},
{"bfd", SAI_HOSTIF_TRAP_TYPE_BFD},
{"bfdv6", SAI_HOSTIF_TRAP_TYPE_BFDV6}
{"src_nat_miss", SAI_HOSTIF_TRAP_TYPE_SNAT_MISS},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make sure that these trap ids are created only on platform which support NAT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done based on the feature check.

chassisorch.cpp \
debugcounterorch.cpp

orchagent_SOURCES += flex_counter/flex_counter_manager.cpp flex_counter/flex_counter_stat_manager.cpp
orchagent_SOURCES += debug_counter/debug_counter.cpp debug_counter/drop_counter.cpp


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this extra line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -50,6 +50,7 @@ class CoppOrch : public Orch
protected:
object_map m_trap_group_map;
bool enable_sflow_trap;
bool isNatSupported;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the same style for variables as in file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this variable and using 'gIsNatSupported' global variable.

std::set<IpPrefix> getSubnetRoutes();
std::map<string, uint32_t> m_nat_zone;
bool isNatSupported;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use same style

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this variable and using 'gIsNatSupported' global variable.

@@ -1015,3 +1087,30 @@ void IntfsOrch::doTask(SelectableTimer &timer)
}
}
}

void IntfsOrch::getNatSupportedInfo()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you are repeating this same function in multiple places. Can you do this query one time, like in switchorch, and use a single API to check if NAT is supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Moved this functionality to main.cpp file and updating the global variable 'gIsNatSupported'.

m_nat_zone[alias] = nat_zone_id;
if (isNatSupported)
{
setRouterIntfsNatZoneId(port, nat_zone_id);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this call. The router interface is created only in setIntf. So this API is always going to return true after its first check. You are actually setting the zone in as part of Line 808. Can you please check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User configures the new 'nat_zone' even after rif is created, this call is used to set the new 'nat_zone'.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it but the set of RIF attribute is currently handled in setIntf. It is done for SUB_PORT currently but I think can be modified to set NAT zone for any RIF type. Another follow up is, why do we need m_nat_zone map seperate? Can this nat zone be part of the port structure, like m_rif_id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting NAT zone for Physical, Vlan and Lag interfaces only, so not handled in setIntf.
Yes, nat zone can be part of port structure, made changes to address it.

@@ -739,6 +796,21 @@ bool IntfsOrch::addRouterIntfs(sai_object_id_t vrf_id, Port &port)
attr.value.u32 = port.m_mtu;
attrs.push_back(attr);

if (isNatSupported)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set operation on a unimplemented SAI attribute (SAI_ROUTER_INTERFACE_ATTR_NAT_ZONE_ID as of SAI 1.5) results in crash, to avoid it added this check here.

attr.id = SAI_ROUTER_INTERFACE_ATTR_NAT_ZONE_ID;
if (m_nat_zone.find(port.m_alias) == m_nat_zone.end())
{
attr.value.u32 = DEFAULT_NAT_ZONE_ID;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not required, its the SAI default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, removed it.

else
{
attr.value.u32 = m_nat_zone[port.m_alias];
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can revisit the check as if natzone is present in configuration, set the attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, modified the code to set the attribute if it is present in cache.

sai_samplepacket_api_t* sai_samplepacket_api;
sai_debug_counter_api_t* sai_debug_counter_api;


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this extra line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@kirankella
Copy link
Contributor

The current nexthop tracking in OA works on different SAI implementations (those which track and those which do not track the nexthop changes) though it will be sub-optimal on the SAI that tracks too. We can proceed as-is with the current OA changes till the SAI implementations converge.

@AkhileshSamineni
Copy link
Contributor Author

Retest this please.

AkhileshSamineni added a commit to AkhileshSamineni/sonic-swss that referenced this pull request Dec 10, 2019
AkhileshSamineni added a commit to AkhileshSamineni/sonic-swss that referenced this pull request Dec 10, 2019
prsunny
prsunny previously approved these changes Dec 20, 2019
@arlakshm
Copy link
Contributor

@AkhileshSamineni, @kirankella,
There seems to conflicts with the with the PR. Please resolve the conflicts.

arlakshm
arlakshm previously approved these changes Dec 20, 2019
@AkhileshSamineni
Copy link
Contributor Author

AkhileshSamineni commented Dec 21, 2019

@AkhileshSamineni, @kirankella,
There seems to conflicts with the with the PR. Please resolve the conflicts.

@arlakshm Resolved merge conflicts.

@AkhileshSamineni
Copy link
Contributor Author

retest this please.

@AkhileshSamineni
Copy link
Contributor Author

Retest this please.

2 similar comments
@AkhileshSamineni
Copy link
Contributor Author

Retest this please.

@AkhileshSamineni
Copy link
Contributor Author

Retest this please.

@kirankella
Copy link
Contributor

retest vs please

@rlhui rlhui merged commit ea8b1da into sonic-net:master Jan 19, 2020
@kirankella
Copy link
Contributor

retest vs please

lguohan added a commit that referenced this pull request Jan 30, 2020
abdosi pushed a commit that referenced this pull request Feb 4, 2020
EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
Add more dependencies to setup.py. Now that we're building as a wheel, we can add all dependent packages to make building and installation more robust, as there will no longer be a need to install them explicitly.

Move sonic-config-engine from `install_requires` to `tests_require`, as it is only needed by the unit tests.

Remove code to install fastentrypoints package if not installed. When building a Python 3 wheel using python3 setup.py bdist_wheel, we would receive a permissions error when calling easy_install.main(['fastentrypoints']), as with Python 3 it appears elevated permissions are required to install the package. This appears to be a behavior change between the Python 2 and Python 3 versions of setuptools. This needs to be explicitly installed before building the package. We are already installing in the SONiC slave container. Will update the sonic-utilities README to specify, also.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants