Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

portsorch: initial support for link-training and remote advertisement #2149

Closed
wants to merge 1 commit into from

Conversation

ds952811
Copy link
Contributor

@ds952811 ds952811 commented Feb 18, 2022

Signed-off-by: Dante Su [email protected]

What I did

  1. Add Link-Training support to portsorch, while Gearbox is not in the scope
  2. Add support for remote speed advertisement
  3. Add support for capability checker

Why I did it

  1. In the case of DAC, static pre-calibrated pre-emphasis is rarely available on SONIC, as most of the ODM are expecting this to be done dynamically at runtime via link-training, hence we'll need this feature to improve the link quality
  2. Help users identify the cause of autoneg link failure
  3. Avoid un-necessary SAI calls for autoneg and link-training controls if this feature is not supported

How I verified it

  1. Manual test
  2. Ran the Unit-tests to the corresponding changes

Details if related

HLD: sonic-net/SONiC#924, sonic-net/SONiC#925

@@ -296,9 +317,11 @@ static bool isValidPortTypeForLagMember(const Port& port)
PortsOrch::PortsOrch(DBConnector *db, DBConnector *stateDb, vector<table_name_with_pri_t> &tableNames, DBConnector *chassisAppDb) :
Orch(db, tableNames),
m_portStateTable(stateDb, STATE_PORT_TABLE_NAME),
m_stateLinkTrainingTable(stateDb, STATE_LINK_TRAINING_TABLE_NAME),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a separate link training table? Why can't we add two fields to existing port table in State DB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's now merged into 'link_training_status' of PORT_TABLE in STATE_DB


auto executor = new ExecutableTimer(m_timer, this, "PORT_STATUS_POLL");
Orch::addExecutor(executor);
m_timer->start();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timer shouldn't be enabled if the platform doesn't support link training or there is not link training config present.

Copy link
Contributor Author

@ds952811 ds952811 Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be migrated to use the implementation in #2215, which is for AutoNeg, and it will only activate the poll when the port is admin up, link_training feature is available and is enabled.

sai_attribute_t attr;

attr.id = SAI_PORT_ATTR_SUPPORTED_LINK_TRAINING_MODE;
status = sai_port_api->set_port_attribute(port.m_port_id, &attr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this logic. This is not the way to query and understand capabilities. We have separate API sai_query_attribute_capability. Please use that API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in #2215, sai_query_attribute_capability does not support per-port basis capability query, while the AN/LT may only be available on certain ports on the switch ASIC

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the API used should have been get_port_attribute instead of set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will fix it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, sorry about that, I thought I did this, but it's on the AutoNeg PR.

{
string supp;

if (m_stateLinkTrainingTable.hget(port.m_alias, "supported", supp))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use internal Port data structure to cache the capability rather than writing to DB and reading from it within the same process

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now cached as member field of the internal Port data structure

@@ -112,6 +113,26 @@ static map<string, int> autoneg_mode_map =
{ "off", 0 }
};

static map<string, int> link_training_mode_map =
{
{ "on", 1 },
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use a bool instead of int? This will make code much simpler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to distinguish 'NOT CONFIURED' and 'OFF', we need -1 for this, and this is the same approach for the AutoNeg in the current implementation of SONiC

}
continue;
}
m_stateLinkTrainingTable.hset(p.m_alias, "status", lt > 0 ? "on" : "off");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use lt_str as such in status?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will do.

fvs = swsscommon.FieldValuePairs([("link_training","on")])
tbl.set("Ethernet4", fvs)

# validate if autoneg false is pushed to asic db when set first time
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comments, should be link-training instead of autoneg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will fix it

expected_fields = {"SAI_PORT_ATTR_LINK_TRAINING_ENABLE":"false"}
adb.wait_for_field_match("ASIC_STATE:SAI_OBJECT_TYPE_PORT", port_oid, expected_fields)

# validate if autoneg true is pushed to asic db when set first time
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comments, should be link-training instead of autoneg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will fix it

sai_attribute_t attr;

attr.id = SAI_PORT_ATTR_SUPPORTED_LINK_TRAINING_MODE;
status = sai_port_api->set_port_attribute(port.m_port_id, &attr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the API used should have been get_port_attribute instead of set.

sai_status_t status;
sai_attribute_t attr;

#ifdef SAI_PORT_ATTR_SUPPORTED_LINK_TRAINING_MODE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this ifdef needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the build failure in the PR and also the backward compatibility of the legacy SAI implementation, where SAI_PORT_ATTR_SUPPORTED_LINK_TRAINING_MODE is not implemented, while the other control attributes are supported

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot have #ifdef in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -61,6 +61,7 @@ extern string gMyAsicName;
#define DEFAULT_VLAN_ID 1
#define MAX_VALID_VLAN_ID 4094

#define PORT_STAT_POLLING_SEC 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe 1 second polling is aggressive for this scenario. We should change it to 5 or 10 seconds. @prsunny What is your opinion?

Copy link
Collaborator

@dgsudharsan dgsudharsan Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also rename this to PORT_STATE_POLLING_SEC or similar? STAT means port statistics (like rx packets/bytes ) which is a different thing altogether

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's done

@@ -2351,6 +2426,9 @@ bool PortsOrch::initPort(const string &alias, const string &role, const int inde
/* Create associated Gearbox lane mapping */
initGearboxPort(p);

/* Initialize port capabilities */
initPortCapLinkTraining(p);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function loops through all the ports and get capability. This would increase the port initialization time which is a bottleneck in fastboot scenarios. If the user has no LT configs this operation is not needed.
Instead init the capability only when the user configures LT and capability not in cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, and it's done

@@ -298,7 +319,8 @@ PortsOrch::PortsOrch(DBConnector *db, DBConnector *stateDb, vector<table_name_wi
m_portStateTable(stateDb, STATE_PORT_TABLE_NAME),
port_stat_manager(PORT_STAT_COUNTER_FLEX_COUNTER_GROUP, StatsMode::READ, PORT_STAT_FLEX_COUNTER_POLLING_INTERVAL_MS, false),
port_buffer_drop_stat_manager(PORT_BUFFER_DROP_STAT_FLEX_COUNTER_GROUP, StatsMode::READ, PORT_BUFFER_DROP_STAT_POLLING_INTERVAL_MS, false),
queue_stat_manager(QUEUE_STAT_COUNTER_FLEX_COUNTER_GROUP, StatsMode::READ, QUEUE_STAT_FLEX_COUNTER_POLLING_INTERVAL_MS, false)
queue_stat_manager(QUEUE_STAT_COUNTER_FLEX_COUNTER_GROUP, StatsMode::READ, QUEUE_STAT_FLEX_COUNTER_POLLING_INTERVAL_MS, false),
m_timer(new SelectableTimer(timespec { .tv_sec = PORT_STATE_POLLING_SEC, .tv_nsec = 0 }))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename m_timer to something more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

sai_attribute_t attr;

attr.id = SAI_PORT_ATTR_SUPPORTED_LINK_TRAINING_MODE;
status = sai_port_api->set_port_attribute(port.m_port_id, &attr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not fixed

}
else
{
port.m_cap_lt = 1; /* Default to 1 for vstest */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this. It means that m_cap_lt will be 1 even if SAI return error. I would suggest to implement unit test if VS test does not support link training.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an issue in VS, do you know where is the source of this SAI model? SAI/bm/p4-sai?
Could we this VS enhancement deal in a separate PR? We'll need to do this for both AutoNeg and LinkTraining

sai_status_t status = sai_port_api->set_port_attribute(port.m_port_id, &attr);
if (status != SAI_STATUS_SUCCESS)
{
SWSS_LOG_ERROR("Failed to set LT %s to port pid:%" PRIx64,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to print the port alias here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return handleSaiSetStatus(SAI_API_PORT, status);
}

SWSS_LOG_INFO("Set LT %s to port pid:%" PRIx64, op.c_str(), port.m_port_id);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to print the port alias here, and maybe it should be NOTICE instead of INFO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


sai_attribute_t attr;
attr.id = SAI_PORT_ATTR_LINK_TRAINING_FAILURE_STATUS;
attr.value.u32 = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is redundant, will have it removed

{
if ((it->second == 0) || !getPort(it->first, port))
{
m_port_state_poll.erase(it);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will invalid the iterator and cause unpredictable behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


typedef enum {
PORT_STATE_POLL_NONE = 0,
PORT_STATE_POLL_AN = 0x00000001, /* Auto Negotiation */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's part of the review for AutoNeg, we're having 2 PRs in review at the same time, and they're using the same port state poller.

}
else
{
m_port_state_poll[port.m_alias] &= ~type;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if it should be remove from m_port_state_poll and stop timer if need

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stop() will only take place inside the poller thread to reduce the race conditions if any.

if (active)
{
m_port_state_poll[port.m_alias] |= type;
m_timer->start();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the timer is already started?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now addressed in the PR below, issue start() to a running poller is no longer having the expiration timer reset.
sonic-net/sonic-swss-common#614

@prsunny prsunny requested a review from prgeor April 28, 2022 17:30
@ds952811 ds952811 changed the title portsorch: initial support for link-training portsorch: initial support for link-training and remote advertisement May 4, 2022
@zhangyanzhao
Copy link
Collaborator

/azpw run Azure sonic-swss

@zhangyanzhao
Copy link
Collaborator

/azpw run Azure.sonic-swss

1 similar comment
@ds952811
Copy link
Contributor Author

/azpw run Azure.sonic-swss

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-swss

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).


if (!lt_str.empty() && (p.m_type == Port::PHY))
{
if (link_training_mode_map.find(lt_str) == link_training_mode_map.end())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to AN, I don't see a lt capability query here. Please add

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentionally removed at the earlier commit

  1. LT is similar to AN, it's a port capability available on a per-port basis, not system-wide on a single switch ASIC
  2. For a port capability available on a per-port basis, SAI_PORT_ATTR_SUPPORTED_XXXX should be introduced into saiport.h
  3. As per the latest comments from Rita, the SAI spec. was finalized around March time frame, it's too late for 202205 right now
  4. Since this is a port capability available on a per-port basis, using SAI_PORT_ATTR_LINK_TRAINING_ENABLE in sai_query_attribute_capability is not going to work, and during the SAI subgroup meeting a few months back, there was a conclusion that per-port capability query should be tracked in a separated PR, as all the existing attributes should all be migrated, not just the new attribute for LT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For your reference, the function prototype of sai_query_attribute_capability is as follows, it does not work on a per-port basis
https://github.com/opencomputeproject/SAI/blob/master/inc/saiobject.h#L221

sai_status_t sai_query_attribute_capability(
        _In_ sai_object_id_t switch_id,
        _In_ sai_object_type_t object_type,
        _In_ sai_attr_id_t attr_id,
        _Out_ sai_attr_capability_t *attr_capability);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a comment to make sure once SAI is there we revisit and add the check? It is important that we ensure it so that devices and specific ports where it is not supported will have check in place to avoid SAI failure thereby prevent orchagent crash

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
if (port.m_admin_state_up && port.m_link_training > 0)
{
updatePortStateLinkTraining(port);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this move inside the below if condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have the if-statement for the port admin state and LT enablement, may I have more details of the additional checker that's missing? Thank you.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to autoneg should updatePortStateLinkTraining be done only when status == SAI_PORT_OPER_STATUS_UP? Else it will be called redundantly during down as updateStatePoll will take care of invoking this.

Copy link
Contributor Author

@ds952811 ds952811 May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of AN

  • Link Up: Refresh AN states
  • Link Down: Clear the remote advertisement

In the case of LT

  • Link Up: Refresh LT states
  • Link Down: Refresh LT states

Hence the updatePortStateLinkTraining() is invoked before checking the operation link status

@@ -6883,3 +7132,115 @@ std::unordered_set<std::string> PortsOrch::generateCounterStats(const string& ty
}
return counter_stats;
}

void PortsOrch::updatePortStateAutoNeg(const Port &port)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name is misleading. Should it be updatePortRemoveAdvSpeeds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a placeholder for the operational AN state, not specific to remote speed only, there are more upcoming in the PRs in the furture (e.g. tx/rx pause, FEC ....)

fvs = swsscommon.FieldValuePairs([("autoneg","on"),("admin_status","up")])
ctbl.set("Ethernet0", fvs)

time.sleep(10)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need 10 second sleep?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you're right now, I'll remove this, thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allow me to rephrase, 10 sec is necessary.
Please note the poll interval is 5 sec, and in a worst-case scenario, the remote advertisement will not be available until 2nd poll.

@@ -2128,6 +2211,32 @@ task_process_status PortsOrch::setPortAutoNeg(sai_object_id_t id, int an)
return task_success;
}

task_process_status PortsOrch::setPortLinkTraining(const Port &port, bool state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is it ensured, LT is not enabled if its not supported on this port?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LT capability checker is now restored, such a request will be blocked in doPortTask()

lt = link_training_mode_map[lt_str];
if (lt != p.m_link_training)
{
auto status = setPortLinkTraining(p, lt > 0 ? true : false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we ensuring if port is admin down then we are not enabling LT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LT could be enabled/disable regardless of the port admin state, it's a set of control registers on the SerDes

}
}

void PortsOrch::updatePortStateLinkTraining(const Port &port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brief comment what this function does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -6883,3 +7132,115 @@ std::unordered_set<std::string> PortsOrch::generateCounterStats(const string& ty
}
return counter_stats;
}

void PortsOrch::updatePortStateAutoNeg(const Port &port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brief comment what this func does

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
if (port.m_admin_state_up && port.m_link_training > 0)
{
updatePortStateLinkTraining(port);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to autoneg should updatePortStateLinkTraining be done only when status == SAI_PORT_OPER_STATUS_UP? Else it will be called redundantly during down as updateStatePoll will take care of invoking this.

sai_status_t status;
sai_attribute_t attr;

attr.id = SAI_PORT_ATTR_SUPPORTED_AUTO_NEG_MODE;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one major concern here. Since SAI enforcing this involves SAI implementing this attribute, do we know whether all vendors support it? If not it might break autoneg in their platforms

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will cause breakage, hence the AN capability will be updated to use 1(supported) by default

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Please use this capability to a warning log in case if SAI returns not supported for the attribute. SAI vendors can then mitigate to implement this attribute without the functional impact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// it's not available in SAI v1.10 paried with SONiC.202205.
// And given that LT is part of AN, it should be okay to use
// SAI_PORT_ATTR_SUPPORTED_AUTO_NEG_MODE as a fallback plan.
attr.id = SAI_PORT_ATTR_SUPPORTED_AUTO_NEG_MODE;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not to use a wrong attribute. Instead add a comment as todo and leave without using capability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

SWSS_LOG_NOTICE("Unable to get %s AN capability", port.m_alias.c_str());
// To avoid breakage on the existing platforms, AN should be 1 by default
port.m_cap_an = 1;
SWSS_LOG_WARN("Unable to get %s AN capability, assumming it's supported",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer you just add "Unable to get AN support capability" in the log. Please remove "assuming it's supported"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

m_portStateTable.hset(port.m_alias, "rmt_adv_speeds", adv_speeds);
}
else
if (port.m_admin_state_up)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the get remote adv speeds is done even when admin state is down and this new logic will restrict it to only when admin up.
This means polling becomes irrelevant. Can you explain the change for the reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the port is admin down, the SerDes is stopped, and it's impossible to get a correct remote advertisement, hence this redundant operation should be skipped if the port is disabled.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then why do we need the poller? Can't we just update the remote advertised speeds on link up? The entire polling logic was introduced to get remote advertisement when link was down. Should that be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion, but that's a different story

  1. MAC disabled + link down
    This is a case we have port.m_admin_state_up == false, while the link is surely down, and the remote advertisement is not available since the MAC/SerDes is disabled

  2. MAC enabled + link down
    This is a case we have port.m_admin_state_up == true, while the link is operationally down, and the remote advertisement could be available as the MAC/SerDes is enabled, frame exchanges are happening on the PHY layer (i.e. It's the internal PHY, in this case. Precisely speaking, it's the TSC drivers on the Broadcom chips)

  3. MAC enabled + link up
    This is a general case of port.m_admin_state_up == true and link is operationally up.

Conclusion:
While the remote advertisement could be available in case 2 and 3, it's not available in case 1, hence the poller should only fetching the status when MAC/SerDes is enabled.

@prgeor
Copy link
Contributor

prgeor commented Jun 20, 2022

"/azp run"

@prgeor
Copy link
Contributor

prgeor commented Jun 22, 2022

@ds952811 can you rebase with latest swss master to see if the build test passes?

@prsunny
Copy link
Collaborator

prsunny commented Jun 22, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

commit 18632a3
Author: Dante Su <[email protected]>
Date:   Mon May 23 12:22:49 2022 +0000

    optimize port state refresh logic

    Signed-off-by: Dante Su <[email protected]>

commit 081d491
Author: ds952811 <[email protected]>
Date:   Mon May 23 02:33:56 2022 +0000

    address review comments

    Signed-off-by: ds952811 <[email protected]>

commit 84bdde4
Author: Dante Su <[email protected]>
Date:   Fri May 20 02:15:59 2022 +0000

    update the default LT capability upon get failures

    Signed-off-by: Dante Su <[email protected]>

commit 0f73666
Author: Dante Su <[email protected]>
Date:   Thu May 19 11:28:38 2022 +0000

    Rename updatePortStatesXX as refreshPortStatesXX

    Signed-off-by: Dante Su <[email protected]>

commit ddd57fe
Author: Dante Su <[email protected]>
Date:   Thu May 19 04:03:13 2022 +0000

    Have AN cap defaults to 1, and use AN attr for LT cap query

    Signed-off-by: Dante Su <[email protected]>

commit 876e605
Author: Dante Su <[email protected]>
Date:   Fri May 13 11:15:12 2022 +0000

    drop LT capability query

    Signed-off-by: Dante Su <[email protected]>

commit 55ced7d
Author: Dante Su <[email protected]>
Date:   Fri Apr 29 13:53:17 2022 +0000

    incorporate autoneg support from PR#2215

    Signed-off-by: Dante Su <[email protected]>

commit a04594e
Author: Dante Su <[email protected]>
Date:   Thu Apr 28 16:33:14 2022 +0000

    address review comments

    Signed-off-by: Dante Su <[email protected]>

commit e9eeb9a
Author: Dante Su <[email protected]>
Date:   Thu Apr 28 15:00:04 2022 +0000

    address review comments

    Signed-off-by: Dante Su <[email protected]>

commit 4ff604d
Author: Dante Su <[email protected]>
Date:   Fri Apr 22 03:51:56 2022 +0000

    Stop the port state poll by default

    Signed-off-by: Dante Su <[email protected]>

commit bdfb8d8
Author: Dante Su <[email protected]>
Date:   Fri Apr 22 03:48:07 2022 +0000

    address review comments

    Signed-off-by: Dante Su <[email protected]>

commit 1c6bda8
Author: Dante Su <[email protected]>
Date:   Mon Apr 18 08:46:21 2022 +0000

    Restore pre-emphasis when LT is transitioned from ON to OFF

    Signed-off-by: Dante Su <[email protected]>

commit 09a9b33
Author: Dante Su <[email protected]>
Date:   Mon Apr 18 02:33:11 2022 +0000

    fix build failure due to SAI_PORT_ATTR_SUPPORTED_LINK_TRAINING_MODE

    Signed-off-by: Dante Su <[email protected]>

commit b0bee3e
Author: Dante Su <[email protected]>
Date:   Thu Apr 14 07:54:14 2022 +0000

    address review comments

    Signed-off-by: Dante Su <[email protected]>

commit c4345ef
Author: Dante Su <[email protected]>
Date:   Fri Mar 25 02:26:05 2022 +0000

    portsorch: initial support for link-training

    - What I did
    Add Link-Training support to portsorch, while Gearbox is not in the scope

    - Why I did it
    In the case of DAC, static pre-calibrated pre-emphasis is rarely available on SONIC, as most of the ODM are expecting this to be done dynamically at runtime via link-training, hence we'll need this feature to improve the link quality

    - How I verified it
    Manual test
    Ran the Unit-tests to the corresponding changes

    Signed-off-by: Dante Su <[email protected]>
@ds952811 ds952811 closed this Jun 28, 2022
@ds952811 ds952811 deleted the link-training branch June 28, 2022 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants