-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[portsorch] fix PortsOrch::allPortsReady() returns true when it should not #1103
Conversation
orchagent/orchdaemon.cpp
Outdated
@@ -193,7 +193,7 @@ bool OrchDaemon::init() | |||
* when iterating ConsumerMap. | |||
* That is ensured implicitly by the order of map key, "LAG_TABLE" is smaller than "VLAN_TABLE" in lexicographic order. | |||
*/ | |||
m_orchList = { gSwitchOrch, gCrmOrch, gBufferOrch, gPortsOrch, gIntfsOrch, gNeighOrch, gRouteOrch, copp_orch, tunnel_decap_orch, qos_orch, wm_orch, policer_orch }; | |||
m_orchList = { gSwitchOrch, gCrmOrch, gPortsOrch, gBufferOrch, gIntfsOrch, gNeighOrch, gRouteOrch, copp_orch, tunnel_decap_orch, qos_orch, wm_orch, policer_orch }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a better order as BufferOrch::doTask() needs to wait for PortsOrch::isInitDone() to be true to proceed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is interesting that BufferOrch was placed behind PortsPorch earlier as of this PR #515
retest this please |
Retest this please |
orchagent/orchdaemon.cpp
Outdated
@@ -193,7 +193,7 @@ bool OrchDaemon::init() | |||
* when iterating ConsumerMap. | |||
* That is ensured implicitly by the order of map key, "LAG_TABLE" is smaller than "VLAN_TABLE" in lexicographic order. | |||
*/ | |||
m_orchList = { gSwitchOrch, gCrmOrch, gBufferOrch, gPortsOrch, gIntfsOrch, gNeighOrch, gRouteOrch, copp_orch, tunnel_decap_orch, qos_orch, wm_orch, policer_orch }; | |||
m_orchList = { gSwitchOrch, gCrmOrch, gPortsOrch, gBufferOrch, gIntfsOrch, gNeighOrch, gRouteOrch, copp_orch, tunnel_decap_orch, qos_orch, wm_orch, policer_orch }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gPortsOrch [](start = 42, length = 10)
Do not reorder if you don't have a strong reason. Any order should work for warm-reboot if we iterate enough rounds. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Qi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qiluo-msft @prsunny I agree that it is just a matter of enough iterations, however this order makes more sense at least to me, so it's more a cosmetic change.
Besides, now I think with this order 3 instead of 4 iterations are needed in warm boot, let me check.
Do you have concern that this may break something? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you have proof that 4 iterations -> 3. Otherwise I don't see any improvement.
In reply to: 339371158 [](ancestors = 339371158)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check one more commit to have 3 iterations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, let me separate this change in another PR
This PR will be an import bug fix. To make lives easier, could you please provide a vs test case, which failed currently version but fixed by this PR? #Closed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented
Stepan, as you analyze in sonic-net/sonic-mgmt#834 (comment), design of m_pendingPortSet has an issue. If we can populate all physical ports to m_pendingPortSet in the PortsOrch::bake() phase, we can drop the change to move the section of code down pasted below:
|
82207af
to
51e8707
Compare
@qiluo-msft @wendani Not everything in this PR can be fixed by initializing m_pendingPortSet in bake(). Another issue is with Pfc wd start action is not protected by allPortsReady() causing errors in logs in warm boot and it is fixed as part of this PR, not related to m_pendingPortSet. |
@qiluo-msft |
retest this please |
Regarding test, I don't mean an end-to-end simulation of a warm-reboot plus PFC storm. Maybe an google test (unit test) could help here. You could just prepare some mock data in Redis or mock redis, and let orchagent consume them. This test case should fail old code, but pass your new code. In reply to: 546554272 [](ancestors = 546554272) |
@qiluo-msft google test (unit test) would be simpler to create than VS, but it fails to compile tests in tests/mock_tests on recent master and I don't see it is run on PR. Are these tests expected to work? #Resolved |
I recently fixed the google test. If you check
tests/mock_tests are not currently included In reply to: 547004643 [](ancestors = 547004643) |
…d not Warm start flow before the change: 1st iteration: - BufferOrch::doTask(): returns since PortInitDone hasn't arived yet - PortsOrch::doTask(): processes all PORT_TABLE untill PortInitDone flag m_pendingPortSet is empty yet and m_portInitDone is true so allPortsReady() will return true - AnyOrch::doTask(): check g_portsOrch->allPortsRead() 2nd iteration: - BufferOrch::doTask(): now buffers are applied This causes BufferOrch override PfcWdOrch's zero-buffer profile. The change swaps BufferOrch and PortsOrch in m_orchList, because 1st BufferOrch iteration will always skip processing and eliminates possibility of having m_pendingPortSet not filled with ports after m_initDone is set to true. Signed-off-by: Stepan Blyschak <[email protected]>
Signed-off-by: Stepan Blyschak <[email protected]>
… started in warm boot It appeared that pfc watchdog relied on a buggy behaviour of PortsOrch::allPortsReady(). In fixed PortsOrch::allPortsReady() you'll see that watchdog action is trying to start before watchdog was started, because allPortsReady() in PfcWdOrch::doTask() returned false. Before the fix watchdog was started before, because allPortsReady() lied that ports are ready when they were not. Signed-off-by: Stepan Blyschak <[email protected]>
Signed-off-by: Stepan Blyschak <[email protected]>
Signed-off-by: Stepan Blyschak <[email protected]>
This reverts commit 84f80e0.
Signed-off-by: Stepan Blyschak <[email protected]>
Signed-off-by: Stepan Blyschak <[email protected]>
Signed-off-by: Stepan Blyschak <[email protected]>
8de2a78
to
6a2073b
Compare
Signed-off-by: Stepan Blyschak <[email protected]>
Without fix:
|
retest this please |
Signed-off-by: Stepan Blyschak <[email protected]>
@stepanblyschak This change cannot be cherry-picked cleanly into 201811 branch. Please create an PR for 201811 branch. Thanks! |
…onic-net#1103) Signed-off-by: Danny Allen <[email protected]>
$(top_srcdir)/orchagent/orchdaemon.cpp \ | ||
$(top_srcdir)/orchagent/orch.cpp \ | ||
$(top_srcdir)/orchagent/notifications.cpp \ | ||
$(top_srcdir)/orchagent/routeorch.cpp \ | ||
$(top_srcdir)/orchagent/neighorch.cpp \ | ||
$(top_srcdir)/orchagent/intfsorch.cpp \ | ||
$(top_srcdir)/orchagent/portsorch.cpp \ | ||
$(top_srcdir)/orchagent/copporch.cpp \ | ||
$(top_srcdir)/orchagent/tunneldecaporch.cpp \ | ||
$(top_srcdir)/orchagent/qosorch.cpp \ | ||
$(top_srcdir)/orchagent/bufferorch.cpp \ | ||
$(top_srcdir)/orchagent/mirrororch.cpp \ | ||
$(top_srcdir)/orchagent/fdborch.cpp \ | ||
$(top_srcdir)/orchagent/aclorch.cpp \ | ||
$(top_srcdir)/orchagent/saihelper.cpp \ | ||
$(top_srcdir)/orchagent/switchorch.cpp \ | ||
$(top_srcdir)/orchagent/pfcwdorch.cpp \ | ||
$(top_srcdir)/orchagent/pfcactionhandler.cpp \ | ||
$(top_srcdir)/orchagent/policerorch.cpp \ | ||
$(top_srcdir)/orchagent/crmorch.cpp \ | ||
$(top_srcdir)/orchagent/request_parser.cpp \ | ||
$(top_srcdir)/orchagent/vrforch.cpp \ | ||
$(top_srcdir)/orchagent/countercheckorch.cpp \ | ||
$(top_srcdir)/orchagent/vxlanorch.cpp \ | ||
$(top_srcdir)/orchagent/vnetorch.cpp \ | ||
$(top_srcdir)/orchagent/dtelorch.cpp \ | ||
$(top_srcdir)/orchagent/flexcounterorch.cpp \ | ||
$(top_srcdir)/orchagent/watermarkorch.cpp \ | ||
$(top_srcdir)/orchagent/chassisorch.cpp \ | ||
$(top_srcdir)/orchagent/sfloworch.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stepanblyschak @Pterosaur @nazariig @lguohan please make orchagent as library *.la, it will be one file, orchagent is already compiled, and here in tests you are compiling it again, which is twice the same compilation, it extend compilation time twice !, now it takes 19 minutes to compile and using it as a lib it could take 10 min
Warm start flow before the change:
1st iteration:
2nd iteration:
This causes BufferOrch override PfcWdOrch's zero-buffer profile.
The change swaps BufferOrch and PortsOrch in m_orchList, because 1st
BufferOrch iteration will always skip processing and eliminates possibility
of having m_pendingPortSet not filled with ports after m_initDone is set to true.
Signed-off-by: Stepan Blyschak [email protected]
What I did
Why I did it
How I verified it
Details if related