-
Notifications
You must be signed in to change notification settings - Fork 537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support neighsyncd system warmreboot. #661
Support neighsyncd system warmreboot. #661
Conversation
5a7326b
to
d746efb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update NEIGH_RESTORE_TABLE schema in swss-schema.md
retest this please |
Updated the swss-schema.md and made the state check function more accurate. BTW: the vs tests failing was due to the dependency on sonic-net/sonic-buildimage#2213 |
|
||
steady_clock::time_point starttime = steady_clock::now(); | ||
while (!sync.isNeighRestoreDone()) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some thought on current restore_neighbors.py implementation. For system warm reboot, the stale entries won't be filtered out and has to wait for a new cycle of arp age out time interval, then the whole reconciliation logic in neighsyncd becomes redundant (skippable). neighorch simply drops the duplicate entries and accept new entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not exactly true, in case the neighbor ports were down during the warm reboot, the neighbor entries won't be inserted to kernel so we need reconciliation logic to remove them.
Also it is good to use the same logic/code path to handle both scenarios.
On top of that, we are still exploring the solutions to only get alive entries into kernel during the restoring phase if possible in later PRs.
So overall, we should use the same reconciliation logic.
retest this please |
1 similar comment
retest this please |
after merge bgp warm boot, the test fails, can you check? |
71b2875
to
b9269ab
Compare
The merge/rebase on github was not correct, my commit id and content were altered, some necessary code was removed. I will fix locally and recommit to PR. |
neighsyncd will waits for kernel restore process to be done before reconciliation Add vs testcases to cover kernel neighbor table restore process and neignsyncd process upon system warm reboot Signed-off-by: Zhenggen Xu <[email protected]>
Make the state check function more accurate. Signed-off-by: Zhenggen Xu <[email protected]>
In case system warm reboot is enabled, it will try to restore the neighbor table from appDB into kernel through netlink API calls and update the neighbor table by sending arp/ns requests to all neighbor entries, then it sets the stateDB flag for neighsyncd to continue the reconciliation process. Added timeout in neighsyncd when waiting for restore_neighbors to finish Updated vs testcases Signed-off-by: Zhenggen Xu <[email protected]>
Use monotonic lib for python time check Update the warmrestart python binding lib and re-enabled restore cnt check in vs tests Signed-off-by: Zhenggen Xu <[email protected]>
thanks |
Time-out value changes vs test case changes to support default host side neigh table settings. Signed-off-by: Zhenggen Xu <[email protected]>
b9269ab
to
742ac62
Compare
Merge is done, still some failure, does not seem to relate to the PR. can you check if everything passed without this PR ? |
@zhenggen-xu , build looks fine with this pr. https://sonic-jenkins.westus2.cloudapp.azure.com/job/vs/job/sonic-swss-build/19/ |
Signed-off-by: Zhenggen Xu <[email protected]>
retest this please |
@@ -301,13 +307,37 @@ def check_neighsyncd_timer(dvs, timer_value): | |||
(exitcode, num) = dvs.runcmd(['sh', '-c', "grep getWarmStartTimer /var/log/syslog | grep neighsyncd | tail -n 1 | rev | cut -d ' ' -f 1 | rev"]) | |||
assert num.strip() == timer_value | |||
|
|||
def check_redis_neigh_entries(dvs, neigh_tbl, number): | |||
(exitcode, lb_output) = dvs.runcmd(['sh', '-c', "redis-cli keys NEIGH_TABLE:lo* | grep NEI | wc -l"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redis-cli keys NEIGH_TABLE:lo* [](start = 53, length = 30)
Try prevent usage redis-cli keys *
because if will join keys by blanks without escaping. Since you are already write python code, can you use redis-py or swsscommon ?
…gbsyncd_startup.py (sonic-net#661) sonic-sairedis: rename physyncd_startup.sh and physyncd_startup.py * change needed to be consistent with renaming in sonic-buildimage
Support neighsyncd system warmreboot.
neighsyncd will waits for kernel restore process to be done
before reconciliation
Add vs testcases to cover kernel neighbor table restore process
and neignsyncd process upon system warm reboot
Signed-off-by: Zhenggen Xu [email protected]
What I did
Support neighsyncd system warmreboot.
neighsyncd will waits for kernel restore process to be done
before reconciliation
Add vs testcases to cover kernel neighbor table restore process
and neignsyncd process upon system warm reboot
Why I did it
Support system warm reboot
How I verified it
vs test cases and on box manual tests
Details if related
This is dependent on below PRs:
sonic-net/sonic-buildimage#2213
sonic-net/sonic-swss-common#243
sonic-net/sonic-swss-common#246