Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support neighsyncd system warmreboot. #661

Merged
merged 6 commits into from
Nov 12, 2018

Conversation

zhenggen-xu
Copy link
Collaborator

@zhenggen-xu zhenggen-xu commented Oct 30, 2018

Support neighsyncd system warmreboot.

neighsyncd will waits for kernel restore process to be done
before reconciliation

Add vs testcases to cover kernel neighbor table restore process
and neignsyncd process upon system warm reboot

Signed-off-by: Zhenggen Xu [email protected]

What I did
Support neighsyncd system warmreboot.

neighsyncd will waits for kernel restore process to be done
before reconciliation

Add vs testcases to cover kernel neighbor table restore process
and neignsyncd process upon system warm reboot

Why I did it
Support system warm reboot

How I verified it
vs test cases and on box manual tests

Details if related
This is dependent on below PRs:
sonic-net/sonic-buildimage#2213
sonic-net/sonic-swss-common#243
sonic-net/sonic-swss-common#246

Copy link
Contributor

@jipanyang jipanyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update NEIGH_RESTORE_TABLE schema in swss-schema.md

@lguohan
Copy link
Contributor

lguohan commented Oct 31, 2018

retest this please

@zhenggen-xu
Copy link
Collaborator Author

Updated the swss-schema.md and made the state check function more accurate.

BTW: the vs tests failing was due to the dependency on sonic-net/sonic-buildimage#2213

neighsyncd/neighsyncd.cpp Show resolved Hide resolved
neighsyncd/neighsyncd.cpp Outdated Show resolved Hide resolved
neighsyncd/neighsync.cpp Outdated Show resolved Hide resolved
neighsyncd/neighsync.h Outdated Show resolved Hide resolved

steady_clock::time_point starttime = steady_clock::now();
while (!sync.isNeighRestoreDone())
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some thought on current restore_neighbors.py implementation. For system warm reboot, the stale entries won't be filtered out and has to wait for a new cycle of arp age out time interval, then the whole reconciliation logic in neighsyncd becomes redundant (skippable). neighorch simply drops the duplicate entries and accept new entries.

Copy link
Collaborator Author

@zhenggen-xu zhenggen-xu Nov 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not exactly true, in case the neighbor ports were down during the warm reboot, the neighbor entries won't be inserted to kernel so we need reconciliation logic to remove them.

Also it is good to use the same logic/code path to handle both scenarios.

On top of that, we are still exploring the solutions to only get alive entries into kernel during the restoring phase if possible in later PRs.

So overall, we should use the same reconciliation logic.

@lguohan
Copy link
Contributor

lguohan commented Nov 10, 2018

retest this please

1 similar comment
@lguohan
Copy link
Contributor

lguohan commented Nov 10, 2018

retest this please

@lguohan
Copy link
Contributor

lguohan commented Nov 10, 2018

after merge bgp warm boot, the test fails, can you check?

@zhenggen-xu
Copy link
Collaborator Author

after merge bgp warm boot, the test fails, can you check?

The merge/rebase on github was not correct, my commit id and content were altered, some necessary code was removed. I will fix locally and recommit to PR.

neighsyncd will waits for kernel restore process to be done
before reconciliation

Add vs testcases to cover kernel neighbor table restore process
and neignsyncd process upon system warm reboot

Signed-off-by: Zhenggen Xu <[email protected]>
Make the state check function more accurate.

Signed-off-by: Zhenggen Xu <[email protected]>
In case system warm reboot is enabled, it will try to restore the neighbor
table from appDB into kernel through netlink API calls and update the neighbor
table by sending arp/ns requests to all neighbor entries, then it sets the
stateDB flag for neighsyncd to continue the reconciliation process.

Added timeout in neighsyncd when waiting for restore_neighbors to finish
Updated vs testcases

Signed-off-by: Zhenggen Xu <[email protected]>
Use monotonic lib for python time check

Update the warmrestart python binding lib and
re-enabled restore cnt check in vs tests

Signed-off-by: Zhenggen Xu <[email protected]>
@lguohan
Copy link
Contributor

lguohan commented Nov 11, 2018

thanks

Time-out value changes
vs test case changes to support default host side neigh table settings.

Signed-off-by: Zhenggen Xu <[email protected]>
@zhenggen-xu
Copy link
Collaborator Author

Merge is done, still some failure, does not seem to relate to the PR. can you check if everything passed without this PR ?

@lguohan
Copy link
Contributor

lguohan commented Nov 11, 2018

@lguohan
Copy link
Contributor

lguohan commented Nov 12, 2018

retest this please

@lguohan lguohan merged commit afdcf34 into sonic-net:master Nov 12, 2018
@zhenggen-xu zhenggen-xu deleted the neigh-system-warmreboot branch October 7, 2019 16:45
@@ -301,13 +307,37 @@ def check_neighsyncd_timer(dvs, timer_value):
(exitcode, num) = dvs.runcmd(['sh', '-c', "grep getWarmStartTimer /var/log/syslog | grep neighsyncd | tail -n 1 | rev | cut -d ' ' -f 1 | rev"])
assert num.strip() == timer_value

def check_redis_neigh_entries(dvs, neigh_tbl, number):
(exitcode, lb_output) = dvs.runcmd(['sh', '-c', "redis-cli keys NEIGH_TABLE:lo* | grep NEI | wc -l"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redis-cli keys NEIGH_TABLE:lo* [](start = 53, length = 30)

Try prevent usage redis-cli keys * because if will join keys by blanks without escaping. Since you are already write python code, can you use redis-py or swsscommon ?

oleksandrivantsiv pushed a commit to oleksandrivantsiv/sonic-swss that referenced this pull request Mar 1, 2023
…gbsyncd_startup.py (sonic-net#661)

sonic-sairedis: rename physyncd_startup.sh and physyncd_startup.py

* change needed to be consistent with renaming in sonic-buildimage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants