Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add teamd as a depedent service to swss #5628

Merged
merged 8 commits into from
Oct 23, 2020

Conversation

judyjoseph
Copy link
Contributor

@judyjoseph judyjoseph commented Oct 14, 2020

- Why I did it
On teamd docker restart, the swss and syncd needs to be restarted as there are dependent resources present.

- How I did it
Add the teamd as a dependent service for swss
Updated the docker-wait script to handle service and dependent services separately.
Handle the case of warm-restart for the dependent service

- How to verify it

Verified the following scenario's with the following testbed
VM1 ----------------------------[DUT 6100] -----------------------VM2, ping traffic continuous between VMs

  1. Stop teamd docker alone

    swss, syncd dockers seen going away
    The LAG reference count error messages seen for a while till swss docker stops.
    Dockers back up.

  2. Enable WR mode for teamd. Stop teamd docker alone

    swss, syncd dockers not removed.
    The LAG reference count error messages not seen
    Repeated stop teamd docker test - same result, no effect on swss/syncd.

  3. Stop swss docker.

    swss, teamd, syncd goes off - dockers comes back correctly, interfaces up

  4. Enable WR mode for swss . Stop swss docker

    swss goes off not affecting syncd/teamd dockers.

  5. Config reload

    no reference counter error seen, dockers comes back correctly, with interfaces up

  6. Warm reboot, observations below

    swss docker goes off first
    teamd + syncd goes off to the end of WR process.
    dockers comes back up fine.
    ping traffic between VM's was NOT HIT

  7. Fast reboot, observations below

    teamd goes off first ( confirmed swss don't exit here )
    swss goes off next
    syncd goes away at the end of the FR process
    dockers comes back up fine.
    there is a traffic HIT as per fast-reboot

  8. Verified in multi-asic platform, the tests above other than WR/FB scenarios

- Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

files/image_config/misc/docker-wait-any Outdated Show resolved Hide resolved
files/image_config/misc/docker-wait-any Outdated Show resolved Hide resolved
files/image_config/misc/docker-wait-any Outdated Show resolved Hide resolved
@judyjoseph
Copy link
Contributor Author

judyjoseph commented Oct 17, 2020

Updated the tests done to verify this fix in the conversation comments

@jleveque jleveque self-requested a review October 17, 2020 01:25
jleveque
jleveque previously approved these changes Oct 17, 2020
Copy link
Contributor

@jleveque jleveque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Please wait for other reviewers.

@jleveque jleveque requested a review from lguohan October 17, 2020 01:27
@judyjoseph judyjoseph marked this pull request as ready for review October 20, 2020 05:15
@lgtm-com
Copy link

lgtm-com bot commented Oct 20, 2020

This pull request introduces 1 alert when merging 9be883d into 5c5e424 - view on LGTM.com

new alerts:

  • 1 for Syntax error

@lgtm-com
Copy link

lgtm-com bot commented Oct 20, 2020

This pull request introduces 1 alert when merging 9c7c59e into 5c5e424 - view on LGTM.com

new alerts:

  • 1 for Unused import

yxieca
yxieca previously approved these changes Oct 21, 2020
@judyjoseph
Copy link
Contributor Author

retest vs please

abdosi
abdosi previously approved these changes Oct 21, 2020
@judyjoseph judyjoseph dismissed stale reviews from abdosi and yxieca via 82ea642 October 22, 2020 01:27
@judyjoseph
Copy link
Contributor Author

retest mellanox please

@judyjoseph
Copy link
Contributor Author

retest mellanox please

@lguohan lguohan merged commit ace7f24 into sonic-net:master Oct 23, 2020
abdosi pushed a commit that referenced this pull request Oct 23, 2020
**- Why I did it**
On teamd docker restart, the swss and syncd needs to be restarted as there are dependent resources present.

**- How I did it**
Add the teamd as a dependent service for swss
Updated the docker-wait script to handle service and dependent services separately.
Handle the case of warm-restart for the dependent service

**- How to verify it**

Verified the following scenario's with the following testbed
VM1 ----------------------------[DUT 6100] -----------------------VM2,  ping traffic continuous between VMs

1. Stop teamd docker alone
      >  swss, syncd dockers seen going away
      >  The LAG reference count error messages seen for a while till swss docker stops.
      >  Dockers back up.

2. Enable WR mode for teamd. Stop teamd docker alone
      >  swss, syncd dockers not removed.
      >  The LAG reference count error messages not seen
      >  Repeated stop teamd docker test - same result, no effect on swss/syncd.

3. Stop swss docker.
      >  swss, teamd, syncd goes off - dockers comes back correctly, interfaces up

4. Enable WR mode for swss . Stop swss docker
      >  swss goes off not affecting syncd/teamd dockers.

5. Config reload
      > no reference counter error seen, dockers comes back correctly, with interfaces up

6. Warm reboot, observations below
	 > swss docker goes off first
	 > teamd + syncd goes off to the end of WR process.
 	 > dockers comes back up fine.
	 > ping traffic between VM's was NOT HIT

7. Fast reboot, observations below
	 > teamd goes off first ( **confirmed swss don't exit here** )
	 > swss goes off next
	 > syncd goes away at the end of the FR process
	 > dockers comes back up fine.
	 > there is a traffic HIT as per fast-reboot

8. Verified in multi-asic platform, the tests above other than WR/FB scenarios
santhosh-kt pushed a commit to santhosh-kt/sonic-buildimage that referenced this pull request Feb 25, 2021
**- Why I did it**
On teamd docker restart, the swss and syncd needs to be restarted as there are dependent resources present.

**- How I did it**
Add the teamd as a dependent service for swss
Updated the docker-wait script to handle service and dependent services separately.
Handle the case of warm-restart for the dependent service   

**- How to verify it**

Verified the following scenario's with the following testbed 
VM1 ----------------------------[DUT 6100] -----------------------VM2,  ping traffic continuous between VMs

1. Stop teamd docker alone  
      >  swss, syncd dockers seen going away
      >  The LAG reference count error messages seen for a while till swss docker stops.
      >  Dockers back up.

2. Enable WR mode for teamd. Stop teamd docker alone  
      >  swss, syncd dockers not removed.
      >  The LAG reference count error messages not seen
      >  Repeated stop teamd docker test - same result, no effect on swss/syncd.

3. Stop swss docker. 
      >  swss, teamd, syncd goes off - dockers comes back correctly, interfaces up

4. Enable WR mode for swss . Stop swss docker 
      >  swss goes off not affecting syncd/teamd dockers.

5. Config reload 
      > no reference counter error seen, dockers comes back correctly, with interfaces up

6. Warm reboot, observations below
	 > swss docker goes off first 
	 > teamd + syncd goes off to the end of WR process.
 	 > dockers comes back up fine.
	 > ping traffic between VM's was NOT HIT

7. Fast reboot, observations below
	 > teamd goes off first ( **confirmed swss don't exit here** )
	 > swss goes off next 
	 > syncd goes away at the end of the FR process
	 > dockers comes back up fine.
	 > there is a traffic HIT as per fast-reboot

8. Verified in multi-asic platform, the tests above other than WR/FB scenarios
@judyjoseph judyjoseph deleted the teamd_REPAIR branch March 1, 2022 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants