Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dockers][supervisor] Increase event buffer size for process exit listener; Set all event buffer sizes to 1024 #7083

Merged
merged 3 commits into from
Mar 28, 2021
Merged

[dockers][supervisor] Increase event buffer size for process exit listener; Set all event buffer sizes to 1024 #7083

merged 3 commits into from
Mar 28, 2021

Conversation

jleveque
Copy link
Contributor

@jleveque jleveque commented Mar 17, 2021

Why I did it

To prevent error messages like the following from being logged:

Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46

This is basically an addendum to #5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10.

This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example here.

I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012

@lguohan
Copy link
Collaborator

lguohan commented Mar 18, 2021

not sure why different docker has different buffer size.

@jleveque
Copy link
Contributor Author

not sure why different docker has different buffer size.

It varies based on the number of processes running in the container. More processes = more events. We could choose a number high enough to work for all containers and apply it everywhere, but this approach is more conservative with memory usage in containers which don't have many processes running.

@qiluo-msft
Copy link
Collaborator

How large is the one buffer slot in memory? If the buffer overflow is critical error, let's use very large buffer size.

@jleveque
Copy link
Contributor Author

How large is the one buffer slot in memory? If the buffer overflow is critical error, let's use very large buffer size.

There are different types of events, so they vary, but they appear to be relatively small: https://github.com/Supervisor/supervisor/blob/master/supervisor/events.py. I considered just using a very large size for all. Thoughts?

@lguohan
Copy link
Collaborator

lguohan commented Mar 19, 2021

suggest 1024 for all.

@jleveque jleveque changed the title [dockers][supervisor] Increase event buffer size for process exit listener [dockers][supervisor] Increase event buffer size for process exit listener; Set all event buffer sizes to 1024 Mar 19, 2021
@jleveque
Copy link
Contributor Author

suggest 1024 for all.

Done in latest commit.

lguohan
lguohan previously approved these changes Mar 19, 2021
qiluo-msft
qiluo-msft previously approved these changes Mar 19, 2021
@jleveque jleveque dismissed stale reviews from qiluo-msft and lguohan via 5f351a7 March 19, 2021 22:00
@jleveque
Copy link
Contributor Author

This will not cherry-pick cleanly into 201911, so I opened a separate PR here.

@lguohan
Copy link
Collaborator

lguohan commented Mar 27, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@lguohan lguohan merged commit c651a9a into sonic-net:master Mar 28, 2021
@jleveque jleveque deleted the docker_event_buffer_exit_listener branch March 29, 2021 15:37
jleveque added a commit that referenced this pull request Mar 29, 2021
…exit listener (#7106)

Backport of #7083 to the 201911 branch.

#### Why I did it

To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged:

```
Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46
```

This is basically an addendum to #5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10.

This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802).

I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.
@daall
Copy link
Contributor

daall commented Mar 31, 2021

@jleveque can you also open a separate PR for 202012 please? There is a conflict.

@jleveque
Copy link
Contributor Author

@jleveque can you also open a separate PR for 202012 please? There is a conflict.

202012 PR opened here: #7203

jleveque added a commit that referenced this pull request Apr 1, 2021
…exit listener; Set all event buffer sizes to 1024 (#7203)

#### Why I did it

Backport of #7083 to the 202012 branch.

To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged:

```
Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46
```

This is basically an addendum to #5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10.

This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802).

I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.
raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-buildimage that referenced this pull request May 23, 2021
…tener; Set all event buffer sizes to 1024 (sonic-net#7083)

To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged:

```
Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46
```

This is basically an addendum to sonic-net#5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10.

This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802).

I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.
carl-nokia pushed a commit to carl-nokia/sonic-buildimage that referenced this pull request Aug 7, 2021
…tener; Set all event buffer sizes to 1024 (sonic-net#7083)

To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged:

```
Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46
```

This is basically an addendum to sonic-net#5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10.

This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802).

I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants