Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[logger] Fix Kernel GP seen for any short-lived *syncd process #444

Merged
merged 3 commits into from
Jan 12, 2021

Conversation

vaibhavhd
Copy link
Contributor

@vaibhavhd vaibhavhd commented Jan 11, 2021

Fixes: sonic-net/sonic-buildimage#6103

This PR is to fix the Kernel GP errors that are seen in any short-lived process within swss.

As part of *syncd initialization, Logger::linkToDbNative function is called. This call starts a SettingThread in the background.
When the main *syncd process terminates the destructor Logger::~Logger simply detaches the SettingThread. This leads to the detached thread's continued exection in the background.

At the same time, the exiting main process deletes the static variables that were in its scope.
Fault is hit when the detached thread (still executing in the infinite loop) tries to access these freed up variables.

This issue is easily reproducible when these *syncd process are short lived (and the detached SettingThread is still executing). Below are the examples by simply executing the help option:

Sample 1 : Orchagent core and Kernel GP:

for i in {1..10}; do docker exec -it swss /usr/bin/orchagent -h  ; done

Jan 11 23:32:50.690125 str-s6100-acs-2 INFO kernel: [ 1198.660072] traps: orchagent[6172] general protection ip:7fda4ccf34fa sp:7fda4c0eb210 error:0 in libswsscommon.so.0.0.0[7fda4cccf000+5b000]
Jan 11 23:32:59.810183 str-s6100-acs-2 INFO kernel: [ 1207.781516] traps: orchagent[7380] general protection ip:7f2bfd3724fa sp:7f2bfc76a210 error:0 in libswsscommon.so.0.0.0[7f2bfd34e000+5b000]

-rw-rw-rw- 1 root root 331K Jan 11 23:32 orchagent.1610407968.6463.core.gz

Sample 2 : Gearsyncd core and Kernel GP:

for i in {1..10}; do docker exec -it swss /usr/bin/gearsyncd -h  ; done

Jan 11 23:22:40.959207 str-s6100-acs-2 INFO kernel: [  588.941879] traps: gearsyncd[25118] general protection ip:7f5e3ec924fa sp:7f5e3e706690 error:0 in libswsscommon.so.0.0.0[7f5e3ec6e000+5b000]
Jan 11 23:22:41.426841 str-s6100-acs-2 INFO kernel: [  589.409417] traps: gearsyncd[25188] general protection ip:7f35fd1504fa sp:7f35fcbc49b0 error:0 in libswsscommon.so.0.0.0[7f35fd12c000+5b000]

-rw-rw-rw- 1 root root 142K Jan 11 23:20 gearsyncd.1610407223.879.core.gz

Sample 3 : portsyncd core and Kernel GP:

for i in {1..10}; do docker exec -it swss /usr/bin/portsyncd -h  ; done

Jan 11 23:19:53.478111 str-s6100-acs-2 INFO kernel: [  421.443939] traps: portsyncd[6052] general protection ip:7f36b47714fa sp:7f36b41e59b0 error:0 in libswsscommon.so.0.0.0[7f36b474d000+5b000]
Jan 11 23:19:54.570082 str-s6100-acs-2 INFO kernel: [  422.536931] traps: portsyncd[6207] general protection ip:7f9e208ce4fa sp:7f9e203429b0 error:0 in libswsscommon.so.0.0.0[7f9e208aa000+5b000]

-rw-rw-rw- 1 root root 144K Jan 11 23:20 portsyncd.1610407201.848.core.gz

Fix:

  1. Before exiting the main process, set the flag in Logger destructor to signal the detached thread that the main process is finishing up.
  2. Instead of detaching the thread (which leaves this thread access the undefined static variables), join the SettingThread thread.

common/logger.h Outdated
@@ -145,6 +145,7 @@ class Logger
std::atomic<Output> m_output = { SWSS_SYSLOG };
std::unique_ptr<std::thread> m_settingThread;
std::mutex m_mutex;
bool terminateSettingThread = false;
Copy link
Contributor

@qiluo-msft qiluo-msft Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add volatile #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to volatile

@qiluo-msft
Copy link
Contributor

In the title, could you talk with the context of logger?

@vaibhavhd vaibhavhd changed the title Fix Kernel GP seen for any short-lived *syncd process [logger] Fix Kernel GP seen for any short-lived *syncd process Jan 12, 2021
@vaibhavhd vaibhavhd merged commit 2db7bea into sonic-net:master Jan 12, 2021
@vaibhavhd vaibhavhd deleted the memory-error-fix branch January 12, 2021 17:26
lguohan pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Jan 13, 2021
…ction error fix (#6436)

To include Kernel GP fault seen in *syncd processes:
sonic-net/sonic-swss-common#444
lguohan pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Jan 15, 2021
…ction error fix (#6436)

To include Kernel GP fault seen in *syncd processes:
sonic-net/sonic-swss-common#444
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kernel GP during swss/syncd/teamd shutdown
3 participants