Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPF in orchagent. ACL counters related code #201

Closed
pavel-shirshov opened this issue Apr 30, 2017 · 6 comments
Closed

GPF in orchagent. ACL counters related code #201

pavel-shirshov opened this issue Apr 30, 2017 · 6 comments

Comments

@pavel-shirshov
Copy link
Contributor

Reading symbols from /usr/bin/orchagent...Reading symbols from /usr/lib/debug/.build-id/ef/36e12d55ea2b4540a35abab394a233817622dc.debug...bdone.
done.
t[New LWP 108]
[New LWP 97]
[New LWP 107]
[New LWP 105]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `orchagent -m 90:b1:1c:f4:a8:51'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000000046a139 in pair (this=0x7f1b1fffee50)
at /usr/include/c++/4.9/bits/stl_pair.h:127
127 /usr/include/c++/4.9/bits/stl_pair.h: No such file or directory.
(gdb) bt
#0 0x000000000046a139 in pair (this=0x7f1b1fffee50)
at /usr/include/c++/4.9/bits/stl_pair.h:127
#1 AclOrch::collectCountersThread (pAclOrch=0x2387690) at aclorch.cpp:1315
#2 0x00007f1b25a7f970 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007f1b26825064 in start_thread (arg=0x7f1b1ffff700) at pthread_create.c:309
#4 0x00007f1b251ef62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

@lguohan
Copy link
Contributor

lguohan commented May 1, 2017

@qiluo-msft , can you take a look at this issue?

@pavel-shirshov
Copy link
Contributor Author

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `orchagent -m 90:b1:1c:f4:a8:51'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000000046a139 in pair (this=0x7f1b1fffee50) at /usr/include/c++/4.9/bits/stl_pair.h:127
127 constexpr pair(const pair&) = default;
(gdb) bt
#0 0x000000000046a139 in pair (this=0x7f1b1fffee50) at /usr/include/c++/4.9/bits/stl_pair.h:127
#1 AclOrch::collectCountersThread (pAclOrch=0x2387690) at aclorch.cpp:1315
#2 0x00007f1b25a7f970 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007f1b26825064 in start_thread (arg=0x7f1b1ffff700) at pthread_create.c:309
#4 0x00007f1b251ef62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

stcheng pushed a commit that referenced this issue May 1, 2017
@oleksandrivantsiv
Copy link
Collaborator

Is there any steps to reproduce? Do you observe this crash during some test?

@oleksandrivantsiv
Copy link
Collaborator

Can you attach or share logs with me?

@oleksandrivantsiv
Copy link
Collaborator

Problem is in incorrect usage of thread class in AclOrch class. Thread is initialized and started in a middle of "AclOrch" class object construction. After thread starts it reads AclOrch object members. Due to race condition in some cases thread reads AclOrch object members before they are initialized.

I was able to simulate race condition by adding additional member with long term creation between thread and other members of AclOrch class. This allowed me achieve 100% reproducibility rate. Patch that allows to reproduce the issue is the following:

diff --git a/orchagent/aclorch.cpp b/orchagent/aclorch.cpp
index 371fdde..845cc70 100644
--- a/orchagent/aclorch.cpp
+++ b/orchagent/aclorch.cpp
@@ -791,9 +791,18 @@ bool AclRange::remove()
     return true;
 }

+Test::Test(void *ptr) :
+        m_ptr(ptr)
+{
+    SWSS_LOG_ERROR("Test class before sleep");
+    sleep(1);
+    SWSS_LOG_ERROR("Test class after sleep");
+}
+
 AclOrch::AclOrch(DBConnector *db, vector<string> tableNames, PortsOrch *portOrch, MirrorOrch *mirrorOrch) :
         Orch(db, tableNames),
         thread(AclOrch::collectCountersThread, this),
+        Test(this),
         m_portOrch(portOrch),
         m_mirrorOrch(mirrorOrch)
 {
diff --git a/orchagent/aclorch.h b/orchagent/aclorch.h
index a005a7e..e646b09 100644
--- a/orchagent/aclorch.h
+++ b/orchagent/aclorch.h
@@ -1,6 +1,7 @@
 #ifndef SWSS_ACLORCH_H
 #define SWSS_ACLORCH_H

+#include <unistd.h>
 #include <iostream>
 #include <sstream>
 #include <thread>
@@ -228,7 +229,15 @@ inline void split(string str, Iterable& out, char delim = ' ')
     }
 }

-class AclOrch : public Orch, public Observer, public thread
+class Test {
+public:
+    Test(void *ptr);
+
+private:
+    void *m_ptr;
+};
+
+class AclOrch : public Orch, public Observer, public thread, public Test
 {
 public:
     AclOrch(DBConnector *db, vector<string> tableNames, PortsOrch *portOrch, MirrorOrch *mirrorOrch);

@stcheng
Copy link
Contributor

stcheng commented May 9, 2017

#203 and #205 fix this issue

@stcheng stcheng closed this as completed May 9, 2017
oleksandrivantsiv pushed a commit to oleksandrivantsiv/sonic-swss that referenced this issue Mar 1, 2023
…ound in map!" (sonic-net#201)

Attribute SAI_QUEUE_ATTR_INDEX added into attribute map
for object type SAI_OBJECT_TYPE_QUEUE

Signed-off-by: Denis Maslov <[email protected]>
lukasstockner pushed a commit to genesiscloud/sonic-swss that referenced this issue Apr 2, 2023
sonic-net#201)

First-cut changes to provide platform independent support for different transceivers in SONiC platforms.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants