Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debian Jessie build from source is hanging #1434

Open
AndrewAday opened this issue Nov 6, 2017 · 24 comments
Open

Debian Jessie build from source is hanging #1434

AndrewAday opened this issue Nov 6, 2017 · 24 comments

Comments

@AndrewAday
Copy link

I am following this guide: https://github.com/iovisor/bcc/blob/master/INSTALL.md?utf8=%E2%9C%93#debian---source

I had to modify CMakeLists.txt to include the -std=c++11 option so it would build at all. But when running debuild -b -uc -us, numerous tests fail with the exception "Exception: Failed to compile BPF module", and debuild hangs on test "Start 29: py_test_tools_memleak"

Any idea what's wrong? I am running a 4-14-rc7 kernel on debian version 8.9.

Attached is my build output.
bcc_0.4.0-1_amd64.build.txt

@yonghong-song
Copy link
Collaborator

I see this:

chdir(/lib/modules/4.14.0-rc7/build): No such file or directory
Failed

=====
You need kernel-devel package to make bcc work.

@AndrewAday
Copy link
Author

Thanks for the quick reply.
What is the kernel-devel package equivalent for Debian?
Is it the linux-headers-<version> and linux-image-<version> package?

@yonghong-song
Copy link
Collaborator

yonghong-song commented Nov 6, 2017 via email

@smun
Copy link

smun commented Nov 8, 2017

I had the same problems AndrewAday'd had above. While there is no build errors, it seems to hang on test #15 and #16 (py_test_brb and py_test_brb2):

Kernel: 4.9.0-0.bpo.3-amd64 (on Debian 8.9)

...
make[2]: Entering directory '/var/tmp/bcc/obj-x86_64-linux-gnu'
Running tests...
/usr/bin/ctest --force-new-ctest-process -j1
Test project /var/tmp/bcc/obj-x86_64-linux-gnu
      Start  1: style-check
 1/35 Test  #1: style-check ......................   Passed    0.00 sec
      Start  2: c_test_static
 2/35 Test  #2: c_test_static ....................   Passed    0.11 sec
      Start  3: test_libbcc
 3/35 Test  #3: test_libbcc ......................   Passed    9.19 sec
      Start  4: py_test_stat1_b
 4/35 Test  #4: py_test_stat1_b ..................   Passed    0.66 sec
      Start  5: py_test_bpf_log
 5/35 Test  #5: py_test_bpf_log ..................   Passed    0.65 sec
      Start  6: py_test_stat1_c
 6/35 Test  #6: py_test_stat1_c ..................   Passed    0.70 sec
      Start  7: py_test_xlate1_c
 7/35 Test  #7: py_test_xlate1_c .................   Passed    0.60 sec
      Start  8: py_test_call1
 8/35 Test  #8: py_test_call1 ....................   Passed    0.61 sec
      Start  9: py_test_trace1
 9/35 Test  #9: py_test_trace1 ...................   Passed    0.35 sec
      Start 10: py_test_trace2
10/35 Test #10: py_test_trace2 ...................   Passed    2.32 sec
      Start 11: py_test_trace3_c
11/35 Test #11: py_test_trace3_c .................   Passed    2.40 sec
      Start 12: py_test_trace4
12/35 Test #12: py_test_trace4 ...................   Passed    1.13 sec
      Start 13: py_test_probe_count
13/35 Test #13: py_test_probe_count ..............   Passed    4.81 sec
      Start 14: py_test_debuginfo
14/35 Test #14: py_test_debuginfo ................   Passed    0.71 sec
      Start 15: py_test_brb
15/35 Test #15: py_test_brb ......................***Exception: Other39733.01 sec
E
======================================================================
ERROR: test_brb (__main__.TestBPFSocket)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/tmp/bcc/tests/python/test_brb.py", line 162, in test_brb
    disable_ipv6=True)
  File "/var/tmp/bcc/tests/python/simulation.py", line 94, in _create_ns
    disable_ipv6)
  File "/var/tmp/bcc/tests/python/simulation.py", line 70, in _ns_add_ifc
    in_ifc = ns_ipdb.interfaces[in_ifname]
KeyError: 'ns1b'

----------------------------------------------------------------------
Ran 1 test in 3.586s

FAILED (errors=1)

      Start 16: py_test_brb2```
qanon-31% 

@drzaeus77
Copy link
Collaborator

Does the initial cmake .. list any warnings? It should, because these tests depend on 3 binaries: iperf, netperf, and arping. Do you have those 3 binaries installed? See tests/python/CMakeLists.txt for the warning checks.

@smun
Copy link

smun commented Nov 8, 2017

no warnings as far as I know:

-- The C compiler identification is GNU 4.9.2
-- The CXX compiler identification is GNU 4.9.2
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Latest recognized Git tag is v0.4.0
-- Git HEAD is 06c0528417be48415cd22cd1cd25b498132bffbd
-- Revision is 0.4.0-06c05284
-- Performing Test HAVE_NO_PIE_FLAG
-- Performing Test HAVE_NO_PIE_FLAG - Failed
-- Found BISON: /usr/bin/bison (found version "3.0.2") 
-- Found FLEX: /usr/bin/flex (found version "2.5.39") 
-- Found LLVM: /usr/lib/llvm-3.8/include 3.8.1
-- Found LibElf: /usr/lib/x86_64-linux-gnu/libelf.so  
-- Performing Test ELF_GETSHDRSTRNDX
-- Performing Test ELF_GETSHDRSTRNDX - Success
-- Using static-libstdc++
-- Found LuaJIT: /usr/lib/x86_64-linux-gnu/libluajit-5.1.a;/usr/lib/x86_64-linux-gnu/libdl.so;/usr/lib/x86_64-linux-gnu/libm.so  
-- Configuring done
-- Generating done
-- Build files have been written to: /var/tmp/bcc/build

and all three tools are also available:

dpkg -l | egrep -i "arping|netperf|iperf"
ii  arping                             2.14-1                           amd64        sends IP and/or ARP pings (to the MAC address)
ii  iperf                              2.0.5+dfsg1-2                    amd64        Internet Protocol bandwidth measuring tool
ii  netperf                            2.6.0-2                          amd64        Network performance benchmark```

@AndrewAday
Copy link
Author

I've retried using 4.9 Kernel and like Smun I am now hanging on test 15: py_test_brb.
Additionally, my other tests are failing because of "Exception: Failed to load BPF program count: Invalid argument". I also have all three tools installed.

======================================================================
ERROR: test_count (__main__.TestAutoKprobe)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/bcc/tests/python/test_probe_count.py", line 53, in setUp
    """)
  File "/root/bcc/src/python/bcc/__init__.py", line 308, in __init__
    self._trace_autoload()
  File "/root/bcc/src/python/bcc/__init__.py", line 915, in _trace_autoload
    fn = self.load_func(func_name, BPF.KPROBE)
  File "/root/bcc/src/python/bcc/__init__.py", line 348, in load_func
    (func_name, errstr))
Exception: Failed to load BPF program kprobe__schedule: Invalid argument

======================================================================
ERROR: test_attach1 (__main__.TestKprobeCnt)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/bcc/tests/python/test_probe_count.py", line 17, in setUp
    self.b.attach_kprobe(event_re="^vfs_.*", fn_name="wololo")
  File "/root/bcc/src/python/bcc/__init__.py", line 505, in attach_kprobe
    matches = BPF.get_kprobe_functions(event_re)
  File "/root/bcc/src/python/bcc/__init__.py", line 478, in get_kprobe_functions
    with open("%s/available_filter_functions" % TRACEFS) as avail_file:
IOError: [Errno 2] No such file or directory: '/sys/kernel/debug/tracing/available_filter_functions'

======================================================================
ERROR: test_probe_quota (__main__.TestProbeGlobalCnt)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/bcc/tests/python/test_probe_count.py", line 38, in test_probe_quota
    self.b1.attach_kprobe(event="schedule", fn_name="count")
  File "/root/bcc/src/python/bcc/__init__.py", line 517, in attach_kprobe
    fn = self.load_func(fn_name, BPF.KPROBE)
  File "/root/bcc/src/python/bcc/__init__.py", line 348, in load_func
    (func_name, errstr))
Exception: Failed to load BPF program count: Invalid argument

----------------------------------------------------------------------
Ran 6 tests in 0.753s

FAILED (errors=3)
Failed

      Start 14: py_test_debuginfo
14/35 Test #14: py_test_debuginfo ................   Passed    1.79 sec
      Start 15: py_test_brb

hangs

Any ideas why?

@AndrewAday
Copy link
Author

Looks like the test is causing deadlock:

[12030.756029] =========================================================
[12030.762755] [ INFO: possible irq lock inversion dependency detected ]
[12030.769650] 4.9.0 #2 Not tainted
[12030.772990] ---------------------------------------------------------
[12030.779555] python/22977 just changed the state of lock:
[12030.784989]  (&head->lock){+.-...}, at: [<ffffffff8163e677>] pcpu_freelist_pop+0xf7/0x1c0
but this lock was taken by another, HARDIRQ-safe lock in the past:
[12030.799706]  (&rq->lock){-.-.-.}
[12030.803110] 
[12030.803110] and interrupts could create inverse lock ordering between them.
[12030.803110] 
[12030.813608] 
[12030.813608] other info that might help us debug this:
[12030.820519] Chain exists of:
  &rq->lock --> &htab->buckets[i].lock --> &head->lock

[12030.829251]  Possible interrupt unsafe locking scenario:
[12030.829251] 
[12030.836324]        CPU0                    CPU1
[12030.840962]        ----                    ----
[12030.845599]   lock(&head->lock);
[12030.849305]                                local_irq_disable();
[12030.855641]                                lock(&rq->lock);
[12030.861560]                                lock(&htab->buckets[i].lock);
[12030.868763]   <Interrupt>
[12030.871620]     lock(&rq->lock);
[12030.875215] 
[12030.875215]  *** DEADLOCK ***
[12030.875215] 
[12030.881462] 1 lock held by python/22977:
[12030.885533]  #0:  (rcu_read_lock){......}, at: [<ffffffff8160390b>] trace_call_bpf+0x4b/0x1c0
[12030.894862] 

@goldshtn
Copy link
Collaborator

@4ast @yonghong-song This looks like a more serious issue

@yonghong-song
Copy link
Collaborator

I can reproduce the issue with latest net-next. I will take a deep look on this.

@yonghong-song
Copy link
Collaborator

I did not see the deadlock message on default 4.9.
I did see the deadlock message on latest net-next. But it appears during booting stage and during test (although hangs) no kernel message is shown. In my case, I see something likes below:

[    9.352782]  Possible unsafe locking scenario:

[    9.353242]        CPU0                    CPU1
[    9.354198]        ----                    ----
[    9.354515]   lock((timer));
[    9.354790]                                lock(slock-AF_INET6);
[    9.355147]                                lock((timer));
[    9.355490]   lock(slock-AF_INET6);
[    9.355780] 
 *** DEADLOCK ***

[    9.356282] 1 lock held by swapper/2/0:
[    9.356581]  #0:  ((timer)){+.-.}, at: [<ffffffff9c2e54b5>] call_timer_fn+0x5/0x310
[    9.357062] 
stack backtrace:

I will look at this separately. At least the system is healthy even I saw this during boot time.

Through some experiment, it looks like the following patch is responsible for the failure.
commit 97a0cac
Author: Brenden Blanco [email protected]
Date: Thu May 18 09:57:42 2017 -0700

Workaround for possible race in pyroute2.ipdb

In simulation.py, add a call to initdb() to force-refresh the netlink
socket and the interface list.

Signed-off-by: Brenden Blanco <[email protected]>

Revert this patch can fix the issue.
@drzaeus77 can you take a look?

@yonghong-song
Copy link
Collaborator

Regarding to my above deadlock warning reported by kernel lockdep, it is a false positive. The issue has been fixed in 4.14 and latest net-next.

@arssher
Copy link

arssher commented Mar 11, 2018

I'm experiencing the same issue. I have tried reverting commit 97a0cac and deadlock in test 15 was indeed resolved, however build fails anyway because of tests 3, 27, 28, 29, 30, 31 not passed. I'm on Debian Stretch, kernel version 4.14.13. Build log attached.
debuild.txt

@yonghong-song
Copy link
Collaborator

Maybe test_libbcc related to ruby package?
test_tools_smoke cannot attach certain probes? Maybe a kernel config issue?
test_tools_memleak maybe python 2/3 compatibility issue?
test_usdt* failures seems related to gcc compilation?

Could you help debug further on these test failures? Thanks!

@yonghong-song
Copy link
Collaborator

Looks like test_brb failed? There are few possibilities:
- netperf is not installed, or
- some python module related to networking is not installed.

The command format to run the test itself

${TEST_WRAPPER} py_brb_c sudo ${CMAKE_CURRENT_SOURCE_DIR}/test_brb.py test_brb.c

Test wrapper is at build/tests/wrapper.sh.

@jingshui001
Copy link

I had the same problems AndrewAday'd had above. While there is no build errors, it seems to hang on test #15 and #16 (py_test_brb and py_test_brb2):

Kernel: 4.9.0-0.bpo.3-amd64 (on Debian 8.9)

...
make[2]: Entering directory '/var/tmp/bcc/obj-x86_64-linux-gnu'
Running tests...
/usr/bin/ctest --force-new-ctest-process -j1
Test project /var/tmp/bcc/obj-x86_64-linux-gnu
      Start  1: style-check
 1/35 Test  #1: style-check ......................   Passed    0.00 sec
      Start  2: c_test_static
 2/35 Test  #2: c_test_static ....................   Passed    0.11 sec
      Start  3: test_libbcc
 3/35 Test  #3: test_libbcc ......................   Passed    9.19 sec
      Start  4: py_test_stat1_b
 4/35 Test  #4: py_test_stat1_b ..................   Passed    0.66 sec
      Start  5: py_test_bpf_log
 5/35 Test  #5: py_test_bpf_log ..................   Passed    0.65 sec
      Start  6: py_test_stat1_c
 6/35 Test  #6: py_test_stat1_c ..................   Passed    0.70 sec
      Start  7: py_test_xlate1_c
 7/35 Test  #7: py_test_xlate1_c .................   Passed    0.60 sec
      Start  8: py_test_call1
 8/35 Test  #8: py_test_call1 ....................   Passed    0.61 sec
      Start  9: py_test_trace1
 9/35 Test  #9: py_test_trace1 ...................   Passed    0.35 sec
      Start 10: py_test_trace2
10/35 Test #10: py_test_trace2 ...................   Passed    2.32 sec
      Start 11: py_test_trace3_c
11/35 Test #11: py_test_trace3_c .................   Passed    2.40 sec
      Start 12: py_test_trace4
12/35 Test #12: py_test_trace4 ...................   Passed    1.13 sec
      Start 13: py_test_probe_count
13/35 Test #13: py_test_probe_count ..............   Passed    4.81 sec
      Start 14: py_test_debuginfo
14/35 Test #14: py_test_debuginfo ................   Passed    0.71 sec
      Start 15: py_test_brb
15/35 Test #15: py_test_brb ......................***Exception: Other39733.01 sec
E
======================================================================
ERROR: test_brb (__main__.TestBPFSocket)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/tmp/bcc/tests/python/test_brb.py", line 162, in test_brb
    disable_ipv6=True)
  File "/var/tmp/bcc/tests/python/simulation.py", line 94, in _create_ns
    disable_ipv6)
  File "/var/tmp/bcc/tests/python/simulation.py", line 70, in _ns_add_ifc
    in_ifc = ns_ipdb.interfaces[in_ifname]
KeyError: 'ns1b'

----------------------------------------------------------------------
Ran 1 test in 3.586s

FAILED (errors=1)

      Start 16: py_test_brb2```
qanon-31% 

I'm experiencing the same issue,any idea what's wrong?

@yonghong-song
Copy link
Collaborator

I roughly remember that the following hack can fix the issue?

-bash-4.4$ git diff
diff --git a/examples/networking/simulation.py b/examples/networking/simulation.py
index 2c6a0f3..1de8667 100644
--- a/examples/networking/simulation.py
+++ b/examples/networking/simulation.py
@@ -66,7 +66,7 @@ class Simulation(object):
 
         if out_ifc: out_ifc.up().commit()
         ns_ipdb.interfaces.lo.up().commit()
-        ns_ipdb.initdb()
+        # ns_ipdb.initdb()
         in_ifc = ns_ipdb.interfaces[in_ifname]
         with in_ifc as v:
             v.ifname = ns_ifc
-bash-4.4$

Could you give a try? If this is the case, we will need to figure out the reason reason...

@jingshui001
Copy link

jingshui001 commented Jan 2, 2019

Yes, this can fix the issue,thank you!

But three other tests failed:

92% tests passed, 3 tests failed out of 37

Total Test time (real) = 512.33 sec

The following tests FAILED:
	 27 - py_test_tools_smoke (Failed)
	 31 - py_test_usdt3 (Failed)
	 37 - lua_test_standalone (Failed)
Errors while running CTest

VrtualBox6.0
kernel:debian9 4.9.0-8-amd64
bcc_0.4.0-1_amd64.build.txt

@yonghong-song
Copy link
Collaborator

Right. That is why we have not fixed this issue yet :-(.
Since you have the environment, if you could dig into a little more to debug this to find the root cause and workable solution, that will be great!

@kmahmou1
Copy link

Hi,

I am also having the same issue.
Any solution yet ?

@kmahmou1
Copy link

I am facing the same error and hanging issue
Here is my environment:
Linux d 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u6 (2018-10-08) x86_64 GNU/Linux

I tried all of the suggestions in the post and verifying the dependencies.

bcc_0.8.0-1_amd64.zip

@yonghong-song
Copy link
Collaborator

I have hoped somebody working on debian issue can help debug why the above mentioned workaround fixed the issue while it had an opposite impact on some other systems...

@tarrall
Copy link

tarrall commented Mar 4, 2019

For @kmahmou1 -- check that your linux-headers package version exactly matches your linux-image version. If you just do "apt-get install linux-headers-$(uname -r)" you'll get linux-headers-4.9.0-8-amd64=4.9.144-3.1 which won't work with that 4.9.110-3+deb9u6 kernel.

If they don't match, either remove the linux-headers packages and do:

sudo apt-get install \
linux-headers-4.9.0-8=4.9.110-3+deb9u6 \
linux-headers-4.9.0-8-common=4.9.110-3+deb9u6

or upgrade linux-image to the matching 4.9.144-3.1 package.

@opentokix
Copy link

When you build on a clean machine, the python module does not exist yet. So it can't be imported by the tests.

So to build the first time you can use the workaround below, to skip the tests to build packages.

DEB_BUILD_OPTIONS=nocheck debuild -b -uc -us -i

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants