Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Core dump and Segfault when calculating Network.nsi_betweenness() centrality #142

Closed
OzanSahin92 opened this issue Jun 12, 2020 · 3 comments
Labels
Milestone

Comments

@OzanSahin92
Copy link

OzanSahin92 commented Jun 12, 2020

I am currently trying to calculate the nsi_betweenness of my directed graph. printing my network object gives me

Network: undirected, 25600 nodes, 61761383 links, link density 0.188.

I tried to find the error with gdb and it seems to happen because of indexerrors in the cython function _nsi_betweenness.
I think that the problem lies in the indexing via the degree.
Some arrays in the function are of size [0,E] with E being the number of edges/links and indexing is done via the sum of degrees, which can be of size [0,2*E]. that results in a segmentation error.

running gdp python on the core dump file lead to this:

Reading symbols from python...

warning: core file may not match specified executable file.
[New LWP 9520]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `python nsi_bc.py 2 95 100'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00002b4d53031756 in __main__::_nsi_betweenness$241(long long, long long, Array<double, 1, C, mutable, aligned>, Array<long long, 1, C, mutable, aligned>, long long, Array<float, 1, C, mutable, aligned>, Array<float, 1, C, mutable, aligned>, Array<int, 1, C, mutable, aligned>, Array<int, 1, C, mutable, aligned>, Array<double, 1, C, mutable, aligned>, Array<int, 1, C, mutable, aligned>) ()

running gdb python again with the _nsi_betweenness function now pythonized lead to this:

(gdb) run nsi_bc.py 2 95 100
Starting program: /home/osahin/anaconda3/envs/myenv/bin/python nsi_bc.py 2 95 100
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fc0e8d06700 (LWP 18074)]
[New Thread 0x7fc0e8505700 (LWP 18075)]
[New Thread 0x7fc0e3d04700 (LWP 18076)]
[New Thread 0x7fc0e1503700 (LWP 18077)]
[New Thread 0x7fc0ded02700 (LWP 18078)]
[New Thread 0x7fc0dc501700 (LWP 18079)]
[New Thread 0x7fc0d9d00700 (LWP 18080)]
[New Thread 0x7fc0d74ff700 (LWP 18081)]
[New Thread 0x7fc0d4cfe700 (LWP 18082)]
[New Thread 0x7fc0d24fd700 (LWP 18083)]
[New Thread 0x7fc0cfcfc700 (LWP 18084)]
[New Thread 0x7fc0cd4fb700 (LWP 18085)]
[New Thread 0x7fc0cacfa700 (LWP 18086)]
[New Thread 0x7fc0c84f9700 (LWP 18087)]
[New Thread 0x7fc0c5cf8700 (LWP 18088)]
[New Thread 0x7fc0c34f7700 (LWP 18089)]
[New Thread 0x7fc0c0cf6700 (LWP 18090)]
[New Thread 0x7fc0be4f5700 (LWP 18091)]
[New Thread 0x7fc0bbcf4700 (LWP 18092)]
[New Thread 0x7fc0b94f3700 (LWP 18093)]
[New Thread 0x7fc0b6cf2700 (LWP 18094)]
[New Thread 0x7fc0b44f1700 (LWP 18095)]
[New Thread 0x7fc0b1cf0700 (LWP 18096)]
[New Thread 0x7fc0af4ef700 (LWP 18097)]
[New Thread 0x7fc0accee700 (LWP 18098)]
[New Thread 0x7fc0aa4ed700 (LWP 18099)]
[New Thread 0x7fc0a7cec700 (LWP 18100)]
[New Thread 0x7fc0a54eb700 (LWP 18101)]
[New Thread 0x7fc0a2cea700 (LWP 18102)]
[New Thread 0x7fc0a04e9700 (LWP 18103)]
[New Thread 0x7fc09fce8700 (LWP 18104)]
[New Thread 0x7fc09b4e7700 (LWP 18105)]
[New Thread 0x7fc098ce6700 (LWP 18106)]
[New Thread 0x7fc0964e5700 (LWP 18107)]
[New Thread 0x7fc093ce4700 (LWP 18108)]
[New Thread 0x7fc0914e3700 (LWP 18109)]
[New Thread 0x7fc090ce2700 (LWP 18110)]
[New Thread 0x7fc08c4e1700 (LWP 18111)]
[New Thread 0x7fc08bce0700 (LWP 18112)]
[Thread 0x7fc0b1cf0700 (LWP 18096) exited]
[Thread 0x7fc08bce0700 (LWP 18112) exited]
[Thread 0x7fc08c4e1700 (LWP 18111) exited]
[Thread 0x7fc090ce2700 (LWP 18110) exited]
[Thread 0x7fc0914e3700 (LWP 18109) exited]
[Thread 0x7fc093ce4700 (LWP 18108) exited]
[Thread 0x7fc0964e5700 (LWP 18107) exited]
[Thread 0x7fc098ce6700 (LWP 18106) exited]
[Thread 0x7fc09b4e7700 (LWP 18105) exited]
[Thread 0x7fc09fce8700 (LWP 18104) exited]
[Thread 0x7fc0a04e9700 (LWP 18103) exited]
[Thread 0x7fc0a2cea700 (LWP 18102) exited]
[Thread 0x7fc0a54eb700 (LWP 18101) exited]
[Thread 0x7fc0a7cec700 (LWP 18100) exited]
[Thread 0x7fc0aa4ed700 (LWP 18099) exited]
[Thread 0x7fc0accee700 (LWP 18098) exited]
[Thread 0x7fc0af4ef700 (LWP 18097) exited]
[Thread 0x7fc0b44f1700 (LWP 18095) exited]
[Thread 0x7fc0b6cf2700 (LWP 18094) exited]
[Thread 0x7fc0b94f3700 (LWP 18093) exited]
[Thread 0x7fc0bbcf4700 (LWP 18092) exited]
[Thread 0x7fc0be4f5700 (LWP 18091) exited]
[Thread 0x7fc0c0cf6700 (LWP 18090) exited]
[Thread 0x7fc0c34f7700 (LWP 18089) exited]
[Thread 0x7fc0c5cf8700 (LWP 18088) exited]
[Thread 0x7fc0c84f9700 (LWP 18087) exited]
[Thread 0x7fc0cacfa700 (LWP 18086) exited]
[Thread 0x7fc0cd4fb700 (LWP 18085) exited]
[Thread 0x7fc0cfcfc700 (LWP 18084) exited]
[Thread 0x7fc0d24fd700 (LWP 18083) exited]
[Thread 0x7fc0d4cfe700 (LWP 18082) exited]
[Thread 0x7fc0d74ff700 (LWP 18081) exited]
[Thread 0x7fc0d9d00700 (LWP 18080) exited]
[Thread 0x7fc0dc501700 (LWP 18079) exited]
[Thread 0x7fc0ded02700 (LWP 18078) exited]
[Thread 0x7fc0e1503700 (LWP 18077) exited]
[Thread 0x7fc0e3d04700 (LWP 18076) exited]
[Thread 0x7fc0e8505700 (LWP 18075) exited]
[Thread 0x7fc0e8d06700 (LWP 18074) exited]
[Detaching after fork from child process 18119]
yo
0
Traceback (most recent call last):
  File "nsi_bc.py", line 174, in <module>
    _nsi_betweenness(N, E, w, k, j, betweenness_to_j, excess_to_j, offsets.astype(np.int32), flat_neighbors, is_source, flat_predecessors)
  File "nsi_bc.py", line 55, in _nsi_betweenness
    flat_predecessors[fi] = i
IndexError: index 123522767 is out of bounds for axis 0 with size 123522767
[Inferior 1 (process 18036) exited with code 01]


ntfrgl added a commit that referenced this issue Apr 8, 2022
- Single-source Cython/C type definitions: `core._ext.types`
- Introduce semantic type aliases: ADJ, NODE, DEGREE, WEIGHT, FIELD, etc.
- Remove deprecated type aliases: `np.{int|float}`
- Replace C type aliases by Numpy type aliases
- Add missing Numpy type casts before Python/Cython boundary
- Disambiguate `timeseries._ext.numerics._twin_surrogates_{s|r}`
  (blame: 098489b & 2f6a5b9)
- Fix some broken output pointers
- Reduce redundant copying
- Vectorize some simple Python loops
- Eliminate some dead code
- Potentially related issues: #128, #141, #142, #145
@fkuehlein fkuehlein changed the title Core dump and Segfault when calculating nsi_betweenness centrality BUG: Core dump and Segfault when calculating Network.nsi_betweenness() centrality Sep 26, 2023
@fkuehlein fkuehlein added the bug label Sep 26, 2023
@fkuehlein fkuehlein added this to the New Release milestone Oct 12, 2023
@fkuehlein
Copy link
Collaborator

fkuehlein commented Oct 13, 2023

Could not reproduce this, Network.nsi_betweenness() works fine for me on Networks of up to 10000 nodes (I'm limited in Network size by my old machine's ram here).

Assuming this has been resolved by @ntfrgl's recent fixes in the Cython/C modules, and hence closing the issue.

ntfrgl added a commit that referenced this issue Oct 19, 2023
Changes:
- Cythonize outer loop over nodes
- remove repeated array allocations
- enable parallel outer loop using `multiprocess` package
  (naive, no shared memory)
- assert precondition on array sizes related to node degrees

Context:
- related: #142
- original Weave -> Cython port: 20ad40d
@ntfrgl
Copy link
Member

ntfrgl commented Oct 19, 2023

@fkuehlein, I don't think that you have provided sufficient reason for closing this issue: The work you mentioned on the Cython/C method signatures wouldn't concern an indexing error deeply inside a Cython function, and you could in principle, say, spin up a moderately sized Amazon Elastic Compute Cloud instance to obtain enough RAM for reproducing the issue.

Nonetheless, I agree with closing the issue for now. My commit above removes some obvious slowdowns in the method, after which I successfully tested it on a random network with slightly higher node and edge counts than the bug report:

from pyunicorn import Network
net = Network.Model("ErdosRenyi", n_nodes=26000, link_probability=.2)
btw = net.nsi_betweenness(parallelize=True)

The explanation suggested by the issue author is unfortunately incorrect: As is now explicitly asserted, the array flat_predecessors is of size E = 2 * n_links, which is the sum of all degrees (both for directed and undirected networks) rather than the number of edges.

In contrast, the error message above appears to have n_links = 61761383 and E = 123522767 = 2 * n_links + 1, which might indicate an inconsistent data structure produced by the user's script. Otherwise, it is of course still conceivable that there remains an uncovered edge case in the implementation, but we would probably need the exact adjacency matrix to reproduce it.

@fkuehlein
Copy link
Collaborator

Right, I was a bit too quick to conclude there..

All the more thanks for your speedup work and for providing good reference and documentation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants