Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to fix for clang 15 #93

Merged
merged 23 commits into from
Dec 19, 2023
Merged

Attempt to fix for clang 15 #93

merged 23 commits into from
Dec 19, 2023

Conversation

johnlees
Copy link
Member

Closes #92

@jakirkham
Copy link

There is this warning on CI:

/home/runner/micromamba/envs/pp_env/lib/python3.12/site-packages/h5py/__init__.py:36: UserWarning: h5py is running against HDF5 1.14.3 when it was built against 1.14.2, this may cause problems
  _warn(("h5py is running against HDF5 {0} when it was built against {1}, "

It looks similar to these comments: conda-forge/h5py-feedstock#122 (comment)

AFAICT this is just a warning (not an error)

A bit later in the log it looks like something caused a segfault:

Progress (CPU): 0 / 28Segmentation fault (core dumped)
Traceback (most recent call last):
2.1.2
90ac5fc6064660e6814c86f47b3679ac1050388f
  File "/home/runner/work/pp-sketchlib/pp-sketchlib/test/run_test.py", line 29, in <module>
    subprocess.run("python ../sketchlib-runner.py sketch -l references.txt -o test_db -s 10000 -k 15,29,4 --cpus 2", shell=True, check=True)
  File "/home/runner/micromamba/envs/pp_env/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python ../sketchlib-runner.py sketch -l references.txt -o test_db -s 10000 -k 15,29,4 --cpus 2' returned non-zero exit status 139.

Are we able to debug this offline? Maybe there is a stacktrace that would help identify what caused the segfault

@johnlees
Copy link
Member Author

Thanks for looking too. The segfault is consistent in time with when the HDF5 file is created, and when I was building I was getting the wrong version linked versus the one used at runtime which made me suspicious.

Annoyingly my local version to debug this doesn't segfault, so it's going to take me more time to sort this out. (but note to self I should try running under valgrind in case the segfault is happening but not caught)

@jakirkham
Copy link

Given the warning, maybe a first step would be to try pinning hdf5 to 1.14.2 to see if whether the warning/segfault go away?

@johnlees
Copy link
Member Author

Appears to be an issue with openmp:

(gdb) bt
#0  PyErr_CheckSignals () at /usr/local/src/conda/python-3.12.0/Modules/signalmodule.c:1771
#1  0x00007fffd6e63e0d in _Z15create_sketchesRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS4_SaIS4_EERKS7_IS9_SaIS9_EERKS7_ImSaImEEmbbmbm._omp_fn.0(void) ()
    at /home/runner/work/pp-sketchlib/pp-sketchlib/src/api.cpp:84
#2  0x00007fffd6cf96d9 in gomp_thread_start (xdata=<optimized out>) at ../../../libgomp/team.c:129
#3  0x00007ffff7c94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#4  0x00007ffff7d26660 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) info threads
  Id   Target Id                                 Frame
  1    Thread 0x7ffff7eaf740 (LWP 4511) "python" 0x00007fffd6dc541a in SeqBuf::SeqBuf (this=<optimized out>, filenames=..., kmer_len=<optimized out>, this=<optimized out>,
    filenames=..., kmer_len=<optimized out>) at /home/runner/work/pp-sketchlib/pp-sketchlib/src/sketch/seqio.cpp:79
  2    Thread 0x7ffff4c84640 (LWP 4515) "python" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7ffff6fedb60 <thread_status+96>)
    at ./nptl/futex-internal.c:57
  3    Thread 0x7ffff3c83640 (LWP 4516) "python" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7ffff6fedbe0 <thread_status+224>)
    at ./nptl/futex-internal.c:57
  4    Thread 0x7fffeac82640 (LWP 4517) "python" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7ffff6fedc60 <thread_status+352>)
    at ./nptl/futex-internal.c:57
* 5    Thread 0x7fffd6be0640 (LWP 4518) "python" PyErr_CheckSignals () at /usr/local/src/conda/python-3.12.0/Modules/signalmodule.c:1771

Using single thread the command runs. Trying this in the CI now

@johnlees
Copy link
Member Author

Confirmed openmp. It would still be worth checking the compile and run time versions and the cmakelists to see if I can fix this in the CI. Otherwise omitting the multithread test would be fine

@johnlees
Copy link
Member Author

johnlees commented Dec 19, 2023

Very annoying to debug, but the underlying reason for this appears to be calling PyErr_CheckSignals() in threads. I don't fully understand why:

If the function is called from the main thread and under the main Python interpreter, it checks whether a signal has been sent to the processes and if so, invokes the corresponding signal handler.
If the function is called from a non-main thread, or under a non-main Python interpreter, it does nothing and returns 0.

Some interaction/difference between openmp/pthreads and python threads perhaps. And I guess it may have always been segfaulting, but just not caught.

Making sure you are in the main thread when checking this seems to work

@johnlees johnlees merged commit 7ee661b into master Dec 19, 2023
4 checks passed
@johnlees johnlees deleted the clang-rng branch December 19, 2023 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clang 15 compilation issues on macOS
2 participants