Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault with shipped example #835

Closed
tillea opened this issue Mar 8, 2023 · 20 comments
Closed

Segmentation fault with shipped example #835

tillea opened this issue Mar 8, 2023 · 20 comments

Comments

@tillea
Copy link

tillea commented Mar 8, 2023

The bug is primarily related to salmon (bulk mode)

Describe the bug
After building salmon it segfaults with the example file shipped with the source tarball.

The problem is described in detail in the Debian bug tracking system. While the Debian bug is against version 1.9.0 I can perfectly reproduce the issue with a recent download of version 1.10.0

To Reproduce
Steps and data to reproduce the behavior:

Download your release tarball.

$ tar xaf v1.10.0.tar.gz
$ cd salmon-1.10.0
# build on Debian testing system
$ cmake -DCMAKE_BUILD_TYPE=Release -DUSE_SHARED_LIBS=TRUE -DBZIP2_LIBRARIES=-lbz2 -DBZIP2_INCLUDE_DIR=/usr/include -DLIBLZMA_INCLUDE_DIR=/usr/include/ -DLIBLZMA_LIBRARY=lzma -DCMAKE_MODULE_PATH=/usr/share/cmake/Modules -DTBB_WILL_RECONFIGURE=FALSE -DBOOST_WILL_RECONFIGURE=FALSE
$ make
$ tar xaf sample_data.tgz
$ src/salmon index -t sample_data/transcripts.fasta -i sample_salmon_quasi_index
Version Info: This is the most recent version of salmon.
[2023-03-08 17:30:38.873] [jLog] [warning] The salmon index is being built without any decoy sequences.  It is recommended that decoy sequence (either computed auxiliary decoy sequence or the genome of the organism) be provided during indexing. Further details can be found at https://salmon.readthedocs.io/en/latest/salmon.html#preparing-transcriptome-indices-mapping-based-mode.
[2023-03-08 17:30:38.873] [jLog] [info] building index
out : sample_salmon_quasi_index
[2023-03-08 17:30:38.873] [puff::index::jointLog] [info] Running fixFasta

[Step 1 of 4] : counting k-mers

[2023-03-08 17:30:38.879] [puff::index::jointLog] [info] Replaced 0 non-ATCG nucleotides
[2023-03-08 17:30:38.879] [puff::index::jointLog] [info] Clipped poly-A tails from 0 transcripts
wrote 15 cleaned references
Segmentation fault.
  • Version 1.9.0 as well as version 1.10.0 are affected. Unfortunately we did not managed to package version 1.7.0 and 1.8.0 but I confirm that version 1.6.0 was not affected by the described problem.
  • Salmon was build as Debian package as well as built from source (see above)
  • The reference was taken from the sample_data shipped with the release tarball

Expected behavior
Clean processing without SEGFAULT.

Desktop (please complete the following information):

  • OS: Debian (testing or unstable)

Additional context

Kind regards, Andreas.

@nileshpatra
Copy link

/cc: @rob-p

@rob-p
Copy link
Collaborator

rob-p commented Mar 9, 2023

Hi @tillea and @nileshpatra, thanks for the report (and ping). Can you point me to a Docker / Singularity container of the relevant Debian build so I can try and reproduce locally? This will make debugging much easier.

For example, I am unable to reproduce this issue building the latest release from the master branch using the latest official Debian image.

In particular, release 1.10 addresses a rare (but stubborn) segfault that certainly was present in 1.9. However, the fix for this is in the corresponding tagged release of pufferfish, which is pulled in by the build script when salmon is built.

@rob-p
Copy link
Collaborator

rob-p commented Mar 10, 2023

I also tried on the testing image. For more detail, here are the steps performed (perhaps taking a look at the installed packages will highlight a difference, as I did this from a clean testing Docker image, so my environment had nothing else in it).

Attempt to reproduce segfault on Debian:testing

$ docker pull debian:testing

$ docker run -it debian:testing

$ apt-get update

$ apt-get install build-essential git libboost-all-dev liblzma-dev libbz2-dev cmake zlib1g-dev curl unzip wget libcurl4-openssl-dev

$ git clone https://github.com/COMBINE-lab/salmon.git
$ cd salmon
$ mkdir build && cd build
$ cmake -DNO_IPO=TRUE .. # GCC still doesn't handle LTO robustly
$ make -j8
$ make install
$ make test

which leads to the output

root@fd877e359439:/salmon/build# make install
[  7%] Built target libcereal
[ 13%] Built target libtbb
[ 16%] Built target ntcard
[ 19%] Built target graphdump
[ 27%] Built target twopaco
[ 29%] Built target ksw2pp_sse2
[ 32%] Built target ksw2pp_sse4
[ 37%] Built target ksw2pp_basic
[ 38%] Built target ksw2pp
[ 59%] Built target puffer
[ 73%] Built target salmon_core
[ 76%] Built target alevin_core
[ 77%] Built target UnitTestsMain
[ 81%] Built target unitTests
[100%] Built target salmon
Install the project...
-- Install configuration: "Release"
-- Installing: /salmon/lib/libntcard.a
-- Installing: /salmon/lib/ntcard/ntcard-targets.cmake
-- Installing: /salmon/lib/ntcard/ntcard-targets-release.cmake
-- Installing: /salmon/lib/libgraphdump.a
-- Installing: /salmon/lib/graphdump/graphdump-targets.cmake
-- Installing: /salmon/lib/graphdump/graphdump-targets-release.cmake
-- Installing: /salmon/lib/libtwopaco.a
-- Installing: /salmon/lib/twopaco/twopaco-targets.cmake
-- Installing: /salmon/lib/twopaco/twopaco-targets-release.cmake
-- Installing: /salmon/lib/libtbb.so
-- Installing: /salmon/lib/libtbb.so.12
-- Installing: /salmon/lib/libtbb.so.12.5
-- Installing: /salmon/lib/libtbbmalloc.so
-- Installing: /salmon/lib/libtbbmalloc.so.2
-- Installing: /salmon/lib/libtbbmalloc.so.2.5
-- Installing: /salmon/lib/libtbbmalloc_proxy.so
-- Installing: /salmon/lib/libtbbmalloc_proxy.so.2
-- Installing: /salmon/lib/libtbbmalloc_proxy.so.2.5
-- Installing: /salmon/bin/salmon
-- Installing: /salmon/lib/libsalmon_core.a



Installation complete. Please ensure the following paths are set properly.
==========================================================================
Please add /salmon/bin to your PATH
Please add /salmon/lib to your LD_LIBRARY_PATH
==========================================================================
root@fd877e359439:/salmon/build# make test
Running tests...
Test project /salmon/build
    Start 1: unit_tests
1/2 Test #1: unit_tests .......................   Passed    0.37 sec
    Start 2: salmon_read_test_quasi
2/2 Test #2: salmon_read_test_quasi ...........   Passed    1.80 sec

100% tests passed, 0 tests failed out of 2

Total Test time (real) =   2.17 sec

The make test command itself runs the test the builds the index and maps the reads against it. Either way, I can do that explicitly too (from within build):

$ ./src/salmon index -t ../sample_data/transcripts.fasta -i sample_idx

returns succesfully with a built index

...
[2023-03-10 05:51:33.748] [puff::index::jointLog] [info] Done wrapping the rank vector with a rank9sel structure.
[2023-03-10 05:51:33.748] [puff::index::jointLog] [info] contig count for validation: 23
[2023-03-10 05:51:33.748] [puff::index::jointLog] [info] Total # of Contigs : 23
[2023-03-10 05:51:33.748] [puff::index::jointLog] [info] Total # of numerical Contigs : 23
[2023-03-10 05:51:33.748] [puff::index::jointLog] [info] Total # of contig vec entries: 36
[2023-03-10 05:51:33.748] [puff::index::jointLog] [info] bits per offset entry 6
[2023-03-10 05:51:33.748] [puff::index::jointLog] [info] Done constructing the contig vector. 24
[2023-03-10 05:51:33.748] [puff::index::jointLog] [info] # segments = 23
[2023-03-10 05:51:33.748] [puff::index::jointLog] [info] total length = 19592
[2023-03-10 05:51:33.748] [puff::index::jointLog] [info] Reading the reference files ...
[2023-03-10 05:51:33.750] [puff::index::jointLog] [info] positional integer width = 15
[2023-03-10 05:51:33.750] [puff::index::jointLog] [info] seqSize = 19592
[2023-03-10 05:51:33.750] [puff::index::jointLog] [info] rankSize = 19592
[2023-03-10 05:51:33.750] [puff::index::jointLog] [info] edgeVecSize = 0
[2023-03-10 05:51:33.750] [puff::index::jointLog] [info] num keys = 18902
for info, total work write each  : 2.331    total work inram from level 3 : 4.322  total work raw : 25.000
[Building BooPHF]  100  %   elapsed:   0 min 0  sec   remaining:   0 min 0  sec
Bitarray          105024  bits (100.00 %)   (array + ranks )
final hash             0  bits (0.00 %) (nb in final hash 0)
[2023-03-10 05:51:33.781] [puff::index::jointLog] [info] mphf size = 0.0125198 MB
[2023-03-10 05:51:33.781] [puff::index::jointLog] [info] chunk size = 9796
[2023-03-10 05:51:33.781] [puff::index::jointLog] [info] chunk 0 = [0, 9796)
[2023-03-10 05:51:33.781] [puff::index::jointLog] [info] chunk 1 = [9796, 19562)
[2023-03-10 05:51:33.784] [puff::index::jointLog] [info] finished populating pos vector
[2023-03-10 05:51:33.784] [puff::index::jointLog] [info] writing index components
[2023-03-10 05:51:33.784] [puff::index::jointLog] [info] finished writing dense pufferfish index
[2023-03-10 05:51:33.784] [jLog] [info] done building index

So on testing at least, I can't yet reproduce this issue.

@nileshpatra
Copy link

Weird. Can you try with a debian:sid image instead? I am able to repro this in a debian-sid choot for instance.

@rob-p
Copy link
Collaborator

rob-p commented Mar 10, 2023

I can try that later today. Does your list of installed dependencies look different?

The thing that's strange about this is that the main motivation for releasing 1.10 was a discovery of an intermittent segfault due to UB at exactly this spot. But that was resolved in the associated tagged pufferfish upstream and checked further with both valgrind and asan.

@tillea
Copy link
Author

tillea commented Mar 10, 2023

I've tried to reproduce the issue in docker by using the Build-Depends that are used in Debian:

$ docker pull debian:testing
$ docker run -it debian:testing
$ echo "deb-src http://deb.debian.org/debian unstable main" > /etc/apt/sources.list.d/10-debsrc.list
$ apt update
$ apt upgrade
$ apt build-dep salmon
$ wget https://github.com/COMBINE-lab/salmon/archive/refs/tags/v1.10.0.tar.gz
$ tar xaf v1.10.0.tar.gz
$ cd salmon-1.10.0
$ mkdir build && cd build
$ cmake -DNO_IPO=TRUE ..
$ make -j8
make -j8
[  3%] Built target ksw2pp_sse4
[  6%] Built target ntcard
[ 15%] Built target twopaco
[ 18%] Built target graphdump
[ 21%] Built target ksw2pp_sse2
[ 27%] Built target ksw2pp_basic
[ 43%] Built target salmon_core
[ 67%] Built target puffer
[ 68%] Built target ksw2pp
[ 69%] Built target UnitTestsMain
[ 73%] Built target alevin_core
[ 74%] Linking CXX executable unitTests
/usr/bin/ld: /usr/lib/x86_64-linux-gnu/libcurl.a(libcurl_gnutls_la-psl.o): in function `Curl_psl_destroy':
(.text+0x21): undefined reference to `psl_free'
/usr/bin/ld: /usr/lib/x86_64-linux-gnu/libcurl.a(libcurl_gnutls_la-psl.o): in function `Curl_psl_use':
(.text+0xbc): undefined reference to `psl_latest'
/usr/bin/ld: (.text+0x157): undefined reference to `psl_builtin'
/usr/bin/ld: /usr/lib/x86_64-linux-gnu/libcurl.a(libcurl_gnutls_la-version.o): in function `curl_version':
(.text+0x129): undefined reference to `BrotliDecoderVersion'
/usr/bin/ld: (.text+0x16f): undefined reference to `ZSTD_versionNumber'
/usr/bin/ld: (.text+0x1e3): undefined reference to `idn2_check_version'
/usr/bin/ld: (.text+0x20e): undefined reference to `psl_get_version'
/usr/bin/ld: /usr/lib/x86_64-linux-gnu/libcurl.a(libcurl_gnutls_la-version.o): in function `curl_version_info':
(.text+0x386): undefined reference to `idn2_check_version'
/usr/bin/ld: (.text+0x3ad): undefined reference to `BrotliDecoderVersion'
/usr/bin/ld: (.text+0x3b8): undefined reference to `BrotliDecoderVersion'
/usr/bin/ld: (.text+0x3f5): undefined reference to `ZSTD_versionNumber'
/usr/bin/ld: (.text+0x400): undefined reference to `ZSTD_versionNumber'
/usr/bin/ld: (.text+0x475): undefined reference to `nghttp2_version'
/usr/bin/ld: /usr/lib/x86_64-linux-gnu/libcurl.a(libcurl_gnutls_la-libssh2.o): in function `ssh_attach':
(.text+0x3e1): undefined reference to `libssh2_session_abstract'
/usr/bin/ld: /usr/lib/x86_64-linux-gnu/libcurl.a(libcurl_gnutls_la-libssh2.o): in function `ssh_statemach_act':
(.text+0x4b1): undefined reference to `libssh2_session_set_blocking'
/usr/bin/ld: (.text+0x4fb): undefined reference to `libssh2_session_handshake'
/usr/bin/ld: (.text+0x58d): undefined reference to `libssh2_hostkey_hash'
/usr/bin/ld: (.text+0x6a0): undefined reference to `libssh2_hostkey_hash'
/usr/bin/ld: (.text+0x75d): undefined reference to `libssh2_knownhost_free'
...

So somehow this does not build - but I have the impression that the linker issues are caused by some missing CMAKE options (as well as using the build directory). Thus I used the cmake command line as its used in the Debian packaging:

$ docker pull debian:testing
$ docker run -it debian:testing
$ echo "deb-src http://deb.debian.org/debian unstable main" > /etc/apt/sources.list.d/10-debsrc.list
$ apt update
$ apt upgrade
$ apt build-dep salmon
$ wget https://github.com/COMBINE-lab/salmon/archive/refs/tags/v1.10.0.tar.gz
$ tar xaf v1.10.0.tar.gz
$ cd salmon-1.10.0
$ cmake -DCMAKE_BUILD_TYPE=Release -DUSE_SHARED_LIBS=TRUE -DBZIP2_LIBRARIES=-lbz2 -DBZIP2_INCLUDE_DIR=/usr/include -DLIBLZMA_INCLUDE_DIR=/usr/include/ -DLIBLZMA_LIBRARY=lzma -DCMAKE_MODULE_PATH=/usr/share/cmake/Modules -DTBB_WILL_RECONFIGURE=FALSE -DBOOST_WILL_RECONFIGURE=FALSE
$ make
$ tar xaf sample_data.tgz
# src/salmon index -t sample_data/transcripts.fasta -i sample_salmon_quasi_index
Version Info: This is the most recent version of salmon.
index ["sample_salmon_quasi_index"] did not previously exist  . . . creating it
[2023-03-10 11:56:01.434] [jLog] [warning] The salmon index is being built without any decoy sequences.  It is recommended that decoy sequence (either computed auxiliary decoy sequence or the genome of the organism) be  rovided during indexing. Further details can be found at https://salmon.readthedocs.io/en/latest/salmon.html#preparing-transcriptome-indices-mapping-based-mode.
[2023-03-10 11:56:01.435] [jLog] [info] building index
out : sample_salmon_quasi_index
[2023-03-10 11:56:01.435] [puff::index::jointLog] [info] Running fixFasta

[Step 1 of 4] : counting k-mers

[2023-03-10 11:56:01.441] [puff::index::jointLog] [info] Replaced 0 non-ATCG nucleotides
[2023-03-10 11:56:01.441] [puff::index::jointLog] [info] Clipped poly-A tails from 0 transcripts
wrote 15 cleaned references
Segmentation fault (core dumped)

Please note that -DNO_IPO=TRUE is not part of the above cmake call. I've added this to the packaging, but there is no difference. Please also note that I did not cloned the Git repository but have downloaded the release tarball since this is what we are packaging.

Hope this helps tracking down the issue.

Kind regards, Andreas.

@rob-p
Copy link
Collaborator

rob-p commented Mar 10, 2023

Hi @tillea! Thanks for the added details, I'll try and repro with these. I noticed 2 other differences and wonder if they matter. The first is that your configure does DUSE_SHARED_LIBS=TRUE. Could there be shared libs found at runtime other than what is linked during compile?

The other is that you seem to be doing an in source build. I.e. building directly in the salmon dir rather than in a build directory. I'm actually a bit surprised the build works in that way, as it isn't designed to be an in source build. I can try and see if either of those matter on my end, but wonder if you have insight into either.

Thanks,
Rob

@rob-p
Copy link
Collaborator

rob-p commented Mar 10, 2023

Hi @tillea,

Thanks for the further details. I'll try with your approach. Two other differences to note between the builds that I wonder if they would matter:

  • Your build forces shared libraries, whereas the default is to statically link. I wonder if there is any difference between what is found at link time and runtime on your system?

  • You seem to be doing an in-source build (i.e. invoking cmake and make from the top-level salmon directory). I'm actually kind of surprised this works, as the build isn't designed to be in source, so I'll see if this has an effect.

Anyway, I will try these out today but wanted to mention them to get your read just in case.

Thanks,
Rob

@rob-p
Copy link
Collaborator

rob-p commented Mar 10, 2023

Ok, when I attempt the build the way you say above, I get the following error during CMake:

-- fetch PUFFERFISH exit code 127
CMake Error at CMakeLists.txt:317 (message):
  Could not fetch pufferfish source [fetchPufferfish.sh returned exit code
  127].


-- Configuring incomplete, errors occurred!
See also "/salmon-1.10.0/build/CMakeFiles/CMakeOutput.log".

It seems wget, curl and unzip were missing, and I had to install them. After that, I was able to build and install.

At that point, I was able to reproduce the issue! So, it seems to me the underlying problem is coming from one of the upstream dependencies (i.e. libraries being linked to). I will try see if I can find the offender. In general, we like to statically link salmon for exactly this reason. Outside of package systems with which I am familiar (e.g. conda), we don't have a lot of experience in specifying dependent package version constrains, which I believe to be at fault here.

@rob-p
Copy link
Collaborator

rob-p commented Mar 10, 2023

Using the rest of the same configure flags without -DUSE_SHARED_LIBS=TRUE, the build does not link properly. I think you should try building without these extra flags. Since the LTO seems not to be a problem on this system, a simple cmake .. && make should work. In the mean time, I'll try and pare back the configure command line to find the maximum viable interpolation between our different configurations.

--Rob

@rob-p
Copy link
Collaborator

rob-p commented Mar 10, 2023

So, it turns out that with the dependency setup that is pulled in, I can't even get salmon to build without USE_SHARED=TRUE. I think at this point, it's not clear the segfault is due to something that is broken / can be fixed in the salmon code itself. Rather, it's likely due to a misversioning of a dependency that is pulled in and then linked against.

For the time being, I think the options are to do a more "standard" build (i.e. like the first one I suggested that pull in only that minimal set of dependencies and let salmon itself statically link the rest), or to try and look at the upstream shared libraries being linked and figure out which of them is misversioned.

--Rob

@tillea
Copy link
Author

tillea commented Mar 10, 2023

Ok, when I attempt the build the way you say above, I get the following error during CMake:

-- fetch PUFFERFISH exit code 127
CMake Error at CMakeLists.txt:317 (message):
  Could not fetch pufferfish source [fetchPufferfish.sh returned exit code
  127].

Did you do the

apt build-dep salmon

step? I can't imagine that you get this problem if you follow my log step by step.

Debian is usually using dynamic linking. By having all Build-Dependencies (which is ensured in the step above) the existence of the libraries is granted and the options for cmake I specified are ensuring that the libs are found.

Kind regards, Andreas.

@rob-p
Copy link
Collaborator

rob-p commented Mar 10, 2023

Hi @tillea,

It seems this is exactly the problem. The build deps here are not quite correct. There are dependencies that salmon no longer has, and some of the dependencies it does have are out of date and can't be used from upstream (e.g. libstaden in the latest version, among others).

On the bright side, it's not the dynamic linking alone that is problematic. The following works fine on my end:

$ docker pull debian:testing
$ docker run -it debian:testing
$ echo "deb-src http://deb.debian.org/debian unstable main" > /etc/apt/sources.list.d/10-debsrc.list
$ apt-get update
$ apt-get upgrade
$ apt-get install build-essential git libboost-all-dev liblzma-dev libbz2-dev cmake zlib1g-dev curl unzip wget libcurl4-openssl-dev libtbb-dev libtbb12 liblzma-dev libjemalloc2 pkg-config libgff-dev
$ wget https://github.com/COMBINE-lab/salmon/archive/refs/tags/v1.10.0.tar.gz
$ tar xaf v1.10.0.tar.gz
$ cd salmon-1.10.0
$ mkdir build && cd build
$ cmake -DCMAKE_BUILD_TYPE=Release -DUSE_SHARED_LIBS=TRUE -DBZIP2_LIBRARIES=-lbz2 -DBZIP2_INCLUDE_DIR=/usr/include -DLIBLZMA_INCLUDE_DIR=/usr/include/ -DLIBLZMA_LIBRARY=lzma -DCMAKE_MODULE_PATH=/usr/share/cmake/Modules -DTBB_WILL_RECONFIGURE=FALSE -DBOOST_WILL_RECONFIGURE=FALSE ..
$ make -j8
$ make install
$ make test

This is preferring dynamic linking, and the resulting installed executable runs fine without a segfault. Can you try this on your end? Then the thing to do may be to find what is discordant between the packages I install above and what gets pulled in by apt build-dep salmon.

Thanks,
Rob

@tillea
Copy link
Author

tillea commented Mar 10, 2023

Hi @rob-p,
thanks a lot for your investigation. Could you please be more verbose on those incorrect Build-Depends? What dependencies can be removed (if not used they should not really harm, thought but you are correct that it makes sense to remove these) and more importantly which can not be used. For instance if we can't use libstaden as packaged we have a problem. All preconditions for a Debian package have to be packaged first. Fetching something from network is not permitted at package build time.
Thus I simply tried changing the cmake options to

$ cmake -DCMAKE_BUILD_TYPE=Release -DUSE_SHARED_LIBS=TRUE -DBZIP2_LIBRARIES=-lbz2 -DBZIP2_INCLUDE_DIR=/usr/include -DLIBLZMA_INCLUDE_DIR=/usr/include/ -DLIBLZMA_LIBRARY=lzma -DCMAKE_MODULE_PATH=/usr/share/cmake/Modules -DTBB_WILL_RECONFIGURE=FALSE -DBOOST_WILL_RECONFIGURE=FALSE ..

which does not change the SEGFAULT problem. If the issue belongs to something we need to download from somewhere please let me know what looks suspicious to you. This would be helpful since we could either add it to the Debian package source in debian/missing-sources ... or rather fix the predependency that would break salmon.
Kind regards, Andreas.

@rob-p
Copy link
Collaborator

rob-p commented Mar 10, 2023

Hi Andreas,

So I don't know if there is a easy way to get the specific list of reverse dependencies, but then we can cross-check it with my explicit list above:

build-essential
git 
libboost-all-dev 
liblzma-dev 
libbz2-dev 
cmake 
zlib1g-dev 
curl 
unzip 
wget 
libcurl4-openssl-dev 
libtbb-dev 
libtbb12 
liblzma-dev 
libjemalloc2 
pkg-config 
libgff-dev

One thing I noticed during build is that, while I included libjemalloc2 here, the salmon build procedure still downloaded and built jemalloc. However, I don't think jemalloc is the thing causing the segfault.

Regarding dependencies that can't be used — the current libstaden is behind the upstream release. The upstream release contains an important bug fix for a bug (and suggested fix that we proposed to the developer) upon which we rely. More importantly, afaik there is no relevant libpufferfish-dev package (we certainly have not made one), and so there is not even e.g. a check in the CMakeLists.txt file. Salmon's build always tries to run fetchPufferfish.sh to download the relevant pufferfish source files needed to build salmon. Critically, the relevant pufferfish dependencies and salmon releases move in lockstep. Each new salmon release it accompanied by a new tag in the pufferfish repo (so that the specific source used to build a given salmon release is fixed and easily trackable).

So, I think the easiest way to move forward is to:

  • do a diff of my list of packages above with what is pulled in by apt build-dep salmon.

  • figure out why, even when libjemalloc2 is installed, the build system tries to build jemalloc itself (maybe we need the dev package?).

  • determine what folks want to do upstream about the lockstep pufferfish dependency. Right now, the fetchPufferfish.sh script pulls a tagged tarball from github and checks that the sha matches, and moves the relevant source files into place. This is true both when we build our own releases as well as when salmon is built for other distribution channels like conda. While I am happy to have someone figure out how to package that up as a package that can be put into the repo and depended upon, we currently don't have the capacity to tackle that ourselves and don't have a suitable mechanism to replace the current approach to obtaining the dependent pufferfish files. However, this question is very important as e.g. a segfault exactly like the one you are encountering was actually a bug in the pufferfish source used in the 1.9.0 release of salmon that was fixed for the 1.10.0 release.

Best,
Rob

@tillea
Copy link
Author

tillea commented Mar 11, 2023

I think we should sort out this issue step by step. If you say libstaden has an important bugfix we should upgrade to the latest version in any case. Do you have a link to this bug? I admit this update simply slipped through - we should have upgraded this in the beginning of this year. Usually we try to follow upstream closely (which we failed for salmon blatantly for several reasons - one is the close connection to pufferfish).
Regarding pufferfish: We tried hard to get pufferfish packaged but failed (due to the use of other versions of spdlog, cereal, and fmt) However, since we can't run fetchPufferfish.sh inside the build process I was running it separately and added the downloaded source in debian/external/pufferfish So I think the requirement of salmon should be fulfilled. I confirm your feeling that pufferfish is important for the current issue.
However, in the test I did when opening this bug report I did not do that pre-downloading of pufferfish since I was building right in the downloaded source tarball. libpufferfish-dev was not installed by apt build-dep salmon since this package does not exist.
Kind regards, Andreas.

@rob-p
Copy link
Collaborator

rob-p commented Mar 11, 2023

Hi @tillea,

So I went through the list of deps pulled in by apt build-dep salmon and the minimal set I gave above. I tried to make the smallest number of changes I could to the apt build-dep salmon list while also removing things that are clearly outdated (we no longer use jellyfish, rapmap, etc. and we use the header-only version of spdlog).

As a result I came up with this list of dependencies. The offending dependency seems to be libcereal-dev. Specifically, I was able to install just this list of dependencies (minus libcereal-dev) atop a clean debian:testing and get a working version where the only thing downloaded from the internet was the appropriate version of the pufferfish files grabbed by fetchPufferfish.sh in the 1.10.0 release. Once I installed libcereal-dev with apt-get install, and rebuilt, then I got the segfault mentioned at the top of this issue. So, it seems that we either have to let salmon build it's own libcereal, or figure out what the problem is with the library upstream.

Please let me know if this you observe this same behavior as well (also @nileshpatra may want to try this out). If so, perhaps we can get libstaden updated upstream, and then use this as the new dep list for salmon. I installed these deps with a simple xargs apt-get install -y < deps_sorted_updated.txt (without libcereal-dev for the working version, and with it included, as below, for the segfault).

Best,
Rob

deps_sorted_updated.txt

@rob-p
Copy link
Collaborator

rob-p commented Mar 12, 2023

Hi @tillea and @nileshpatra,

Ok, I dug deeper and found out what's going on. The culprit is, in fact, libcereal. The problem is that libcereal bumped patch versions only since the version corresponding to the headers included in pufferfish, but their changes are not, in fact, backwards compatible! This lead to a version mismatch between the headers used in pufferfish and the headers found from the installed package, ultimately resulting in an assertion failure in rapidjson (which cereal is using) and a segfault.

On the plus side, this was relatively easy to fix by bumping the included cereal headers in pufferfish. I also updated the Findcereal.cmake module and added a version constraint so that we now require the new version (1.3.2).

This is now tagged and released as salmon 1.10.1. Please give that a go when you have a chance. I'll note that, before this is added upstream in debian, I'd still advocate for fixing the libstaden package to update to the new version. I'd also recommend moving to dependencies like the ones I've included above to remove some really antiquated dependencies that salmon no longer requires but are still being pulled in.

Thanks!
Rob

@tillea
Copy link
Author

tillea commented Mar 14, 2023

Hi,
I confirm that salmon 1.10.1 fixes the issue. I'll update libstaden in Debian. We probably should talk about some separate pufferfish package for the next Debian release. We actually started with the packaging but this has somehow stalled (I don't remember the reasons but I'll keep an eye on this soon (but its too late for bookworm).
Kind regards and thanks a lot for the fix, Andreas.

@rob-p
Copy link
Collaborator

rob-p commented Mar 14, 2023

Thanks! We can circle back around to the pufferfish question in a different issue when folks have more thoughts about how to move forward there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants