Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] Clean up vg augment #2474

Merged
merged 3 commits into from
Sep 20, 2019
Merged

[wip] Clean up vg augment #2474

merged 3 commits into from
Sep 20, 2019

Conversation

glennhickey
Copy link
Contributor

Switch everything to handle graph code.

Will clean out all support for the old pileup logic in this pr (

@glennhickey
Copy link
Contributor Author

@ekg I just tried this branch successfully on one of your test files as follows

export graph=MHC_37
# quality filter
vg filter $graph.300x.gam -q 15 -t 32 -v > $graph.300x.q15.gam
#Total Filtered:                132353 / 9719544

# covert to packed graph (takes 2 seconds)
vg convert $graph.X32.vg -p > $graph.X32.pg

# augment the packed graph (takes 37 minutes, 2.6G RAM)
vg augment $graph.X32.pg $graph.300x.q15.gam -A $graph.300x.q15.aug.gam -C > $graph.X32.aug.pg

# compare the two graphs
 vg stats -NEl $graph.X32.vg
155330
155329
length  4970557
vg convert $graph.X32.aug.pg -v | vg stats - -NEl
5758895
8389027
length  7724570

As discussed, there are probably too many read errors in the augmented graph. I'll be looking at ways to get around this via either vg filter or vg augment options soon.

@ekg
Copy link
Member

ekg commented Sep 20, 2019 via email

@glennhickey
Copy link
Contributor Author

glennhickey commented Sep 20, 2019

Continuing the above example:

# compute coverage (3m45s , 1.4G RAM)
vg pack -x $graph.X32.aug.pg -g  $graph.300x.q15.aug.gam -o  $graph.300x.q15.aug.pack -Q 15 -t 5

# compute snarls (2m30s, 5.4G RAM)
vg snarls $graph.X32.aug.pg > $graph.X32.aug.snarls

# make a vcf (dies after 1m20s, 2G RAM)
vg call $graph.X32.aug.pg -k $graph.300x.q15.aug.pack -r $graph.X32.aug.snarls > $graph.300x.q15.aug.vcf

The crash doesn't seem obviously related to a cycle in the reference path (though I wouldn't rule it out). The stack doesn't contain anything useful. I'll try the debugger. But when augmentation increases the node count by 40X (and average node length down to 1.3bp), I think you gotta expect trouble (good stress test though)

Crash report for vg v1.18.0-450-g8d7b74726 "Zungoli"
Stack trace (most recent call last) in thread 82464:
#14   Object "", at 0xffffffffffffffff, in 
#13   Object "/lib/x86_64-linux-gnu/libc-2.27.so", at 0x7fcf558c588e, in __clone
      Source "../sysdeps/unix/sysv/linux/x86_64/clone.S", line 95, in __clone [0x7fcf558c588e]
#12   Object "/lib/x86_64-linux-gnu/libpthread-2.27.so", at 0x7fcf581dd6da, in start_thread
      Source "/build/glibc-OTsEL5/glibc-2.27/nptl/pthread_create.c", line 463, in start_thread [0x7fcf581dd6da]
#11   Object "/usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0", at 0x7fcf55dc395d, in 
#10   Object "/data1/vg/bin/vg", at 0x55d14f8ab913, in vg::SnarlManager::for_each_top_level_snarl_parallel(std::function<void (vg::Snarl const*)> const&) const [clone ._omp_fn.0]
    | Source "src/snarls.cpp", line 709, in operator()
      Source "/usr/include/c++/7/bits/std_function.h", line 706, in _ZNK2vg12SnarlManager33for_each_top_level_snarl_parallelERKSt8functionIFvPKNS_5SnarlEEE._omp_fn.0 [0x55d14f8ab913]
        703:     {
        704:       if (_M_empty())
        705:    __throw_bad_function_call();
      > 706:       return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
        707:     }
        708: 
        709: #if __cpp_rtti
#9    Object "/data1/vg/bin/vg", at 0x55d14f9145f7, in std::_Function_handler<void (vg::Snarl const*), vg::GraphCaller::call_top_level_snarls(bool)::{lambda(vg::Snarl const*)#1}>::_M_invoke(std::_Any_data const&, vg::Snarl const*&&)
    | Source "/usr/include/c++/7/bits/std_function.h", line 316, in operator()
    |   314:       _M_invoke(const _Any_data& __functor, _ArgTypes&&... __args)
    |   315:       {
    | > 316:    (*_Base::_M_get_pointer(__functor))(
    |   317:        std::forward<_ArgTypes>(__args)...);
    |   318:       }
      Source "src/graph_caller.cpp", line 22, in _M_invoke [0x55d14f9145f7]
#8    Object "/data1/vg/bin/vg", at 0x55d14f91b7ac, in vg::LegacyCaller::call_snarl(vg::Snarl const&)
    | Source "src/graph_caller.cpp", line 323, in ~vector
    | Source "/usr/include/c++/7/bits/stl_vector.h", line 435, in ~_Vector_base
    |   433:       ~vector() _GLIBCXX_NOEXCEPT
    |   434:       { std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish,
    | > 435:                  _M_get_Tp_allocator()); }
    |   436: 
    |   437:       /**
    | Source "/usr/include/c++/7/bits/stl_vector.h", line 162, in _M_deallocate
    |   161:       ~_Vector_base() _GLIBCXX_NOEXCEPT
    | > 162:       { _M_deallocate(this->_M_impl._M_start, this->_M_impl._M_end_of_storage
    |   163:                  - this->_M_impl._M_start); }
    | Source "/usr/include/c++/7/bits/stl_vector.h", line 180, in deallocate
    |   178:    typedef __gnu_cxx::__alloc_traits<_Tp_alloc_type> _Tr;
    |   179:    if (__p)
    | > 180:      _Tr::deallocate(_M_impl, __p, __n);
    |   181:       }
    | Source "/usr/include/c++/7/bits/alloc_traits.h", line 462, in deallocate
    |   460:       static void
    |   461:       deallocate(allocator_type& __a, pointer __p, size_type __n)
    | > 462:       { __a.deallocate(__p, __n); }
    |   463: 
    |   464:       /**
      Source "/usr/include/c++/7/ext/new_allocator.h", line 125, in call_snarl [0x55d14f91b7ac]
        122:        return;
        123:      }
        124: #endif
      > 125:    ::operator delete(__p);
        126:       }
        127: 
        128:       size_type
#7    Object "/lib/x86_64-linux-gnu/libgcc_s.so.1", at 0x7fcf55ba5e94, in _Unwind_Resume
#6    Object "/lib/x86_64-linux-gnu/libgcc_s.so.1", at 0x7fcf55ba5612, in 
#5    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25", at 0x7fcf5606e487, in __gxx_personality_v0
#4    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25", at 0x7fcf5606db18, in 
#3    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25", at 0x7fcf5606eab5, in 
#2    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25", at 0x7fcf56068956, in 
#1    Object "/lib/x86_64-linux-gnu/libc-2.27.so", at 0x7fcf557e4800, in abort
      Source "/build/glibc-OTsEL5/glibc-2.27/stdlib/abort.c", line 79, in abort [0x7fcf557e4800]
#0    Object "/lib/x86_64-linux-gnu/libc-2.27.so", at 0x7fcf557e2e97, in raise
      Source "../sysdeps/unix/sysv/linux/raise.c", line 51, in raise [0x7fcf557e2e97]

@glennhickey
Copy link
Contributor Author

@ekg Trying a coverage we're more used to:

# downsample to 30x, and use more stringent filter
vg filter $graph.300x.gam -r 0.90 -fu -m 1 -q 15 -v -d 1.1 > $graph.30x.filter.gam

# augment (only takes a few seconds now)
vg augment $graph.X32.pg $graph.30x.filter.gam -A $graph.30x.filter.aug.gam -C > $graph.X32.aug.30X.pg

# our graph is somehwat less mangled
vg convert -v $graph.X32.aug.30X.pg | vg stats -NEl -
909671
1181198
length  5249626

# try calling again (runs in seconds)
vg pack -x $graph.X32.aug.30X.pg -g $graph.30x.filter.aug.gam -Q 15 -o $graph.30x.filter.aug.pack -t 5
vg snarls $graph.X32.aug.30X.pg > $graph.X32.aug.30X.snarls
vg call $graph.X32.aug.30X.pg -k $graph.30x.filter.aug.pack -r $graph.X32.aug.30X.snarls > $graph.30x.filter.aug.vcf

Seems to have worked

tail $graph.30x.filter.aug.vcf
GRCh37-MHC      4966598 155207_484295   A       C       7.5     PASS    DP=16   GT:DP:AD:XADL:XAAD      1/0:16:8,8:-0.513846:8
GRCh37-MHC      4966895 507596_507598   C       CTT,CT  9.5     PASS    DP=22   GT:DP:AD:XADL:XAAD      1/2:22:0,10,12:-0.303450:22
GRCh37-MHC      4966934 155217_507613   C       G       34      PASS    DP=34   GT:DP:AD:XADL:XAAD      1/1:34:0,34:.:34
GRCh37-MHC      4968165 155256_473481   G       T       9       PASS    DP=23   GT:DP:AD:XADL:XAAD      1/0:23:14,9:-0.110954:9
GRCh37-MHC      4968737 501903_501868   A       G       35      PASS    DP=35   GT:DP:AD:XADL:XAAD      1/1:35:0,35:.:35
GRCh37-MHC      4968983 482316_494814   AT      A       8       PASS    DP=17   GT:DP:AD:XADL:XAAD      0/1:17:9,8:-0.693147:8
GRCh37-MHC      4969267 155290_478186   C       A       4.5     PASS    DP=12   GT:DP:AD:XADL:XAAD      0/1:12:5,7:-0.215483:7
GRCh37-MHC      4970081 155316_504125   TTATTATTATTATTATTATTAT  T,TTATTAT       4.5     PASS    DP=13   GT:DP:AD:XADL:XAAD      1/2:13:0,5,8:-0.143204:13
GRCh37-MHC      4970448 465056_465058   AAAT    A       7.5     PASS    DP=18   GT:DP:AD:XADL:XAAD      1/0:18:10,8:-0.274886:8
GRCh37-MHC      4970467 155328_465064   G       A       5.5     PASS    DP=14   GT:DP:AD:XADL:XAAD      1/0:14:8,6:-0.238226:6

@ekg
Copy link
Member

ekg commented Sep 23, 2019

I guess that via downsampling we get the same kind of filtering effect with augment. It'd just be nicer if we could make that happen directly in augment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants