[wip] Clean up vg augment #2474

glennhickey · 2019-09-20T02:00:06Z

Switch everything to handle graph code.

Will clean out all support for the old pileup logic in this pr (

glennhickey · 2019-09-20T14:42:53Z

@ekg I just tried this branch successfully on one of your test files as follows

export graph=MHC_37
# quality filter
vg filter $graph.300x.gam -q 15 -t 32 -v > $graph.300x.q15.gam
#Total Filtered:                132353 / 9719544

# covert to packed graph (takes 2 seconds)
vg convert $graph.X32.vg -p > $graph.X32.pg

# augment the packed graph (takes 37 minutes, 2.6G RAM)
vg augment $graph.X32.pg $graph.300x.q15.gam -A $graph.300x.q15.aug.gam -C > $graph.X32.aug.pg

# compare the two graphs
 vg stats -NEl $graph.X32.vg
155330
155329
length  4970557
vg convert $graph.X32.aug.pg -v | vg stats - -NEl
5758895
8389027
length  7724570

As discussed, there are probably too many read errors in the augmented graph. I'll be looking at ways to get around this via either vg filter or vg augment options soon.

ekg · 2019-09-20T14:54:06Z

Awesome! Now I just have get call to work on cyclic graphs. Can we go right from GFA to pg format?

…

On Fri, Sep 20, 2019, 16:42 Glenn Hickey ***@***.***> wrote: @ekg <https://github.com/ekg> I just tried this branch successfully on one of your test files as follows export graph=MHC_37 # quality filter vg filter $graph.300x.gam -q 15 -t 32 -v > $graph.300x.q15.gam #Total Filtered: 132353 / 9719544 # covert to packed graph (takes 2 seconds) vg convert $graph.X32.vg -p > $graph.X32.pg # augment the packed graph (takes 37 minutes, 2.6G RAM) vg augment $graph.X32.pg $graph.300x.q15.gam -A $graph.300x.q15.aug.gam -C > $graph.X32.aug.pg # compare the two graphs vg stats -NEl $graph.X32.vg 155330 155329 length 4970557 vg convert $graph.X32.aug.pg -v | vg stats - -NEl 5758895 8389027 length 7724570 As discussed, there are probably too many read errors in the augmented graph. I'll be looking at ways to get around this via either vg filter or vg augment options soon. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2474?email_source=notifications&email_token=AABDQEL3UKY4SW5JRFRYJHDQKTOO5A5CNFSM4IYRZIHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7G5B6Q#issuecomment-533582074>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABDQEN3JC6TFN5J2GSW2KTQKTOO5ANCNFSM4IYRZIHA> .

glennhickey · 2019-09-20T15:12:45Z

Continuing the above example:

# compute coverage (3m45s , 1.4G RAM)
vg pack -x $graph.X32.aug.pg -g  $graph.300x.q15.aug.gam -o  $graph.300x.q15.aug.pack -Q 15 -t 5

# compute snarls (2m30s, 5.4G RAM)
vg snarls $graph.X32.aug.pg > $graph.X32.aug.snarls

# make a vcf (dies after 1m20s, 2G RAM)
vg call $graph.X32.aug.pg -k $graph.300x.q15.aug.pack -r $graph.X32.aug.snarls > $graph.300x.q15.aug.vcf

The crash doesn't seem obviously related to a cycle in the reference path (though I wouldn't rule it out). The stack doesn't contain anything useful. I'll try the debugger. But when augmentation increases the node count by 40X (and average node length down to 1.3bp), I think you gotta expect trouble (good stress test though)

Crash report for vg v1.18.0-450-g8d7b74726 "Zungoli"
Stack trace (most recent call last) in thread 82464:
#14   Object "", at 0xffffffffffffffff, in 
#13   Object "/lib/x86_64-linux-gnu/libc-2.27.so", at 0x7fcf558c588e, in __clone
      Source "../sysdeps/unix/sysv/linux/x86_64/clone.S", line 95, in __clone [0x7fcf558c588e]
#12   Object "/lib/x86_64-linux-gnu/libpthread-2.27.so", at 0x7fcf581dd6da, in start_thread
      Source "/build/glibc-OTsEL5/glibc-2.27/nptl/pthread_create.c", line 463, in start_thread [0x7fcf581dd6da]
#11   Object "/usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0", at 0x7fcf55dc395d, in 
#10   Object "/data1/vg/bin/vg", at 0x55d14f8ab913, in vg::SnarlManager::for_each_top_level_snarl_parallel(std::function<void (vg::Snarl const*)> const&) const [clone ._omp_fn.0]
    | Source "src/snarls.cpp", line 709, in operator()
      Source "/usr/include/c++/7/bits/std_function.h", line 706, in _ZNK2vg12SnarlManager33for_each_top_level_snarl_parallelERKSt8functionIFvPKNS_5SnarlEEE._omp_fn.0 [0x55d14f8ab913]
        703:     {
        704:       if (_M_empty())
        705:    __throw_bad_function_call();
      > 706:       return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
        707:     }
        708: 
        709: #if __cpp_rtti
#9    Object "/data1/vg/bin/vg", at 0x55d14f9145f7, in std::_Function_handler<void (vg::Snarl const*), vg::GraphCaller::call_top_level_snarls(bool)::{lambda(vg::Snarl const*)#1}>::_M_invoke(std::_Any_data const&, vg::Snarl const*&&)
    | Source "/usr/include/c++/7/bits/std_function.h", line 316, in operator()
    |   314:       _M_invoke(const _Any_data& __functor, _ArgTypes&&... __args)
    |   315:       {
    | > 316:    (*_Base::_M_get_pointer(__functor))(
    |   317:        std::forward<_ArgTypes>(__args)...);
    |   318:       }
      Source "src/graph_caller.cpp", line 22, in _M_invoke [0x55d14f9145f7]
#8    Object "/data1/vg/bin/vg", at 0x55d14f91b7ac, in vg::LegacyCaller::call_snarl(vg::Snarl const&)
    | Source "src/graph_caller.cpp", line 323, in ~vector
    | Source "/usr/include/c++/7/bits/stl_vector.h", line 435, in ~_Vector_base
    |   433:       ~vector() _GLIBCXX_NOEXCEPT
    |   434:       { std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish,
    | > 435:                  _M_get_Tp_allocator()); }
    |   436: 
    |   437:       /**
    | Source "/usr/include/c++/7/bits/stl_vector.h", line 162, in _M_deallocate
    |   161:       ~_Vector_base() _GLIBCXX_NOEXCEPT
    | > 162:       { _M_deallocate(this->_M_impl._M_start, this->_M_impl._M_end_of_storage
    |   163:                  - this->_M_impl._M_start); }
    | Source "/usr/include/c++/7/bits/stl_vector.h", line 180, in deallocate
    |   178:    typedef __gnu_cxx::__alloc_traits<_Tp_alloc_type> _Tr;
    |   179:    if (__p)
    | > 180:      _Tr::deallocate(_M_impl, __p, __n);
    |   181:       }
    | Source "/usr/include/c++/7/bits/alloc_traits.h", line 462, in deallocate
    |   460:       static void
    |   461:       deallocate(allocator_type& __a, pointer __p, size_type __n)
    | > 462:       { __a.deallocate(__p, __n); }
    |   463: 
    |   464:       /**
      Source "/usr/include/c++/7/ext/new_allocator.h", line 125, in call_snarl [0x55d14f91b7ac]
        122:        return;
        123:      }
        124: #endif
      > 125:    ::operator delete(__p);
        126:       }
        127: 
        128:       size_type
#7    Object "/lib/x86_64-linux-gnu/libgcc_s.so.1", at 0x7fcf55ba5e94, in _Unwind_Resume
#6    Object "/lib/x86_64-linux-gnu/libgcc_s.so.1", at 0x7fcf55ba5612, in 
#5    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25", at 0x7fcf5606e487, in __gxx_personality_v0
#4    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25", at 0x7fcf5606db18, in 
#3    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25", at 0x7fcf5606eab5, in 
#2    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25", at 0x7fcf56068956, in 
#1    Object "/lib/x86_64-linux-gnu/libc-2.27.so", at 0x7fcf557e4800, in abort
      Source "/build/glibc-OTsEL5/glibc-2.27/stdlib/abort.c", line 79, in abort [0x7fcf557e4800]
#0    Object "/lib/x86_64-linux-gnu/libc-2.27.so", at 0x7fcf557e2e97, in raise
      Source "../sysdeps/unix/sysv/linux/raise.c", line 51, in raise [0x7fcf557e2e97]

glennhickey · 2019-09-20T15:33:33Z

@ekg Trying a coverage we're more used to:

# downsample to 30x, and use more stringent filter
vg filter $graph.300x.gam -r 0.90 -fu -m 1 -q 15 -v -d 1.1 > $graph.30x.filter.gam

# augment (only takes a few seconds now)
vg augment $graph.X32.pg $graph.30x.filter.gam -A $graph.30x.filter.aug.gam -C > $graph.X32.aug.30X.pg

# our graph is somehwat less mangled
vg convert -v $graph.X32.aug.30X.pg | vg stats -NEl -
909671
1181198
length  5249626

# try calling again (runs in seconds)
vg pack -x $graph.X32.aug.30X.pg -g $graph.30x.filter.aug.gam -Q 15 -o $graph.30x.filter.aug.pack -t 5
vg snarls $graph.X32.aug.30X.pg > $graph.X32.aug.30X.snarls
vg call $graph.X32.aug.30X.pg -k $graph.30x.filter.aug.pack -r $graph.X32.aug.30X.snarls > $graph.30x.filter.aug.vcf

Seems to have worked

tail $graph.30x.filter.aug.vcf
GRCh37-MHC      4966598 155207_484295   A       C       7.5     PASS    DP=16   GT:DP:AD:XADL:XAAD      1/0:16:8,8:-0.513846:8
GRCh37-MHC      4966895 507596_507598   C       CTT,CT  9.5     PASS    DP=22   GT:DP:AD:XADL:XAAD      1/2:22:0,10,12:-0.303450:22
GRCh37-MHC      4966934 155217_507613   C       G       34      PASS    DP=34   GT:DP:AD:XADL:XAAD      1/1:34:0,34:.:34
GRCh37-MHC      4968165 155256_473481   G       T       9       PASS    DP=23   GT:DP:AD:XADL:XAAD      1/0:23:14,9:-0.110954:9
GRCh37-MHC      4968737 501903_501868   A       G       35      PASS    DP=35   GT:DP:AD:XADL:XAAD      1/1:35:0,35:.:35
GRCh37-MHC      4968983 482316_494814   AT      A       8       PASS    DP=17   GT:DP:AD:XADL:XAAD      0/1:17:9,8:-0.693147:8
GRCh37-MHC      4969267 155290_478186   C       A       4.5     PASS    DP=12   GT:DP:AD:XADL:XAAD      0/1:12:5,7:-0.215483:7
GRCh37-MHC      4970081 155316_504125   TTATTATTATTATTATTATTAT  T,TTATTAT       4.5     PASS    DP=13   GT:DP:AD:XADL:XAAD      1/2:13:0,5,8:-0.143204:13
GRCh37-MHC      4970448 465056_465058   AAAT    A       7.5     PASS    DP=18   GT:DP:AD:XADL:XAAD      1/0:18:10,8:-0.274886:8
GRCh37-MHC      4970467 155328_465064   G       A       5.5     PASS    DP=14   GT:DP:AD:XADL:XAAD      1/0:14:8,6:-0.238226:6

ekg · 2019-09-23T12:05:03Z

I guess that via downsampling we get the same kind of filtering effect with augment. It'd just be nicer if we could make that happen directly in augment.

glennhickey added 3 commits September 19, 2019 21:56

handlify augment_main and remove pileup mode

0cebdb0

fix up tests in light of augment cleanup

6f80ff7

clear out old calling and pileup code

8d7b747

glennhickey merged commit ed86372 into master Sep 20, 2019

glennhickey mentioned this pull request Sep 20, 2019

Better support for high coverage in augment #2475

Open

RenzoTale88 mentioned this pull request Nov 25, 2019

Issues with vg call on graph generated from cactus #2546

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wip] Clean up vg augment #2474

[wip] Clean up vg augment #2474

glennhickey commented Sep 20, 2019

glennhickey commented Sep 20, 2019

ekg commented Sep 20, 2019 via email

glennhickey commented Sep 20, 2019 •

edited

Loading

glennhickey commented Sep 20, 2019

ekg commented Sep 23, 2019

[wip] Clean up vg augment #2474

[wip] Clean up vg augment #2474

Conversation

glennhickey commented Sep 20, 2019

glennhickey commented Sep 20, 2019

ekg commented Sep 20, 2019 via email

glennhickey commented Sep 20, 2019 • edited Loading

glennhickey commented Sep 20, 2019

ekg commented Sep 23, 2019

glennhickey commented Sep 20, 2019 •

edited

Loading