Skip to content

Commit

Permalink
Merge pull request #3699 from vgteam/giraffe-readme
Browse files Browse the repository at this point in the history
Add Giraffe to the README
adamnovak authored Jul 8, 2022
2 parents cdbe857 + c7e491f commit 2a029a2
Showing 1 changed file with 41 additions and 13 deletions.
54 changes: 41 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -71,7 +71,7 @@ At present, you will need GCC version 4.9 or greater, with support for C++14, to

Other libraries may be required. Please report any build difficulties.

Note that a 64-bit OS is required. Ubuntu 18.04 should work.
Note that a 64-bit OS is required. Ubuntu 20.04 should work.

When you are ready, build with `. ./source_me.sh && make`, and run with `./bin/vg`.

@@ -189,29 +189,39 @@ Note that `vg` tools can generally read all supported graph formats (VG, uncompr

The format of a given graph file can be retrieved with `vg stats -F`.

### Alignment
### Mapping

As this is a small graph, you could align to it using a full-length partial order alignment:
If you have more than one sequence, or you are working on a large graph, you will want to map rather than merely aligning.

<!-- !test check Align a string to a graph -->
```sh
vg align -s CTACTGACAGCAGAAGTTTGCTGTGAAGATTAAATTAGGTGATGCTTG x.vg
```
There are multiple read mappers in `vg`:

Note that you don't have to store the graph on disk at all, you can simply pipe it into the local aligner:
* `vg giraffe` is designed to be fast for highly accurate short reads, against graphs with haplotype information.
* `vg map` is a general-purpose read mapper.
* `vg mpmap` does "munti-path" mapping, to allow describing local alignment uncertainty. [This is useful for transcriptomics.](#Transcriptomic-analysis)

<!-- !test check Align a string to a piped graph -->
#### Mapping with `vg giraffe`

To use `vg giraffe` to map reads, you will first need to prepare indexes. This is best done using `vg autoindex`. In order to get `vg autoindex` to use haplotype information from a VCF file, you can give it the VCF and the associated linear reference directly.

<!-- !test check Simulate and map back with surjection with Giraffe -->
```sh
vg construct -r small/x.fa -v small/x.vcf.gz | vg align -s CTACTGACAGCAGAAGTTTGCTGTGAAGATTAAATTAGGTGATGCTTG -
# construct the graph and indexes (paths below assume running from `vg/test` directory)
vg autoindex --workflow giraffe -r small/x.fa -v small/x.vcf.gz -p x

# simulate a bunch of 150bp reads from the graph, into a GAM file of reads aligned to a graph
vg sim -n 1000 -l 150 -x x.giraffe.gbz -a > x.sim.gam
# now re-map these reads against the graph, and get BAM output in linear space
# FASTQ input uses -f instead of -G.
vg giraffe -Z x.giraffe.gbz -G x.sim.gam -o BAM > aln.bam
```

Most commands allow the streaming of graphs into and out of `vg`.
[More information on using `vg girafe` can be found on the `vg` wiki.](https://github.com/vgteam/vg/wiki/Mapping-short-reads-with-Giraffe)

### Mapping
#### Mapping with `vg map`

If your graph is large, you want to use `vg index` to store the graph and `vg map` to align reads. `vg map` implements a kmer based seed and extend alignment model that is similar to that used in aligners like novoalign or MOSAIK. First an on-disk index is built with `vg index` which includes the graph itself and kmers of a particular size. When mapping, any kmer size shorter than that used in the index can be employed, and by default the mapper will decrease the kmer size to increase sensitivity when alignment at a particular _k_ fails.

<!-- !test check Simulate and map back with surjection -->
<!-- !test check Simulate and map back with surjection with map -->
```sh
# construct the graph (paths below assume running from `vg/test` directory)
vg construct -r small/x.fa -v small/x.vcf.gz > x.vg
@@ -381,6 +391,24 @@ vg mpmap -n rna -t 4 -x vg_rna.spliced.xg -g vg_rna.spliced.gcsa -d vg_rna.splic

This will produce alignments in the multipath format. For more information on the multipath alignment format and `vg mpmap` see [wiki page on mpmap](https://github.com/vgteam/vg/wiki/Multipath-alignments-and-vg-mpmap). Running the two commands on the small example data using 4 threads should on most machines take less than a minute.

### Alignment

If you have a small graph, you can align a sequence to the whole graph, using a full-length partial order alignment:

<!-- !test check Align a string to a graph -->
```sh
vg align -s CTACTGACAGCAGAAGTTTGCTGTGAAGATTAAATTAGGTGATGCTTG x.vg
```

Note that you don't have to store the graph on disk at all, you can simply pipe it into the local aligner:

<!-- !test check Align a string to a piped graph -->
```sh
vg construct -r small/x.fa -v small/x.vcf.gz | vg align -s CTACTGACAGCAGAAGTTTGCTGTGAAGATTAAATTAGGTGATGCTTG -
```

Most commands allow the streaming of graphs into and out of `vg`.

### Command line interface

A variety of commands are available:

1 comment on commit 2a029a2

@adamnovak
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vg CI tests complete for merge to master. View the full report here.

16 tests passed, 0 tests failed and 0 tests skipped in 12783 seconds

Please sign in to comment.