Skip to content

Latest commit

 

History

History
36 lines (28 loc) · 8.87 KB

README.md

File metadata and controls

36 lines (28 loc) · 8.87 KB

Minigraph-Cactus Pangenomes

These are some pangenomes created using Minigraph-Cactus, along with all steps used to produce them.

The commits used can be found in the logs in the Output links. Data is provided in HAL, vg, GFA and VCF formats. Please read about VCF output before using it.

Please cite the Minigraph-Cactus paper when using these pangenomes. For the HPRC pangenomes, please also cite the HPRC Paper.

Please Read First

  • These pangenomes are provided "as-is" and, unless published on, are here merely to serve as examples as we've used them for debugging and testing but haven't necessarily spent much time validating them.
  • The Minigraph-Cactus Pangenome pipeline's interface has seen many changes (especially prior to publication of the methdod). It's possible that the given commands do not work with the latest version of Cactus! Try to look at the most recent example(s) if you are having issues with the interface.
  • The 30-mouse-pg-2022-09-23 pangenome is nearly 40% Ns due, apparently, to gappy input assemblies. This should be taken into account when using it. How best to handle Ns is still an area of future investigation.
  • The commands given below (upt to early 2023, when we switched to SLURM) to construct these graphs were run from Toil AWS leader instances. If you are not running on an AWS/EC2 cluster and want to reproduce the graphs, then you will need to adapt the commands to your environment. To run locally remove the AWS-specific options like --batchSystem, --defaultPreemptable, --nodeType, --nodeStorage, --maxNodes, --betaInertia, --targetTime and --provisioner and use a local (non-s3) jobstore (like ./jobstore).

Links to example pangenomes and how to make them

Name Species Date Haplotypes Reference SeqFile Commands Output
Fruit fly Drosophila melanogaster 2023-08-25 16 dm6 16-fly-pg-2023-08-25-seqfile.txt 16-fly-pg-2023-08-25-commands.md files
HPRC v1.1 Homo sapiens 2023-07-04 90 GRCh38 hprc-v1.1-mc.seqfile hprc-v1.1-mc.md index, files
HPRC v1.1 Homo sapiens 2023-07-04 90 CHM13 hprc-v1.1-mc.seqfile hprc-v1.1-mc.md index, files
Dog Canis lupus familiaris 2023-06-27 9 canFam4 9-dog-pg-2023-06-27-seqfile.txt 9-dog-pg-2023-06-27-commands.md files
Chicken Gallus gallus 2023-06-27 10 galGal6 10-chicken-pg-2023-06-27-seqfile.txt 10-chicken-pg-2023-06-27-commands.md files
GRCh38-alts Homo sapiens 2023-04-13 262 GRCh38 grch38-alts-pg-2023-04-13-seqfile.txt grch38-alts-pg-2023-04-13-commands.md files
Cow Bos taurus 2023-03-31 5 bosTau9 5-cow-pg-2023-03-31-seqfile.txt 5-cow-pg-2023-03-31-commands.md files
Soybean Glycine max 2022-09-26 17 Glycine_max_v4.0 17-soybean-pg-2022-09-26-seqfile.txt 17-soybean-pg-2022-09-26-commands.md files
Mouse Mus musculus 2022-09-23 30 mm39 30-mouse-pg-2022-09-23-seqfile.txt 30-mouse-pg-2022-09-23-commands.md files
Chicken Gallus gallus 2022-09-23 10 galGal6 10-chicken-pg-2022-09-23-seqfile.txt 10-chicken-pg-2022-09-23-commands.md files
Dog Canis lupus familiaris 2022-09-23 9 canFam4 9-dog-pg-2022-09-23-seqfile.txt 9-dog-pg-2022-09-23-commands.md files
Cow Bos taurus 2022-09-22 5 bosTau9 5-cow-pg-2022-09-22-seqfile.txt 5-cow-pg-2022-09-22-commands.md files
Fruit fly Drosophila melanogaster 2022-05-26 16 dm6 16-fly-pg-2022-05-26-seqfile.txt 16-fly-pg-2022-05-26-commands.md files
HPRC v1.0 Homo sapiens 2021-08-11 90 GRCh38 see here see here see here
HPRC v1.0 Homo sapiens 2021-08-11 90 CHM13 see here see here see here

Creating an HPRC seqfile and running the current pipeline on it is described here.