Skip to content

Whole genome construction, indexing and GBWT

Glenn Hickey edited this page Mar 1, 2018 · 13 revisions
# Set your aws region
export TOIL_AWS_ZONE="us-west-2a"
# This will be used below for the Toil Jobstore.  "jobstore" can be replaced with any name
export TOIL_JOBSTORE="aws:us-west-2:jobstore"
# All the output will be put in this S3 bucket.  "outstore" can be replaced with any name
export TOIL_OUTSTORE="aws:us-west-2:outstore"
# This must be a valid AWS keypair, with keys appropriately set up on your computer
export KEYPAIR_NAME=my_keypair_name
# This will be the name of the cluster leader node used, it can be any name
export LEADER=leader

# construct a leader node (will assume in toil-vg/ as cloned from git clone https://github.com/vgteam/toil-vg.git)
scripts/create-ec2-leader.sh $LEADER $KEYPAIR_NAME

construct a set of whole genome graphs (snp1kg = 1kg vcf, primary = flat graph), and all their indexes

./scripts/construct-hs37d5-ec2.py $LEADER $TOIL_JOBSTORE $TOIL_OUTSTORE --gcsa --xg --gbwt --primary --node i3.8xlarge:0.85