-
Notifications
You must be signed in to change notification settings - Fork 82
Hatchet
Hatchet is a tool used to split the GTDB-Tk reference tree into smaller sub-tree to reduce the memory footprint of the tool.
Hatchet is an internal tool and may break outside of ACE.
This step reduce the number of genomes in the reference tree to one genome per rank of interest.
hatchet pick --domain bac -r f --tree release89/pplacer/gtdb_r89_bac120.refpkg/bac120_r89_unroot.pplacer.tree --msa release89/pplacer/gtdb_r89_bac120.refpkg/bac120_msa_r89.faa --taxonomy release89/taxonomy/gtdb_taxonomy.tsv --red_file release89/mrca_red/gtdbtk_r89_bac120.tsv --output_dir output
hatchet pick
is generating a shell script 'pick_one_genome.sh' calling different third party tools.
Once the pruned tree is generated we need to recreate the red file with the remaining nodes.
hatchet red --raw_tree ../release89/pplacer/gtdb_r89_bac120.refpkg/bac120_r89_unroot.pplacer.tree --pruned_tree gtdb_pruned.tree --red_file ../release89/mrca_red/gtdbtk_r89_bac120.tsv --output new_red_file.tsv
hatchet hatchet_wf -d bac -t gtdb_r207_bac120_decorated_fullids.tree --msa /srv/projects/gtdbtk/test_for_ms/benchmark_time_r207/tk_package/pplacer/gtdb_r207_bac120.refpkg/gtdb_r207_bac120_concatenated_gtdb_headers.faa --tax ../../taxonomy/bac120_taxonomy_r207_reps.tsv -o split/hatchet_wf_use_original_log --red_file phylorank_outliers/gtdb_r207_bac120_decorated_fullids.node_rd.tsv --original_log gtdb_r207_bac120_fasttree.log