You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was hoping to ask a question about a draft Nocardiopsis spp. model I've built using PopPUNK; I work in the Natural Products space, primarily with phylum Actinomycetota—a group of bacteria that can be incredibly genetically diverse even at the species level. I am working with a novel Nocardiopsis species and was using PopPUNK to try to determine its taxonomic position relative to Nocardiopsis genomes available via NCBI. Because my strain is a new species (confirmed chemotaxonomically), I need a tool that would be appropriate for mixed-species taxonomic characterization. I started with PopPUNK because the 2019 publication highlighted its utility in classifying multi-species cohorts of bacterial pathogens, and also because I'd like to use core + accessory genomes to build a phylogeny. My question is this—is PopPUNK appropriate for a diverse genus like Nocardiopsis, given the following results?
I've attached all files generated from the database creation here as a .tar.gz file.
Reference Nocardiopsis genomes were downloaded from NCBI using Datasets. 157 genomes (either complete or draft) were used.
The following workflow was used to generate this first database:
Sketch: poppunk --create-db --output nocardiopsis_ref --r-files ref.txt --min-k 13 --max-k 29
Thanks for the advice, I'll looking into modifying both my QC criteria and investigating a more optimal kmer range. I did run Kchooser4 from the kSNP4 package on my cohort, which reported an optimal kmer length of 23; however, even using that parameter for the kSNP4 pipeline resulted in a tree with low confidence. As far as ANI is concerned, I did run fastANI on my dataset, however there are no strains that meet the ANI threshold when compared with my strain–all other strains of importance exhibit ANI values <90%, which encapsulates my core problem, I think.
I was hoping to ask a question about a draft Nocardiopsis spp. model I've built using PopPUNK; I work in the Natural Products space, primarily with phylum Actinomycetota—a group of bacteria that can be incredibly genetically diverse even at the species level. I am working with a novel Nocardiopsis species and was using PopPUNK to try to determine its taxonomic position relative to Nocardiopsis genomes available via NCBI. Because my strain is a new species (confirmed chemotaxonomically), I need a tool that would be appropriate for mixed-species taxonomic characterization. I started with PopPUNK because the 2019 publication highlighted its utility in classifying multi-species cohorts of bacterial pathogens, and also because I'd like to use core + accessory genomes to build a phylogeny. My question is this—is PopPUNK appropriate for a diverse genus like Nocardiopsis, given the following results?
I've attached all files generated from the database creation here as a
.tar.gz
file.Reference Nocardiopsis genomes were downloaded from NCBI using Datasets. 157 genomes (either complete or draft) were used.
The following workflow was used to generate this first database:
Sketch:
poppunk --create-db --output nocardiopsis_ref --r-files ref.txt --min-k 13 --max-k 29
QC database:
poppunk --qc-db --ref-db nocardiopsis_ref --qc-keep
This resulted in 128 files failing QC for various reasons, mostly for failing distance.
Fit model:
poppunk --fit-model bgmm --ref-db nocardiopsis_ref --output nocardiopsis_ref
Fit summary:
Avg. entropy of assignment 0.0160
Number of components used 2
Scaled component means:
[0.69171908 0.07866528]
[0.36643846 0.3914161 ]
Network summary:
Components 1
Density 0.1921
Transitivity 0.5733
Mean betweenness 0.1924
Weighted-mean betweenness 0.1924
Score 0.4631
Score (w/ betweenness) 0.3740
Score (w/ weighted-betweenness) 0.3740
Removing 136 sequences
Thanks!
nocardiopsis_ref.tar.gz
The text was updated successfully, but these errors were encountered: