You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
it would be nice to be able to decontaminate GTDB itself, but one of the problems we face there is that charcoal doesn't work well in situations where we have the exact genome in question in the reference database. this is because the first filter is to use gather to search the reference database, and it will return precisely 1 match in that situation.
so the question is, how can we deal with this? two ideas --
allow the initial gather to be a search, instead, and then tell just_taxonomy.py to ignore exact matches internally.
find ways to mask or temporarily eliminate specific signatures from the databases.
on GTDB 25k, which is really nice and non-redundant, it should be straightforward (if medium expensive) to do the search, so maybe we should start there.
it would be nice to be able to decontaminate GTDB itself, but one of the problems we face there is that charcoal doesn't work well in situations where we have the exact genome in question in the reference database. this is because the first filter is to use gather to search the reference database, and it will return precisely 1 match in that situation.
so the question is, how can we deal with this? two ideas --
on GTDB 25k, which is really nice and non-redundant, it should be straightforward (if medium expensive) to do the search, so maybe we should start there.
ref sourmash-bio/sourmash#849
The text was updated successfully, but these errors were encountered: