Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thoughts about leveraging database covers for biology #11

Open
taylorreiter opened this issue Mar 28, 2022 · 0 comments
Open

thoughts about leveraging database covers for biology #11

taylorreiter opened this issue Mar 28, 2022 · 0 comments

Comments

@taylorreiter
Copy link
Member

So an immediate drawback of using covers to build databases is that strain-level identification, with the guarantee that the best strain will always be returned, disappears.

But a great side benefit is that if a species exists across biomes, the first match in both biomes will always be the first sketch the db cover encountered when being built. Then, any additional matches will represent chunks of sequence not in the original sketch, but still present in the environment.

What this might allow us to do is look at the species that are present across biomes, but look for specific matches that are only present in one biome. What I'm imagining is something like this:

E. coli k12 is in the gut and in soil.
E. coli EHEC is only in gut
E. coli MX is only in soil.

Might be cool for identifying strains, or at least genome sequences, that are specific to an environment (or at least seed hypotheses about this stuff)

(side note -- if these covers weren't built with GTDB reps first, it would be really good to have GTDB reps first. Then our fav model orgs will always get the biggest chunk of matches, which is a very useful thing.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant