-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Silva 138? #3
Comments
Yes please! I am hoping to run a fragment insertion analysis with SILVA but my OTUs were picked using a SILVA 138 reference database prepared for QIIME2. |
Yes please! This would be amazing! |
Hello, I've been trying to recreate a SeppReferenceTree artefact pipeline for Silva 138.1 from the repo.
For the moment I am particularly puzzled with the masking step from here.
The help would be much appreciated! |
It seems @diego92sigma6 has had some luck with this issue: smirarab/pasta#61. Perhaps he can chime in.
That masking step is meant to remove super-gappy sites from the alignment (not just retaining V4).
I'd be happy to share the scripts. In fact, I thought everything necessary is here already: https://github.com/smirarab/sepp-refs/tree/master/silva and https://github.com/smirarab/sepp/tree/master/sepp-package/buildref When I last tried to used SILVA 138, I ran to the issue of non-monophyly of archaea. I didn't have time to further follow up further on that. |
Hi, @crusher083 hoping you are well. Since my dataset is too big to run on a desktop computer (12GB fasta), I had to create a small C++ program for gappy sequence filtering that uses streams to optimize resources. This 12GB alignment with 2 million sequences is taking 120 seconds to perform the filtering. I would be really happy to help with this program if you are in a similar situation with resources availability. My current situation is that only 3 out of the 2 million sequences were 97+% gaps, so I'm following a second piece of advice from @smirarab to filter similar sequences which I will write here for convenience.
On my side, maybe filtering to V4 only may be the step I was missing to reduce the dataset. I was wondering if there is any tool you know about to perform this task. I'm happy to help with anything you need. |
Diego, you may have misunderstood what I asked for filtering. I was advising removing sites (so columns) not species (rows) that have more than 99.5% gaps. Did you try simply removing gappy sites? |
Oh, that's my bad! |
Hi Siavash, |
Do you have a file that tells you the taxonomy for all of the species in
your sample? If so, we need to root at the LCA of archaea. I'd be happy to
help with this step if you have that mapping file.
One issue that I faced was that archea was actually not monophyletic in the
previous version, but there are ways around that.
…On Wed, Sep 22, 2021 at 10:10 AM Diego Alonso Marquez Palacios < ***@***.***> wrote:
Hi Siavash,
I'm getting very close to have the reference. I was wondering if you could
please refer me to a resource to understand the step rooting on the lowest
common ancestor of archaea. I'm honestly a bit lost in here. Does this mean
associating the RAxML output tree to another preexisting one? and which
tools would you use to perform this?
Thank you!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGJXODLRK5YA3H75Z2R7ILUDIEXHANCNFSM4RVMNMXQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Siavash Mirarab
|
Hi Siavash, I think we can use this taxonomy file, which contains the accession and semicolon separated taxonomy path for each entry . My input for the raxml steps were the full aligned sequences from silva (ended up using these) with 99.99 sites removed, with a tree generated from fasttree. I decided to manually build the tree instead of using this one because some accessions were associated with the same taxa, producing undesired results in the raxml steps. The produced tree has accessions as nodes. I think this is correct because the sepp-ref for 12.8 is also based on an accession tree. |
Hi Diego,
Sorry for my long silence on this. Was this successful? Any help needed
from my side? Could I point people to the output of your work?
Thanks
Siavash
…On Fri, Sep 24, 2021 at 2:10 PM Diego Alonso Marquez Palacios < ***@***.***> wrote:
Hi Siavash, I think we can use this taxonomy file
<https://www.arb-silva.de/fileadmin/silva_databases/release_138_1/Exports/taxonomy/taxmap_slv_ssu_ref_nr_138.1.txt.gz>,
which contains the accession and semicolon separated taxonomy path for each
entry . My input for the raxml steps were the full aligned sequences from
silva (ended up using these) with 99.99 sites removed, with a tree
generated from fasttree. I decided to manually build the tree instead of
using this one
<https://www.arb-silva.de/fileadmin/silva_databases/release_138_1/Exports/taxonomy/tax_slv_ssu_138.1.tre.gz>
because some accessions were associated with the same taxa, producing
undesired results in the raxml steps. The produced tree has accessions as
nodes. I think this is correct because the sepp-ref for 12.8 is also based
on an accession tree.
I will expose a public folder with the results so far once the branch
length step is done. I will let you know.
Thank you!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGJXOAAF5RQRSN4EM4NAALUDTSNJANCNFSM4RVMNMXQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Siavash Mirarab
|
Hi Siavash. Here is a public gdrive folder with my work. Hope this helps |
Is it possible to either get scripts to do the alignment or get the new Silva 138 release? Silva updates about annually and it would be really nice to be able to update things that rely on sepp and a consistent database along side that
The text was updated successfully, but these errors were encountered: