-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Soil contig tax #24
Soil contig tax #24
Conversation
Merge branch 'main' into soil-contig-tax
@brynnz22 Overall, an impressive feat to pull together all these API calls and merges in a consumable way. Wow! The notebook rendered fine without running the calls or accessing the pkl files, so I think we are good on that front; though we should add to the readme for the google colab that rerunning this in an interactive environment is not recommended and will likely break for these reasons. Do you use the 'taxonomic_dist_by_soil_layer/python/mongodb_query.txt.js' file? Maybe I'm missing it, but if not please remove from the branch. In first Markdown cell, the word 'object' is misspelled. Once we have the tsv urls, I think it would be useful to show a single sample's results before concatenating them all together. That dataframe should have soil horizion, biosample id, geo_loc, taxa, and count. Biologically, its not correct to add together counts between samples, so I think we need to revisit the last couple code chunks to make a bit more sense. I have a couple ideas for this that shouldn't be too painful (hopefully!). |
@kheal addressing your points above, I:
I also created a second plot faceted by locations in Colorado Thanks for the feedback :) |
In md cell 35 "Example of what the TSV contig taxa file looks like"; we decided the third column is not percent (otherwise it would add up to > 100%), so that text should be edited. How about something along the lines of "The first column is the identifier of a single contig, the second is the taxonomic placement of the contig, the third is a simple count". In py cell 36; I would rename the percent column to count. Also, we will need to calculate relative abundance per sample per taxa, and then calculate average relative abundance per horizon, as we discussed. |
Okay! I think we are good! Thanks again for all of your help!! |
@kheal I believe I got the nbviewer to work now: https://nbviewer.org/github/microbiomedata/notebook_hackathons/blob/soil-contig-tax/taxonomic_dist_by_soil_layer/python/taxonomic_dist_soil_layer.ipynb . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good, we should be good to merge now. Cheers!
closes #8! |
Added a python notebook to look out how the taxonomic distribution of contigs differ by soil layer (mineral vs. organic) in Colorado. This uses NMDC metadata to access and analyze metagenome data.