Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soil contig tax #24

Merged
merged 22 commits into from
Jan 3, 2024
Merged

Soil contig tax #24

merged 22 commits into from
Jan 3, 2024

Conversation

brynnz22
Copy link
Contributor

@brynnz22 brynnz22 commented Dec 20, 2023

Added a python notebook to look out how the taxonomic distribution of contigs differ by soil layer (mineral vs. organic) in Colorado. This uses NMDC metadata to access and analyze metagenome data.

@brynnz22 brynnz22 requested a review from kheal December 20, 2023 21:11
@kheal
Copy link
Collaborator

kheal commented Dec 20, 2023

@brynnz22 Overall, an impressive feat to pull together all these API calls and merges in a consumable way. Wow! The notebook rendered fine without running the calls or accessing the pkl files, so I think we are good on that front; though we should add to the readme for the google colab that rerunning this in an interactive environment is not recommended and will likely break for these reasons.

Do you use the 'taxonomic_dist_by_soil_layer/python/mongodb_query.txt.js' file? Maybe I'm missing it, but if not please remove from the branch.

In first Markdown cell, the word 'object' is misspelled.

Once we have the tsv urls, I think it would be useful to show a single sample's results before concatenating them all together. That dataframe should have soil horizion, biosample id, geo_loc, taxa, and count.

Biologically, its not correct to add together counts between samples, so I think we need to revisit the last couple code chunks to make a bit more sense. I have a couple ideas for this that shouldn't be too painful (hopefully!).

@brynnz22
Copy link
Contributor Author

brynnz22 commented Dec 22, 2023

@kheal addressing your points above, I:

  • edited the readme to explain that running in the interactive environment is not recommended

  • We should the mongodb_query.txt.js file because this helped inform the API request traversals. Also, this is helpful to inform the endpoint being created.

  • I fixed the mispelling of object

  • I printed a snippet of a TSV to show what it looks like

  • Finally, we discussed the last point and that the way I did it was correct.

I also created a second plot faceted by locations in Colorado

Thanks for the feedback :)

@kheal
Copy link
Collaborator

kheal commented Dec 22, 2023

In md cell 35 "Example of what the TSV contig taxa file looks like"; we decided the third column is not percent (otherwise it would add up to > 100%), so that text should be edited. How about something along the lines of "The first column is the identifier of a single contig, the second is the taxonomic placement of the contig, the third is a simple count". In py cell 36; I would rename the percent column to count. Also, we will need to calculate relative abundance per sample per taxa, and then calculate average relative abundance per horizon, as we discussed.

@brynnz22
Copy link
Contributor Author

brynnz22 commented Jan 2, 2024

Okay! I think we are good! Thanks again for all of your help!!

@brynnz22
Copy link
Contributor Author

brynnz22 commented Jan 3, 2024

Copy link
Collaborator

@kheal kheal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good, we should be good to merge now. Cheers!

@kheal
Copy link
Collaborator

kheal commented Jan 3, 2024

closes #8!

@kheal kheal merged commit 99adf59 into main Jan 3, 2024
@kheal kheal deleted the soil-contig-tax branch May 29, 2024 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants