Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare and release Bio-Scales notebooks for exploration #22

Merged
merged 4 commits into from
Nov 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .renvignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.ipynb
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
Notebooks that are ready for use and exploration.

- [NEON Soil MetaData](https://github.com/microbiomedata/notebook_hackathons/tree/main/NEON_soil_metadata)
- [Bio-Scales Biogeochemical MetaData](https://github.com/microbiomedata/notebook_hackathons/tree/main/bioscales_biogeochemical_metadata)


## Overview
Expand All @@ -18,7 +19,7 @@ This repository includes jupyter notebooks that explore and analyze microbiome d

Each folder's scope attempts to explore a scientific question using the NMDC's (meta)data. A folder includes a `README.md` that outlines the question or analysis posed as well as two sub-folders, one labeled `R`, and the other `python` that comprises the sample notebooks using the R and Python programming languages, respectively.

R and Python were chosen since they are popular languages among scientists to explore and visualize data. Jupyter Notebook is used because of its interactive code and data exploration features, effectiveness in teaching, language independency, and ease of sharing code.
R and Python were chosen since they are popular languages among scientists to explore and visualize data. Jupyter Notebook paired with Google Colab is used because of its interactive code and data exploration features, effectiveness in teaching, language independency, and ease of sharing code.

A challenging aspect that has been highlighted with this process is accessing the (meta)data in a user-friendly way via the NMDC API. Because the NMDC metadata schema is highly modular, retrieving metadata is not straight forward without extensive knowledge of the metadata schema's infrastructure, modeling language ([LinkML](https://linkml.io/)), and naming conventions. A proposed solution to this challenge is the creation of an R or Python package that would allow users to access NMDC's data in an easier and more straight forward way.

Expand Down
17 changes: 10 additions & 7 deletions bioscales_biogeochemical_metadata/R/bioscales_metadata.Rmd
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
---
title: "Bio-Scales metadata exploration"
output: rmarkdown::github_document
date: "2023-11-21"
date: "2023-11-29"
---

```{r setup, include=FALSE}
# Load essential libraries
knitr::opts_chunk$set(echo = TRUE)
library(jsonlite)
library(dplyr)
library(tidyr)
library(ggplot2)
if (!require('GGally')) install.packages('GGally'); library(GGally)
library(jsonlite, warn.conflicts=FALSE)
library(dplyr, warn.conflicts=FALSE)
library(tidyr, warn.conflicts=FALSE)
library(ggplot2, warn.conflicts=FALSE)
if (!require('GGally')) install.packages('GGally', quiet = TRUE); library(GGally, warn.conflicts=FALSE)
```


Expand Down Expand Up @@ -120,7 +120,10 @@ glimpse(df)
## Plot chemical data in a correlation matrix
Create paired correlation matrix using GGally package's [ggpairs function](https://ggobi.github.io/ggally/articles/ggpairs.html)
```{r, message=FALSE, warning=FALSE}
g <- ggpairs(df,
# Drop rows with NAs before plotting to avoid NA warning
df_complete <- na.omit(df)

g <- ggpairs(df_complete,
columns = c(3:7),
title = "Correlation Matrix of Chemicals in Bio-Scales Data",
lower = list(continuous = wrap("points", alpha = 0.5, size = 0.7)),
Expand Down
15 changes: 9 additions & 6 deletions bioscales_biogeochemical_metadata/R/bioscales_metadata.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,11 @@
"source": [
"# Load essential libraries\n",
"knitr::opts_chunk$set(echo = TRUE)\n",
"library(jsonlite)\n",
"library(dplyr)\n",
"library(tidyr)\n",
"library(ggplot2)\n",
"if (!require('GGally')) install.packages('GGally'); library(GGally)\n"
"library(jsonlite, warn.conflicts=FALSE)\n",
"library(dplyr, warn.conflicts=FALSE)\n",
"library(tidyr, warn.conflicts=FALSE)\n",
"library(ggplot2, warn.conflicts=FALSE)\n",
"if (!require('GGally')) install.packages('GGally', quiet = TRUE); library(GGally, warn.conflicts=FALSE)\n"
]
},
{
Expand Down Expand Up @@ -184,7 +184,10 @@
"metadata": {},
"outputs": [],
"source": [
"g <- ggpairs(df, \n",
"# Drop rows with NAs before plotting to avoid NA warning\n",
"df_complete <- na.omit(df)\n",
"\n",
"g <- ggpairs(df_complete, \n",
" columns = c(3:7), \n",
" title = \"Correlation Matrix of Chemicals in Bio-Scales Data\",\n",
" lower = list(continuous = wrap(\"points\", alpha = 0.5, size = 0.7)),\n",
Expand Down
7 changes: 5 additions & 2 deletions bioscales_biogeochemical_metadata/R/bioscales_metadata.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Bio-Scales metadata exploration
================
2023-11-21
2023-11-29

## Get study IDs associated with Bio-Scales sites using API

Expand Down Expand Up @@ -198,7 +198,10 @@ Create paired correlation matrix using GGally package’s [ggpairs
function](https://ggobi.github.io/ggally/articles/ggpairs.html)

``` r
g <- ggpairs(df,
# Drop rows with NAs before plotting to avoid NA warning
df_complete <- na.omit(df)

g <- ggpairs(df_complete,
columns = c(3:7),
title = "Correlation Matrix of Chemicals in Bio-Scales Data",
lower = list(continuous = wrap("points", alpha = 0.5, size = 0.7)),
Expand Down
Loading