microbiomedata · kheal · Nov 29, 2023 · Nov 29, 2023 · Nov 29, 2023 · Nov 29, 2023
diff --git a/.renvignore b/.renvignore
@@ -0,0 +1 @@
+*.ipynb
diff --git a/README.md b/README.md
@@ -4,6 +4,7 @@
 Notebooks that are ready for use and exploration.
 
 - [NEON Soil MetaData](https://github.com/microbiomedata/notebook_hackathons/tree/main/NEON_soil_metadata)
+- [Bio-Scales Biogeochemical MetaData](https://github.com/microbiomedata/notebook_hackathons/tree/main/bioscales_biogeochemical_metadata)
 
 
 ## Overview 
@@ -18,7 +19,7 @@ This repository includes jupyter notebooks that explore and analyze microbiome d
 
 Each folder's scope attempts to explore a scientific question using the NMDC's (meta)data. A folder includes a `README.md` that outlines the question or analysis posed as well as two sub-folders, one labeled `R`, and the other `python` that comprises the sample notebooks using the R and Python programming languages, respectively. 
 
-R and Python were chosen since they are popular languages among scientists to explore and visualize data. Jupyter Notebook is used because of its interactive code and data exploration features, effectiveness in teaching, language independency, and ease of sharing code.
+R and Python were chosen since they are popular languages among scientists to explore and visualize data. Jupyter Notebook paired with Google Colab is used because of its interactive code and data exploration features, effectiveness in teaching, language independency, and ease of sharing code.
 
 A challenging aspect that has been highlighted with this process is accessing the (meta)data in a user-friendly way via the NMDC API. Because the NMDC metadata schema is highly modular, retrieving metadata is not straight forward without extensive knowledge of the metadata schema's infrastructure, modeling language ([LinkML](https://linkml.io/)), and naming conventions. A proposed solution to this challenge is the creation of an R or Python package that would allow users to access NMDC's data in an easier and more straight forward way.
 

diff --git a/bioscales_biogeochemical_metadata/R/bioscales_metadata.Rmd b/bioscales_biogeochemical_metadata/R/bioscales_metadata.Rmd
@@ -1,17 +1,17 @@
 ---
 title: "Bio-Scales metadata exploration"
 output: rmarkdown::github_document
-date: "2023-11-21"
+date: "2023-11-29"
 ---
 
 ```{r setup, include=FALSE}
 # Load essential libraries
 knitr::opts_chunk$set(echo = TRUE)
-library(jsonlite)
-library(dplyr)
-library(tidyr)
-library(ggplot2)
-if (!require('GGally')) install.packages('GGally'); library(GGally)
+library(jsonlite, warn.conflicts=FALSE)
+library(dplyr, warn.conflicts=FALSE)
+library(tidyr, warn.conflicts=FALSE)
+library(ggplot2, warn.conflicts=FALSE)
+if (!require('GGally')) install.packages('GGally', quiet = TRUE); library(GGally, warn.conflicts=FALSE)
 ```
 
 
@@ -120,7 +120,10 @@ glimpse(df)
 ## Plot chemical data in a correlation matrix
 Create paired correlation matrix using GGally package's [ggpairs function](https://ggobi.github.io/ggally/articles/ggpairs.html)
 ```{r, message=FALSE, warning=FALSE}
-g <- ggpairs(df, 
+# Drop rows with NAs before plotting to avoid NA warning
+df_complete <- na.omit(df)
+
+g <- ggpairs(df_complete, 
         columns = c(3:7), 
         title = "Correlation Matrix of Chemicals in Bio-Scales Data",
         lower = list(continuous = wrap("points", alpha = 0.5, size = 0.7)),

diff --git a/bioscales_biogeochemical_metadata/R/bioscales_metadata.ipynb b/bioscales_biogeochemical_metadata/R/bioscales_metadata.ipynb
@@ -13,11 +13,11 @@
             "source": [
                 "# Load essential libraries\n",
                 "knitr::opts_chunk$set(echo = TRUE)\n",
-                "library(jsonlite)\n",
-                "library(dplyr)\n",
-                "library(tidyr)\n",
-                "library(ggplot2)\n",
-                "if (!require('GGally')) install.packages('GGally'); library(GGally)\n"
+                "library(jsonlite, warn.conflicts=FALSE)\n",
+                "library(dplyr, warn.conflicts=FALSE)\n",
+                "library(tidyr, warn.conflicts=FALSE)\n",
+                "library(ggplot2, warn.conflicts=FALSE)\n",
+                "if (!require('GGally')) install.packages('GGally', quiet = TRUE); library(GGally, warn.conflicts=FALSE)\n"
             ]
         },
         {
@@ -184,7 +184,10 @@
             "metadata": {},
             "outputs": [],
             "source": [
-                "g <- ggpairs(df, \n",
+                "# Drop rows with NAs before plotting to avoid NA warning\n",
+                "df_complete <- na.omit(df)\n",
+                "\n",
+                "g <- ggpairs(df_complete, \n",
                 "        columns = c(3:7), \n",
                 "        title = \"Correlation Matrix of Chemicals in Bio-Scales Data\",\n",
                 "        lower = list(continuous = wrap(\"points\", alpha = 0.5, size = 0.7)),\n",

diff --git a/bioscales_biogeochemical_metadata/R/bioscales_metadata.md b/bioscales_biogeochemical_metadata/R/bioscales_metadata.md
@@ -1,6 +1,6 @@
 Bio-Scales metadata exploration
 ================
-2023-11-21
+2023-11-29
 
 ## Get study IDs associated with Bio-Scales sites using API
 
@@ -198,7 +198,10 @@ Create paired correlation matrix using GGally package’s [ggpairs
 function](https://ggobi.github.io/ggally/articles/ggpairs.html)
 
 ``` r
-g <- ggpairs(df, 
+# Drop rows with NAs before plotting to avoid NA warning
+df_complete <- na.omit(df)
+
+g <- ggpairs(df_complete, 
         columns = c(3:7), 
         title = "Correlation Matrix of Chemicals in Bio-Scales Data",
         lower = list(continuous = wrap("points", alpha = 0.5, size = 0.7)),