Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move folders #1

Merged
merged 6 commits into from
Oct 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
.Rhistory
.RData
.Ruserdata
.ipynb_checkpoints
Original file line number Diff line number Diff line change
Expand Up @@ -82,37 +82,77 @@ for (i in 1:length(study_ids)){

```{r}
library(lubridate)
my_theme <- theme_bw()
df <- dat_all %>%
select(collection_date, ph, geo_loc_name, lat_lon) %>%
unnest(cols = c(collection_date, geo_loc_name,
lat_lon), names_sep = "_") %>%
select(
collection_date, ph, geo_loc_name, lat_lon
) %>%
unnest(
cols = c(
collection_date,
geo_loc_name,
lat_lon
), names_sep = "_") %>%
rename(collection_date = collection_date_has_raw_value ,
geo_loc = geo_loc_name_has_raw_value)

df2 <- df %>%
mutate(collection_date2 = as.Date(collection_date))

df3 <- df2 %>%
mutate(geo_loc_grouped = geo_loc %>%
factor() %>%
fct_lump(n = 6)
) %>%
filter(geo_loc_grouped != "Other")


g <- ggplot(data = df2) +
geom_point(aes(x=collection_date, y = ph)) +
my_theme +
facet_wrap(facets = vars(geo_loc))

g <- ggplot(data = df3) +
geom_point(aes(x=collection_date2, y = ph)) +
my_theme +
scale_x_date()+
facet_wrap(facets = vars(geo_loc_grouped),
labeller = label_wrap_gen(width=30))
g

```


```{r}
library("rnaturalearth")
library("rnaturalearthdata")

locs_with_ph <- df2 %>%
group_by(
geo_loc
) %>%
mutate(
count_with_ph = n()
) %>%
select(
geo_loc,
lat_lon_longitude,
lat_lon_latitude,
count_with_ph
) %>%
distinct()

world <- ne_countries(scale = "medium", returnclass = "sf")
class(world)
g2 <- ggplot(data = world) +
geom_sf() +
geom_point(
data = locs_with_ph,
aes(x = lat_lon_longitude, y = lat_lon_latitude, color = geo_loc)) +
aes(x = lat_lon_longitude, y = lat_lon_latitude,
size = count_with_ph)) +
my_theme +
labs(x = "Longitude", y = "Latitude")+
theme(legend.position = "none")+
labs(x = "Longitude", y = "Latitude", size = "Samples with \n pH measurements")+
theme()+
coord_sf(xlim = c(-165, -66), ylim = c(17, 72), expand = FALSE)
g2

Expand Down
File renamed without changes.
3 changes: 3 additions & 0 deletions NEON_ph_by_time/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Interactive application exploring NEON sites of pH vs. time

This folder includes two notebooks, in R and Python, that looks at how soil pH changes over time for the various NEON sites.
File renamed without changes.
29 changes: 29 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# NNMC Data and Metadata R and Python Sample Jupyter Notebooks

## Overview

This repository includes jupyter notebooks that explore and analyze microbiome data from the National Microbiome Data Collaborative's (NMDC) data portal. These notebooks aim to:

- highlight the NMDC's metadata and data
- demonstrate how the NMDC's API may be used to retrieve metadata and data of various microbiome research
- illustrate example use cases of using the NMDC's (meta)data to answer scientific questions
- encourage scientists to programmaticly access the NMDC Data Portal
- promote the accessiblity of microbiome research by demonstrating various modes of finding, accessing, and reusing existing microbiome data.

Each folder's scope attempts to explore a scientific question using the NMDC's (meta)data. A folder includes a `README.md` that outlines the question or analysis posed as well as two sub-folders, one labeled `R`, and the other `python` that comprises the sample notebooks using the R and Python programming languages, respectively.

R and Python were chosen since they are popular languages among scientists to explore and visualize data. Jupyter Notebook is used because of its interactive code and data exploration features, effectiveness in teaching, language independency, and ease of sharing code.

A challenging aspect that has been highlighted with this process is accessing the (meta)data in a user-friendly way via the NMDC API. Because the NMDC metadata schema is highly modular, retrieving metadata is not straight forward without extensive knowledge of the metadata schema's infrastructure, modeling language ([LinkML](https://linkml.io/)), and naming conventions. A proposed solution to this challenge is the creation of an R or Python package that would allow users to access NMDC's data in an easier and more straight forward way.

## Adding new notebooks

To add a new notebook to this repository:

1. Create a folder in the base directory
- Name the folder with a short version of the analysis/question that will be explored.
- Make name of folder `snake_case`
2. Create a `README.md` in the folder outlining the analysis or question.
3. Create a sub-folder for each language that will be demonstrated
- e.g. one subfolder named `R` and one subfolder named `python`
4. Instantiate a Jupyter Notebook for each folder coded in its corresponding language