source commit: 03ca5c2

datacarpentry · Jan 11, 2024 · 383241f · 383241f
commit 383241f
Show file tree

Hide file tree

Showing 55 changed files with 9,134 additions and 0 deletions.
diff --git a/.Rhistory b/.Rhistory
diff --git a/01-rstudio-intro.md b/01-rstudio-intro.md
diff --git a/02-project-intro.md b/02-project-intro.md
@@ -0,0 +1,243 @@
+---
+title: Project Management With RStudio
+teaching: 10
+exercises: 5
+source: Rmd
+---
+
+::::::::::::::::::::::::::::::::::::::: objectives
+
+- Create self-contained projects in RStudio
+
+::::::::::::::::::::::::::::::::::::::::::::::::::
+
+:::::::::::::::::::::::::::::::::::::::: questions
+
+- How can I manage my projects in R?
+
+::::::::::::::::::::::::::::::::::::::::::::::::::
+
+
+
+## Introduction
+
+The scientific process is naturally incremental, and many projects start life as
+random notes, some code, then a manuscript, and eventually everything is a bit
+mixed together. Organising a project involving spatial data is no different from
+any other data analysis project, although you may require more disk space than
+usual.
+
+<div class="text-center">
+
+<blockquote class="twitter-tweet"><p>Managing your projects in a reproducible fashion doesn't just make your science reproducible, it makes your life easier.</p>— Vince Buffalo (@vsbuffalo) <a href="https://twitter.com/vsbuffalo/status/323638476153167872">April 15, 2013</a></blockquote>
+
+<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
+
+</div>
+
+Most people tend to organize their projects like this:
+
+![](fig/bad_layout.png){alt='A screenshot of a project folder containing multiple versions of data, analysis scripts, figures, and results files'}
+
+There are many reasons why we should *ALWAYS* avoid this:
+
+1. It is really hard to tell which version of your data is
+  the original and which is the modified;
+2. It gets really messy because it mixes files with various
+  extensions together;
+3. It probably takes you a lot of time to actually find
+  things, and relate the correct figures to the exact code
+  that has been used to generate it;
+
+A good project layout will ultimately make your life easier:
+
+- It will help ensure the integrity of your data;
+- It makes it simpler to share your code with someone else
+  (a lab-mate, collaborator, or supervisor);
+- It allows you to easily upload your code with your manuscript submission;
+- It makes it easier to pick the project back up after a break.
+
+## A possible solution
+
+Fortunately, there are tools and packages which can help you manage your work effectively.
+
+One of the most powerful and useful aspects of RStudio is its project management
+functionality. We'll be using this today to create a self-contained, reproducible
+project.
+
+:::::::::::::::::::::::::::::::::::::::  instructor
+
+Make sure learners download the data files in Challenge 1 and move those files
+to their `data/` directory.
+
+When learners load an RStudio project, their R session's working directory should
+automatically be set to the same folder as the `.RProj` file. We'll be using relative
+paths throughout the lesson to refer to files, so it's important to make sure that
+learners have loaded the right project and are in the right directory! You may also
+want to introduce other ways to make file paths, such as the `here` package, after
+creating the project.
+
+:::::::::::::::::::::::::::::::::::::::  
+
+:::::::::::::::::::::::::::::::::::::::  challenge
+
+## Challenge: Creating a self-contained project
+
+We're going to create a new project in RStudio:
+
+1. Click the "File" menu button, then "New Project".
+2. Click "New Directory".
+3. Click "Empty Project".
+4. Type in "r-geospatial" as the name of the directory.
+5. Click the "Create Project" button.
+
+
+::::::::::::::::::::::::::::::::::::::::::::::::::
+
+A key advantage of an RStudio Project is that whenever we open this project in
+subsequent RStudio sessions our working directory will *always* be set to the
+folder `r-geospatial`.
+Let's check our working directory by entering the following into the R console:
+
+```r
+getwd()
+```
+
+R should return `your/path/r-geospatial` as the working directory.
+
+## Best practices for project organization
+
+Although there is no "best" way to lay out a project, there are some general
+principles to adhere to that will make project management easier:
+
+### Treat data as read only
+
+This is probably the most important goal of setting up a project. Data is
+typically time consuming and/or expensive to collect. Working with them
+interactively (e.g., in Excel) where they can be modified means you are never
+sure of where the data came from, or how it has been modified since collection.
+It is therefore a good idea to treat your data as "read-only".
+
+### Data Cleaning
+
+In many cases your data will be "dirty": it will need significant preprocessing
+to get into a format R (or any other programming language) will find useful. This
+task is sometimes called "data munging". I find it useful to store these scripts
+in a separate folder, and create a second "read-only" data folder to hold the
+"cleaned" data sets.
+
+### Treat generated output as disposable
+
+Anything generated by your scripts should be treated as disposable: it should
+all be able to be regenerated from your scripts.
+
+There are lots of different ways to manage this output. I find it useful to
+have an output folder with different sub-directories for each separate
+analysis. This makes it easier later, as many of my analyses are exploratory
+and don't end up being used in the final project, and some of the analyses
+get shared between projects.
+
+### Keep related data together
+
+Some GIS file formats are really 3-6 files that need to be kept together and have the same name,
+e.g. shapefiles. It may be tempting to store those components separately,
+but your spatial data will be unusable if you do that.
+
+### Keep a consistent naming scheme
+
+It is generally best to avoid renaming downloaded spatial data,
+so that a clear connection is maintained with the point of truth.
+You may otherwise find yourself wondering whether `file_A` really is just a copy of `Official_file_on_website` or not.
+
+For datasets you generate, it's worth taking the time to come up with a naming convention that works for your project,
+and sticking to it. File names don't have to be long, they just have to be long enough that you can tell what the file
+is about. Date generated, topic, and whether a product is intermediate or final are good bits of information to keep
+in a file name. For more tips on naming files, check out [the slides from Jenny Bryan's talk "Naming things" at the 2015 Reproducible Science Workshop](https://speakerdeck.com/jennybc/how-to-name-files).
+
+:::::::::::::::::::::::::::::::::::::::::  callout
+
+## Tip: Good Enough Practices for Scientific Computing
+
+[Good Enough Practices for Scientific Computing](https://github.com/swcarpentry/good-enough-practices-in-scientific-computing/blob/gh-pages/good-enough-practices-for-scientific-computing.pdf) gives the following recommendations for project organization:
+
+1. Put each project in its own directory, which is named after the project.
+2. Put text documents associated with the project in the `doc` directory.
+3. Put raw data and metadata in the `data` directory, and files generated during cleanup and analysis in a `results` directory.
+4. Put source for the project's scripts and programs in the `src` directory, and programs brought in from elsewhere or compiled locally in the `bin` directory.
+5. Name all files to reflect their content or function.
+
+::::::::::::::::::::::::::::::::::::::::::::::::::
+
+### Save the data in the data directory
+
+Now we have a good directory structure we will now place/save our data files in the `data/` directory.
+
+:::::::::::::::::::::::::::::::::::::::  challenge
+
+## Challenge 1
+
+1\. Download each of the data files listed below (<kbd>Ctrl</kbd>\+<kbd>S</kbd>, right mouse click -> "Save as", or File -> "Save page as")
+
+- [nordic country data](https://datacarpentry.org/r-intro-geospatial/data/nordic-data.csv)
+- [nordic country data (version 2)](https://datacarpentry.org/r-intro-geospatial/data/nordic-data-2.csv)
+- [gapminder data](https://datacarpentry.org/r-intro-geospatial/data/gapminder_data.csv)
+
+2\. Make sure the files have the following names:
+
+- `nordic-data.csv`
+- `nordic-data-2.csv`
+- `gapminder_data.csv`
+
+3\. Save the files in the `data/` folder within your project.
+
+We will load and inspect these data later.
+
+
+::::::::::::::::::::::::::::::::::::::::::::::::::
+
+:::::::::::::::::::::::::::::::::::::::  challenge
+
+## Challenge 2
+
+We also want to move the data that we downloaded from the [data page](https://datacarpentry.org/geospatial-workshop/data/) into a subdirectory
+inside `r-geospatial`. If you haven't already downloaded the data, you can do so by clicking
+[this download link](https://ndownloader.figshare.com/articles/2009586/versions/10).
+
+1. Move the downloaded zip file to the `data` directory.
+2. Once the data have been moved, unzip all files.
+
+
+::::::::::::::::::::::::::::::::::::::::::::::::::
+
+Once you have completed moving the data across to the new folder,
+your data directory should look as follows:
+
+```
+data/
+   gapminder_data.csv
+   NEON-DS-Airborne-Remote-Sensing/
+   NEON-DS-Landsat-NDVI/
+   NEON-DS-Met-Time-Series/
+   NEON-DS-Site-Layout-Files/
+   NEON-DS-Airborne-Remote-Sensing.zip
+   NEON-DS-Landsat-NDVI.zip
+   NEON-DS-Met-Time-Series.zip
+   NEON-DS-Site-Layout-Files.zip
+   nordic-data.csv
+   nordic-data-2.csv
+```
+
+### Stage your scripts
+
+Creating separate R scripts or Rmarkdown documents for different stages of a project will maximise efficiency.
+For instance, separating data download commands into their own file means that you won't re-download data unnecessarily.
+
+:::::::::::::::::::::::::::::::::::::::: keypoints
+
+- Use RStudio to create and manage projects with consistent layout.
+- Treat raw data as read-only.
+- Treat generated output as disposable.
+
+::::::::::::::::::::::::::::::::::::::::::::::::::
+
+