-
-
Notifications
You must be signed in to change notification settings - Fork 72
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 383241f
Showing
55 changed files
with
9,134 additions
and
0 deletions.
There are no files selected for viewing
Empty file.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,243 @@ | ||
--- | ||
title: Project Management With RStudio | ||
teaching: 10 | ||
exercises: 5 | ||
source: Rmd | ||
--- | ||
|
||
::::::::::::::::::::::::::::::::::::::: objectives | ||
|
||
- Create self-contained projects in RStudio | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
:::::::::::::::::::::::::::::::::::::::: questions | ||
|
||
- How can I manage my projects in R? | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
|
||
|
||
## Introduction | ||
|
||
The scientific process is naturally incremental, and many projects start life as | ||
random notes, some code, then a manuscript, and eventually everything is a bit | ||
mixed together. Organising a project involving spatial data is no different from | ||
any other data analysis project, although you may require more disk space than | ||
usual. | ||
|
||
<div class="text-center"> | ||
|
||
<blockquote class="twitter-tweet"><p>Managing your projects in a reproducible fashion doesn't just make your science reproducible, it makes your life easier.</p>— Vince Buffalo (@vsbuffalo) <a href="https://twitter.com/vsbuffalo/status/323638476153167872">April 15, 2013</a></blockquote> | ||
|
||
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script> | ||
|
||
</div> | ||
|
||
Most people tend to organize their projects like this: | ||
|
||
![](fig/bad_layout.png){alt='A screenshot of a project folder containing multiple versions of data, analysis scripts, figures, and results files'} | ||
|
||
There are many reasons why we should *ALWAYS* avoid this: | ||
|
||
1. It is really hard to tell which version of your data is | ||
the original and which is the modified; | ||
2. It gets really messy because it mixes files with various | ||
extensions together; | ||
3. It probably takes you a lot of time to actually find | ||
things, and relate the correct figures to the exact code | ||
that has been used to generate it; | ||
|
||
A good project layout will ultimately make your life easier: | ||
|
||
- It will help ensure the integrity of your data; | ||
- It makes it simpler to share your code with someone else | ||
(a lab-mate, collaborator, or supervisor); | ||
- It allows you to easily upload your code with your manuscript submission; | ||
- It makes it easier to pick the project back up after a break. | ||
|
||
## A possible solution | ||
|
||
Fortunately, there are tools and packages which can help you manage your work effectively. | ||
|
||
One of the most powerful and useful aspects of RStudio is its project management | ||
functionality. We'll be using this today to create a self-contained, reproducible | ||
project. | ||
|
||
::::::::::::::::::::::::::::::::::::::: instructor | ||
|
||
Make sure learners download the data files in Challenge 1 and move those files | ||
to their `data/` directory. | ||
|
||
When learners load an RStudio project, their R session's working directory should | ||
automatically be set to the same folder as the `.RProj` file. We'll be using relative | ||
paths throughout the lesson to refer to files, so it's important to make sure that | ||
learners have loaded the right project and are in the right directory! You may also | ||
want to introduce other ways to make file paths, such as the `here` package, after | ||
creating the project. | ||
|
||
::::::::::::::::::::::::::::::::::::::: | ||
|
||
::::::::::::::::::::::::::::::::::::::: challenge | ||
|
||
## Challenge: Creating a self-contained project | ||
|
||
We're going to create a new project in RStudio: | ||
|
||
1. Click the "File" menu button, then "New Project". | ||
2. Click "New Directory". | ||
3. Click "Empty Project". | ||
4. Type in "r-geospatial" as the name of the directory. | ||
5. Click the "Create Project" button. | ||
|
||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
A key advantage of an RStudio Project is that whenever we open this project in | ||
subsequent RStudio sessions our working directory will *always* be set to the | ||
folder `r-geospatial`. | ||
Let's check our working directory by entering the following into the R console: | ||
|
||
```r | ||
getwd() | ||
``` | ||
|
||
R should return `your/path/r-geospatial` as the working directory. | ||
|
||
## Best practices for project organization | ||
|
||
Although there is no "best" way to lay out a project, there are some general | ||
principles to adhere to that will make project management easier: | ||
|
||
### Treat data as read only | ||
|
||
This is probably the most important goal of setting up a project. Data is | ||
typically time consuming and/or expensive to collect. Working with them | ||
interactively (e.g., in Excel) where they can be modified means you are never | ||
sure of where the data came from, or how it has been modified since collection. | ||
It is therefore a good idea to treat your data as "read-only". | ||
|
||
### Data Cleaning | ||
|
||
In many cases your data will be "dirty": it will need significant preprocessing | ||
to get into a format R (or any other programming language) will find useful. This | ||
task is sometimes called "data munging". I find it useful to store these scripts | ||
in a separate folder, and create a second "read-only" data folder to hold the | ||
"cleaned" data sets. | ||
|
||
### Treat generated output as disposable | ||
|
||
Anything generated by your scripts should be treated as disposable: it should | ||
all be able to be regenerated from your scripts. | ||
|
||
There are lots of different ways to manage this output. I find it useful to | ||
have an output folder with different sub-directories for each separate | ||
analysis. This makes it easier later, as many of my analyses are exploratory | ||
and don't end up being used in the final project, and some of the analyses | ||
get shared between projects. | ||
|
||
### Keep related data together | ||
|
||
Some GIS file formats are really 3-6 files that need to be kept together and have the same name, | ||
e.g. shapefiles. It may be tempting to store those components separately, | ||
but your spatial data will be unusable if you do that. | ||
|
||
### Keep a consistent naming scheme | ||
|
||
It is generally best to avoid renaming downloaded spatial data, | ||
so that a clear connection is maintained with the point of truth. | ||
You may otherwise find yourself wondering whether `file_A` really is just a copy of `Official_file_on_website` or not. | ||
|
||
For datasets you generate, it's worth taking the time to come up with a naming convention that works for your project, | ||
and sticking to it. File names don't have to be long, they just have to be long enough that you can tell what the file | ||
is about. Date generated, topic, and whether a product is intermediate or final are good bits of information to keep | ||
in a file name. For more tips on naming files, check out [the slides from Jenny Bryan's talk "Naming things" at the 2015 Reproducible Science Workshop](https://speakerdeck.com/jennybc/how-to-name-files). | ||
|
||
::::::::::::::::::::::::::::::::::::::::: callout | ||
|
||
## Tip: Good Enough Practices for Scientific Computing | ||
|
||
[Good Enough Practices for Scientific Computing](https://github.com/swcarpentry/good-enough-practices-in-scientific-computing/blob/gh-pages/good-enough-practices-for-scientific-computing.pdf) gives the following recommendations for project organization: | ||
|
||
1. Put each project in its own directory, which is named after the project. | ||
2. Put text documents associated with the project in the `doc` directory. | ||
3. Put raw data and metadata in the `data` directory, and files generated during cleanup and analysis in a `results` directory. | ||
4. Put source for the project's scripts and programs in the `src` directory, and programs brought in from elsewhere or compiled locally in the `bin` directory. | ||
5. Name all files to reflect their content or function. | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
### Save the data in the data directory | ||
|
||
Now we have a good directory structure we will now place/save our data files in the `data/` directory. | ||
|
||
::::::::::::::::::::::::::::::::::::::: challenge | ||
|
||
## Challenge 1 | ||
|
||
1\. Download each of the data files listed below (<kbd>Ctrl</kbd>\+<kbd>S</kbd>, right mouse click -> "Save as", or File -> "Save page as") | ||
|
||
- [nordic country data](https://datacarpentry.org/r-intro-geospatial/data/nordic-data.csv) | ||
- [nordic country data (version 2)](https://datacarpentry.org/r-intro-geospatial/data/nordic-data-2.csv) | ||
- [gapminder data](https://datacarpentry.org/r-intro-geospatial/data/gapminder_data.csv) | ||
|
||
2\. Make sure the files have the following names: | ||
|
||
- `nordic-data.csv` | ||
- `nordic-data-2.csv` | ||
- `gapminder_data.csv` | ||
|
||
3\. Save the files in the `data/` folder within your project. | ||
|
||
We will load and inspect these data later. | ||
|
||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
::::::::::::::::::::::::::::::::::::::: challenge | ||
|
||
## Challenge 2 | ||
|
||
We also want to move the data that we downloaded from the [data page](https://datacarpentry.org/geospatial-workshop/data/) into a subdirectory | ||
inside `r-geospatial`. If you haven't already downloaded the data, you can do so by clicking | ||
[this download link](https://ndownloader.figshare.com/articles/2009586/versions/10). | ||
|
||
1. Move the downloaded zip file to the `data` directory. | ||
2. Once the data have been moved, unzip all files. | ||
|
||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
Once you have completed moving the data across to the new folder, | ||
your data directory should look as follows: | ||
|
||
``` | ||
data/ | ||
gapminder_data.csv | ||
NEON-DS-Airborne-Remote-Sensing/ | ||
NEON-DS-Landsat-NDVI/ | ||
NEON-DS-Met-Time-Series/ | ||
NEON-DS-Site-Layout-Files/ | ||
NEON-DS-Airborne-Remote-Sensing.zip | ||
NEON-DS-Landsat-NDVI.zip | ||
NEON-DS-Met-Time-Series.zip | ||
NEON-DS-Site-Layout-Files.zip | ||
nordic-data.csv | ||
nordic-data-2.csv | ||
``` | ||
|
||
### Stage your scripts | ||
|
||
Creating separate R scripts or Rmarkdown documents for different stages of a project will maximise efficiency. | ||
For instance, separating data download commands into their own file means that you won't re-download data unnecessarily. | ||
|
||
:::::::::::::::::::::::::::::::::::::::: keypoints | ||
|
||
- Use RStudio to create and manage projects with consistent layout. | ||
- Treat raw data as read-only. | ||
- Treat generated output as disposable. | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
|
Oops, something went wrong.