This is a tutorial on the very basics of R and RStudio. The tutorial is meant to use a cloud version of Rstudio with Posit Cloud but can also be used with local installation. Most of the education material was pulled directly from Carpentries lessons on R, including:
https://swcarpentry.github.io/r-novice-inflammation/ and https://swcarpentry.github.io/r-novice-gapminder/01-rstudio-intro.html
R is a scripting language originally developed at the University of Aukland, NZ for statistically data analysis. R software is open-source and free software and uses GNU General Public License. R runs natively in a command line interface. R software is available for Windows, Mac, and Linux operating systems. The base source code is maintained by the R Core Team, but additional software packages are developed by thousands of independent people around the world.
Scripting languages like R are very useful for data exploration and analysis because they offer limitless customization for instructing a computer to do something. They can, however, take more time to learn compared with purely graphical user interface (GUI) programs (think Microsoft Word).
Rstudio is an integrated development environment (IDE) built specifically for using the R language. It was developed by Posit PBC and uses the license GNU Affero General Public License version 3.
RStudio makes is easier and more ituitive to use the R language by providing point-and-click elements, a script editor, console, package management features, help documentation, and tools to visualize data graphs.
Rstudio can be ran on your local computer, on a server, or in the cloud (i.e., on a Posit Computer).
For learning R/Rstudio today, we will all use a cloud version of Rstudio hosted on Posit Cloud. Cloud instances of software are great for educational environments because they require minimal setup and avoid the pitfalls of local installation. The drawback of such an approach is that educational cloud instances often have low computing resources and requesting more resources costs $$. We will interact with cloud Rstudio using a web browser (e.g, Google Chrome).
Sign up for a free account with Posit Cloud to launch a cloud instance of RStudio https://posit.cloud/plans. It is probably easiest to sign-up using your University of Arizona Google account.
When you first open RStudio, you will be greeted by three panels:
-
The interactive R console/Terminal (entire left)
- The console is for typing R commands (one at a time)
- The terminal is the Linux Command Line. This is where you give instructions to your underlying computer.
-
Environment/History/Connections (tabbed in upper right)
- The environment pane will show all of your variables and objects
-
Files/Plots/Packages/Help/Viewer (tabbed in lower right)
- Files tab gives you access to the directories and files in the underlying computer
- Plots tab will show graphical representations of data
- Packages tab shows all of the software
- Help tab shows documentation on all of the software
File >>> New File >>> Rscript
Once you open files, such as R scripts, an editor panel will also open in the top left.
An R script (.R) is a file that holds multiple R commands that can be run automatically in sequence.
'#' makes the line a comment and disables it as code
With the curser on the code line of interest, you can:
or type
CTRL
+ Return
on Windows or Linux
⌘
+ Return
on MacOS
Often times, there are R software packages we want to use but are not currently installed. We can easily find, download, and install R packages from The Comprehensive R Archive Network (R CRAN). R CRAN is a online repository for approved R packages.
You can install packages using the Rstudio GUI (Tools >>> Install Packages)
Or you can install packages with commands
install.packages("lidR")
library("spatial")
We are studying inflammation in patients who have been given a new treatment for arthritis, and need to analyze the first dozen data sets of their daily inflammation. The data sets are stored in CSV format (comma-separated values): each row holds information for a single patient, and the columns represent successive days.
In the terminal:
Download r-novice-inflammation-data.zip
curl -o r-novice-inflammation-data.zip https://swcarpentry.github.io/r-novice-inflammation/data/r-novice-inflammation-data.zip
Unzip the file
unzip r-novice-inflammation-data.zip
Once Unzipped, go into the data directory
cd data
List the files in the directory
ls
In the R script, type:
read.csv(file = "data/inflammation-01.csv", header = FALSE)
Assign the data as a variable
inflam = read.csv(file = "data/inflammation-01.csv", header = FALSE)
Now that our data are loaded into R, we can start doing things with them. First, let’s ask what type of thing inflam1 is:
class(inflam1)
ouput = "data.frame"
dim(inflam1)
output = "60 40"
Get value of row 1 and column 1
inflam1[1,1]
Get value of row 30 and column 20
inflam1[30,20]
Get values from rows 1, 3, 5, and columns 10, 20
inflam1[c(1,3,5), c(10,20)]
Get values for rows 1-4, and columns 1-10
inflam1[1:4, 1:10]
Get values for row 5 and all columns
inflam1[5, ]
Get values from all rows, but only for columns 16-18
inflam1[, 16:18]
Assign the values to a new variable (ie, subset data)
subset = inflam1[, 16:18]
Addressing Columns by Name
column_16 = inflam$V16
Calculate the Median value of column 16
median(inflam1$V16)
Calculate the Mean value of column 16
mean(inflam1[, 16])
Calculate the Mean value of row 3
mean(inflam1[3, ])
Note that R may return an error when you attempt to perform similar calculations on subsetted rows of data frames. This is because some functions in R automatically convert the object type to a numeric vector, while others do not (e.g. max(dat[1, ]) works as expected, while mean(dat[1, ]) returns NA and a warning). You get the expected output by including an explicit call to as.numeric(), e.g. mean(as.numeric(dat[1, ])). By contrast, calculations on subsetted columns always work as expected, since columns of data frames are already defined as vectors.
For row statistics try:
mean(as.numeric(inflam1[3, ]))
Use summary
to get basic descriptive stats
summary(inflam1$V16)
Get the mean inflammation value for each patient. The '1' argument specifies rows.
mean_patient_inflammation = apply(inflam1, 1, mean)
Get the mean inflammation value for each day. The '2' argument specifies columns.
mean_day_inflammation = apply(inflam1, 2, mean)
The apply function can be used to summarize datasets and subsets of data across rows and columns using the MARGIN argument. Suppose you want to calculate the mean inflammation for specific days and patients in the patient dataset (i.e. 60 patients across 40 days).
Please use a combination of the apply function and indexing to:
-
calculate the mean inflammation for patients 1 to 5 over the whole 40 days
-
calculate the mean inflammation for days 1 to 10 (across all patients)
-
calculate the mean inflammation for every second day (across all patients)
plot(mean_day_inflammation, xlab = "Day")
barplot(mean_day_inflammation, main="Mean Inflammation Over Time", xlab="Day", ylab="Mean Inflammation", col = "blue")
Write subsetted data to a new CSV file in the current working directory
write.csv(mean_day_inflammation, file = 'mean_day_inflammation.csv')
R Shiny is an R package that makes it easy to create web applications to help colleagues and the public interact with your scientific data.
Check out some examples here and here
This content showed you the very basics of getting started with R and Rstudio using Posit Cloud. There is of course much more to learn about R, but proficiency will come with repeated use and a compelling reason for using it. In my opinion, the best way to learn R (or any other technology) is to have a specific goal in mind. If you know what you want to accomplish, then you have the motivation to figure it out step by step. Please don't think you need to have a vast general knowledge of R before you can accomplish something great.
Tips for Learning R
- Have a specific reason to use R and be curious about figuring out the steps to accomplish it
- Use LLMs (chatGPT, Claude, Gemini) as your co-pilot to help you write code
- Join the Posit Community Forum
- There are tons of free resources online to learn R. This content was pulled directly from lessons from The Carpentries