From fa2c4e575b4f44a00092646eafaa7d3067c414d3 Mon Sep 17 00:00:00 2001 From: KarenZhuqianZhou Date: Tue, 27 Sep 2016 17:55:50 -0400 Subject: [PATCH] Zhuqian's assignment 3 --- .gitignore | 4 + Class_7_Instructions.html | 250 ++++++++++++++++++++++++++++++++++++++ Zhuqian Zhou-Plot.png | Bin 0 -> 3444 bytes class7.Rproj | 13 ++ 4 files changed, 267 insertions(+) create mode 100644 .gitignore create mode 100644 Class_7_Instructions.html create mode 100644 Zhuqian Zhou-Plot.png create mode 100644 class7.Rproj diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..5b6a065 --- /dev/null +++ b/.gitignore @@ -0,0 +1,4 @@ +.Rproj.user +.Rhistory +.RData +.Ruserdata diff --git a/Class_7_Instructions.html b/Class_7_Instructions.html new file mode 100644 index 0000000..9d5a255 --- /dev/null +++ b/Class_7_Instructions.html @@ -0,0 +1,250 @@ + + + + + + + + + + + + + + + +Assignment 3 + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + +
+

In this assignment you will be practising data tidying. You will be using the data we have collected from class and data generated from the instructor wearing a wristband activity tracker.

+
+
+

First, you need to import into R a data set containing information about Charles’ activity for the last three weeks. You can find this data set within the Assignment 3 repository you cloned to create this project.

+
+
+

Install packages for manipulating data

+

We will use two packages: tidyr and dplyr

+
#Insall packages
+install.packages("tidyr")
+install.packages("dplyr")
+#Load packages
+library(tidyr, dplyr)
+
+
+

Upload wide format instructor data (instructor_activity_wide.csv)

+
data_wide <- read.table("~/Career/TC/Courses/Core Methods in EDM/class7/instructor_activity_wide.csv", sep = ",", header = TRUE)
+
+#Now view the data you have uploaded and notice how its structure: each variable is a date and each row is a type of measure.
+View(data_wide)
+
+#R doesn't like having variable names that consist only of numbers so, as you can see, every variable starts with the letter "X". The numbers represent dates in the format year-month-day.
+
+
+

This is not a convenient format for us to analyze. What we need is for each type of measure to be a column. Your fisrt task is to convert wide format to long format data. To do this we will use the “gather” function: gather(data, time, variables)

+

The gather command requires the following input arguments:

+
    +
  • data: Data object
  • +
  • key: Name of new key column (made from names of data columns)
  • +
  • value: Name of new value column
  • +
  • …: Names of source columns that contain values
  • +
+
data_long <- gather(data_wide, date, variables)
+#Rename the variables so we don't get confused about what is what!
+names(data_long) <- c("variables", "date", "measure")
+#Take a look at your new data, looks weird huh?
+View(data_long)
+
+
+

Now convert this long format into separate columns using the “spread” function to separate by the type of measure

+

The spread function requires the following input:

+
    +
  • data: Data object
  • +
  • key: Name of column containing the new column names
  • +
  • value: Name of column containing values
  • +
+
instructor_data <- spread(data_long, variables, measure)
+
+
+

Now we have a workable instructor data set!The next step is to create a workable student data set. Upload the data “student_activity.csv”. View your file once you have uploaded it and then draw on a piece of paper the structure that you want before you attempt to code it. Write the code you use in the chunk below. (Hint: you can do it in one step)

+
student_activity <- read.table("~/Career/TC/Courses/Core Methods in EDM/class7/student_activity.csv", sep = ",", header = TRUE)
+student_data <- spread(student_activity, variable, measure)
+
+
+

Now that you have workable student data set, subset it to create a data set that only includes data from the second class.

+

To do this we will use the dplyr package (We will need to call dplyr in the command by writing dplyr:: because dplyr uses commands that exist in other packages but to do different operations.)

+

Notice that the way we subset is with a logical rule, in this case date == 20160204. In R, when we want to say that something “equals” something else we need to use a double equals sign “==”. (A single equals sign means the same as <-).

+
student_data_2 <- dplyr::filter(student_data, date == 20160204)
+

Now subset the student_activity data frame to create a data frame that only includes students who have sat at table 4. Write your code in the following chunk:

+
student_data_3 <- dplyr::filter(student_data, table == 4)
+
+
+

Make a new variable

+

It is useful to be able to make new variables for analysis. We can either apend a new variable to our dataframe or we can replace some variables with a new variable. Below we will use the “mutate” function to create a new variable “total_sleep” from the light and deep sleep variables in the instructor data.

+
instructor_data <- dplyr::mutate(instructor_data, total_sleep = s_deep + s_light)
+

Now, refering to the cheat sheet, create a data frame called “instructor_sleep” that contains ONLY the total_sleep variable. Write your code in the following code chunk:

+
instructor_sleep <- dplyr::select(instructor_data, total_sleep)
+

Now, we can combine several commands together to create a new variable that contains a grouping. The following code creates a weekly grouping variable called “week” in the instructor data set:

+
instructor_data <- dplyr::mutate(instructor_data, week = dplyr::ntile(date, 3))
+

Create the same variables for the student data frame, write your code in the code chunk below:

+
student_data <- dplyr::mutate(student_data, week=dplyr::ntile(date, 3))
+
+
+

Sumaraizing

+

Next we will summarize the student data. First we can simply take an average of one of our student variables such as motivation:

+
student_data %>% dplyr::summarise(mean(motivation))
+
+#That isn't super interesting, so let's break it down by week:
+
+student_data %>% dplyr::group_by(date) %>% dplyr::summarise(mean(motivation))
+

Create two new data sets using this method. One that sumarizes average motivation for students for each week (student_week) and another than sumarizes “m_active_time” for the instructor per week (instructor_week). Write your code in the following chunk:

+
student_week <- student_data %>% dplyr::group_by(week) %>% dplyr::summarise(mean(motivation))
+instructor_week <- instructor_data %>% dplyr::group_by(week) %>% dplyr::summarise(mean(m_active_time))
+
+
+

Merging

+

Now we will merge these two data frames using dplyr.

+
merge <- dplyr::full_join(instructor_week, student_week, "week")
+
+
+

Visualize

+

Visualize the relationship between these two variables (mean motivation and mean instructor activity) with the “plot” command and then run a Pearson correlation test (hint: cor.test()). Write the code for the these commands below:

+
names(merge)<-c("week", "avg_student", "avg_instructor")
+plot(merge$avg_student, merge$avg_instructor)
+cor.test(merge$avg_student, merge$avg_instructor)
+

Fnally save your markdown document and your plot to this folder and comit, push and pull your repo to submit.

+
+ + + + +
+ + + + + + + + diff --git a/Zhuqian Zhou-Plot.png b/Zhuqian Zhou-Plot.png new file mode 100644 index 0000000000000000000000000000000000000000..b04e028c8b5ad30afae5533a6e3d03798ab8c21a GIT binary patch literal 3444 zcmdTHc{tQtd%CieB~eU>(v`B6^)fMT)?`aD#xiDXa}j2SJ9a8zkX@Fn7op79vQ1L06cG)SsMZn1cHD-Xb>a?gl+^d2nYfJp&@8AB!!Ot@dLon zz}MJ_Ze*oEKq(M_2BoB+Q|PQlG>FZnr!=w}8(A!R3WQE)v*|1r@DHE?0I&gy+pon6 z*y;%Onw>ugB+$hkhk7wj+(00{S`&k-)9 z`MzU9kw|yZqf$Es!+C1iY3UCnGfVw&Af5P0S&#tA<~T_%(#Ei+f(Lf8vqBnfbC?+a z|HpjQ599s(a~2KX)p_LaCZw*T@RUziqcEt3Z<8OD4uunYWU~T~{IIPUlM#VWXP~Roagw9~O{YiyiBj@~ zAvO9KRptDY3faxdHt1yHX;TFCbr`V~F)zO7^V%F6>d|X%+Rw=mBSNmW(m5O38p78# z(-x9wW2t~|=o3~kT-)&L1l0+~YuYL5H|_2U%KeU%3X@pdTlL+bxNN%%YezZN6>4rM z&SE|LJ(k4R47MmL_Sg0QsAv+?5Y=s&3*YZwxUwepl@if9vAF!!asR#k{1-bOQdKfD z1V8f=p*iPNR`c#ibz5QV?u9$jOa_CzMNGJ-I;+a%RO@)TLm^5!X6RjVbEZX(0MvRi zZs6_*ufpn#0ZtD8%1lhj*btpWe8|hSg5ORD%t**QQ43eZ6mYK*svVWUm*I-TkKaXv z(nGxiU0YdG5SR|`5jXS&G=V>*meCg(JaZOTeT=Cd?DQPf@%E=t)s9QKok{p#G|dxi zklEa@D0y*qYA)Y9sXR*aZ1T>`0(MfwYWb}`y9r46;_cjfDr5tQlgL)b*D%7uy-@{G zzOmj84qjt6hHb5gHRZae4k)X1>hK7_333;NR{~Zg6iEZBesOAPCj-<(xMAMNS~((V zPJ$XeIKiy2nQHL{6AQ=*nMvp6Ub$v-+^1#xl#AtYLvx;-S@aO_zP*FgXwna)J|#Y? zrGhE6Hho(S#5DPI8 zSomzL3QfZ^C z>|37BG+|L9U|WJELT1s!DbF6|mxS&iA_2kzta$OS`gis#`(M2PY+PO|?UXZvjCvvK0n6zKwznWY(Wa=m zc4@gSS@`6#H@*RgMiDYm{!D*Mkw_2~pyvC_UL^9bbVND7KKjGjrI2l8ZmMA^V3?0c zu~>mDP|bCfnYFme7}2%B%?_pukjsulo4nCv6PI4}S)nT%dT!1EZKd>p?FcP3-l5P( z*Uyg+qRMC`#t;E*8tx^R1Sv#C_ZR!z($GJussLwa)2lN_E%6+ax~Is#&rzrYFgd~u zu5Kmw$j!e)4g(ZHENeut*ug8`dZxs$da@%xSvn6(Vo4Tb@05|eW}o@7?tEsFU8cX8 zw8?CMo#Zm*#Q^|mw2S2l=DQ?HS1^duj>#HGi^u>#e6UEQc8nA%$&xyr9()yQ4NX@0*}Tk$JyYkQ0K80q+38+Dr9VPuMs8BxkL(0}yVG^$)sT3t^cf? zq`R2)3^J6$BKLfPq*}0)FVozY$|ynMJjUe&%wW<`zEK#$fcTfdmtTrUzAK-fMK?qg z+{QDl+b>{_EnYg`8#F$-!n1Ic?N%ihCr)YIQPwgt z{}x(sdREf_zd4P;mg=um^?sbe0% z+;4WRUI#yA!CN7_@>31>OM_u8f;N}E>L)1RAj{$D>mE40<=fvl^6wSW=A^ey@!HQQ zQHrXb#+0pZsd+16MS-=F&?sA!r+Y<_z-xWp4$Ey)i|)2vsGSr`U~0fBq? z6@1G%0QtY)<2HOqYj>cF(CES07Pw+^0NuvuJ(y70v3d zY}@nMPno4mBPNAHwpOPbXwTkQjJWAE2d@c)KM)_zJh@}F(AKbfwPRQJJq>R{Z^*rL z^{T{?Qk{c9pN&|Rhy`j>%g>Q}QpwwcvxR!!aDg78P39U{?`>_Ag8*z)Cj8xMR`bAZ z-FbVntj)53XX_idRTp82V+qFfgaF(Q-F~^2wt08evZh(797aE^h?gBR@n{dfHcE8A zE`HOIb@Gn9T2IlTsyC#OCG&MlQUP~&#WAV(G=0_jVB3bA!u>GcTa5RJ#qDZ1VzAPv zzqxniDbYs@j>s1#ejwyJsU1m3qNsEl)l4*!FD5TX?5)3_)*uvcVZs_(7lVH#?P6UT zzTRw<+UrYcI0yi9u%<@hp!&yhg(OP<93BUgsTsnFz`atN!1cYY??SLpO#=k;l94W0 zn^SdexJHAPk3Am3Y=$B5l9xJa`Y3kIwk3I&;b!B)+xG-@6trMB)Rl#H@ zruCEzmqgA7q&i_n<_`~bn<-D9*#!M|p?KJ>bM|K96g5Jb=SvOd#HjH*Micjc^HMlC Y)-_)2HCQM}VZWYC3@r?*^iYw118x