Reproducible Research: Peer Assessment 1

Getting set up

library(knitr)
library(ggplot2)
library(scales)
library(Hmisc)

## Loading required package: lattice

## Loading required package: survival

## Loading required package: Formula

## 
## Attaching package: 'Hmisc'

## The following objects are masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units

opts_chunk$set(echo = TRUE, results = 'hold')

Loading and preprocessing the data

# Assume that activity.zip is present.
if(!file.exists('activity.csv')){
    unzip('activity.zip')
}
data <- read.csv('activity.csv')

What is mean total number of steps taken per day?

Histogram of the total number of steps taken each day

Mean and median number of steps taken each day

dailySteps <- aggregate(steps ~ date, data, sum)
hist(dailySteps$steps, main = paste("Total Steps Each Day"), col="blue", xlab="Number of Steps")

dailyMean <- mean(dailySteps$steps)
dailyMedian <- median(dailySteps$steps)

The mean is 1.0766189\times 10^{4} and the median is 10765.

What is the average daily activity pattern?

Time series plot of the average number of steps taken

intervalSteps <- aggregate(steps ~ interval, data, mean)

plot(intervalSteps$interval,intervalSteps$steps, type="l", xlab="Interval", ylab="# of Steps",main="Average # of Steps per Day by Interval")

maximumInterval <- intervalSteps[which.max(intervalSteps$steps),1]

The 5-minute interval that, on average, contains the maximum number of steps

Interval with maximum number of steps: 835

Imputing missing values

Calculate and report the total number of missing values in the dataset (i.e. the total number of rows with 𝙽𝙰s)

numMissingValues <- sum(is.na(data$steps))

Number of missing values: 2304

Devise a strategy for filling in all of the missing values in the dataset

Create a new dataset that is equal to the original dataset but with the missing data filled in

# http://stackoverflow.com/questions/20273070/function-to-impute-missing-value
dataImpute <- data
dataImpute$steps <- impute(data$steps, fun=mean)

Make a histogram of the total number of steps taken each day and Calculate and report the mean and median total number of steps taken per day

imputedDailySteps <- tapply(dataImpute$steps, dataImpute$date, sum)
qplot(imputedDailySteps, xlab="Imputed Total Steps per Day", ylab="Frequency", binwidth=500)

Do these values differ from the estimates from the first part of the assignment?

No

imputedDailyStepsMean <- mean(imputedDailySteps)
imputedDailyStepsMedian <- median(imputedDailySteps)

Imputed Mean: 1.0766189\times 10^{4}
Imputed Median: 1.0766189\times 10^{4}
Mean: 1.0766189\times 10^{4}
Median is 10765.

What is the impact of imputing missing data on the estimates of the total daily number of steps?

No, the values do not differ from the estimates at the beginning of the assignment

Are there differences in activity patterns between weekdays and weekends?

Create a new factor variable in the dataset with two levels – “weekday” and “weekend” indicating whether a given date is a weekday or weekend day

dataImpute$dateType <-  ifelse(as.POSIXlt(dataImpute$date)$wday %in% c(0,6), 'weekend', 'weekday')

Make a panel plot containing a time series plot (i.e. 𝚝𝚢𝚙𝚎 = "𝚕") of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all weekday days or weekend days (y-axis).

averagedDataImputed <- aggregate(steps ~ interval + dateType, data=dataImpute, mean)
ggplot(averagedDataImputed, aes(interval, steps)) + 
    geom_line() + 
    facet_grid(dateType ~ .) +
    xlab("5-Min Interval") + 
    ylab("Average # of Steps")

Overall more activity on the weekends
Earlier peak activityon weekdays

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PA1_template.md

PA1_template.md

Reproducible Research: Peer Assessment 1

Getting set up

Loading and preprocessing the data

What is mean total number of steps taken per day?

Histogram of the total number of steps taken each day

Mean and median number of steps taken each day

What is the average daily activity pattern?

Time series plot of the average number of steps taken

The 5-minute interval that, on average, contains the maximum number of steps

Imputing missing values

Calculate and report the total number of missing values in the dataset (i.e. the total number of rows with 𝙽𝙰s)

Devise a strategy for filling in all of the missing values in the dataset

Create a new dataset that is equal to the original dataset but with the missing data filled in

Make a histogram of the total number of steps taken each day and Calculate and report the mean and median total number of steps taken per day

Do these values differ from the estimates from the first part of the assignment?

What is the impact of imputing missing data on the estimates of the total daily number of steps?

Are there differences in activity patterns between weekdays and weekends?

Create a new factor variable in the dataset with two levels – “weekday” and “weekend” indicating whether a given date is a weekday or weekend day

Make a panel plot containing a time series plot (i.e. 𝚝𝚢𝚙𝚎 = "𝚕") of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all weekday days or weekend days (y-axis).

Files

PA1_template.md

Latest commit

History

PA1_template.md

File metadata and controls

Reproducible Research: Peer Assessment 1

Getting set up

Loading and preprocessing the data

What is mean total number of steps taken per day?

Histogram of the total number of steps taken each day

Mean and median number of steps taken each day

What is the average daily activity pattern?

Time series plot of the average number of steps taken

The 5-minute interval that, on average, contains the maximum number of steps

Imputing missing values

Calculate and report the total number of missing values in the dataset (i.e. the total number of rows with 𝙽𝙰s)

Devise a strategy for filling in all of the missing values in the dataset

Create a new dataset that is equal to the original dataset but with the missing data filled in

Make a histogram of the total number of steps taken each day and Calculate and report the mean and median total number of steps taken per day

Do these values differ from the estimates from the first part of the assignment?

What is the impact of imputing missing data on the estimates of the total daily number of steps?

Are there differences in activity patterns between weekdays and weekends?

Create a new factor variable in the dataset with two levels – “weekday” and “weekend” indicating whether a given date is a weekday or weekend day

Make a panel plot containing a time series plot (i.e. 𝚝𝚢𝚙𝚎 = "𝚕") of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all weekday days or weekend days (y-axis).