title | output | ||||
---|---|---|---|---|---|
Reproducible Research: Peer Assessment 1 |
|
## Import the necessary libraries
library(data.table)
library(ggplot2)
library(plyr)
## Read the CSV file
data <- read.csv(file = "activity.csv")
stepsData <- as.data.table(x = data)
totalSteps <- aggregate(formula = steps ~ date, FUN = sum, data = stepsData)
A portion of the totalSteps
is as follows
## date steps
## 1 2012-10-02 126
## 2 2012-10-03 11352
## 3 2012-10-04 12116
## 4 2012-10-05 13294
## 5 2012-10-06 15420
## 6 2012-10-07 11015
qplot(x = steps, data = totalSteps, geom = "histogram", main = "Total Steps Per Day")
stepsMean <- mean(x = totalSteps$steps, na.rm = TRUE)
stepsMedian <- median(x = totalSteps$steps, na.rm = TRUE)
The mean of the total number of steps taken per day is 10766.19 & the median of the total number of steps taken per day is 10765.
1. Make a time series plot of the 5-minute interval and the average number of steps taken, averaged across all days
## Calculate the average number of steps taken in the 5-minute interval
averageSteps <- ddply(.data = stepsData, .variables = "interval", .fun = summarise, steps = mean(steps, na.rm = TRUE))
## Create the plot
qplot(x = interval, y = steps, data = averageSteps, geom = "line", xlab = "5-Minute Interval(HHMM)", ylab = "Average Number of Steps", main = "Time Series of Avg. Steps Against 5-Minute Interval")
2. Which 5-minute interval, on average across all the days in the dataset, contains the maximum number of steps?
## Find the index of the maximum of steps
index <- which.max(x = averageSteps$steps)
## Get the interval located at the index
maxInterval <- stepsData$interval[index]
The interval with maximum number of steps is 835.
countNA <- sum(is.na(stepsData[,steps]))
Total number of missing values in the dataset is 2304.
2. Fill in all of the missing values in the dataset with the average 5-minute interval values and create a new dataset with all the missing data filled in
We are replacing the NA values with the average 5-minute interval values. We use ceiling
here to round off the average 5-minute interval values. The new dataset with the missing values filled in is named completeStepsData
## Get the steps column from the dataset
stepCount <- data.frame(stepsData[,steps])
## Replacing NA values with the average 5-minute interval values
stepCount[is.na(stepCount),] <- ceiling(tapply(X = stepsData[,steps], INDEX = stepsData[,interval], FUN = mean, na.rm = TRUE))
## Create a new dataset combining the columns
completeStepsData <- cbind(stepCount, data[,2:3])
colnames(completeStepsData) <- colnames(data)
A portion of the new dataset completeStepsData
with all the missing data filled in is as follows
## steps date interval
## 1 2 2012-10-01 0
## 2 1 2012-10-01 5
## 3 1 2012-10-01 10
## 4 1 2012-10-01 15
## 5 1 2012-10-01 20
## 6 3 2012-10-01 25
4. Make a histogram of the total number of steps taken each day and Calculate and report the mean and median total number of steps taken per day
newTotalSteps <- aggregate(formula = steps ~ date, FUN = sum, data = completeStepsData)
A portion of the newTotalSteps
is as follows
## date steps
## 1 2012-10-01 10909
## 2 2012-10-02 126
## 3 2012-10-03 11352
## 4 2012-10-04 12116
## 5 2012-10-05 13294
## 6 2012-10-06 15420
qplot(x = steps, data = newTotalSteps, geom = "histogram", main = "Total Steps Per Day With Missing Data Filled In")
Calculate the mean and median of the total number of steps taken per day(with missing values filled in)
newStepsMean <- mean(x = newTotalSteps$steps, na.rm = TRUE)
newStepsMedian <- median(x = newTotalSteps$steps, na.rm = TRUE)
The mean of the total number of steps taken per day is 10784.92 & the median of the total number of steps taken per day is 10909.00.
## Create a new data frame with a column for the type of the day
day.name <- weekdays(as.POSIXct(completeStepsData$date))
day.type <- ifelse(day.name == "Saturday" | day.name == "Sunday", "weekend", "weekday" )
weeklySteps <- cbind(completeStepsData, day.type)
## Calculate the average steps for 5-minute intervals for weekend and weekday
newAverageSteps <- ddply(.data = weeklySteps, .variables = c("interval", "day.type"), .fun = summarise, steps = mean(steps))
## Plotting
qplot(x = interval, y = steps, data = newAverageSteps, geom = "line", xlab = "5-Minute Interval(HHMM)", ylab = "Average Number of Steps", main = "Time Series of Avg. Steps Against 5-Minute Interval", facets = day.type ~ .)