The assignment requires us to take the data from the UIC source and perform 5 steps towards getting clean data
- Merges the training and the test sets to create one data set.
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Uses descriptive activity names to name the activities in the data set
- Appropriately labels the data set with descriptive activity names.
- Creates a second, independent tidy data set with the average of each variable for each activity and each subject.
- The script walks through all the above steps
- Includes downloading the data from site specified
- Reading the data into local variables using read.table()
- Using rbind, combine the test and train data into a single data
- For step 4, change the names of the X columns using gsub to make them more meaningful
- Adding README.txt from the original data set for reference
- Extract only specified columns which reduces the column count from 561 to 66
- use the aggregate function to find the mean of all the data in the reduced set for per person, per activity
- For Step2, 33 columns were chosen for Mean measurement and 33 columns for Standard Deviation measurement
- MeanFrequency and Angle measurements we not included in the above set. Only mean() and std() for all measurements were considered
- For Step 4, I assumed that this meant cleaning up the column names of the features. In order to make these column names more readable, "()" and "-" were replaced with "." and camel case notation (Ex. tBodyAcc.Mean.X)
- Since the format of the tidy data set was not specified, I've used a .txt format similar to the original data