Skip to content

Latest commit

 

History

History
55 lines (44 loc) · 3.61 KB

codebook.md

File metadata and controls

55 lines (44 loc) · 3.61 KB

Tidy Data Codebook

This is a short description of the variables in the "tidy.txt" dataset

Based on the following dataset: Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. Human Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine. International Workshop of Ambient Assisted Living (IWAAL 2012). Vitoria-Gasteiz, Spain. Dec 2012 (This dataset is distributed AS-IS and no responsibility implied or explicit can be addressed to the authors or their institutions for its use or misuse. Any commercial use is prohibited.) Jorge L. Reyes-Ortiz, Alessandro Ghio, Luca Oneto, Davide Anguita. November 2012.

Original dataset description:

The experiments have been carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The experiments have been video-recorded to label the data manually. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% the test data. The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain. See 'features_info.txt' for more details.

For each record provided:

  • Triaxial acceleration from the accelerometer (total acceleration) and the estimated body acceleration.
  • Triaxial Angular velocity from the gyroscope.
  • A 561-feature vector with time and frequency domain variables.
  • Its activity label.
  • An identifier of the subject who carried out the experiment.

The dataset included the following files:

  • 'README.txt'
  • 'features_info.txt': Shows information about the variables used on the feature vector.
  • 'features.txt': List of all features.
  • 'activity_labels.txt': Links the class labels with their activity name.
  • 'train/X_train.txt': Training set.
  • 'train/y_train.txt': Training labels.
  • 'test/X_test.txt': Test set.
  • 'test/y_test.txt': Test labels.

Notes:

  • Features are normalized and bounded within [-1,1].
  • Each feature vector is a row on the text file.

(For more information about this dataset contact: [email protected])

Processing data description:

  • training and test datasets were merged into one large dataset, using appropriate variable labels from features.txt
  • variables NOT dealing with mean or standard deviation were exculded from the dataset
  • activities were labeled with appropriate text from 'activity_labels.txt'(e.g. if activity = '1' then activity label = 'WALKING')
  • the processed dataset includes averages for every activity for each subject, instead of individual observations (so if subject 1 had 10 rows for 'WALKING' activity, in the processed dataset a single row would appear with average value for these 10 entries for each variable)
  • variables names were somewhat altered in order to be more meaningful & readable.