Vignette develop #184

itismeghasyam · 2019-02-07T08:48:24Z

Resolves get_heartrate failing for psd and peak methods #179 (I didn't write the tests because commenting out that code totally eliminates that problem)
Update vignettes #144 - Sensors and Modules vignettes are now updated to reflect new architecture of the package. Also Phil can you please include examples for using funs and models in these vignettes ?
I also made changes to heartrate_data. It now has only columns t, red, green and blue and a much better illustrative record. Before we had some record that was not as illustrative and also we had extra columns of brightness, hue and saturation.
Examples take a very long time to run #185 I reduced the number of examples in many feature extraction functions, and they also run on a subset of the sensor data to make the processing faster. The travis build now takes ~15min instead of ~18min

Getting up to date with Sage - Bionetworks Develop

- Removed data/sensor_data.rda - roxygenized the package

Develop

Commented L#225 - L#229. It is a If loop that makes the hr values NA if the confidence values are NA

- Phil, your input is needed. Can you please write a sample use case of funs(custom defined funs) and models parameter for gyroscope and accelerometer data?

- Updated the modules and sensors vignettes to reflect the changes in the package - The vignette's have a line ## PHIL'S INPUT NEEDED HERE (A sample example for using funs and models), Phil please comment it out once you have addressed it

Develop

Merge pull request #27 from Sage-Bionetworks/develop

if any of the outputs are null, they are recast as NAs

philerooski · 2019-02-07T17:53:56Z

BTW, you don't need to close a PR if you need to make changes. Just push new commits to the reference branch (in this case, itismeghasyam:vignette-develop) and they will be reflected in the PR.

philerooski · 2019-02-08T23:29:58Z

R/get_heartrate.R

-  if (is.na(confidence) || is.na(hr)) {
-    confidence <- NA
-    hr <- NA
+  if ((length(hr) == 0) || is.null(hr)) {


You check if hr is null here, but then reassign it to NA. Are these the same object or are you reusing a variable name? If hr is supposed to be non-numeric, then its non-value representation is NULL. If hr is numeric, its non-value representation is NA.

This is important because if you need to check for non-value-ness somewhere down the line, you should know whether it's numeric (check for NA) or otherwise (check for NULL).

hr is a supposed to be a numeric value hence why it is recast into NA

As for the NULL check and length(hr)>0, they are there to ensure that the hr does not come out with weird results. For eg., if all the signal is 0, the output of L#193 is a list of NaN And the heart rate estimated in the following 5 lines will give a numeric output, but a vector of length 0. So the check. I also dropped in the NULL for safety.

So I don't believe in making any changes to these lines of code.

Alright. I just find it confusing.

philerooski · 2019-02-11T17:16:24Z

vignettes/sensors.Rmd

 library(dplyr)
 ```

 ## Accelerometer

 #### Data format and Sample data

-`mhealthtools` already comes preloaded with sample accelerometer data. You can access the data through `data(accelerometer_data)`. The first five rows are shown below, to give you an idea of the structure of the data (this is the input format to the functions that extract accelerometer features). Where `t` is the timestamp; `x`,`y` and `z` are the X,Y and Z- axis respectively
+`mhealthtools` already comes preloaded with sample accelerometer data. You can access the data through `data(accelerometer_data)` or `mhealthtools::accelerometer_data`. The first five rows are shown below, to give you an idea of the structure of the data (this is the input format to the functions that extract accelerometer features). Where `t` is the timestamp; `x`,`y` and `z` are the X,Y and Z- axis respectively


Calling library(mhealthtools) loads our datasets into the global namespace. So you can reference a dataset by simply typing accelerometer_data.

also put spaces after commas.

Calling library(mhealthtools) loads our datasets into the global namespace. So you can reference a dataset by simply typing accelerometer_data.

Yes, but I personally feel package::dataset is the best way to do it and keep track of things. There is nothing wrong in doing so, infact this is more robust.

also put spaces after commas.

Will do

You should never have to explicitly self-reference a package within itself. package::dataset is for external libraries. Keep in mind this is a vignette. It reads like a book. A very different context from package development.

You should never have to explicitly self-reference a package within itself. package::dataset is for external libraries. Keep in mind this is a vignette. It reads like a book. A very different context from package development.

Is there any problem with the referencing execution wise or is it purely aesthetic?

I have no issues in changing it back, but I need a valid reason to do so. If it is for easy reading, then yes sure.

In a sense, it's "purely aesthetic". But it's also redundant and not the way a more experienced programmer would write it unless there was a namespace conflict.

philerooski · 2019-02-11T17:21:32Z

vignettes/sensors.Rmd

 library(ggplot2)
 a <- accelerometer_data
 a$t <- a$t - a$t[1]
 a <- tidyr::gather(a, 'axis' , 'val', -t)
 ggplot(a, aes(x = t, y= val)) + geom_line()+facet_wrap(~axis) + ylim(c(-0.025, 0.025))
 ```
+We are looking at 10s of data sampled at a high enough sampling rate(for accelerometer data) of 100Hz. The data looks noisy, and looks like it could make good use of some processing like frequency filters, maybe subsetting the time (time filtering) - all of which mhealthtools conveniently offers.


Space between "rate" and "(for...". Format the package name in a consistent way (either mhealthtools or mhealthtools throughout the vignette). You can type a double hyphen "--" in Rmarkdown to insert an em dash (little known facts of English: https://www.thepunctuationguide.com/em-dash.html)

philerooski · 2019-02-11T17:29:12Z

vignettes/sensors.Rmd


 #### Extracting features (using pre-existing pipeline)
-To use the existing feature set (you can define your own features or feed in your own model to extract features in `mhealthtools`!!). Note: You need to get the data in the proper format (as that of the sample data) to use the feature extraction pipeline.
+NOTE: To use the existing feature set (you can define your own features or feed in your own model to extract features in `mhealthtools`!!) you need to get the data in the proper format (as that of the sample data) to use the feature extraction pipeline.


You wouldn't typically have a NOTE at the very beginning of a section. 2+ exclamation marks are a little too informal for the tone we're trying to use for the vignettes. (Single exclamation point is fine).

Wouldn't this sentence be a better fit for the "Data format and Sample data" section?

You wouldn't typically have a NOTE at the very beginning of a section. 2+ exclamation marks are a little too informal for the tone we're trying to use for the vignettes. (Single exclamation point is fine).

There is no rule for where a NOTE should be. I will lose the double exclamation marks, missed that one!

Wouldn't this sentence be a better fit for the "Data format and Sample data" section?

I agree with you, but I do want to convey to the people that the data needs to be in the correct format. Maybe put this note in the data format section, and reference the data format section as a note in L#53? What are your thoughts on this Phil?

What are your thoughts on this Phil?

I think all data formatting requirements should go in the data formatting section.

There is no rule

There aren't hard and fast rules for structuring your writing. I'm merely noting that it's unconventional and doesn't seem to add anything.

I'll shift the requirements to the data formatting section, and just add a line referencing this in other sections as required .

philerooski · 2019-02-11T17:41:28Z

vignettes/sensors.Rmd

 accel_features <- accelerometer_features(accelerometer_data)
 # using default parameters
 ```
+The output file is a list containing features extracted using supplied/default functions (extracted_features), features calculated using models(model_features) and an element for error (error). Let's look at the extracted_features to see how the default features look like.


supplied/default functions (extracted_features)

What is extracted_features? A variable? Another name for "supplied/default functions"? Let's pick one consistent name for the features computed by default. I prefer to call them default features -- especially in this context. Keep in mind that we've also been using the name feature extraction functions to refer explicitly to the functions that, for example, would be passed to the funs parameter in the feature functions. (I know this is a headache, but it's important to be consistent to avoid confusing the reader).

I think you meant to put a space in "models(model_features)".

It's also important to explain under what circumstances results are stored under $extracted_features, $model_features, and $error. This should probably go in the modules vignette.

The reason I mentioned extracted_features was because that is the name of the key in the output list. I explain the keys with their tag(in brackets), the whole sentence reads that way. I'll restructure it to be more coherent

Ah, I see. I use a dollar sign $extracted_features elsewhere in the package to make it more clear that it's a named list element, but I haven't seen a convention around these things. At least format it as a code block so it is obvious we are talking about a code object.

That's a good idea. I will move to $extracted_features, to be consistent across the package.

philerooski · 2019-02-11T17:49:19Z

vignettes/sensors.Rmd


 ```{r, warning=FALSE, error=FALSE, prompt=FALSE}
 accel_features <- accelerometer_features(
  accelerometer_data,
  window_length = 128,
  window_overlap = 0.5)
-# Consider a window length of 128 samples
+# Consider a window length of 128 samples,


Perhaps instead of saying we "Consider a window length..." or "Consider the time ranges..." we could say "Extract features over windows 128 samples in length" and "Extract features over the 2s to 5s time period" -- likewise for the other comments. "Extract features" seems to describe what we're doing more explicitly than "Consider".

I explicitly did not do that.
I always found it effective to mention the parameters as Consider ... of ...

To write it as Extract features.. it would have to be a whole sentence like
Extract features over a window length of 128 samples with a window overlap of 0.5

Because two different sentences like
Extract features over a window length of 128
Extract features over a window overlap of 0.5

read like two different instances of feature extractions.

I always found it effective to mention the parameters as Consider ... of ...

Effective how? In my opinion, the phrasing is unconventional and not explicit. It is also redundant (it is clear from the function call that we are setting window_length to be 128, no need to comment afterwards # Set/Consider a window length of 128 samples).

I always found it effective to mention the parameters as Consider ... of ...

Effective how? In my opinion, the phrasing is unconventional and not explicit. It is also redundant (it is clear from the function call that we are setting window_length to be 128, no need to comment afterwards # Set/Consider a window length of 128 samples).

The many texts I read reference parameters this way. And I always found it better when talking about a function (in our case it is explicitly feature extraction) say Consider the parameter x as 35 etc., You are right about it not being explicit and yes, because I wanted it to read like every other genetic function description. I mean the section is about accelerometer_features

I agree that it looks redundant, but I feel it is a good thing - only shows how good our parameters are named.

philerooski · 2019-02-11T17:53:36Z

vignettes/sensors.Rmd

 ```
+Notice how we now have three extra columns in the feature extraction output namely, window, window_start_time and window_stop_time - which indicate the number of the window in the signal (window 1 : the first window, (0 - window_length) samples), and the start and stop time respectively of that window.


(window 1 : the first window, (0 - window_length) samples)

This confuses me. Can we take it out and still get our meaning across?

My bad, will restructure the sentence to be more coherent

philerooski · 2019-02-11T17:59:06Z

vignettes/sensors.Rmd

 # Consider the frequencies only from 4Hz to 16Hz
 ```

+There are also advanced processing techniques that you can apply to the signal like detrending (using loess), or decompose the signal into Intrinsic Mode Functions (IMF) using the Hilbert Huang Transform. `detrend` and `IMF` respectively are the parameters that refer to these techniques.


Is detrending advanced? I thought you would always want an unbiased signal. (I'm asking because I honestly don't know, maybe detrending is as fancy a function as decomposing the signal into IMFs).

Yes! detrending can be done using a lot of methods. Infact for eg. high pass filtering is one way of detrending the signal, by removing the slow moving trends.

We are making sure the reader understands that the default detrending is done using loess, and if they want more, they can supply the transformation/pre-processing function to mhealthtools

The IMFs are a result of the Hilbert Huang Transform, they got nothing to do with detrending in our package.

philerooski · 2019-02-11T17:59:47Z

vignettes/sensors.Rmd

+# the default feature extraction pipeline
+```
+
+## PHIL'S INPUT NEEDED HERE (A sample example for using funs and models)


We will reference the extending mhealthtools vignette once I make my changes.

philerooski · 2019-02-11T18:07:04Z

vignettes/sensors.Rmd

+```
+
+## PHIL'S INPUT NEEDED HERE (A sample example for using funs and models)
+
 Please read the function documentation `?accelerometer_features` for more info.

 ## Gyroscope


This is an exact copy paste + replace of the previous section on accelerometer data. Please consolidate these into a single section or write a short sentence or two stating that the functionality is the same and that the functional/data analogues are gyroscope_features/gyroscope_data.

Yes it is. I thought if somebody skipped accelerometer and went to gyroscope directly, they would still get the whole vignette.

I thought of consolidating, but we do have two different feature extraction functions for accelerometer and gyroscope, which we may make individual changes to down the line. But I worry that if I consolidate the vignette, people will point out why not do so with the functions too? I can summarize the gyroscope section by pointing them to the accelerometer section for further reference, but again I have the same worry.

What is your call on this?

The alternative is that they will worry "what are the differences?" if we have two distinct, but otherwise identical sections for each. There are differences between the two feature functions, but only in how column names of the transformed sensor data are handled, not the data itself.

Redundancy is fine and sometimes even good in documentation, but the redundancy here only conveys "that the functionality is the same and that the functional/data analogues are gyroscope_features/gyroscope_data" while obfuscating that same message with many more words. I recommend keeping the gyroscope section, but writing a short sentence or two stating that the functionality is the same and that the functional/data analogues are gyroscope_features/gyroscope_data.

The alternative is that they will worry "what are the differences?" if we have two distinct, but otherwise identical sections for each. There are differences between the two feature functions, but only in how column names of the transformed sensor data are handled, not the data itself.

Good point Phil, and point made.

Redundancy is fine and sometimes even good in documentation, but the redundancy here only conveys "that the functionality is the same and that the functional/data analogues are gyroscope_features/gyroscope_data" while obfuscating that same message with many more words. I recommend keeping the gyroscope section, but writing a short sentence or two stating that the functionality is the same and that the functional/data analogues are gyroscope_features/gyroscope_data.

I will make the gyroscope section concise, we can always add things later on if we make changes

philerooski · 2019-02-11T18:12:25Z

vignettes/sensors.Rmd

@@ -158,7 +197,7 @@ For more information on how this data was collected please look into the Parkins

 #### Data format and Sample data

-`mhealthtools` already comes preloaded with sample tapping data. You can access the data through `data(tap_data)`. The first five rows are shown below, to give you an idea of the structure of the data (this is the input format to the functions that extract gyroscope features). Where `t` is the timestamp; `x` and `y` are the co-ordinates of the tap, and `buttonid` is to indicate whether the user tapped the left or the right button.
+`mhealthtools` already comes preloaded with sample tapping data. You can access the data through `data(tap_data)` or `mhealthtools::tap_data`. The first five rows are shown below, to give you an idea of the structure of the data (this is the input format to the functions that extract gyroscope features). Where `t` is the timestamp; `x` and `y` are the co-ordinates of the tap, and `buttonid` is to indicate whether the user tapped the left or the right button.


Github isn't allowing me to comment on the relevant section, but it's not clear to me what depress_threshold is doing, specifically because I'm not understanding what kind of metric "intertap distance" is supposed to be measuring.

I will elaborate on both depress_threshold and 'intertap distance'

philerooski · 2019-02-11T18:35:14Z

vignettes/modules.Rmd

@@ -30,19 +30,20 @@ library(mhealthtools)
 library(dplyr)
 ```



the front pocket of your pant while you walk

pants is always plural.

(also referred to as module, task or activity)

You didn't make the changes I requested in #144 about referring to the experimental design as an assay. Also module comes from the software engineering vernacular. It is a different concept from "experimental design".

I learnt about modules as building blocks for entities. The entities can be programs, algorithms, or complex tasks, this is the actual definition.

For eg., look at this https://en.wikipedia.org/wiki/Module_(mathematics), has been in use since long before the software engineering vernacular.

BUT, if you think most of our readership would get confused about module, I can remove it. In my defense though, an experiment design is a complex task/activity and a module is its building block.

Internally, we have only ever talked about modules in a software engineering context. I'm confused why it's mentioned as an alternative term for "Experimental Design" here. If you are using "module" here in its most context agnostic sense, it's like a cookbook saying "Please add the eggs (also known as food)".

Oh.. alright if you put it that way. The point is to get the point across. I will remove module then.

Also that egg analogy is exactly not valid, as food forms a group in which eggs are one class of modules.

philerooski · 2019-02-11T18:40:21Z

vignettes/modules.Rmd

@@ -30,19 +30,20 @@ library(mhealthtools)
 library(dplyr)
 ```

-#### Walk Module
+### Walk Module

 In this task, the participant is asked to keep the phone in his/her pant's front pocket (if not in the waistband of the pants) and then walk in a straight line for some fixed amount of time (like 30s). For someone with Parkinson's we expect to see symptoms of Dyskinesia and Bradykinesia. 


pant's

pants

philerooski · 2019-02-11T18:41:47Z

vignettes/modules.Rmd


-`mhealthtools` comes with preloaded walk data from one of the mpower participants. Let's take a look at the data before we proceed to analyze it.
+`mhealthtools` comes with preloaded sample walk data from the mpower study. Let's take a look at the walk data before we proceed to analyze it.

 ```{r}


Please make the changes I referenced in #144 and I will do a full review of the modules vignette from there.

(This is in reference to the call to data(...) below as well as other changes that still need to be made in this document).

Added a check to change mean_filter_order to 33, when the sampling rate is low (<32fps)

…tworks#144

1. get_walk_features 2. get_kinetic_tremor_features 3. get_tremor_features 4. get_balance_features

Merge pull request Sage-Bionetworks#184 from itismeghasyam/vignette-develop

itismeghasyam and others added 15 commits January 28, 2019 16:28

Merge pull request #24 from Sage-Bionetworks/develop

b018157

Getting up to date with Sage - Bionetworks Develop

- Removed FeatureDefinitions.md since we now have a vignette for that

2ed849a

- Removed data/sensor_data.rda - roxygenized the package

sensor_data reference removed

6f2c81e

Merge branch 'pr/25' into develop

2162caf

Merge pull request #26 from Sage-Bionetworks/develop

97225b5

Develop

Update get_heartrate.R

faa316b

Commented L#225 - L#229. It is a If loop that makes the hr values NA if the confidence values are NA

- Updated sensors.Rmd Vignette

bdf4bc0

- Phil, your input is needed. Can you please write a sample use case of funs(custom defined funs) and models parameter for gyroscope and accelerometer data?

- Changed heartrate_data to a much cleaner data

5cb888d

- Updated the modules and sensors vignettes to reflect the changes in the package - The vignette's have a line ## PHIL'S INPUT NEEDED HERE (A sample example for using funs and models), Phil please comment it out once you have addressed it

- Changed library dependency from tidyverse to dplyr

9c5a0dd

- Changed dependency from tidyverse to dplyr

f5a9748

eval = False

7f73483

Merge pull request #27 from Sage-Bionetworks/develop

6800fbe

Develop

Merge pull request #28 from itismeghasyam/develop

a326982

Merge pull request #27 from Sage-Bionetworks/develop

Update get_heartrate.R

d2718a5

if any of the outputs are null, they are recast as NAs

NULL and empty numeric handling in get_heartrate

e56e927

philerooski requested changes Feb 8, 2019

View reviewed changes

philerooski requested changes Feb 11, 2019

View reviewed changes

itismeghasyam added 6 commits February 14, 2019 12:38

Update get_heartrate.R

4842dd1

Added a check to change mean_filter_order to 33, when the sampling rate is low (<32fps)

Made changes to modules and sensors vignettes according to Sage-Bione…

a882ae9

…tworks#144

Made changes referenced in Sage-Bionetworks#144

1d2f6d5

- Removed function referencing link in vignette

bd2da84

Made changes to possibly reduce examples run time

dfb5b61

Reduced the number of examples in

57581d0

1. get_walk_features 2. get_kinetic_tremor_features 3. get_tremor_features 4. get_balance_features

philerooski merged commit d32c7c9 into Sage-Bionetworks:develop Mar 4, 2019

philerooski mentioned this pull request Mar 4, 2019

Examples take a very long time to run #185

Closed

itismeghasyam added a commit to itismeghasyam/mHealthTools that referenced this pull request Apr 8, 2019

Merge pull request #29 from Sage-Bionetworks/develop

f5fc309

Merge pull request Sage-Bionetworks#184 from itismeghasyam/vignette-develop

		```
		Notice how we now have three extra columns in the feature extraction output namely, window, window_start_time and window_stop_time - which indicate the number of the window in the signal (window 1 : the first window, (0 - window_length) samples), and the start and stop time respectively of that window.

Vignette develop #184

Vignette develop #184

Conversation

itismeghasyam commented Feb 7, 2019 • edited Loading

philerooski commented Feb 7, 2019

Choose a reason for hiding this comment

itismeghasyam Feb 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

philerooski Feb 11, 2019 • edited Loading

Choose a reason for hiding this comment

itismeghasyam Feb 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

itismeghasyam Feb 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

itismeghasyam Feb 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

itismeghasyam Feb 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

itismeghasyam Feb 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

itismeghasyam commented Feb 7, 2019 •

edited

Loading

itismeghasyam Feb 12, 2019 •

edited

Loading

philerooski Feb 11, 2019 •

edited

Loading

itismeghasyam Feb 12, 2019 •

edited

Loading

itismeghasyam Feb 12, 2019 •

edited

Loading

itismeghasyam Feb 12, 2019 •

edited

Loading

itismeghasyam Feb 12, 2019 •

edited

Loading

itismeghasyam Feb 12, 2019 •

edited

Loading