Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vignette develop #184

Merged
merged 21 commits into from
Mar 4, 2019

Conversation

itismeghasyam
Copy link
Contributor

@itismeghasyam itismeghasyam commented Feb 7, 2019

  • Resolves get_heartrate failing for psd and peak methods #179 (I didn't write the tests because commenting out that code totally eliminates that problem)
  • Update vignettes #144 - Sensors and Modules vignettes are now updated to reflect new architecture of the package. Also Phil can you please include examples for using funs and models in these vignettes ?
  • I also made changes to heartrate_data. It now has only columns t, red, green and blue and a much better illustrative record. Before we had some record that was not as illustrative and also we had extra columns of brightness, hue and saturation.
  • Examples take a very long time to run #185 I reduced the number of examples in many feature extraction functions, and they also run on a subset of the sensor data to make the processing faster. The travis build now takes ~15min instead of ~18min

itismeghasyam and others added 15 commits January 28, 2019 16:28
Getting up to date with Sage - Bionetworks Develop
- Removed data/sensor_data.rda
- roxygenized the package
Commented L#225 - L#229. It is a If loop that makes the hr values NA if the confidence values are NA
- Phil, your input is needed. Can you please write a sample use case of funs(custom defined funs) and models parameter for gyroscope and accelerometer data?
- Updated the modules and sensors vignettes to reflect the changes in the package
- The vignette's have a line ## PHIL'S INPUT NEEDED HERE (A sample example for using funs and models),
Phil please comment it out once you have addressed it
Merge pull request #27 from Sage-Bionetworks/develop
if any of the outputs are null, they are recast as NAs
@philerooski
Copy link
Collaborator

BTW, you don't need to close a PR if you need to make changes. Just push new commits to the reference branch (in this case, itismeghasyam:vignette-develop) and they will be reflected in the PR.

if (is.na(confidence) || is.na(hr)) {
confidence <- NA
hr <- NA
if ((length(hr) == 0) || is.null(hr)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You check if hr is null here, but then reassign it to NA. Are these the same object or are you reusing a variable name? If hr is supposed to be non-numeric, then its non-value representation is NULL. If hr is numeric, its non-value representation is NA.

This is important because if you need to check for non-value-ness somewhere down the line, you should know whether it's numeric (check for NA) or otherwise (check for NULL).

Copy link
Contributor Author

@itismeghasyam itismeghasyam Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hr is a supposed to be a numeric value hence why it is recast into NA

As for the NULL check and length(hr)>0, they are there to ensure that the hr does not come out with weird results. For eg., if all the signal is 0, the output of L#193 is a list of NaN And the heart rate estimated in the following 5 lines will give a numeric output, but a vector of length 0. So the check. I also dropped in the NULL for safety.

So I don't believe in making any changes to these lines of code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I just find it confusing.

library(dplyr)
```

## Accelerometer

#### Data format and Sample data

`mhealthtools` already comes preloaded with sample accelerometer data. You can access the data through `data(accelerometer_data)`. The first five rows are shown below, to give you an idea of the structure of the data (this is the input format to the functions that extract accelerometer features). Where `t` is the timestamp; `x`,`y` and `z` are the X,Y and Z- axis respectively
`mhealthtools` already comes preloaded with sample accelerometer data. You can access the data through `data(accelerometer_data)` or `mhealthtools::accelerometer_data`. The first five rows are shown below, to give you an idea of the structure of the data (this is the input format to the functions that extract accelerometer features). Where `t` is the timestamp; `x`,`y` and `z` are the X,Y and Z- axis respectively
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling library(mhealthtools) loads our datasets into the global namespace. So you can reference a dataset by simply typing accelerometer_data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also put spaces after commas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling library(mhealthtools) loads our datasets into the global namespace. So you can reference a dataset by simply typing accelerometer_data.

Yes, but I personally feel package::dataset is the best way to do it and keep track of things. There is nothing wrong in doing so, infact this is more robust.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also put spaces after commas.

Will do

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should never have to explicitly self-reference a package within itself. package::dataset is for external libraries. Keep in mind this is a vignette. It reads like a book. A very different context from package development.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should never have to explicitly self-reference a package within itself. package::dataset is for external libraries. Keep in mind this is a vignette. It reads like a book. A very different context from package development.

Is there any problem with the referencing execution wise or is it purely aesthetic?

I have no issues in changing it back, but I need a valid reason to do so. If it is for easy reading, then yes sure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a sense, it's "purely aesthetic". But it's also redundant and not the way a more experienced programmer would write it unless there was a namespace conflict.

library(ggplot2)
a <- accelerometer_data
a$t <- a$t - a$t[1]
a <- tidyr::gather(a, 'axis' , 'val', -t)
ggplot(a, aes(x = t, y= val)) + geom_line()+facet_wrap(~axis) + ylim(c(-0.025, 0.025))
```
We are looking at 10s of data sampled at a high enough sampling rate(for accelerometer data) of 100Hz. The data looks noisy, and looks like it could make good use of some processing like frequency filters, maybe subsetting the time (time filtering) - all of which mhealthtools conveniently offers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space between "rate" and "(for...". Format the package name in a consistent way (either mhealthtools or mhealthtools throughout the vignette). You can type a double hyphen "--" in Rmarkdown to insert an em dash (little known facts of English: https://www.thepunctuationguide.com/em-dash.html)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted


#### Extracting features (using pre-existing pipeline)
To use the existing feature set (you can define your own features or feed in your own model to extract features in `mhealthtools`!!). Note: You need to get the data in the proper format (as that of the sample data) to use the feature extraction pipeline.
NOTE: To use the existing feature set (you can define your own features or feed in your own model to extract features in `mhealthtools`!!) you need to get the data in the proper format (as that of the sample data) to use the feature extraction pipeline.
Copy link
Collaborator

@philerooski philerooski Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wouldn't typically have a NOTE at the very beginning of a section. 2+ exclamation marks are a little too informal for the tone we're trying to use for the vignettes. (Single exclamation point is fine).

Wouldn't this sentence be a better fit for the "Data format and Sample data" section?

Copy link
Contributor Author

@itismeghasyam itismeghasyam Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wouldn't typically have a NOTE at the very beginning of a section. 2+ exclamation marks are a little too informal for the tone we're trying to use for the vignettes. (Single exclamation point is fine).

There is no rule for where a NOTE should be. I will lose the double exclamation marks, missed that one!

Wouldn't this sentence be a better fit for the "Data format and Sample data" section?

I agree with you, but I do want to convey to the people that the data needs to be in the correct format. Maybe put this note in the data format section, and reference the data format section as a note in L#53? What are your thoughts on this Phil?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on this Phil?

I think all data formatting requirements should go in the data formatting section.

There is no rule

There aren't hard and fast rules for structuring your writing. I'm merely noting that it's unconventional and doesn't seem to add anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll shift the requirements to the data formatting section, and just add a line referencing this in other sections as required .

accel_features <- accelerometer_features(accelerometer_data)
# using default parameters
```
The output file is a list containing features extracted using supplied/default functions (extracted_features), features calculated using models(model_features) and an element for error (error). Let's look at the extracted_features to see how the default features look like.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supplied/default functions (extracted_features)

What is extracted_features? A variable? Another name for "supplied/default functions"? Let's pick one consistent name for the features computed by default. I prefer to call them default features -- especially in this context. Keep in mind that we've also been using the name feature extraction functions to refer explicitly to the functions that, for example, would be passed to the funs parameter in the feature functions. (I know this is a headache, but it's important to be consistent to avoid confusing the reader).

I think you meant to put a space in "models(model_features)".

It's also important to explain under what circumstances results are stored under $extracted_features, $model_features, and $error. This should probably go in the modules vignette.

Copy link
Contributor Author

@itismeghasyam itismeghasyam Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I mentioned extracted_features was because that is the name of the key in the output list. I explain the keys with their tag(in brackets), the whole sentence reads that way. I'll restructure it to be more coherent

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I use a dollar sign $extracted_features elsewhere in the package to make it more clear that it's a named list element, but I haven't seen a convention around these things. At least format it as a code block so it is obvious we are talking about a code object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I will move to $extracted_features, to be consistent across the package.


```{r, warning=FALSE, error=FALSE, prompt=FALSE}
accel_features <- accelerometer_features(
accelerometer_data,
window_length = 128,
window_overlap = 0.5)
# Consider a window length of 128 samples
# Consider a window length of 128 samples,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps instead of saying we "Consider a window length..." or "Consider the time ranges..." we could say "Extract features over windows 128 samples in length" and "Extract features over the 2s to 5s time period" -- likewise for the other comments. "Extract features" seems to describe what we're doing more explicitly than "Consider".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explicitly did not do that.
I always found it effective to mention the parameters as Consider ... of ...

To write it as Extract features.. it would have to be a whole sentence like
Extract features over a window length of 128 samples with a window overlap of 0.5

Because two different sentences like
Extract features over a window length of 128
Extract features over a window overlap of 0.5

read like two different instances of feature extractions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always found it effective to mention the parameters as Consider ... of ...

Effective how? In my opinion, the phrasing is unconventional and not explicit. It is also redundant (it is clear from the function call that we are setting window_length to be 128, no need to comment afterwards # Set/Consider a window length of 128 samples).

Copy link
Contributor Author

@itismeghasyam itismeghasyam Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always found it effective to mention the parameters as Consider ... of ...

Effective how? In my opinion, the phrasing is unconventional and not explicit. It is also redundant (it is clear from the function call that we are setting window_length to be 128, no need to comment afterwards # Set/Consider a window length of 128 samples).

The many texts I read reference parameters this way. And I always found it better when talking about a function (in our case it is explicitly feature extraction) say Consider the parameter x as 35 etc., You are right about it not being explicit and yes, because I wanted it to read like every other genetic function description. I mean the section is about accelerometer_features

I agree that it looks redundant, but I feel it is a good thing - only shows how good our parameters are named.

```
Notice how we now have three extra columns in the feature extraction output namely, window, window_start_time and window_stop_time - which indicate the number of the window in the signal (window 1 : the first window, (0 - window_length) samples), and the start and stop time respectively of that window.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(window 1 : the first window, (0 - window_length) samples)

This confuses me. Can we take it out and still get our meaning across?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, will restructure the sentence to be more coherent

# Consider the frequencies only from 4Hz to 16Hz
```

There are also advanced processing techniques that you can apply to the signal like detrending (using loess), or decompose the signal into Intrinsic Mode Functions (IMF) using the Hilbert Huang Transform. `detrend` and `IMF` respectively are the parameters that refer to these techniques.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is detrending advanced? I thought you would always want an unbiased signal. (I'm asking because I honestly don't know, maybe detrending is as fancy a function as decomposing the signal into IMFs).

Copy link
Contributor Author

@itismeghasyam itismeghasyam Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! detrending can be done using a lot of methods. Infact for eg. high pass filtering is one way of detrending the signal, by removing the slow moving trends.

We are making sure the reader understands that the default detrending is done using loess, and if they want more, they can supply the transformation/pre-processing function to mhealthtools

The IMFs are a result of the Hilbert Huang Transform, they got nothing to do with detrending in our package.

# the default feature extraction pipeline
```

## PHIL'S INPUT NEEDED HERE (A sample example for using funs and models)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will reference the extending mhealthtools vignette once I make my changes.

```

## PHIL'S INPUT NEEDED HERE (A sample example for using funs and models)

Please read the function documentation `?accelerometer_features` for more info.

## Gyroscope
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an exact copy paste + replace of the previous section on accelerometer data. Please consolidate these into a single section or write a short sentence or two stating that the functionality is the same and that the functional/data analogues are gyroscope_features/gyroscope_data.

Copy link
Contributor Author

@itismeghasyam itismeghasyam Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is. I thought if somebody skipped accelerometer and went to gyroscope directly, they would still get the whole vignette.

I thought of consolidating, but we do have two different feature extraction functions for accelerometer and gyroscope, which we may make individual changes to down the line. But I worry that if I consolidate the vignette, people will point out why not do so with the functions too? I can summarize the gyroscope section by pointing them to the accelerometer section for further reference, but again I have the same worry.

What is your call on this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is that they will worry "what are the differences?" if we have two distinct, but otherwise identical sections for each. There are differences between the two feature functions, but only in how column names of the transformed sensor data are handled, not the data itself.

Redundancy is fine and sometimes even good in documentation, but the redundancy here only conveys "that the functionality is the same and that the functional/data analogues are gyroscope_features/gyroscope_data" while obfuscating that same message with many more words. I recommend keeping the gyroscope section, but writing a short sentence or two stating that the functionality is the same and that the functional/data analogues are gyroscope_features/gyroscope_data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is that they will worry "what are the differences?" if we have two distinct, but otherwise identical sections for each. There are differences between the two feature functions, but only in how column names of the transformed sensor data are handled, not the data itself.

Good point Phil, and point made.

Redundancy is fine and sometimes even good in documentation, but the redundancy here only conveys "that the functionality is the same and that the functional/data analogues are gyroscope_features/gyroscope_data" while obfuscating that same message with many more words. I recommend keeping the gyroscope section, but writing a short sentence or two stating that the functionality is the same and that the functional/data analogues are gyroscope_features/gyroscope_data.

I will make the gyroscope section concise, we can always add things later on if we make changes

@@ -158,7 +197,7 @@ For more information on how this data was collected please look into the Parkins

#### Data format and Sample data

`mhealthtools` already comes preloaded with sample tapping data. You can access the data through `data(tap_data)`. The first five rows are shown below, to give you an idea of the structure of the data (this is the input format to the functions that extract gyroscope features). Where `t` is the timestamp; `x` and `y` are the co-ordinates of the tap, and `buttonid` is to indicate whether the user tapped the left or the right button.
`mhealthtools` already comes preloaded with sample tapping data. You can access the data through `data(tap_data)` or `mhealthtools::tap_data`. The first five rows are shown below, to give you an idea of the structure of the data (this is the input format to the functions that extract gyroscope features). Where `t` is the timestamp; `x` and `y` are the co-ordinates of the tap, and `buttonid` is to indicate whether the user tapped the left or the right button.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github isn't allowing me to comment on the relevant section, but it's not clear to me what depress_threshold is doing, specifically because I'm not understanding what kind of metric "intertap distance" is supposed to be measuring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will elaborate on both depress_threshold and 'intertap distance'

@@ -30,19 +30,20 @@ library(mhealthtools)
library(dplyr)
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the front pocket of your pant while you walk

pants is always plural.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also referred to as module, task or activity)

You didn't make the changes I requested in #144 about referring to the experimental design as an assay. Also module comes from the software engineering vernacular. It is a different concept from "experimental design".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I learnt about modules as building blocks for entities. The entities can be programs, algorithms, or complex tasks, this is the actual definition.

For eg., look at this https://en.wikipedia.org/wiki/Module_(mathematics), has been in use since long before the software engineering vernacular.

BUT, if you think most of our readership would get confused about module, I can remove it. In my defense though, an experiment design is a complex task/activity and a module is its building block.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally, we have only ever talked about modules in a software engineering context. I'm confused why it's mentioned as an alternative term for "Experimental Design" here. If you are using "module" here in its most context agnostic sense, it's like a cookbook saying "Please add the eggs (also known as food)".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh.. alright if you put it that way. The point is to get the point across. I will remove module then.

Also that egg analogy is exactly not valid, as food forms a group in which eggs are one class of modules.

@@ -30,19 +30,20 @@ library(mhealthtools)
library(dplyr)
```

#### Walk Module
### Walk Module

In this task, the participant is asked to keep the phone in his/her pant's front pocket (if not in the waistband of the pants) and then walk in a straight line for some fixed amount of time (like 30s). For someone with Parkinson's we expect to see symptoms of Dyskinesia and Bradykinesia.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pant's

pants

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted


`mhealthtools` comes with preloaded walk data from one of the mpower participants. Let's take a look at the data before we proceed to analyze it.
`mhealthtools` comes with preloaded sample walk data from the mpower study. Let's take a look at the walk data before we proceed to analyze it.

```{r}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the changes I referenced in #144 and I will do a full review of the modules vignette from there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is in reference to the call to data(...) below as well as other changes that still need to be made in this document).

@philerooski philerooski merged commit d32c7c9 into Sage-Bionetworks:develop Mar 4, 2019
itismeghasyam added a commit to itismeghasyam/mHealthTools that referenced this pull request Apr 8, 2019
Merge pull request Sage-Bionetworks#184 from itismeghasyam/vignette-develop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants