-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vignette develop #184
Vignette develop #184
Conversation
Getting up to date with Sage - Bionetworks Develop
- Removed data/sensor_data.rda - roxygenized the package
Commented L#225 - L#229. It is a If loop that makes the hr values NA if the confidence values are NA
- Phil, your input is needed. Can you please write a sample use case of funs(custom defined funs) and models parameter for gyroscope and accelerometer data?
- Updated the modules and sensors vignettes to reflect the changes in the package - The vignette's have a line ## PHIL'S INPUT NEEDED HERE (A sample example for using funs and models), Phil please comment it out once you have addressed it
Merge pull request #27 from Sage-Bionetworks/develop
if any of the outputs are null, they are recast as NAs
BTW, you don't need to close a PR if you need to make changes. Just push new commits to the reference branch (in this case, itismeghasyam:vignette-develop) and they will be reflected in the PR. |
if (is.na(confidence) || is.na(hr)) { | ||
confidence <- NA | ||
hr <- NA | ||
if ((length(hr) == 0) || is.null(hr)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You check if hr
is null here, but then reassign it to NA. Are these the same object or are you reusing a variable name? If hr
is supposed to be non-numeric, then its non-value representation is NULL
. If hr
is numeric, its non-value representation is NA
.
This is important because if you need to check for non-value-ness somewhere down the line, you should know whether it's numeric (check for NA) or otherwise (check for NULL).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hr
is a supposed to be a numeric value hence why it is recast into NA
As for the NULL check and length(hr)>0, they are there to ensure that the hr
does not come out with weird results. For eg., if all the signal is 0, the output of L#193 is a list of NaN
And the heart rate estimated in the following 5 lines will give a numeric output, but a vector of length 0
. So the check. I also dropped in the NULL for safety.
So I don't believe in making any changes to these lines of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. I just find it confusing.
vignettes/sensors.Rmd
Outdated
library(dplyr) | ||
``` | ||
|
||
## Accelerometer | ||
|
||
#### Data format and Sample data | ||
|
||
`mhealthtools` already comes preloaded with sample accelerometer data. You can access the data through `data(accelerometer_data)`. The first five rows are shown below, to give you an idea of the structure of the data (this is the input format to the functions that extract accelerometer features). Where `t` is the timestamp; `x`,`y` and `z` are the X,Y and Z- axis respectively | ||
`mhealthtools` already comes preloaded with sample accelerometer data. You can access the data through `data(accelerometer_data)` or `mhealthtools::accelerometer_data`. The first five rows are shown below, to give you an idea of the structure of the data (this is the input format to the functions that extract accelerometer features). Where `t` is the timestamp; `x`,`y` and `z` are the X,Y and Z- axis respectively |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling library(mhealthtools) loads our datasets into the global namespace. So you can reference a dataset by simply typing accelerometer_data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also put spaces after commas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling library(mhealthtools) loads our datasets into the global namespace. So you can reference a dataset by simply typing
accelerometer_data
.
Yes, but I personally feel package::dataset is the best way to do it and keep track of things. There is nothing wrong in doing so, infact this is more robust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also put spaces after commas.
Will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should never have to explicitly self-reference a package within itself. package::dataset is for external libraries. Keep in mind this is a vignette. It reads like a book. A very different context from package development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should never have to explicitly self-reference a package within itself. package::dataset is for external libraries. Keep in mind this is a vignette. It reads like a book. A very different context from package development.
Is there any problem with the referencing execution wise or is it purely aesthetic?
I have no issues in changing it back, but I need a valid reason to do so. If it is for easy reading, then yes sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a sense, it's "purely aesthetic". But it's also redundant and not the way a more experienced programmer would write it unless there was a namespace conflict.
vignettes/sensors.Rmd
Outdated
library(ggplot2) | ||
a <- accelerometer_data | ||
a$t <- a$t - a$t[1] | ||
a <- tidyr::gather(a, 'axis' , 'val', -t) | ||
ggplot(a, aes(x = t, y= val)) + geom_line()+facet_wrap(~axis) + ylim(c(-0.025, 0.025)) | ||
``` | ||
We are looking at 10s of data sampled at a high enough sampling rate(for accelerometer data) of 100Hz. The data looks noisy, and looks like it could make good use of some processing like frequency filters, maybe subsetting the time (time filtering) - all of which mhealthtools conveniently offers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space between "rate" and "(for...". Format the package name in a consistent way (either mhealthtools
or mhealthtools throughout the vignette). You can type a double hyphen "--" in Rmarkdown to insert an em dash (little known facts of English: https://www.thepunctuationguide.com/em-dash.html)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted
vignettes/sensors.Rmd
Outdated
|
||
#### Extracting features (using pre-existing pipeline) | ||
To use the existing feature set (you can define your own features or feed in your own model to extract features in `mhealthtools`!!). Note: You need to get the data in the proper format (as that of the sample data) to use the feature extraction pipeline. | ||
NOTE: To use the existing feature set (you can define your own features or feed in your own model to extract features in `mhealthtools`!!) you need to get the data in the proper format (as that of the sample data) to use the feature extraction pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You wouldn't typically have a NOTE at the very beginning of a section. 2+ exclamation marks are a little too informal for the tone we're trying to use for the vignettes. (Single exclamation point is fine).
Wouldn't this sentence be a better fit for the "Data format and Sample data" section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You wouldn't typically have a NOTE at the very beginning of a section. 2+ exclamation marks are a little too informal for the tone we're trying to use for the vignettes. (Single exclamation point is fine).
There is no rule for where a NOTE should be. I will lose the double exclamation marks, missed that one!
Wouldn't this sentence be a better fit for the "Data format and Sample data" section?
I agree with you, but I do want to convey to the people that the data needs to be in the correct format. Maybe put this note in the data format section, and reference the data format section as a note in L#53? What are your thoughts on this Phil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on this Phil?
I think all data formatting requirements should go in the data formatting section.
There is no rule
There aren't hard and fast rules for structuring your writing. I'm merely noting that it's unconventional and doesn't seem to add anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll shift the requirements to the data formatting section, and just add a line referencing this in other sections as required .
vignettes/sensors.Rmd
Outdated
accel_features <- accelerometer_features(accelerometer_data) | ||
# using default parameters | ||
``` | ||
The output file is a list containing features extracted using supplied/default functions (extracted_features), features calculated using models(model_features) and an element for error (error). Let's look at the extracted_features to see how the default features look like. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supplied/default functions (extracted_features)
What is extracted_features? A variable? Another name for "supplied/default functions"? Let's pick one consistent name for the features computed by default. I prefer to call them default features -- especially in this context. Keep in mind that we've also been using the name feature extraction functions to refer explicitly to the functions that, for example, would be passed to the funs
parameter in the feature functions. (I know this is a headache, but it's important to be consistent to avoid confusing the reader).
I think you meant to put a space in "models(model_features)".
It's also important to explain under what circumstances results are stored under $extracted_features
, $model_features
, and $error
. This should probably go in the modules vignette.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I mentioned extracted_features
was because that is the name of the key in the output list. I explain the keys with their tag(in brackets), the whole sentence reads that way. I'll restructure it to be more coherent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. I use a dollar sign $extracted_features
elsewhere in the package to make it more clear that it's a named list element, but I haven't seen a convention around these things. At least format it as a code block so it is obvious we are talking about a code object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. I will move to $extracted_features
, to be consistent across the package.
|
||
```{r, warning=FALSE, error=FALSE, prompt=FALSE} | ||
accel_features <- accelerometer_features( | ||
accelerometer_data, | ||
window_length = 128, | ||
window_overlap = 0.5) | ||
# Consider a window length of 128 samples | ||
# Consider a window length of 128 samples, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps instead of saying we "Consider a window length..." or "Consider the time ranges..." we could say "Extract features over windows 128 samples in length" and "Extract features over the 2s to 5s time period" -- likewise for the other comments. "Extract features" seems to describe what we're doing more explicitly than "Consider".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explicitly did not do that.
I always found it effective to mention the parameters as Consider ... of ...
To write it as Extract features.. it would have to be a whole sentence like
Extract features over a window length of 128 samples with a window overlap of 0.5
Because two different sentences like
Extract features over a window length of 128
Extract features over a window overlap of 0.5
read like two different instances of feature extractions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always found it effective to mention the parameters as Consider ... of ...
Effective how? In my opinion, the phrasing is unconventional and not explicit. It is also redundant (it is clear from the function call that we are setting window_length
to be 128, no need to comment afterwards # Set/Consider a window length of 128 samples
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always found it effective to mention the parameters as Consider ... of ...
Effective how? In my opinion, the phrasing is unconventional and not explicit. It is also redundant (it is clear from the function call that we are setting
window_length
to be 128, no need to comment afterwards# Set/Consider a window length of 128 samples
).
The many texts I read reference parameters this way. And I always found it better when talking about a function (in our case it is explicitly feature extraction) say Consider the parameter x as 35 etc., You are right about it not being explicit and yes, because I wanted it to read like every other genetic function description. I mean the section is about accelerometer_features
I agree that it looks redundant, but I feel it is a good thing - only shows how good our parameters are named.
vignettes/sensors.Rmd
Outdated
``` | ||
Notice how we now have three extra columns in the feature extraction output namely, window, window_start_time and window_stop_time - which indicate the number of the window in the signal (window 1 : the first window, (0 - window_length) samples), and the start and stop time respectively of that window. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(window 1 : the first window, (0 - window_length) samples)
This confuses me. Can we take it out and still get our meaning across?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, will restructure the sentence to be more coherent
# Consider the frequencies only from 4Hz to 16Hz | ||
``` | ||
|
||
There are also advanced processing techniques that you can apply to the signal like detrending (using loess), or decompose the signal into Intrinsic Mode Functions (IMF) using the Hilbert Huang Transform. `detrend` and `IMF` respectively are the parameters that refer to these techniques. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is detrending advanced? I thought you would always want an unbiased signal. (I'm asking because I honestly don't know, maybe detrending is as fancy a function as decomposing the signal into IMFs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! detrending can be done using a lot of methods. Infact for eg. high pass filtering is one way of detrending the signal, by removing the slow moving trends.
We are making sure the reader understands that the default detrending is done using loess
, and if they want more, they can supply the transformation/pre-processing function to mhealthtools
The IMFs are a result of the Hilbert Huang Transform, they got nothing to do with detrending in our package.
# the default feature extraction pipeline | ||
``` | ||
|
||
## PHIL'S INPUT NEEDED HERE (A sample example for using funs and models) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will reference the extending mhealthtools vignette once I make my changes.
``` | ||
|
||
## PHIL'S INPUT NEEDED HERE (A sample example for using funs and models) | ||
|
||
Please read the function documentation `?accelerometer_features` for more info. | ||
|
||
## Gyroscope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an exact copy paste + replace of the previous section on accelerometer data. Please consolidate these into a single section or write a short sentence or two stating that the functionality is the same and that the functional/data analogues are gyroscope_features/gyroscope_data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is. I thought if somebody skipped accelerometer and went to gyroscope directly, they would still get the whole vignette.
I thought of consolidating, but we do have two different feature extraction functions for accelerometer and gyroscope, which we may make individual changes to down the line. But I worry that if I consolidate the vignette, people will point out why not do so with the functions too? I can summarize the gyroscope section by pointing them to the accelerometer section for further reference, but again I have the same worry.
What is your call on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative is that they will worry "what are the differences?" if we have two distinct, but otherwise identical sections for each. There are differences between the two feature functions, but only in how column names of the transformed sensor data are handled, not the data itself.
Redundancy is fine and sometimes even good in documentation, but the redundancy here only conveys "that the functionality is the same and that the functional/data analogues are gyroscope_features/gyroscope_data" while obfuscating that same message with many more words. I recommend keeping the gyroscope section, but writing a short sentence or two stating that the functionality is the same and that the functional/data analogues are gyroscope_features/gyroscope_data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative is that they will worry "what are the differences?" if we have two distinct, but otherwise identical sections for each. There are differences between the two feature functions, but only in how column names of the transformed sensor data are handled, not the data itself.
Good point Phil, and point made.
Redundancy is fine and sometimes even good in documentation, but the redundancy here only conveys "that the functionality is the same and that the functional/data analogues are gyroscope_features/gyroscope_data" while obfuscating that same message with many more words. I recommend keeping the gyroscope section, but writing a short sentence or two stating that the functionality is the same and that the functional/data analogues are gyroscope_features/gyroscope_data.
I will make the gyroscope section concise, we can always add things later on if we make changes
@@ -158,7 +197,7 @@ For more information on how this data was collected please look into the Parkins | |||
|
|||
#### Data format and Sample data | |||
|
|||
`mhealthtools` already comes preloaded with sample tapping data. You can access the data through `data(tap_data)`. The first five rows are shown below, to give you an idea of the structure of the data (this is the input format to the functions that extract gyroscope features). Where `t` is the timestamp; `x` and `y` are the co-ordinates of the tap, and `buttonid` is to indicate whether the user tapped the left or the right button. | |||
`mhealthtools` already comes preloaded with sample tapping data. You can access the data through `data(tap_data)` or `mhealthtools::tap_data`. The first five rows are shown below, to give you an idea of the structure of the data (this is the input format to the functions that extract gyroscope features). Where `t` is the timestamp; `x` and `y` are the co-ordinates of the tap, and `buttonid` is to indicate whether the user tapped the left or the right button. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Github isn't allowing me to comment on the relevant section, but it's not clear to me what depress_threshold
is doing, specifically because I'm not understanding what kind of metric "intertap distance" is supposed to be measuring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will elaborate on both depress_threshold
and 'intertap distance'
@@ -30,19 +30,20 @@ library(mhealthtools) | |||
library(dplyr) | |||
``` | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the front pocket of your pant while you walk
pants is always plural.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also referred to as module, task or activity)
You didn't make the changes I requested in #144 about referring to the experimental design as an assay. Also module comes from the software engineering vernacular. It is a different concept from "experimental design".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I learnt about modules as building blocks for entities. The entities can be programs, algorithms, or complex tasks, this is the actual definition.
For eg., look at this https://en.wikipedia.org/wiki/Module_(mathematics)
, has been in use since long before the software engineering vernacular.
BUT, if you think most of our readership would get confused about module, I can remove it. In my defense though, an experiment design is a complex task/activity and a module is its building block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internally, we have only ever talked about modules in a software engineering context. I'm confused why it's mentioned as an alternative term for "Experimental Design" here. If you are using "module" here in its most context agnostic sense, it's like a cookbook saying "Please add the eggs (also known as food)".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh.. alright if you put it that way. The point is to get the point across. I will remove module then.
Also that egg analogy is exactly not valid, as food forms a group in which eggs are one class of modules.
vignettes/modules.Rmd
Outdated
@@ -30,19 +30,20 @@ library(mhealthtools) | |||
library(dplyr) | |||
``` | |||
|
|||
#### Walk Module | |||
### Walk Module | |||
|
|||
In this task, the participant is asked to keep the phone in his/her pant's front pocket (if not in the waistband of the pants) and then walk in a straight line for some fixed amount of time (like 30s). For someone with Parkinson's we expect to see symptoms of Dyskinesia and Bradykinesia. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pant's
pants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted
|
||
`mhealthtools` comes with preloaded walk data from one of the mpower participants. Let's take a look at the data before we proceed to analyze it. | ||
`mhealthtools` comes with preloaded sample walk data from the mpower study. Let's take a look at the walk data before we proceed to analyze it. | ||
|
||
```{r} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the changes I referenced in #144 and I will do a full review of the modules vignette from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This is in reference to the call to data(...)
below as well as other changes that still need to be made in this document).
Added a check to change mean_filter_order to 33, when the sampling rate is low (<32fps)
1. get_walk_features 2. get_kinetic_tremor_features 3. get_tremor_features 4. get_balance_features
Merge pull request Sage-Bionetworks#184 from itismeghasyam/vignette-develop
heartrate_data
. It now has only columns t, red, green and blue and a much better illustrative record. Before we had some record that was not as illustrative and also we had extra columns of brightness, hue and saturation.