Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Fixed memory baviour in hrv calc #837

Merged
merged 11 commits into from
Jun 7, 2023

Conversation

ThomasGebhardt
Copy link

@ThomasGebhardt ThomasGebhardt commented Jun 2, 2023

Hello,

while writing some custom code for nk2, I tried gating the calculation of the time-based, frequency-based and non-linear PPG HRV Parameters. While doing that I noticed that, when running the code with all three types of HRV parameters it worked flawlessly - as it is intended using ppg_analyze.

But setting any of the gates to false, did not work, if they have been true before without restarting the kernel. Looking into the issue by extensive testing, I found the following:

In both functions (_ppg_intervalrelated_hrv(...) and _ppg_intervalrelated_formatinput(...)), the return variable is also in the arguments list for the functions (i.e. output):

def _ppg_intervalrelated_hrv(data,
                             sampling_rate,
                             output={},
                             ):    

    # Sanitize input
    colnames = data.columns.values
    if len([i for i in colnames if "PPG_Peaks" in i]) == 0:
        raise ValueError(
            "NeuroKit error: ppg_intervalrelated(): Wrong input,"
            "we couldn't extract peaks. Please make sure"
            "your DataFrame contains an `PPG_Peaks` column."
        )

    # Transform rpeaks from "signal" format to "info" format.
    peaks = np.where(data["PPG_Peaks"].values)[0]
    peaks = {"PPG_Peaks": peaks}

    results = hrv(peaks, 
                  sampling_rate=sampling_rate)
    
    for column in results.columns:
        output[column] = float(results[column])

    return output

def _ppg_intervalrelated_formatinput(data, 
                                     output={} 
                                     ):

    # Sanitize input
    colnames = data.columns.values
    if len([i for i in colnames if "PPG_Rate" in i]) == 0:
        raise ValueError(
            "NeuroKit error: ppg_intervalrelated(): Wrong input,"
            "we couldn't extract heart rate. Please make sure"
            "your DataFrame contains an `PPG_Rate` column."
        )
    signal = data["PPG_Rate"].values
    output["PPG_Rate_Mean"] = np.mean(signal)

    return output

This leads to some really weird memory behaviour, as it seems the output variable within these functions is not properly destroyed (or at least reset), which is not noticeable if the content of the variable is always the same. It is most noticeable, when the amount of columns / keys within output changes between runs.

The fix is as trivial: just move the initialization of the output dictionary out of the arguments into the actual function body - done.

def _ppg_intervalrelated_hrv(data,
                             sampling_rate ):    

    output = {}
   
    # Sanitize input
    colnames = data.columns.values
    if len([i for i in colnames if "PPG_Peaks" in i]) == 0:
        raise ValueError(
            "NeuroKit error: ppg_intervalrelated(): Wrong input,"
            "we couldn't extract peaks. Please make sure"
            "your DataFrame contains an `PPG_Peaks` column."
        )

    # Transform rpeaks from "signal" format to "info" format.
    peaks = np.where(data["PPG_Peaks"].values)[0]
    peaks = {"PPG_Peaks": peaks}

    results = hrv(peaks, 
                  sampling_rate=sampling_rate)
    
    for column in results.columns:
        output[column] = float(results[column])

    return output

def _ppg_intervalrelated_formatinput(data):

   output={}

    # Sanitize input
    colnames = data.columns.values
    if len([i for i in colnames if "PPG_Rate" in i]) == 0:
        raise ValueError(
            "NeuroKit error: ppg_intervalrelated(): Wrong input,"
            "we couldn't extract heart rate. Please make sure"
            "your DataFrame contains an `PPG_Rate` column."
        )
    signal = data["PPG_Rate"].values
    output["PPG_Rate_Mean"] = np.mean(signal)

    return output

Cheers

Fixed function arguments for lower level functions in
ppg_intervalrelated.py, such that no weird memory behaviour occurs.
@welcome
Copy link

welcome bot commented Jun 2, 2023

Thanks for opening this pull request! We'll make sure it's perfect before merging 🤗 force
Make sure to read the contributing guide. Also, if you think that your contribution is worthy of it, you can consider adding yourself to the Contributors list (feel free to ask us if you have any doubts).

@codecov-commenter
Copy link

codecov-commenter commented Jun 2, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.18 🎉

Comparison is base (5116fb4) 53.88% compared to head (9b953ca) 54.06%.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #837      +/-   ##
==========================================
+ Coverage   53.88%   54.06%   +0.18%     
==========================================
  Files         295      295              
  Lines       13811    13809       -2     
==========================================
+ Hits         7442     7466      +24     
+ Misses       6369     6343      -26     
Impacted Files Coverage Δ
neurokit2/ppg/ppg_intervalrelated.py 91.42% <100.00%> (+75.21%) ⬆️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@ThomasGebhardt ThomasGebhardt changed the title Fixed output [Fix] Fixed output in hrv calc Jun 2, 2023
@ThomasGebhardt ThomasGebhardt changed the title [Fix] Fixed output in hrv calc [Fix] Fixed memory baviour in hrv calc Jun 2, 2023
I am expecting this to fail with the suggested change because when a dictionary is passed as input, the "output" argument is used
@pull-request-size pull-request-size bot added size/M and removed size/XS labels Jun 2, 2023
@danibene
Copy link
Collaborator

danibene commented Jun 3, 2023

Hi @ThomasGebhardt , thank you for this! I added a test showing how the suggested changes are incompatible with having a dictionary as input. Would you be able to add modifications so that the test passes?

Thomas Gebhardt added 4 commits June 3, 2023 09:54
Fixed the error message, that too many arguments are passed.
Ran into other problems - changed quite a few things as a consequence.
- Updated internals' checks to more "pythonic" way
- Updated intervals for dicts such that they don't overwrite each other
  anymore (add label info and rate calculation that is)
- Updated such that the weird memory behaviour is gone, when gating hrv
  parameter calculation for performance improvement
- Updated type casting in ppg_intervalrelated internals to utilize
  pandas native functions for performance increase (and easier to read
  code)
- Added type hints to internal functions such that IDEs actually know
  what the function gets as input, enabling suggestions and
  documentation
Previous version created a bug, in which the calculated hrv parameters
are entered as dicts into the overall ppg_intervalls df - which
miraculously did not conflict with any tests.
@ThomasGebhardt
Copy link
Author

Hey @danibene, should be good to go now - hit some road bumps on the way, changed a bit more than intended, but also works now properly, as far as I can see it (at least I didn't find anything why it shouldn't and the tests check out).

@@ -97,27 +95,28 @@ def ppg_intervalrelated(data, sampling_rate=1000):
# =============================================================================


def _ppg_intervalrelated_formatinput(data, output={}):

def _ppg_intervalrelated_formatinput(data: pd.DataFrame):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason you added a type annotation here? I do think that type annotations are generally good to have but it's inconsistent with the rest of the code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but curious about that annotation as well.

In general yes, I also think type annotations are good, although NK might not be a priority repo to add them, as in general we are pretty flexible by design regarding the type of input (i.e., we try to accommodate potential user "mistakes")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed more of a convenience for the "deep-diving" user - such that the employed IDE gets what the incoming data type is and thus enables the suggestion features for the variable. Regarding flexibility, the functions were before and still are only written to handle data as a pandas DataFrame as that is what is created from ppg_process beforehand.

Me personally, I love using type hints as it makes using functions clearer and easier - and they also help with getting back into code that I did not work on for some time a lot faster - so maybe I simply added it from a force of habit as well.

If pure functionality and consistency with the rest of the code are the goals, the type hint can be removed without consequence. Let me know, if you want me to drop it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a big experience with type hints, but I can see their value.

I guess either:

  • we remove it here for consistency for now (and eventually someday working on an update by type hitting the whole codebase)
  • We leave it and we implicitly start slowly type-hitting here and there whenever we do some code updates (i.e., the progress will be slow and random).

@danibene what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a big experience with type hints, but I can see their value.

I guess either:

  • we remove it here for consistency for now (and eventually someday working on an update by type hitting the whole codebase)
  • We leave it and we implicitly start slowly type-hitting here and there whenever we do some code updates (i.e., the progress will be slow and random).

@danibene what do you think?

I also don't have much experience with type hints, but I'd err on the side of removing it for now.

One option to start with, to make sure we're not biting off more than we can chew, could be to create separate stub files like pandas https://github.com/pandas-dev/pandas-stubs

I recently started this on my own so I could use mypy in repositories using certain neurokit functions, if you'd like we could add this to the neuropsychology org and request that contributors of new functions contribute there too:
https://github.com/danibene/neurokit2-stubs/tree/dev

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't have much experience with type hints, but I'd err on the side of removing it for now.

Let's do that then

could be to create separate stub files

Oh boy it's the first time I hear about stubs 😵‍💫 Let me do some reading first and then we can see what to do regarding the infrastructure

Copy link
Collaborator

@danibene danibene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ThomasGebhardt ! I just added a small comment on style

@DominiqueMakowski could you also please take a look before merging?

@danibene danibene merged commit cf2d93f into neuropsychology:dev Jun 7, 2023
@welcome
Copy link

welcome bot commented Jun 7, 2023

landing
Congrats on merging your first pull request! 🎉🍾 We're looking forward to your next one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants