[Fix] Fixed memory baviour in hrv calc #837

ThomasGebhardt · 2023-06-02T16:16:13Z

Hello,

while writing some custom code for nk2, I tried gating the calculation of the time-based, frequency-based and non-linear PPG HRV Parameters. While doing that I noticed that, when running the code with all three types of HRV parameters it worked flawlessly - as it is intended using ppg_analyze.

But setting any of the gates to false, did not work, if they have been true before without restarting the kernel. Looking into the issue by extensive testing, I found the following:

In both functions (_ppg_intervalrelated_hrv(...) and _ppg_intervalrelated_formatinput(...)), the return variable is also in the arguments list for the functions (i.e. output):

def _ppg_intervalrelated_hrv(data,
                             sampling_rate,
                             output={},
                             ):    

    # Sanitize input
    colnames = data.columns.values
    if len([i for i in colnames if "PPG_Peaks" in i]) == 0:
        raise ValueError(
            "NeuroKit error: ppg_intervalrelated(): Wrong input,"
            "we couldn't extract peaks. Please make sure"
            "your DataFrame contains an `PPG_Peaks` column."
        )

    # Transform rpeaks from "signal" format to "info" format.
    peaks = np.where(data["PPG_Peaks"].values)[0]
    peaks = {"PPG_Peaks": peaks}

    results = hrv(peaks, 
                  sampling_rate=sampling_rate)
    
    for column in results.columns:
        output[column] = float(results[column])

    return output

def _ppg_intervalrelated_formatinput(data, 
                                     output={} 
                                     ):

    # Sanitize input
    colnames = data.columns.values
    if len([i for i in colnames if "PPG_Rate" in i]) == 0:
        raise ValueError(
            "NeuroKit error: ppg_intervalrelated(): Wrong input,"
            "we couldn't extract heart rate. Please make sure"
            "your DataFrame contains an `PPG_Rate` column."
        )
    signal = data["PPG_Rate"].values
    output["PPG_Rate_Mean"] = np.mean(signal)

    return output

This leads to some really weird memory behaviour, as it seems the output variable within these functions is not properly destroyed (or at least reset), which is not noticeable if the content of the variable is always the same. It is most noticeable, when the amount of columns / keys within output changes between runs.

The fix is as trivial: just move the initialization of the output dictionary out of the arguments into the actual function body - done.

def _ppg_intervalrelated_hrv(data,
                             sampling_rate ):    

    output = {}
   
    # Sanitize input
    colnames = data.columns.values
    if len([i for i in colnames if "PPG_Peaks" in i]) == 0:
        raise ValueError(
            "NeuroKit error: ppg_intervalrelated(): Wrong input,"
            "we couldn't extract peaks. Please make sure"
            "your DataFrame contains an `PPG_Peaks` column."
        )

    # Transform rpeaks from "signal" format to "info" format.
    peaks = np.where(data["PPG_Peaks"].values)[0]
    peaks = {"PPG_Peaks": peaks}

    results = hrv(peaks, 
                  sampling_rate=sampling_rate)
    
    for column in results.columns:
        output[column] = float(results[column])

    return output

def _ppg_intervalrelated_formatinput(data):

   output={}

    # Sanitize input
    colnames = data.columns.values
    if len([i for i in colnames if "PPG_Rate" in i]) == 0:
        raise ValueError(
            "NeuroKit error: ppg_intervalrelated(): Wrong input,"
            "we couldn't extract heart rate. Please make sure"
            "your DataFrame contains an `PPG_Rate` column."
        )
    signal = data["PPG_Rate"].values
    output["PPG_Rate_Mean"] = np.mean(signal)

    return output

Cheers

Fixed function arguments for lower level functions in ppg_intervalrelated.py, such that no weird memory behaviour occurs.

welcome · 2023-06-02T16:16:15Z

Thanks for opening this pull request! We'll make sure it's perfect before merging 🤗
Make sure to read the contributing guide. Also, if you think that your contribution is worthy of it, you can consider adding yourself to the Contributors list (feel free to ask us if you have any doubts).

codecov-commenter · 2023-06-02T16:41:10Z

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.18 🎉

Comparison is base (5116fb4) 53.88% compared to head (9b953ca) 54.06%.

Additional details and impacted files

@@            Coverage Diff             @@
##              dev     #837      +/-   ##
==========================================
+ Coverage   53.88%   54.06%   +0.18%     
==========================================
  Files         295      295              
  Lines       13811    13809       -2     
==========================================
+ Hits         7442     7466      +24     
+ Misses       6369     6343      -26

Impacted Files	Coverage Δ
neurokit2/ppg/ppg_intervalrelated.py	`91.42% <100.00%> (+75.21%)`	⬆️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

I am expecting this to fail with the suggested change because when a dictionary is passed as input, the "output" argument is used

danibene · 2023-06-03T01:40:22Z

Hi @ThomasGebhardt , thank you for this! I added a test showing how the suggested changes are incompatible with having a dictionary as input. Would you be able to add modifications so that the test passes?

Fixed the error message, that too many arguments are passed.

Ran into other problems - changed quite a few things as a consequence. - Updated internals' checks to more "pythonic" way - Updated intervals for dicts such that they don't overwrite each other anymore (add label info and rate calculation that is) - Updated such that the weird memory behaviour is gone, when gating hrv parameter calculation for performance improvement - Updated type casting in ppg_intervalrelated internals to utilize pandas native functions for performance increase (and easier to read code) - Added type hints to internal functions such that IDEs actually know what the function gets as input, enabling suggestions and documentation

Previous version created a bug, in which the calculated hrv parameters are entered as dicts into the overall ppg_intervalls df - which miraculously did not conflict with any tests.

ThomasGebhardt · 2023-06-03T15:20:19Z

Hey @danibene, should be good to go now - hit some road bumps on the way, changed a bit more than intended, but also works now properly, as far as I can see it (at least I didn't find anything why it shouldn't and the tests check out).

danibene · 2023-06-04T21:10:28Z

neurokit2/ppg/ppg_intervalrelated.py

@@ -97,27 +95,28 @@ def ppg_intervalrelated(data, sampling_rate=1000):
 # =============================================================================


-def _ppg_intervalrelated_formatinput(data, output={}):
-
+def _ppg_intervalrelated_formatinput(data: pd.DataFrame):


is there a reason you added a type annotation here? I do think that type annotations are generally good to have but it's inconsistent with the rest of the code

Looks good to me, but curious about that annotation as well.

In general yes, I also think type annotations are good, although NK might not be a priority repo to add them, as in general we are pretty flexible by design regarding the type of input (i.e., we try to accommodate potential user "mistakes")

It is indeed more of a convenience for the "deep-diving" user - such that the employed IDE gets what the incoming data type is and thus enables the suggestion features for the variable. Regarding flexibility, the functions were before and still are only written to handle data as a pandas DataFrame as that is what is created from ppg_process beforehand.

Me personally, I love using type hints as it makes using functions clearer and easier - and they also help with getting back into code that I did not work on for some time a lot faster - so maybe I simply added it from a force of habit as well.

If pure functionality and consistency with the rest of the code are the goals, the type hint can be removed without consequence. Let me know, if you want me to drop it.

I don't have a big experience with type hints, but I can see their value.

I guess either:

we remove it here for consistency for now (and eventually someday working on an update by type hitting the whole codebase)

We leave it and we implicitly start slowly type-hitting here and there whenever we do some code updates (i.e., the progress will be slow and random).

@danibene what do you think?

I don't have a big experience with type hints, but I can see their value.

I guess either:

we remove it here for consistency for now (and eventually someday working on an update by type hitting the whole codebase)

We leave it and we implicitly start slowly type-hitting here and there whenever we do some code updates (i.e., the progress will be slow and random).

@danibene what do you think?

I also don't have much experience with type hints, but I'd err on the side of removing it for now.

One option to start with, to make sure we're not biting off more than we can chew, could be to create separate stub files like pandas https://github.com/pandas-dev/pandas-stubs

I recently started this on my own so I could use mypy in repositories using certain neurokit functions, if you'd like we could add this to the neuropsychology org and request that contributors of new functions contribute there too:
https://github.com/danibene/neurokit2-stubs/tree/dev

I also don't have much experience with type hints, but I'd err on the side of removing it for now.

Let's do that then

could be to create separate stub files

Oh boy it's the first time I hear about stubs 😵‍💫 Let me do some reading first and then we can see what to do regarding the infrastructure

danibene

Thanks @ThomasGebhardt ! I just added a small comment on style

@DominiqueMakowski could you also please take a look before merging?

welcome · 2023-06-07T14:03:29Z

Congrats on merging your first pull request! 🎉🍾 We're looking forward to your next one!

Fixed output

9a08861

Fixed function arguments for lower level functions in ppg_intervalrelated.py, such that no weird memory behaviour occurs.

pull-request-size bot added the size/XS label Jun 2, 2023

ThomasGebhardt changed the title ~~Fixed output~~ [Fix] Fixed output in hrv calc Jun 2, 2023

ThomasGebhardt changed the title ~~[Fix] Fixed output in hrv calc~~ [Fix] Fixed memory baviour in hrv calc Jun 2, 2023

add test that works on current version of neurokit2

e7f9995

I am expecting this to fail with the suggested change because when a dictionary is passed as input, the "output" argument is used

pull-request-size bot added size/M and removed size/XS labels Jun 2, 2023

Merge branch 'dev' into fix_output

d06756e

Thomas Gebhardt added 4 commits June 3, 2023 09:54

Fixed error message for ppg_intervalrelated

1b692c3

Fixed the error message, that too many arguments are passed.

Fixed coding style

16730e0

Fixed df values bug

41f125b

Previous version created a bug, in which the calculated hrv parameters are entered as dicts into the overall ppg_intervalls df - which miraculously did not conflict with any tests.

danibene added 2 commits June 4, 2023 16:42

test dataframe as input

eb20130

format with black

0fc8471

danibene reviewed Jun 4, 2023

View reviewed changes

danibene requested a review from DominiqueMakowski June 4, 2023 21:12

danibene added 2 commits June 7, 2023 08:45

remove type annotations

ac39ec1

Merge branch 'dev' into fix_output

9b953ca

danibene merged commit cf2d93f into neuropsychology:dev Jun 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Fixed memory baviour in hrv calc #837

[Fix] Fixed memory baviour in hrv calc #837

ThomasGebhardt commented Jun 2, 2023 •

edited

Loading

welcome bot commented Jun 2, 2023

codecov-commenter commented Jun 2, 2023 •

edited

Loading

danibene commented Jun 3, 2023

ThomasGebhardt commented Jun 3, 2023

danibene Jun 4, 2023

DominiqueMakowski Jun 5, 2023

ThomasGebhardt Jun 5, 2023

DominiqueMakowski Jun 5, 2023

danibene Jun 5, 2023

DominiqueMakowski Jun 5, 2023

danibene left a comment

welcome bot commented Jun 7, 2023

[Fix] Fixed memory baviour in hrv calc #837

[Fix] Fixed memory baviour in hrv calc #837

Conversation

ThomasGebhardt commented Jun 2, 2023 • edited Loading

welcome bot commented Jun 2, 2023

codecov-commenter commented Jun 2, 2023 • edited Loading

Codecov Report

danibene commented Jun 3, 2023

ThomasGebhardt commented Jun 3, 2023

danibene Jun 4, 2023

Choose a reason for hiding this comment

DominiqueMakowski Jun 5, 2023

Choose a reason for hiding this comment

ThomasGebhardt Jun 5, 2023

Choose a reason for hiding this comment

DominiqueMakowski Jun 5, 2023

Choose a reason for hiding this comment

danibene Jun 5, 2023

Choose a reason for hiding this comment

DominiqueMakowski Jun 5, 2023

Choose a reason for hiding this comment

danibene left a comment

Choose a reason for hiding this comment

welcome bot commented Jun 7, 2023

ThomasGebhardt commented Jun 2, 2023 •

edited

Loading

codecov-commenter commented Jun 2, 2023 •

edited

Loading