Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Modify signal_interpolate for input with NaNs and extrapolation #666

Conversation

danibene
Copy link
Collaborator

Description

Modify the signal_interpolate() function to accept a single array containing NaNs as input and to extrapolate.

(See #651 for original motivation for doing so)

Proposed Changes

I modified the signal_interpolate() function as follows:

  1. Accepting a single array containing NaNs as input (either x_values or y_values can be None), for interpolation of missing values like in Pandas
  2. Adding fill_value as an argument to be passed to the Scipy interpolate function, such that it can be set to "extrapolate"

Checklist

Here are some things to check before creating the PR. If you encounter any issues, do let us know :)

  • I have read the CONTRIBUTING file.
  • My PR is targetted at the dev branch (and not towards the master branch).
  • I ran the CODE CHECKS on the files I added or modified and fixed the errors.
  • I have added the newly added features to News.rst (if applicable)

@codecov-commenter
Copy link

codecov-commenter commented Jun 22, 2022

Codecov Report

Merging #666 (a168ec0) into dev (6b7147d) will decrease coverage by 0.00%.
The diff coverage is 52.17%.

@@            Coverage Diff             @@
##              dev     #666      +/-   ##
==========================================
- Coverage   52.75%   52.75%   -0.01%     
==========================================
  Files         277      277              
  Lines       12615    12634      +19     
==========================================
+ Hits         6655     6665      +10     
- Misses       5960     5969       +9     
Impacted Files Coverage Δ
neurokit2/signal/signal_interpolate.py 64.10% <52.17%> (-20.90%) ⬇️
neurokit2/rsp/rsp_simulate.py 94.64% <0.00%> (+0.89%) ⬆️
neurokit2/eda/eda_eventrelated.py 83.67% <0.00%> (+2.04%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@DominiqueMakowski
Copy link
Member

What's the status here is it good for review/merge or still wip?

@danibene
Copy link
Collaborator Author

danibene commented Jul 5, 2022

@DominiqueMakowski @JanCBrammer I think it's good for review, but let me know if I'm missing anything!

@JanCBrammer
Copy link
Collaborator

Will try to have a look this week. Please @ me here in case you don't hear back.

Co-authored-by: Dominique Makowski <[email protected]>
@DominiqueMakowski
Copy link
Member

DominiqueMakowski commented Jul 11, 2022

gentle ping just in case @JanCBrammer

@JanCBrammer
Copy link
Collaborator

@danibene, @DominiqueMakowski, I recall that settling on an API for signal_interpolate() wasn't straightforward: #247. This is why I'm hesitant to change it.

Pandas-style NaN interpolation

The NaN interpolation is a rather big modification of the API: going from mandatory x_values and y_values to being able to omitting one of them. This modification adds complexity and overhead and I'm not sure it's worth it. What do you think @DominiqueMakowski.

custom extrapolation values

I'm not opposed to adding the fill_value parameter. Ideally it should be applied to the monotone_cubic method as well.

I changed the identification of the indices corresponding to the original first and last x values since if the x values are decimals e.g. 0.33, 0.66, ... rounding to the next integer is not very precise.
@DominiqueMakowski
Copy link
Member

@Tam-Pham tell us what you think so we can merge or not and release 2.1

@DominiqueMakowski
Copy link
Member

@Tam-Pham bump

1 similar comment
@DominiqueMakowski
Copy link
Member

DominiqueMakowski commented Jul 19, 2022

@Tam-Pham bump

@DominiqueMakowski
Copy link
Member

I'm not sure what to do here, I don't have a clear idea of the risks / benefits balance, and @JanCBrammer comment makes me cautious. Is it indeed a change that will benefit many users and carries no risk?

@danibene
Copy link
Collaborator Author

@DominiqueMakowski

I'm not sure what to do here, I don't have a clear idea of the risks / benefits balance, and @JanCBrammer comment makes me cautious. Is it indeed a change that will benefit many users and carries no risk?

It's hard for me to say - maybe an alternative could be that I remove the change that @JanCBrammer was hesitant about, i.e.:

Accepting a single array containing NaNs as input (either x_values or y_values can be None), for interpolation of missing values like in Pandas

and I could open a separate PR for a separate function that calls signal_interpolate() and takes only one input array? E.g. signal_interpolate_nan()? Then this separate function could eventually be used in signal_fixpeaks() instead of Pandas.

@DominiqueMakowski
Copy link
Member

So if I understand, we want to add the possibility of interpolating arrays with nan pandas-style to bypass the usage of pandas and allow more control over it?

To do so, we have modified signal_interpolate, that previously required x_values and y_values, now if only one is passed, and has Nans, it will interpolate them am I right?

@danibene
Copy link
Collaborator Author

@DominiqueMakowski

So if I understand, we want to add the possibility of interpolating arrays with nan pandas-style to bypass the usage of pandas and allow more control over it?

To do so, we have modified signal_interpolate, that previously required x_values and y_values, now if only one is passed, and has Nans, it will interpolate them am I right?

Yes that's all correct!

Do you think we should keep this addition in this PR or just the changes that allow extrapolation of points outside of the data range?

@DominiqueMakowski
Copy link
Member

i think it's fine, perhaps to make it clearer we can have an internal function called _signal_interpolate_nan() and if y_values is None and x_values contains NaN, we directly call this function (so that it's internally "separated" from the rest of the regular interpolation behavior)

I guess as long as it doesn't break the current behavior it's good

@pull-request-size pull-request-size bot added size/L and removed size/M labels Aug 25, 2022
@danibene
Copy link
Collaborator Author

@DominiqueMakowski

i think it's fine, perhaps to make it clearer we can have an internal function called _signal_interpolate_nan() and if y_values is None and x_values contains NaN, we directly call this function (so that it's internally "separated" from the rest of the regular interpolation behavior)

Done, let me know if you'd like me to change anything!

Also, I changed an unrelated line in cd02053

if len(x_values) == len(x_new):
return y_values

It returned the original y values if the length of the new x values matched the length of the original x values, but the length can still match without the values matching, so instead I changed it to check that all the values match.

Here is an example of the difference in behavior in case the x values don't match but their length does

@DominiqueMakowski
Copy link
Member

Thanks a ton Dani for all that work!!

@DominiqueMakowski DominiqueMakowski merged commit 2038b6f into neuropsychology:dev Aug 29, 2022
@danibene danibene deleted the feature/signal_interpolate_nans_extrapolate branch August 29, 2022 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants