-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve automatic periodicity determination #3912
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3912 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 347 347
Lines 36755 36745 -10
=======================================
- Hits 36625 36619 -6
+ Misses 130 126 -4
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some clarification questions and quick fixes but otherwise looks good to me!
|
||
""" | ||
|
||
def _get_rel_max_from_acf(y): | ||
"""Determines the relative maxima of the target's autocorrelation.""" | ||
acf = sm.tsa.acf(y, nlags=np.maximum(400, len(y))) | ||
filter_acf = [acf[i] if (acf[i] > 0) else 0 for i in range(len(acf))] | ||
filter_acf = [acf[i] if (acf[i] > 0.01) else 0 for i in range(len(acf))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why we need this change!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question! We were seeing that datasets with no seasonality were still getting a result due to having some very small autocorrelations. This is to filter those out so we correctly say these datasets have no seasonality. Datasets with a seasonal period will be more on the scale of 0.1-0.6, so we aren't filtering out anything significant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want this 0.01 to be a variable that's defined in the decomposer. I think that's probably a better practice.
evalml/pipelines/components/transformers/preprocessing/decomposer.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor comments! Nice work!!
|
||
""" | ||
|
||
def _get_rel_max_from_acf(y): | ||
"""Determines the relative maxima of the target's autocorrelation.""" | ||
acf = sm.tsa.acf(y, nlags=np.maximum(400, len(y))) | ||
filter_acf = [acf[i] if (acf[i] > 0) else 0 for i in range(len(acf))] | ||
filter_acf = [acf[i] if (acf[i] > 0.01) else 0 for i in range(len(acf))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want this 0.01 to be a variable that's defined in the decomposer. I think that's probably a better practice.
evalml/tests/component_tests/decomposer_tests/test_decomposer.py
Outdated
Show resolved
Hide resolved
evalml/pipelines/components/transformers/preprocessing/decomposer.py
Outdated
Show resolved
Hide resolved
"""Uses a moving average to determine the target's trend and remove it.""" | ||
# A larger moving average will be less likely to remove the seasonal signal | ||
# but we need to make sure we're not passing in a window that's larger than the data | ||
moving_avg = min(51, len(y) // 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 51 and 3 seem a little arbitrary and kind of trigger my "should this be hardcoded" sense, but I think in this case it's probably fine.
Simplifies and improves performance of our
decomposer.determine_period
method by removing the need to pre-fit the model for detrending, using a moving average to detrend instead.