Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve automatic periodicity determination #3912

Merged
merged 19 commits into from
Jan 23, 2023
Merged

Conversation

eccabay
Copy link
Contributor

@eccabay eccabay commented Jan 6, 2023

Simplifies and improves performance of our decomposer.determine_period method by removing the need to pre-fit the model for detrending, using a moving average to detrend instead.

@codecov
Copy link

codecov bot commented Jan 6, 2023

Codecov Report

Merging #3912 (c65da15) into main (62029a0) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3912     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        347     347             
  Lines      36755   36745     -10     
=======================================
- Hits       36625   36619      -6     
+ Misses       130     126      -4     
Impacted Files Coverage Δ
...sts/decomposer_tests/test_polynomial_decomposer.py 100.0% <ø> (ø)
...nent_tests/decomposer_tests/test_stl_decomposer.py 100.0% <ø> (ø)
...omponents/transformers/preprocessing/decomposer.py 100.0% <100.0%> (+0.8%) ⬆️
...omponent_tests/decomposer_tests/test_decomposer.py 100.0% <100.0%> (+0.4%) ⬆️
evalml/tests/conftest.py 98.2% <0.0%> (+0.3%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@eccabay eccabay marked this pull request as ready for review January 12, 2023 21:46
Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some clarification questions and quick fixes but otherwise looks good to me!


"""

def _get_rel_max_from_acf(y):
"""Determines the relative maxima of the target's autocorrelation."""
acf = sm.tsa.acf(y, nlags=np.maximum(400, len(y)))
filter_acf = [acf[i] if (acf[i] > 0) else 0 for i in range(len(acf))]
filter_acf = [acf[i] if (acf[i] > 0.01) else 0 for i in range(len(acf))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why we need this change!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! We were seeing that datasets with no seasonality were still getting a result due to having some very small autocorrelations. This is to filter those out so we correctly say these datasets have no seasonality. Datasets with a seasonal period will be more on the scale of 0.1-0.6, so we aren't filtering out anything significant here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want this 0.01 to be a variable that's defined in the decomposer. I think that's probably a better practice.

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor comments! Nice work!!


"""

def _get_rel_max_from_acf(y):
"""Determines the relative maxima of the target's autocorrelation."""
acf = sm.tsa.acf(y, nlags=np.maximum(400, len(y)))
filter_acf = [acf[i] if (acf[i] > 0) else 0 for i in range(len(acf))]
filter_acf = [acf[i] if (acf[i] > 0.01) else 0 for i in range(len(acf))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want this 0.01 to be a variable that's defined in the decomposer. I think that's probably a better practice.

"""Uses a moving average to determine the target's trend and remove it."""
# A larger moving average will be less likely to remove the seasonal signal
# but we need to make sure we're not passing in a window that's larger than the data
moving_avg = min(51, len(y) // 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 51 and 3 seem a little arbitrary and kind of trigger my "should this be hardcoded" sense, but I think in this case it's probably fine.

@eccabay eccabay enabled auto-merge (squash) January 20, 2023 22:35
@eccabay eccabay merged commit 21e0baa into main Jan 23, 2023
@eccabay eccabay deleted the 388_determine_period branch January 23, 2023 19:16
@chukarsten chukarsten mentioned this pull request Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants