Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] StatsForecast AutoETS forecasting model #2988

Merged
merged 23 commits into from
Oct 31, 2022

Conversation

AzulGarza
Copy link
Contributor

@AzulGarza AzulGarza commented Oct 21, 2022

Description

This PR includes a new model for the forecast module. AutoETS selects the best model amongst the exponential smoothing family. Here's a reference for the model. Since StatsForecast is already included in the dependencies, it wasn't necessary to include it.

Here's a working example

image

And its comparison against the theta method,

image

  • Summary of the change / bug fix.
  • Link # issue, if applicable.
  • Screenshot of the feature or the bug before/after fix, if applicable.
  • Relevant motivation and context.
  • List any dependencies that are required for this change.

How has this been tested?

  • Please describe the tests that you ran to verify your changes.
  • Provide instructions so we can reproduce.
  • Please also list any relevant details for your test configuration.
  • Make sure affected commands still run in terminal
  • Ensure the SDK still works
  • Check any related reports

Checklist:

Others

  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.

@DidierRLopes DidierRLopes added feat S Small T-Shirt size Feature HACKTOBERFEST Hacktoberfest labels Oct 21, 2022
model_ets = ETS(
season_length=int(seasonal_periods),
)
fcst = StatsForecast(df=ticker_series, models=[model_ets], freq="B")
Copy link
Contributor

@martinb-ai martinb-ai Oct 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FedericoGarza Will this change in the future when we do not use "B" freq for time series?

Does StatsForecast do an auto-check if it doesn't happen to be "B"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @martinb-bb!

I think it would be better if there were an extra parameter in the models named freq, so that the user can include the frequency of their data.

As far as I know, it is difficult for pandas to infer the frequency "B" because there are missing values in the data (weekends).

Should I include the freq parameter in this PR?

Copy link
Contributor

@martinb-ai martinb-ai Oct 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @FedericoGarza, Just thinking aloud right now..

I would rather not add another kwarg for the time being, since we have so many. Some users are new to this field, and assuming they know frequency right out the gate can be overwhelming.

Instead, the goal is to make the menu robust under the hood. Our current code checks for B right off the bat, and if that fails, then tries to use pandas to assume freq. If that fails as well, we then resort to index of the Dataframe.

found in helpers.py

try:
    ticker_series = TimeSeries.from_dataframe(**filler_kwargs)
except ValueError:
    # remove business days to allow base lib to assume freq
    filler_kwargs.pop("freq")
    ticker_series = TimeSeries.from_dataframe(**filler_kwargs)
except AttributeError:
    # Uses index if no date column
    console.print(
        "[red]Warning: no date found, assuming items evenly spaced.[/red]\n"
    )
    filler_kwargs.pop("time_col")
    ticker_series = TimeSeries.from_dataframe(**filler_kwargs)

Something like this would be ideal. If we include your library with the Darts wrapper, we can then do the same logic and it should work identically to the other models we have. Else, you can add some sort of custom logic in front of the StatsForecast(df=ticker_series, models=[model_ets], freq="B") before you pass in the time series?

In the future, we may revisit the concept on frequency when we start including extraordinary datasets and/or have have feature requests - in which that will require a decent refactor.

Only some thoughts... feel free to comment on any other ideas!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @martinb-bb! Thanks for sharing your thoughts. I agree with you, asking users for freq would be overwhelming.

I've changed the preprocessing step to use helpers.get_series before using StatsForecast,

https://github.com/OpenBB-finance/OpenBBTerminal/blob/ba75a1f80f9fb8d8c0ef18af0494b06d564a9d56/openbb_terminal/forecast/autoets_model.py#L75-L79

This ensures that the same logic is applied to the frequency between all models.🙌

@martinb-ai
Copy link
Contributor

@FedericoGarza Is a progress bar like the other models possible? It allows the user to know whats happening under the hood and not that the terminal is frozen.

@AzulGarza
Copy link
Contributor Author

@FedericoGarza Is a progress bar like the other models possible? It allows the user to know whats happening under the hood and not that the terminal is frozen.

Hey @martinb-bb, I've included verbose=True in 5d1ea53. The verbose argument is only available as of v1.1.3 version of StatsForecast. Should I freeze that requirement in poetry.lock? Since darts installs the most recent version, perhaps it is unnecessary.

@martinb-ai
Copy link
Contributor

martinb-ai commented Oct 25, 2022

Hey @martinb-bb, I've included verbose=True in 5d1ea53. The verbose argument is only available as of v1.1.3 version of StatsForecast. Should I freeze that requirement in poetry.lock? Since darts installs the most recent version, perhaps it is unnecessary.

This is great!
Works nicely to give users some understanding on what is going on.
image

I would say yes, poetry should update the statsforecast to the latest, regardless what darts installs. If you set a requirement in the poetry under the -e prediction tag, that would be easiest.

For ref, we install darts into the conda env and then update the remaining libs through poetry. This will help us in the future when you would like to add in new features to your lib and include them into the terminal.

When you update poetry, I will test on my end to make sure it will play nicely with the other operating systems 😄
Nice work!

@AzulGarza
Copy link
Contributor Author

Hey @martinb-bb, I've included verbose=True in 5d1ea53. The verbose argument is only available as of v1.1.3 version of StatsForecast. Should I freeze that requirement in poetry.lock? Since darts installs the most recent version, perhaps it is unnecessary.

This is great! Works nicely to give users some understanding on what is going on. image

I would say yes, poetry should update the statsforecast to the latest, regardless what darts installs. If you set a requirement in the poetry under the -e prediction tag, that would be easiest.

For ref, we install darts into the conda env and then update the remaining libs through poetry. This will help us in the future when you would like to add in new features to your lib and include them into the terminal.

When you update poetry, I will test on my end to make sure it will play nicely with the other operating systems 😄 Nice work!

Thank you @martinb-bb! I've included statsforecast==1.1.3 in the conda-3-9-env-full.yaml file instead of in poetry to simplify changes. What do you think? :)

@AzulGarza AzulGarza requested review from martinb-ai and removed request for colin99d October 26, 2022 19:01
@martinb-ai
Copy link
Contributor

martinb-ai commented Oct 30, 2022

@FedericoGarza This is looking just about ready to merge from my end! We are currently doing a review of the forecast menu and adding in some UX improvements.

Could you please add a check for dataset size within your model.py to ensure users are running with at least 100 points? This is to prevent them from getting wonky results and expecting some valuable insights when using very small datasets. We have just merged a PR that added this in for Expo and theta as well : #3133

This is useful when they use the API.

@DidierRLopes anything else from a UX perspective that you see that could use a quick modification before we proceed to merge?

@DidierRLopes
Copy link
Collaborator

@FedericoGarza This is looking just about ready to merge from my end! We are currently doing a review of the forecast menu and adding in some UX improvements.

Could you please add a check for dataset size within your model.py to ensure users are running with at least 100 points? This is to prevent them from getting wonky results and expecting some valuable insights when using very small datasets. We have just merged a PR that added this in for Expo and theta as well : #3133

This is useful when they use the API.

@DidierRLopes anything else from a UX perspective that you see that could use a quick modification before we proceed to merge?

Nothing from my end, I created a PR with the UX improvements.

Let's merge this one first, and then I can rectify whatever else is needed. But this is looking great 😊

@jmaslek jmaslek merged commit 826303e into OpenBB-finance:main Oct 31, 2022
hjoaquim pushed a commit to hjoaquim/OpenBBTerminal that referenced this pull request Nov 1, 2022
* feat: add autoets model

* feat: update autoets index

* feat: add verbose option

* feat: freq preprocessing

* feat: improve test size call

* feat: add statsforecast dep to conda env full

Co-authored-by: James Maslek <[email protected]>
Co-authored-by: martinb-bb <[email protected]>
jmaslek added a commit that referenced this pull request Nov 1, 2022
* improved integration tests for forex

* integration test for dashboards

* added more tests to economy

* added test for forecast report

* added eedbt integration test

* added hist tests

* added eval command

* updatedeconometrics tests

* changed order

* test removing warning

* Revert "test removing warning"

This reverts commit 09ca09012156771428b2041b478296dbf2bfc780.

* adding todo

* Fix top categories and sorting not working (#3025)

* Fix categories and sorting not working

* Allow sort in ascending order

* Documented the new flag

* Fixed tests

* Fix bad practises

Co-authored-by: James Maslek <[email protected]>

* [FEAT] StatsForecast AutoETS forecasting model (#2988)

* feat: add autoets model

* feat: update autoets index

* feat: add verbose option

* feat: freq preprocessing

* feat: improve test size call

* feat: add statsforecast dep to conda env full

Co-authored-by: James Maslek <[email protected]>
Co-authored-by: martinb-bb <[email protected]>

* Improve ETF menu (#3124)

* improve etf menu

* improve docs

* fix tests

* should cover more edge cases

Co-authored-by: James Maslek <[email protected]>

* improve export (#3176)

* When renaming funcs to remove source at start, the translating keys were missing update (#3177)

* fix the en.yml key for some funcs

* fix tests

Co-authored-by: James Maslek <[email protected]>

* Adds guide for SDK/Keys (#3111)

* Adds guide for SDK/Keys

* Linter

* Applied requested changes, and improved the examples.

* Removed bullet point

Co-authored-by: Jeroen Bouma <[email protected]>

* Adds Introduction Guide for the Economy SDK Module (#3142)

* Adds Introduction Guide for the Economy SDK Module

* Changes title

Co-authored-by: Jeroen Bouma <[email protected]>

* version check for statforecast (#3227)

* Fix 3188 - UX for `Feature Engineering` functions  (#3224)

* fix depr. warning

* console printout enhancment

* improvement to sto for UX

* average true range UX

* signal UX improv.

* Fix Some Econometrics Bugs (#3210)

* Added suggestions for load

* Fixed a command

* Fixed panel

Co-authored-by: Henrique Joaquim <[email protected]>
Co-authored-by: James Maslek <[email protected]>

* Fixing 6 crypto bugs (#3228)

* fixing 6 crypto bugs

* fixing filter stuff

Co-authored-by: Colin Delahunty <[email protected]>

* Fix SDK plot colors without needing openbb.stocks.candle (#3178)

* fix SDK plot colors without needing openbb.stocks.candle

* lint

* ordering

Co-authored-by: James Maslek <[email protected]>
Co-authored-by: Colin Delahunty <[email protected]>

* Updating crypto integration tests (#3222)

* disc, ov, and base

* more integration tests and fixing previous commited ones

* some more fixes

Co-authored-by: Colin Delahunty <[email protected]>

* removing commented tests

* removing commented tests

Co-authored-by: Gerard <[email protected]>
Co-authored-by: James Maslek <[email protected]>
Co-authored-by: fede <[email protected]>
Co-authored-by: martinb-bb <[email protected]>
Co-authored-by: DidierRLopes <[email protected]>
Co-authored-by: Danglewood <[email protected]>
Co-authored-by: Jeroen Bouma <[email protected]>
Co-authored-by: Colin Delahunty <[email protected]>
Co-authored-by: James Simmons <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat S Small T-Shirt size Feature HACKTOBERFEST Hacktoberfest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants