Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Neuralforecast #1115

Merged
merged 27 commits into from
Sep 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
8f49397
Set the write output column type for forecast functions
xzdandy Sep 13, 2023
043d671
Fix forecast integration test
xzdandy Sep 13, 2023
0977c1f
Move the generic utils test
xzdandy Sep 13, 2023
092c03f
Fix ludwig unittest cases and add unittestcase for normal forecasting
xzdandy Sep 13, 2023
96e40db
Add unitest cases for forecast with rename in binder.
xzdandy Sep 13, 2023
5648371
Add unittest when an expected column is passed to forecasting
xzdandy Sep 13, 2023
8692ff1
Add unittest when required columns are missing in binder
xzdandy Sep 13, 2023
0679200
Merge branch 'staging' into neuralforecast
americast Sep 13, 2023
1fd3c02
Add neuralforecast support
americast Sep 14, 2023
65ed6e1
less horizon no retrain
americast Sep 15, 2023
5fd8af7
Merge branch 'staging' into neuralforecast
americast Sep 24, 2023
be242ee
add support for exogenous variables
americast Sep 25, 2023
583e778
Fix exogenous support; add tests
americast Sep 25, 2023
52c563e
add tests
americast Sep 25, 2023
84a159e
wip: fix test
americast Sep 25, 2023
06a7db0
remove strict column check in test
americast Sep 25, 2023
32a204b
Fix GPU issue with neuralforecast; fixed auto exog veriables
americast Sep 28, 2023
fda2b40
Merge remote-tracking branch 'origin/staging' into neuralforecast
americast Sep 28, 2023
736d9e0
added auto support; updated docs
americast Sep 29, 2023
06fb001
Update forecasting notebook.
xzdandy Sep 29, 2023
a36a1f5
fixes
americast Sep 29, 2023
eee78c9
Merge branch 'neuralforecast' of github.com:georgia-tech-db/evadb int…
americast Sep 29, 2023
09bee12
Fix horizon issue for multi uniqueids
americast Sep 29, 2023
b422000
update docs
americast Sep 29, 2023
e176bd4
fix exogenous for auto; made default
americast Sep 30, 2023
68265d3
turn auto off for neuralforecast test to avoid TLE error
americast Sep 30, 2023
267443d
Update the Notebook
xzdandy Sep 30, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
289 changes: 289 additions & 0 deletions data/forecasting/AirPassengersPanel.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,289 @@
ds,unique_id,y,trend,ylagged
1949-01-31,Airline1,112.0,0,112.0
1949-02-28,Airline1,118.0,1,118.0
1949-03-31,Airline1,132.0,2,132.0
1949-04-30,Airline1,129.0,3,129.0
1949-05-31,Airline1,121.0,4,121.0
1949-06-30,Airline1,135.0,5,135.0
1949-07-31,Airline1,148.0,6,148.0
1949-08-31,Airline1,148.0,7,148.0
1949-09-30,Airline1,136.0,8,136.0
1949-10-31,Airline1,119.0,9,119.0
1949-11-30,Airline1,104.0,10,104.0
1949-12-31,Airline1,118.0,11,118.0
1950-01-31,Airline1,115.0,12,112.0
1950-02-28,Airline1,126.0,13,118.0
1950-03-31,Airline1,141.0,14,132.0
1950-04-30,Airline1,135.0,15,129.0
1950-05-31,Airline1,125.0,16,121.0
1950-06-30,Airline1,149.0,17,135.0
1950-07-31,Airline1,170.0,18,148.0
1950-08-31,Airline1,170.0,19,148.0
1950-09-30,Airline1,158.0,20,136.0
1950-10-31,Airline1,133.0,21,119.0
1950-11-30,Airline1,114.0,22,104.0
1950-12-31,Airline1,140.0,23,118.0
1951-01-31,Airline1,145.0,24,115.0
1951-02-28,Airline1,150.0,25,126.0
1951-03-31,Airline1,178.0,26,141.0
1951-04-30,Airline1,163.0,27,135.0
1951-05-31,Airline1,172.0,28,125.0
1951-06-30,Airline1,178.0,29,149.0
1951-07-31,Airline1,199.0,30,170.0
1951-08-31,Airline1,199.0,31,170.0
1951-09-30,Airline1,184.0,32,158.0
1951-10-31,Airline1,162.0,33,133.0
1951-11-30,Airline1,146.0,34,114.0
1951-12-31,Airline1,166.0,35,140.0
1952-01-31,Airline1,171.0,36,145.0
1952-02-29,Airline1,180.0,37,150.0
1952-03-31,Airline1,193.0,38,178.0
1952-04-30,Airline1,181.0,39,163.0
1952-05-31,Airline1,183.0,40,172.0
1952-06-30,Airline1,218.0,41,178.0
1952-07-31,Airline1,230.0,42,199.0
1952-08-31,Airline1,242.0,43,199.0
1952-09-30,Airline1,209.0,44,184.0
1952-10-31,Airline1,191.0,45,162.0
1952-11-30,Airline1,172.0,46,146.0
1952-12-31,Airline1,194.0,47,166.0
1953-01-31,Airline1,196.0,48,171.0
1953-02-28,Airline1,196.0,49,180.0
1953-03-31,Airline1,236.0,50,193.0
1953-04-30,Airline1,235.0,51,181.0
1953-05-31,Airline1,229.0,52,183.0
1953-06-30,Airline1,243.0,53,218.0
1953-07-31,Airline1,264.0,54,230.0
1953-08-31,Airline1,272.0,55,242.0
1953-09-30,Airline1,237.0,56,209.0
1953-10-31,Airline1,211.0,57,191.0
1953-11-30,Airline1,180.0,58,172.0
1953-12-31,Airline1,201.0,59,194.0
1954-01-31,Airline1,204.0,60,196.0
1954-02-28,Airline1,188.0,61,196.0
1954-03-31,Airline1,235.0,62,236.0
1954-04-30,Airline1,227.0,63,235.0
1954-05-31,Airline1,234.0,64,229.0
1954-06-30,Airline1,264.0,65,243.0
1954-07-31,Airline1,302.0,66,264.0
1954-08-31,Airline1,293.0,67,272.0
1954-09-30,Airline1,259.0,68,237.0
1954-10-31,Airline1,229.0,69,211.0
1954-11-30,Airline1,203.0,70,180.0
1954-12-31,Airline1,229.0,71,201.0
1955-01-31,Airline1,242.0,72,204.0
1955-02-28,Airline1,233.0,73,188.0
1955-03-31,Airline1,267.0,74,235.0
1955-04-30,Airline1,269.0,75,227.0
1955-05-31,Airline1,270.0,76,234.0
1955-06-30,Airline1,315.0,77,264.0
1955-07-31,Airline1,364.0,78,302.0
1955-08-31,Airline1,347.0,79,293.0
1955-09-30,Airline1,312.0,80,259.0
1955-10-31,Airline1,274.0,81,229.0
1955-11-30,Airline1,237.0,82,203.0
1955-12-31,Airline1,278.0,83,229.0
1956-01-31,Airline1,284.0,84,242.0
1956-02-29,Airline1,277.0,85,233.0
1956-03-31,Airline1,317.0,86,267.0
1956-04-30,Airline1,313.0,87,269.0
1956-05-31,Airline1,318.0,88,270.0
1956-06-30,Airline1,374.0,89,315.0
1956-07-31,Airline1,413.0,90,364.0
1956-08-31,Airline1,405.0,91,347.0
1956-09-30,Airline1,355.0,92,312.0
1956-10-31,Airline1,306.0,93,274.0
1956-11-30,Airline1,271.0,94,237.0
1956-12-31,Airline1,306.0,95,278.0
1957-01-31,Airline1,315.0,96,284.0
1957-02-28,Airline1,301.0,97,277.0
1957-03-31,Airline1,356.0,98,317.0
1957-04-30,Airline1,348.0,99,313.0
1957-05-31,Airline1,355.0,100,318.0
1957-06-30,Airline1,422.0,101,374.0
1957-07-31,Airline1,465.0,102,413.0
1957-08-31,Airline1,467.0,103,405.0
1957-09-30,Airline1,404.0,104,355.0
1957-10-31,Airline1,347.0,105,306.0
1957-11-30,Airline1,305.0,106,271.0
1957-12-31,Airline1,336.0,107,306.0
1958-01-31,Airline1,340.0,108,315.0
1958-02-28,Airline1,318.0,109,301.0
1958-03-31,Airline1,362.0,110,356.0
1958-04-30,Airline1,348.0,111,348.0
1958-05-31,Airline1,363.0,112,355.0
1958-06-30,Airline1,435.0,113,422.0
1958-07-31,Airline1,491.0,114,465.0
1958-08-31,Airline1,505.0,115,467.0
1958-09-30,Airline1,404.0,116,404.0
1958-10-31,Airline1,359.0,117,347.0
1958-11-30,Airline1,310.0,118,305.0
1958-12-31,Airline1,337.0,119,336.0
1959-01-31,Airline1,360.0,120,340.0
1959-02-28,Airline1,342.0,121,318.0
1959-03-31,Airline1,406.0,122,362.0
1959-04-30,Airline1,396.0,123,348.0
1959-05-31,Airline1,420.0,124,363.0
1959-06-30,Airline1,472.0,125,435.0
1959-07-31,Airline1,548.0,126,491.0
1959-08-31,Airline1,559.0,127,505.0
1959-09-30,Airline1,463.0,128,404.0
1959-10-31,Airline1,407.0,129,359.0
1959-11-30,Airline1,362.0,130,310.0
1959-12-31,Airline1,405.0,131,337.0
1960-01-31,Airline1,417.0,132,360.0
1960-02-29,Airline1,391.0,133,342.0
1960-03-31,Airline1,419.0,134,406.0
1960-04-30,Airline1,461.0,135,396.0
1960-05-31,Airline1,472.0,136,420.0
1960-06-30,Airline1,535.0,137,472.0
1960-07-31,Airline1,622.0,138,548.0
1960-08-31,Airline1,606.0,139,559.0
1960-09-30,Airline1,508.0,140,463.0
1960-10-31,Airline1,461.0,141,407.0
1960-11-30,Airline1,390.0,142,362.0
1960-12-31,Airline1,432.0,143,405.0
1949-01-31,Airline2,412.0,144,412.0
1949-02-28,Airline2,418.0,145,418.0
1949-03-31,Airline2,432.0,146,432.0
1949-04-30,Airline2,429.0,147,429.0
1949-05-31,Airline2,421.0,148,421.0
1949-06-30,Airline2,435.0,149,435.0
1949-07-31,Airline2,448.0,150,448.0
1949-08-31,Airline2,448.0,151,448.0
1949-09-30,Airline2,436.0,152,436.0
1949-10-31,Airline2,419.0,153,419.0
1949-11-30,Airline2,404.0,154,404.0
1949-12-31,Airline2,418.0,155,418.0
1950-01-31,Airline2,415.0,156,412.0
1950-02-28,Airline2,426.0,157,418.0
1950-03-31,Airline2,441.0,158,432.0
1950-04-30,Airline2,435.0,159,429.0
1950-05-31,Airline2,425.0,160,421.0
1950-06-30,Airline2,449.0,161,435.0
1950-07-31,Airline2,470.0,162,448.0
1950-08-31,Airline2,470.0,163,448.0
1950-09-30,Airline2,458.0,164,436.0
1950-10-31,Airline2,433.0,165,419.0
1950-11-30,Airline2,414.0,166,404.0
1950-12-31,Airline2,440.0,167,418.0
1951-01-31,Airline2,445.0,168,415.0
1951-02-28,Airline2,450.0,169,426.0
1951-03-31,Airline2,478.0,170,441.0
1951-04-30,Airline2,463.0,171,435.0
1951-05-31,Airline2,472.0,172,425.0
1951-06-30,Airline2,478.0,173,449.0
1951-07-31,Airline2,499.0,174,470.0
1951-08-31,Airline2,499.0,175,470.0
1951-09-30,Airline2,484.0,176,458.0
1951-10-31,Airline2,462.0,177,433.0
1951-11-30,Airline2,446.0,178,414.0
1951-12-31,Airline2,466.0,179,440.0
1952-01-31,Airline2,471.0,180,445.0
1952-02-29,Airline2,480.0,181,450.0
1952-03-31,Airline2,493.0,182,478.0
1952-04-30,Airline2,481.0,183,463.0
1952-05-31,Airline2,483.0,184,472.0
1952-06-30,Airline2,518.0,185,478.0
1952-07-31,Airline2,530.0,186,499.0
1952-08-31,Airline2,542.0,187,499.0
1952-09-30,Airline2,509.0,188,484.0
1952-10-31,Airline2,491.0,189,462.0
1952-11-30,Airline2,472.0,190,446.0
1952-12-31,Airline2,494.0,191,466.0
1953-01-31,Airline2,496.0,192,471.0
1953-02-28,Airline2,496.0,193,480.0
1953-03-31,Airline2,536.0,194,493.0
1953-04-30,Airline2,535.0,195,481.0
1953-05-31,Airline2,529.0,196,483.0
1953-06-30,Airline2,543.0,197,518.0
1953-07-31,Airline2,564.0,198,530.0
1953-08-31,Airline2,572.0,199,542.0
1953-09-30,Airline2,537.0,200,509.0
1953-10-31,Airline2,511.0,201,491.0
1953-11-30,Airline2,480.0,202,472.0
1953-12-31,Airline2,501.0,203,494.0
1954-01-31,Airline2,504.0,204,496.0
1954-02-28,Airline2,488.0,205,496.0
1954-03-31,Airline2,535.0,206,536.0
1954-04-30,Airline2,527.0,207,535.0
1954-05-31,Airline2,534.0,208,529.0
1954-06-30,Airline2,564.0,209,543.0
1954-07-31,Airline2,602.0,210,564.0
1954-08-31,Airline2,593.0,211,572.0
1954-09-30,Airline2,559.0,212,537.0
1954-10-31,Airline2,529.0,213,511.0
1954-11-30,Airline2,503.0,214,480.0
1954-12-31,Airline2,529.0,215,501.0
1955-01-31,Airline2,542.0,216,504.0
1955-02-28,Airline2,533.0,217,488.0
1955-03-31,Airline2,567.0,218,535.0
1955-04-30,Airline2,569.0,219,527.0
1955-05-31,Airline2,570.0,220,534.0
1955-06-30,Airline2,615.0,221,564.0
1955-07-31,Airline2,664.0,222,602.0
1955-08-31,Airline2,647.0,223,593.0
1955-09-30,Airline2,612.0,224,559.0
1955-10-31,Airline2,574.0,225,529.0
1955-11-30,Airline2,537.0,226,503.0
1955-12-31,Airline2,578.0,227,529.0
1956-01-31,Airline2,584.0,228,542.0
1956-02-29,Airline2,577.0,229,533.0
1956-03-31,Airline2,617.0,230,567.0
1956-04-30,Airline2,613.0,231,569.0
1956-05-31,Airline2,618.0,232,570.0
1956-06-30,Airline2,674.0,233,615.0
1956-07-31,Airline2,713.0,234,664.0
1956-08-31,Airline2,705.0,235,647.0
1956-09-30,Airline2,655.0,236,612.0
1956-10-31,Airline2,606.0,237,574.0
1956-11-30,Airline2,571.0,238,537.0
1956-12-31,Airline2,606.0,239,578.0
1957-01-31,Airline2,615.0,240,584.0
1957-02-28,Airline2,601.0,241,577.0
1957-03-31,Airline2,656.0,242,617.0
1957-04-30,Airline2,648.0,243,613.0
1957-05-31,Airline2,655.0,244,618.0
1957-06-30,Airline2,722.0,245,674.0
1957-07-31,Airline2,765.0,246,713.0
1957-08-31,Airline2,767.0,247,705.0
1957-09-30,Airline2,704.0,248,655.0
1957-10-31,Airline2,647.0,249,606.0
1957-11-30,Airline2,605.0,250,571.0
1957-12-31,Airline2,636.0,251,606.0
1958-01-31,Airline2,640.0,252,615.0
1958-02-28,Airline2,618.0,253,601.0
1958-03-31,Airline2,662.0,254,656.0
1958-04-30,Airline2,648.0,255,648.0
1958-05-31,Airline2,663.0,256,655.0
1958-06-30,Airline2,735.0,257,722.0
1958-07-31,Airline2,791.0,258,765.0
1958-08-31,Airline2,805.0,259,767.0
1958-09-30,Airline2,704.0,260,704.0
1958-10-31,Airline2,659.0,261,647.0
1958-11-30,Airline2,610.0,262,605.0
1958-12-31,Airline2,637.0,263,636.0
1959-01-31,Airline2,660.0,264,640.0
1959-02-28,Airline2,642.0,265,618.0
1959-03-31,Airline2,706.0,266,662.0
1959-04-30,Airline2,696.0,267,648.0
1959-05-31,Airline2,720.0,268,663.0
1959-06-30,Airline2,772.0,269,735.0
1959-07-31,Airline2,848.0,270,791.0
1959-08-31,Airline2,859.0,271,805.0
1959-09-30,Airline2,763.0,272,704.0
1959-10-31,Airline2,707.0,273,659.0
1959-11-30,Airline2,662.0,274,610.0
1959-12-31,Airline2,705.0,275,637.0
1960-01-31,Airline2,717.0,276,660.0
1960-02-29,Airline2,691.0,277,642.0
1960-03-31,Airline2,719.0,278,706.0
1960-04-30,Airline2,761.0,279,696.0
1960-05-31,Airline2,772.0,280,720.0
1960-06-30,Airline2,835.0,281,772.0
1960-07-31,Airline2,922.0,282,848.0
1960-08-31,Airline2,906.0,283,859.0
1960-09-30,Airline2,808.0,284,763.0
1960-10-31,Airline2,761.0,285,707.0
1960-11-30,Airline2,690.0,286,662.0
1960-12-31,Airline2,732.0,287,705.0
41 changes: 31 additions & 10 deletions docs/source/reference/ai/model-forecasting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,16 +47,24 @@ EvaDB's default forecast framework is `statsforecast <https://nixtla.github.io/s
.. list-table:: Available Parameters
:widths: 25 75

* - PREDICT (**required**)
* - PREDICT (str, required)
- The name of the column we wish to forecast.
* - TIME
- The name of the column that contains the datestamp, wihch should be of a format expected by Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp. Please visit the `pandas documentation <https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html>`_ for details. If not provided, an auto increasing ID column will be used.
* - ID
- The name of column that represents an identifier for the series. If not provided, the whole table is considered as one series of data.
* - MODEL
- We can select one of AutoARIMA, AutoCES, AutoETS, AutoTheta. The default is AutoARIMA. Check `Automatic Forecasting <https://nixtla.github.io/statsforecast/src/core/models_intro.html#automatic-forecasting>`_ to learn details about these models.
* - Frequency
- A string indicating the frequency of the data. The common used ones are D, W, M, Y, which repestively represents day-, week-, month- and year- end frequency. The default value is M. Check `pandas available frequencies <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`_ for all available frequencies.
* - HORIZON (int, required)
- The number of steps into the future we wish to forecast.
* - TIME (str, default: 'ds')
- The name of the column that contains the datestamp, which should be of a format expected by Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp. Please visit the `pandas documentation <https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html>`_ for details. If relevant column is not found, an auto increasing ID column will be used.
* - ID (str, default: 'unique_id')
- The name of column that represents an identifier for the series. If relevant column is not found, the whole table is considered as one series of data.
* - LIBRARY (str, default: 'statsforecast')
- We can select one of `statsforecast` (default) or `neuralforecast`. `statsforecast` provides access to statistical forecasting methods, while `neuralforecast` gives access to deep-learning based forecasting methods.
* - MODEL (str, default: 'ARIMA')
- If LIBRARY is `statsforecast`, we can select one of ARIMA, CES, ETS, Theta. The default is ARIMA. Check `Automatic Forecasting <https://nixtla.github.io/statsforecast/src/core/models_intro.html#automatic-forecasting>`_ to learn details about these models. If LIBRARY is `neuralforecast`, we can select one of NHITS or NBEATS. The default is NBEATS. Check `NBEATS docs <https://nixtla.github.io/neuralforecast/models.nbeats.html>`_ for details.
* - AUTO (str, default: 'T')
- If set to 'T', it enables automatic hyperparameter optimization. Must be set to 'T' for `statsforecast` library. One may set this parameter to `false` if LIBRARY is `neuralforecast` for faster (but less reliable) results.
* - Frequency (str, default: 'auto')
- A string indicating the frequency of the data. The common used ones are D, W, M, Y, which repestively represents day-, week-, month- and year- end frequency. The default value is M. Check `pandas available frequencies <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`_ for all available frequencies. If it is not provided, the frequency is attempted to be determined automatically.

Note: If columns other than the ones required as mentioned above are passed while creating the function, they will be treated as exogenous variables if LIBRARY is `neuralforecast`. Otherwise, they would be ignored.

Below is an example query specifying the above parameters:

Expand All @@ -65,8 +73,21 @@ Below is an example query specifying the above parameters:
CREATE FUNCTION IF NOT EXISTS HomeRentalForecast FROM
(SELECT saledate, ma, type FROM HomeData)
TYPE Forecasting
HORIZON 12
PREDICT 'ma'
TIME 'saledate'
ID 'type'
MODEL 'AutoCES'
Frequency 'W';

Below is an example query with `neuralforecast` with `trend` column as exogenous and without automatic hyperparameter optimization:

.. code-block:: sql

CREATE FUNCTION AirPanelForecast FROM
(SELECT unique_id, ds, y, trend FROM AirDataPanel)
TYPE Forecasting
HORIZON 12
PREDICT 'y'
LIBRARY 'neuralforecast'
AUTO 'f'
FREQUENCY 'M';
4 changes: 0 additions & 4 deletions evadb/binder/statement_binder.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,10 +126,6 @@ def _bind_create_function_statement(self, node: CreateFunctionStatement):
elif column.name == arg_map.get("predict", "y"):
outputs.append(column)
required_columns.remove(column.name)
else:
raise BinderError(
f"Unexpected column {column.name} found for forecasting function."
)
assert (
americast marked this conversation as resolved.
Show resolved Hide resolved
len(required_columns) == 0
), f"Missing required {required_columns} columns for forecasting function."
Expand Down
Loading