Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_test_nbeatsx confusion #12

Open
MC-Dave opened this issue Mar 27, 2023 · 2 comments
Open

run_test_nbeatsx confusion #12

MC-Dave opened this issue Mar 27, 2023 · 2 comments

Comments

@MC-Dave
Copy link

MC-Dave commented Mar 27, 2023

I am trying to use this project on my own task. I have gone through the process to tune a model and find an optimal configuration.

However, during the process of trying to forecast on future data, I am getting significantly worse MAE than during training/testing.

I am confused about run_test_nbeatsx and its behavior. My understanding is that the exogenous variables for the forecast horizon are not available at prediction time. However, when I change the values in X_df when forecasting, the output y forecast values change.

In run_test_nbeatsx the logic seems to imply that all the exogenous values must be present and filled in the forecast window.

# Test dataset and loader, to sample window with the day currently being predicted
# Test mask: 1s for 24 lead time
test_mask = np.zeros(len(Y_df_scaled))
test_mask[-offset:] = 1
test_mask[(len(Y_df_scaled) - offset + mc['output_size']):] = 0

assert test_mask.sum() == mc['output_size'], f'Sum of Test mask must be {mc["output_size"]} not {test_mask.sum()}'

ts_dataset_test = TimeSeriesDataset(Y_df=Y_df_scaled, X_df=X_df_scaled, ts_train_mask=test_mask)
test_ts_loader = TimeSeriesLoader(model='nbeats',
                                  ts_dataset=ts_dataset_test,
                                  window_sampling_limit=mc['window_sampling_limit_multiplier'] * mc['output_size'],
                                  offset=offset - mc['output_size'], # To bypass leakeage protection
                                  input_size=int(mc['input_size_multiplier'] * mc['output_size']),
                                  output_size=int(mc['output_size']),
                                  idx_to_sample_freq=24,
                                  batch_size=int(mc['batch_size']),
                                  is_train_loader=True,
                                  shuffle=False)

...

 _, y_hat_split, y_hat_decomposed_split, _ = model.predict(ts_loader=test_ts_loader,  return_decomposition=True)             

if mc['normalizer_y'] is not None:
    y_hat_split = scaler_y.inv_scale(x=y_hat_split)

print('Prediction: ', y_hat_split)

Am I misunderstanding something? Why is the test logic forecasting on the last 24 periods using the exogenous data that otherwise wouldn't be available at forecast time?

Thanks in advance for you assistance. This is a great project

@cchallu
Copy link
Owner

cchallu commented Mar 27, 2023

Hi @MC-Dave. Yes, we assume the exogenous variables are known for the forecasting window. In EPF, the exogenous variables correspond to predictions of demand and offer for the forecasting window.

We have a general implementation of the model in our NeuralForecast library (https://github.com/Nixtla/neuralforecast). This implementation allows for 3 types of exogenous variables: static, future temporal (available in the forecasting window), and historic temporal (unavailable for future values). This tutorial shows how to use a model with different types of variables: https://nixtla.github.io/neuralforecast/examples/exogenous_variables.html

@MC-Dave
Copy link
Author

MC-Dave commented Mar 27, 2023

@cchallu Thank you very much for the quick reply.

I assume there is no support for historic temporal variables in this repo?

I suppose I misunderstood the include_var_dict and the meaning of the offsets.

From the code it is implied that variables like week_day are known ahead of time, which is why you can set it to -1 for future. The Other variables, including y, must be -2 or less. -2 here meaning past variable.

This comment is under def run_val_nbeatsx

# This dictionary will be used to select particular lags as inputs for each y and exogenous variables.
# For eg, -1 will include the future (corresponding to the forecasts variables), -2 will add the last
# available day (1 day lag), etc.

I have used the NeuralForecast library prior to working with this project. I tried this project because it gives a much greater control over the parameters available, as well as implemented a very helpful hyperparameter optimization loop.

Thank you again for your assistance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants