itme_ids in multivariate datasets #3217

satyrmipt · 2024-09-11T20:17:56Z

Description

Not sure if this is bug or feature. While using datasets with several item_ids (strings) one pre each time series on the train step i can't find item ids in targets and forecasts after make_evaluation_predictions. Predictions for different time series marked by numbers [0, series number). But the order of this numeration is unclear as well as why we even introduce item_ids on the earlier stages of data processing if we lost them on inference.

# original pandas dataset:
normal_train_df=pd.read_pickle(os.path.join(ETL_PATH, 'normal_train_df.pkl'))
normal_test_df=pd.read_pickle(os.path.join(ETL_PATH, 'normal_test_df.pkl'))
print(f"Train ds params:\n\t{normal_train_df.shape}\n\t{normal_train_df.columns}\n\tunique item_id: {normal_train_df['item_id'].nunique()}\n\tunique timestamps: {normal_train_df.index.nunique()}")
print(f"Test ds params:\n\t{normal_test_df.shape}\n\t{normal_test_df.columns}\n\tunique item_id: {normal_test_df['item_id'].nunique()}\n\tunique timestamps: {normal_test_df.index.nunique()}")
# Train ds params:
# 	(1213236, 2)
# 	Index(['target', 'item_id'], dtype='object')
# 	unique item_id: 1206
# 	unique timestamps: 1006
# Test ds params:
# 	(1587096, 2)
# 	Index(['target', 'item_id'], dtype='object')
# 	unique item_id: 1206
# 	unique timestamps: 1316

# conver to gluonts ds:
train_data=PandasDataset.from_long_dataframe(
    dataframe=normal_train_df,
    target='target',
    item_id='item_id',
    freq=FREQ,
    unchecked=False,
    assume_sorted=False 
)

test_data=PandasDataset.from_long_dataframe(
    dataframe=normal_test_df,
    target='target',
    item_id='item_id',
    freq=FREQ,
    unchecked=False,
    assume_sorted=False 
)

# apply grouper and train the model according to itransformer.ipynb example:
train_grouper = MultivariateGrouper(
    max_target_dim=1206
)

test_grouper = MultivariateGrouper(
    num_test_dates=1,      # have no rolling forecasts
    max_target_dim=1206
)

train_gr_data = train_grouper(train_data)
test_gr_data = test_grouper(test_data)

start_time=time.time()
estimator = ITransformerEstimator(
    prediction_length=4,
    context_length=30,
    scaling="std",                  
    nonnegative_pred_samples=False, 
    trainer_kwargs=dict(max_epochs=3)
  )

predictor = estimator.train(
    train_gr_data, 
    cache_data=True, 
    shuffle_buffer_length=1024
  )

# inference  according to itransformer.ipynb example:
evaluator = MultivariateEvaluator(
    quantiles=(np.arange(5) / 5.0)[1:], target_agg_funcs={"sum": np.sum},
    )

forecast, ts = make_evaluation_predictions(
    dataset=test_gr_data, 
    predictor=predictor, 
    num_samples=100
)

forecasts = list(forecast)
targets = list(ts)

Now when i look at forecasts the begining of data structure is
[gluonts.model.forecast.SampleForecast(info=None, item_id=None, samples=array([[[ ........

So item_id=None
Comparing data in original dataset with targets and forecasts i could guess items are enumerated in order of sorted(set(normal_test_df.item_id)).

Is there a way to incorporate item_ids as part of the trained model and use them on inference in make_evaluation_predictions and evaluator results instead of serial number? Right now i have to save model and sorted(set(normal_test_df.item_id)) as two separate objects to use 'em on inference.

The text was updated successfully, but these errors were encountered:

satyrmipt added the bug Something isn't working label Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

itme_ids in multivariate datasets #3217

itme_ids in multivariate datasets #3217

satyrmipt commented Sep 11, 2024

itme_ids in multivariate datasets #3217

itme_ids in multivariate datasets #3217

Comments

satyrmipt commented Sep 11, 2024

Description