PandasDataset.from_long_dataframe does not work with MultivariateGrouper #3220

satyrmipt · 2024-09-17T18:18:56Z

Description

Looking for correct way to apply MultivariateGrouper to data from PandasDataset.from_long_dataframe
or how to transform my custom dataset to object of type like get_dataset("electricity_nips", regenerate=False).
Please notice the train_grouper(train_ds) works well but test_grouper(test_ds) raises the error.

I have studied this example but it work with standard dataset and i see no examples how can i gather it from long (or any other) dataframe or convert

To Reproduce

(Please provide minimal example of code snippet that reproduces the error. For existing examples, please provide link.)

from gluonts.dataset.pandas import PandasDataset
from gluonts.dataset.split import OffsetSplitter
import gluonts
print(gluonts.__version__)
# create long dataframe with two time series, 100 values for every of them
df=pd.DataFrame(
  data={
    'target': [i for i in range(100)]+[i for i in range(100)],
    'item_id': ['var_1' for i in range(100)]+['var_2' for i in range(100)]
    },
    index=[i for i in pd.date_range(start='1970-01-01', periods=100, freq='1D')]+
    [i for i in pd.date_range(start='1970-01-01', periods=100, freq='1D')]
    )
print(df.info())
gluon_ds=PandasDataset.from_long_dataframe(
    dataframe=df,
    target='target',
    item_id='item_id',
    freq='1D'
)

splitter = OffsetSplitter(offset=70)
train_ds, test_template = splitter.split(gluon_ds)

test_ds = test_template.generate_instances(
    prediction_length=1,
    windows=29,
    distance=1
)
print(f"{train_ds=}")
print(f"{test_ds=}")

train_grouper = MultivariateGrouper(
    max_target_dim=2     
)

test_grouper = MultivariateGrouper(
    num_test_dates=29,    
    max_target_dim=2,       
)

train_gr_data = train_grouper(train_ds)
test_gr_data = test_grouper(test_ds)

Error message or code output

(Paste the complete error message, including stack trace, or the undesired output that the above snippet produces.)

0.15.1
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 200 entries, 1970-01-01 to 1970-04-10
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   target   200 non-null    int64 
 1   item_id  200 non-null    object
dtypes: int64(1), object(1)
memory usage: 4.7+ KB
None
train_ds=TrainingDataset(dataset=PandasDataset<size=2, freq=1D, num_feat_dynamic_real=0, num_past_feat_dynamic_real=0, num_feat_static_real=0, num_feat_static_cat=0, static_cardinalities=[]>, splitter=OffsetSplitter(offset=70))
test_ds=TestData(dataset=PandasDataset<size=2, freq=1D, num_feat_dynamic_real=0, num_past_feat_dynamic_real=0, num_feat_static_real=0, num_feat_static_cat=0, static_cardinalities=[]>, splitter=OffsetSplitter(offset=70), prediction_length=1, windows=29, distance=1, max_history=None)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-2-7778d6fa7e79>](https://localhost:8080/#) in <cell line: 42>()
     40 
     41 train_gr_data = train_grouper(train_ds)
---> 42 test_gr_data = test_grouper(test_ds)

1 frames
[/usr/local/lib/python3.10/dist-packages/gluonts/dataset/multivariate_grouper.py](https://localhost:8080/#) in __call__(self, dataset)
     85 
     86     def __call__(self, dataset: Dataset) -> Dataset:
---> 87         self._preprocess(dataset)
     88         return self._group_all(dataset)
     89 

[/usr/local/lib/python3.10/dist-packages/gluonts/dataset/multivariate_grouper.py](https://localhost:8080/#) in _preprocess(self, dataset)
     98         """
     99         for data in dataset:
--> 100             timestamp = data[FieldName.START]
    101 
    102             if self.first_timestamp is None:

TypeError: tuple indices must be integers or slices, not str

Environment

Operating system: google colab

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Python version: Python 3.10.12

GluonTS version: 0.15.1

MXNet version: no MXNet

(Add as much information about your environment as possible, e.g. dependencies versions.)

The text was updated successfully, but these errors were encountered:

satyrmipt added the bug Something isn't working label Sep 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PandasDataset.from_long_dataframe does not work with MultivariateGrouper #3220

PandasDataset.from_long_dataframe does not work with MultivariateGrouper #3220

satyrmipt commented Sep 17, 2024

PandasDataset.from_long_dataframe does not work with MultivariateGrouper #3220

PandasDataset.from_long_dataframe does not work with MultivariateGrouper #3220

Comments

satyrmipt commented Sep 17, 2024

Description

To Reproduce

Error message or code output

Environment