Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PandasDataset.from_long_dataframe does not work with MultivariateGrouper #3220

Open
satyrmipt opened this issue Sep 17, 2024 · 0 comments
Open
Labels
bug Something isn't working

Comments

@satyrmipt
Copy link

Description

Looking for correct way to apply MultivariateGrouper to data from PandasDataset.from_long_dataframe
or how to transform my custom dataset to object of type like get_dataset("electricity_nips", regenerate=False).
Please notice the train_grouper(train_ds) works well but test_grouper(test_ds) raises the error.

I have studied this example but it work with standard dataset and i see no examples how can i gather it from long (or any other) dataframe or convert

To Reproduce

(Please provide minimal example of code snippet that reproduces the error. For existing examples, please provide link.)

from gluonts.dataset.pandas import PandasDataset
from gluonts.dataset.split import OffsetSplitter
import gluonts
print(gluonts.__version__)
# create long dataframe with two time series, 100 values for every of them
df=pd.DataFrame(
  data={
    'target': [i for i in range(100)]+[i for i in range(100)],
    'item_id': ['var_1' for i in range(100)]+['var_2' for i in range(100)]
    },
    index=[i for i in pd.date_range(start='1970-01-01', periods=100, freq='1D')]+
    [i for i in pd.date_range(start='1970-01-01', periods=100, freq='1D')]
    )
print(df.info())
gluon_ds=PandasDataset.from_long_dataframe(
    dataframe=df,
    target='target',
    item_id='item_id',
    freq='1D'
)

splitter = OffsetSplitter(offset=70)
train_ds, test_template = splitter.split(gluon_ds)

test_ds = test_template.generate_instances(
    prediction_length=1,
    windows=29,
    distance=1
)
print(f"{train_ds=}")
print(f"{test_ds=}")

train_grouper = MultivariateGrouper(
    max_target_dim=2     
)

test_grouper = MultivariateGrouper(
    num_test_dates=29,    
    max_target_dim=2,       
)

train_gr_data = train_grouper(train_ds)
test_gr_data = test_grouper(test_ds)

Error message or code output

(Paste the complete error message, including stack trace, or the undesired output that the above snippet produces.)

0.15.1
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 200 entries, 1970-01-01 to 1970-04-10
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   target   200 non-null    int64 
 1   item_id  200 non-null    object
dtypes: int64(1), object(1)
memory usage: 4.7+ KB
None
train_ds=TrainingDataset(dataset=PandasDataset<size=2, freq=1D, num_feat_dynamic_real=0, num_past_feat_dynamic_real=0, num_feat_static_real=0, num_feat_static_cat=0, static_cardinalities=[]>, splitter=OffsetSplitter(offset=70))
test_ds=TestData(dataset=PandasDataset<size=2, freq=1D, num_feat_dynamic_real=0, num_past_feat_dynamic_real=0, num_feat_static_real=0, num_feat_static_cat=0, static_cardinalities=[]>, splitter=OffsetSplitter(offset=70), prediction_length=1, windows=29, distance=1, max_history=None)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-2-7778d6fa7e79>](https://localhost:8080/#) in <cell line: 42>()
     40 
     41 train_gr_data = train_grouper(train_ds)
---> 42 test_gr_data = test_grouper(test_ds)

1 frames
[/usr/local/lib/python3.10/dist-packages/gluonts/dataset/multivariate_grouper.py](https://localhost:8080/#) in __call__(self, dataset)
     85 
     86     def __call__(self, dataset: Dataset) -> Dataset:
---> 87         self._preprocess(dataset)
     88         return self._group_all(dataset)
     89 

[/usr/local/lib/python3.10/dist-packages/gluonts/dataset/multivariate_grouper.py](https://localhost:8080/#) in _preprocess(self, dataset)
     98         """
     99         for data in dataset:
--> 100             timestamp = data[FieldName.START]
    101 
    102             if self.first_timestamp is None:

TypeError: tuple indices must be integers or slices, not str

Environment

Operating system: google colab

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Python version: Python 3.10.12

GluonTS version: 0.15.1

MXNet version: no MXNet

(Add as much information about your environment as possible, e.g. dependencies versions.)

@satyrmipt satyrmipt added the bug Something isn't working label Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant