Skip to content

[BUG] Different handling of nans at the beginning in TimeSeriesImputerTransform #601

Closed
1 task done
Mr-Geekman opened this issue Mar 14, 2022 · 1 comment · Fixed by #634
Closed
1 task done
Assignees
Labels
bug Something isn't working

Comments

@Mr-Geekman
Copy link
Contributor

🐛 Bug Report

Currently different strategies of TimeSeriesImputerTransform handle NaNs at the beginning of the segments differenlty.
Don't touch them:

  • running_mean

Fill them:

  • zero
  • mean
  • forward_fill

I think there should be some consistency.

Expected behavior

There can be two solutions:

  1. Fill nans at running_mean: it doesn't seem right
  2. Don't fill nans in the beginning in other methods: potentially it can lead to errors in some pipelines

Current suggested solution: 2.

How To Reproduce

Script:

from copy import deepcopy

import numpy as np
import pandas as pd

from etna.datasets import TSDataset
from etna.transforms import TimeSeriesImputerTransform


def get_ts() -> TSDataset:
    periods = 100
    df1 = pd.DataFrame({"timestamp": pd.date_range("2020-01-01", periods=periods)})
    df1["segment"] = "segment_1"
    df1["target"] = np.random.uniform(10, 20, size=periods)

    df2 = pd.DataFrame({"timestamp": pd.date_range("2020-01-01", periods=periods)})
    df2["segment"] = "segment_2"
    df2["target"] = np.random.uniform(-15, 5, size=periods)

    df = pd.concat([df1, df2]).reset_index(drop=True)
    df = TSDataset.to_dataset(df)
    df.loc[:4, pd.IndexSlice["segment_1", "target"]] = None
    tsds = TSDataset(df, freq="D")
    return tsds


ts = get_ts()
ts.info()

copy_ts = deepcopy(ts)
copy_ts.fit_transform(transforms=[TimeSeriesImputerTransform(strategy="zero")])
print()
print("Zero strategy")
copy_ts.info()

copy_ts = deepcopy(ts)
copy_ts.fit_transform(transforms=[TimeSeriesImputerTransform(strategy="mean")])
print()
print("Mean strategy")
copy_ts.info()

copy_ts = deepcopy(ts)
copy_ts.fit_transform(transforms=[TimeSeriesImputerTransform(strategy="running_mean")])
print()
print("Running mean strategy")
copy_ts.info()

copy_ts = deepcopy(ts)
copy_ts.fit_transform(transforms=[TimeSeriesImputerTransform(strategy="forward_fill")])
print()
print("Forward fill strategy")
copy_ts.info()

You'll see the lengths of the dataset after different imputer strategy.

Environment

No response

Additional context

No response

Checklist

  • Bug appears at the latest library version
@Mr-Geekman Mr-Geekman added the bug Something isn't working label Mar 14, 2022
@Mr-Geekman Mr-Geekman moved this to Specification in etna board Mar 14, 2022
@iKintosh iKintosh moved this from Specification to Todo in etna board Mar 28, 2022
@Mr-Geekman Mr-Geekman self-assigned this Apr 1, 2022
@Mr-Geekman Mr-Geekman moved this from Todo to In Progress in etna board Apr 1, 2022
@Mr-Geekman Mr-Geekman moved this from In Progress to In Review in etna board Apr 1, 2022
@Mr-Geekman Mr-Geekman moved this from In Review to In Progress in etna board Apr 1, 2022
@Mr-Geekman Mr-Geekman moved this from In Progress to In Review in etna board Apr 1, 2022
@martins0n martins0n moved this from In Review to In Progress in etna board Apr 5, 2022
@Mr-Geekman Mr-Geekman moved this from In Progress to In Review in etna board Apr 5, 2022
Repository owner moved this from In Review to Done in etna board Apr 5, 2022
@Vodyanoy17
Copy link

Currently TimeSeriesImputerTransform doesn't handle NaN at the beginning of the segments even when the strategy="zero"
this issue becomes critical when we use
pipeline.backtest with many folds, because we can not control the [start; end] of the fold

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants