Optimize `TSDataset.describe` #1341

Mr-Geekman · 2023-08-01T06:15:34Z

🚀 Feature Request

We should optimize method TSDataset.describe, because it can consume up to 30% of all computation time during backtest on NaiveModel with 10k segments.

Proposal

In current implementation the bottleneck is TSDataset._gather_segments_data and it should be optimized. The problem lies in per-segment iteration.

Possible solution:

Vectorization
Optimization of one iteration
Rewriting cycle using numba

As an alternative we could optimize the places where TSDataset.describe is used:

Test cases

Make sure current tests pass.

Additional context

Connected issues: #1336.

The text was updated successfully, but these errors were encountered:

Mr-Geekman added the enhancement New feature or request label Aug 1, 2023

Mr-Geekman added this to etna board Aug 1, 2023

github-project-automation bot moved this to Specification in etna board Aug 1, 2023

Mr-Geekman moved this from Specification to Todo in etna board Aug 1, 2023

Mr-Geekman self-assigned this Aug 2, 2023

Mr-Geekman moved this from Todo to In Progress in etna board Aug 2, 2023

Mr-Geekman mentioned this issue Aug 2, 2023

Optimize TSDataset.describe and TSDataset.info by vectorization #1344

Merged

4 tasks

Mr-Geekman moved this from In Progress to In Review in etna board Aug 2, 2023

alex-hse-repository closed this as completed in #1344 Aug 4, 2023

github-project-automation bot moved this from In Review to Done in etna board Aug 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `TSDataset.describe` #1341

Optimize `TSDataset.describe` #1341

Mr-Geekman commented Aug 1, 2023 •

edited

Loading

Optimize TSDataset.describe #1341

Optimize TSDataset.describe #1341

Comments

Mr-Geekman commented Aug 1, 2023 • edited Loading

🚀 Feature Request

Proposal

Test cases

Additional context

Optimize `TSDataset.describe` #1341

Optimize `TSDataset.describe` #1341

Mr-Geekman commented Aug 1, 2023 •

edited

Loading