Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

[WIP] Added TabularRegressionDataFrameDataSource wrapping TimeSeriesDataSet and TabularRegressionPreprocess stub #588

Conversation

sumanmichael
Copy link
Contributor

What does this PR do?

Created TabularRegressionPreprocess, TabularRegressionDataFrameDataSource wrapping TimeSeriesDataSet in pytorch-forecasting

Here's what I did.

  • Create TabularRegressionPreprocess stub
  • created TabularClassificationPreprocess in new tabular/classification/data.py & TabularRegressionPreprocess in new tabular/regression/data.py extending the TabularPreprocess in tabular/data.py by fixing the respective is_regression bool param.
  • created TabularClassificationDataFrameDataSource in new tabular/classification/data.py & TabularRegressionDataFrameDataSource in new tabular/regression/data.py extending the TabularPreprocess in tabular/data.py by wrapping TimeSeriesDataSet in pytorch-forecasting.
    • TabularRegressionData.from_data_frame(train_data, …, preprocess_kwargs)
    • that calls DataModule.from_data_source(“data_frame”, …)
    • that creates the TabularRegressionPreprocess with the preprocess_kwargs (including the args for the time series data set)
    • then from_data_source gets the data_frame data source and calls load_data with the data frame
      in load_data, we return a TimeSeriesDataset with the dataframe and the args from the init
  • changed required __ init __.py files

Fixes #528 #529

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests? [not needed for typos/docs]
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

  • Is this pull request ready for review? (if not, please submit in draft mode)

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@codecov
Copy link

codecov bot commented Jul 14, 2021

Codecov Report

Merging #588 (b147278) into master (27cc06d) will decrease coverage by 12.35%.
The diff coverage is 71.42%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #588       +/-   ##
===========================================
- Coverage   91.41%   79.06%   -12.36%     
===========================================
  Files         117      132       +15     
  Lines        7200     7829      +629     
===========================================
- Hits         6582     6190      -392     
- Misses        618     1639     +1021     
Flag Coverage Δ
unittests 79.06% <71.42%> (-12.36%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
flash/tabular/data.py 93.63% <ø> (ø)
flash/tabular/regression/data.py 66.66% <64.00%> (-33.34%) ⬇️
flash/tabular/classification/data.py 80.00% <76.92%> (-20.00%) ⬇️
flash/tabular/__init__.py 100.00% <100.00%> (ø)
flash/tabular/classification/__init__.py 100.00% <100.00%> (ø)
flash/tabular/regression/__init__.py 100.00% <100.00%> (ø)
flash/image/detection/data.py 29.77% <0.00%> (-65.58%) ⬇️
flash/image/detection/serialization.py 26.92% <0.00%> (-65.53%) ⬇️
flash/core/serve/_compat/cached_property.py 45.23% <0.00%> (-54.77%) ⬇️
flash/image/embedding/model.py 42.10% <0.00%> (-40.36%) ⬇️
... and 52 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27cc06d...b147278. Read the comment docs.

else:
DataFrame = object

from pytorch_forecasting import TimeSeriesDataSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should check if pytorch_forecasting is available.

from flash.tabular.data import TabularData, TabularDataFrameDataSource, TabularPreprocess


class TabularRegressionDataFrameDataSource(TabularDataFrameDataSource):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it confusing that TabularRegression is used to timeseries.
This should be a TimeSeriesForecasting Task using from_data_frame or from_csv function :)

Copy link
Collaborator

@ethanwharris ethanwharris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sumanmichael great work so far! It's definitely getting there. I think it would be simpler to just extend the base Preprocess and DataSource classes for tabular regression since regression and classification don't share many arguments or any functionality.

from flash.tabular.data import TabularData, TabularDataFrameDataSource, TabularPreprocess


class TabularRegressionDataFrameDataSource(TabularDataFrameDataSource):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this would be better just extending DataSource as it doesn't use any of the functionality from the TabularDataFrameDataSource

)


class TabularRegressionPreprocess(TabularPreprocess):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, maybe should just be a Preprocess? That way you can remove the arguments to init that are not used for regression

num_workers: Optional[int] = None,
**preprocess_kwargs: Any,
):
super().from_data_frame(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be super().from_data_source("data_frame", ...

@tchaton
Copy link
Contributor

tchaton commented Jul 15, 2021

Maybe this structure would be more appropriate:

tabular
    | _ forecasting
            | _ model.py: TabularForecaster

@sumanmichael sumanmichael marked this pull request as draft July 15, 2021 16:55
@stale
Copy link

stale bot commented Sep 14, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the won't fix This will not be worked on label Sep 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
won't fix This will not be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create TabularForecastingPreprocess stub
3 participants