feat: Scaler for the `Dataset`. #4

daniel-dodd · 2022-12-20T15:11:20Z

Would be nice to have a Scaler object that scales inputs or and outputs of a jaxutils.Dataset, and that saves the mean and variance, to scale test inputs for later.

from jaxutils import PyTree

class Scaler(PyTree):
  ...

# call method scales data and "fits the scale transform"

train = jaxutils.Dataset(X=..., y=...)
test = jaxutils.Dataset(X=..., y=...)

scaler = Scaler(...)
scaled_train = Scaler(train) # learn the transform
scaled_test = Scaler(test) # scales the test data, under the learnt transform of the train data

The text was updated successfully, but these errors were encountered:

st-- · 2023-02-23T12:33:31Z

Instead of recoding from scratch, how about interfacing with sklearn's preprocessing tools? That way all of them would become available in one go. Could be simply a wrapper that unbundles the Dataset X/y attributes?

daniel-dodd · 2023-02-23T12:36:11Z

Thanks @st--, this is a nice suggestion. I agree, and would love to see this functionality. :)

daniel-dodd added enhancement New feature or request good first issue Good for newcomers labels Dec 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Scaler for the `Dataset`. #4

feat: Scaler for the `Dataset`. #4

daniel-dodd commented Dec 20, 2022

st-- commented Feb 23, 2023

daniel-dodd commented Feb 23, 2023

feat: Scaler for the Dataset. #4

feat: Scaler for the Dataset. #4

Comments

daniel-dodd commented Dec 20, 2022

st-- commented Feb 23, 2023

daniel-dodd commented Feb 23, 2023

feat: Scaler for the `Dataset`. #4

feat: Scaler for the `Dataset`. #4