You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a user, I wish I could train a pipeline on data that might not have nans and has non nullable types and then predict/transform_all_but_final /score data that has nans and therefore has nullable types.
Currently, having nullable types at train and non nullable types at test (or visa versa) causes ComponentGraph._transform_features to error with Input X data types are different from the input types the pipeline was fitted on, but other than whether or not they may contain null values, nullable types and their non nullable counterparts contain the same type of data.
Once the nullability epic is in place, we may see increased usage of nullable types, which could result in more instances of the above situation popping up.
I propose we change _schema_is_equal to treat the following nullable types interchangably with their non nullable counterparts
Integer - IntegerNullable
Boolean - BooleanNullable
Age - AgeNullable
There are several things to take into account when implementing this:
Overall, think about the impact of allowing these types to be used interchangably
Consider requiring that we validate the existence of NaNs before treating nullable and non nullable types as equivalent - In general, I don't want us to shy away from keeping IntegerNullable columns as such even if no nans are present (whether we impute them ourselves or users input them). Those types aren't really meant to imply the presence of nans, just that the type can support null values, but other than that they're the same as non nullable integers. For example, in Featuretools, we might output types as IntegerNullable from a Primitive so that users could pass nans in and not have it break.
Increase test coverage of the different ways data could be different between train and test data for the different problem types and the score, predict, and transform_all_but_final methods on pipelines
Confirm this doesn't change automl results before or after nullable type handling changes
The text was updated successfully, but these errors were encountered:
As a user, I wish I could train a pipeline on data that might not have nans and has non nullable types and then predict/
transform_all_but_final
/score data that has nans and therefore has nullable types.Currently, having nullable types at train and non nullable types at test (or visa versa) causes
ComponentGraph._transform_features
to error withInput X data types are different from the input types the pipeline was fitted on
, but other than whether or not they may contain null values, nullable types and their non nullable counterparts contain the same type of data.Once the nullability epic is in place, we may see increased usage of nullable types, which could result in more instances of the above situation popping up.
I propose we change
_schema_is_equal
to treat the following nullable types interchangably with their non nullable counterpartsInteger
-IntegerNullable
Boolean
-BooleanNullable
Age
-AgeNullable
There are several things to take into account when implementing this:
score
,predict
, andtransform_all_but_final
methods on pipelinesThe text was updated successfully, but these errors were encountered: