Change `_schema_is_equal` to allow nullable and non nullable types to be used interchangably between train and test data #4077

tamargrey · 2023-03-14T20:51:26Z

As a user, I wish I could train a pipeline on data that might not have nans and has non nullable types and then predict/transform_all_but_final /score data that has nans and therefore has nullable types.

Currently, having nullable types at train and non nullable types at test (or visa versa) causes ComponentGraph._transform_features to error with Input X data types are different from the input types the pipeline was fitted on, but other than whether or not they may contain null values, nullable types and their non nullable counterparts contain the same type of data.

Once the nullability epic is in place, we may see increased usage of nullable types, which could result in more instances of the above situation popping up.

I propose we change _schema_is_equal to treat the following nullable types interchangably with their non nullable counterparts

Integer - IntegerNullable
Boolean - BooleanNullable
Age - AgeNullable

There are several things to take into account when implementing this:

Overall, think about the impact of allowing these types to be used interchangably
Consider requiring that we validate the existence of NaNs before treating nullable and non nullable types as equivalent - In general, I don't want us to shy away from keeping IntegerNullable columns as such even if no nans are present (whether we impute them ourselves or users input them). Those types aren't really meant to imply the presence of nans, just that the type can support null values, but other than that they're the same as non nullable integers. For example, in Featuretools, we might output types as IntegerNullable from a Primitive so that users could pass nans in and not have it break.
Increase test coverage of the different ways data could be different between train and test data for the different problem types and the score, predict, and transform_all_but_final methods on pipelines
Confirm this doesn't change automl results before or after nullable type handling changes

The text was updated successfully, but these errors were encountered:

tamargrey mentioned this issue Mar 15, 2023

Use nullable type handling in components' fit, transform, and predict methods #4046

Merged

Mhsh mentioned this issue Apr 6, 2023

Predict does not work because of data type mismatch for same dataframe. #4124

Open

tamargrey mentioned this issue Apr 10, 2023

Change _schema_is_equal check to _schema_is_compatible and use training schema for predict data #4133

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change `_schema_is_equal` to allow nullable and non nullable types to be used interchangably between train and test data #4077

Change `_schema_is_equal` to allow nullable and non nullable types to be used interchangably between train and test data #4077

tamargrey commented Mar 14, 2023

Change _schema_is_equal to allow nullable and non nullable types to be used interchangably between train and test data #4077

Change _schema_is_equal to allow nullable and non nullable types to be used interchangably between train and test data #4077

Comments

tamargrey commented Mar 14, 2023

Change `_schema_is_equal` to allow nullable and non nullable types to be used interchangably between train and test data #4077

Change `_schema_is_equal` to allow nullable and non nullable types to be used interchangably between train and test data #4077