Update NonValueTransformer's Default Setting and Handle Custom Fill Values #199
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
The changes involve updating the
NonValueTransformer
class within thesdgx/data_processors/transformers/nan.py
file. Specifically, thedrop_na
attribute is updated to default toFalse
, indicating that rows with missing values will not be dropped by default. Additionally, a new functionality is introduced to handle a customfill_na_value
passed throughkwargs
during thefit
method. This value must be of typestr
, and if not, aValueError
is raised.Motivation and Context
This change is required to enhance the flexibility of the
NonValueTransformer
class. By allowing users to specify a custom fill value for missing data, the transformer becomes more versatile and useful in scenarios where specific string values are preferred for filling missing data rather than dropping rows.How has this been tested?
The changes have been tested by running unit tests that cover the
fit
andconvert
methods of theNonValueTransformer
class.Types of changes
Checklist: