Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update NonValueTransformer's Default Setting and Handle Custom Fill Values #199

Merged
merged 4 commits into from
Jul 11, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 11 additions & 4 deletions sdgx/data_processors/transformers/nan.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,14 @@ class NonValueTransformer(Transformer):
If `drop_na` is set to `False`, this value will be used to fill missing values in the data.
"""

drop_na = True
drop_na = False
"""
A boolean flag indicating whether to drop rows with missing values or fill them with `fill_na_value`.

If `True`, rows with missing values will be dropped. If `False`, missing values will be filled with `fill_na_value`.
If `True`, rows with missing values will be dropped.
If `False`, missing values will be filled with `fill_na_value`.

Currently, the default setting is False, which means rows with missing values are not dropped.
"""

def fit(self, metadata: Metadata | None = None, **kwargs: dict[str, Any]):
Expand All @@ -48,9 +51,13 @@ def fit(self, metadata: Metadata | None = None, **kwargs: dict[str, Any]):
"""
logger.info("NonValueTransformer Fitted.")

self.fitted = True
for key, value in kwargs.items():
if key == "fill_na_value":
if not isinstance(value, str):
raise ValueError("fill_na_value must be of type <str>")
self.fill_na_value = value

return
self.fitted = True

def convert(self, raw_data: DataFrame) -> DataFrame:
"""
Expand Down
Loading