-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: OneHotEncoder.inverse_transform
now maintains the column order from the original table
#195
feat: OneHotEncoder.inverse_transform
now maintains the column order from the original table
#195
Conversation
…ll columns were fitted `OneHotEncoder.inverse_transform` now maintains the column order from the original table; added test for this behaviour (#109)
Codecov Report
@@ Coverage Diff @@
## main #195 +/- ##
==========================================
+ Coverage 93.97% 94.01% +0.04%
==========================================
Files 42 42
Lines 1476 1486 +10
==========================================
+ Hits 1387 1397 +10
Misses 89 89
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
🦙 MegaLinter status: ✅ SUCCESS
See detailed report in MegaLinter reports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solving this issue is a little more complicated than that: Say you call fit
with this table and run the encoder only on column col1
:
col1 | col2 |
---|---|
"a" | 0 |
"b" | 1 |
You store the schema of this table in self._original_schema
. Now you call transform
with the following table:
col2 | col1 |
---|---|
0 | "a" |
1 | "b" |
If you now call inverse_transform
on the result, you will get
col1 | col2 |
---|---|
"a" | 0 |
"b" | 1 |
because the relative order of columns is now determined by the table the transformer was fitted with. This is not the table before transform, though.
A more thorough solution would be that transform
already maintains the order of columns by replacing a column with its one-hot-encoded version instead of appending all new columns at the end. For example, calling transform
on
col2 | col1 |
---|---|
0 | "a" |
1 | "b" |
should result in
col2 | col1_a | col1_b |
---|---|---|
0 | 1 | 0 |
1 | 0 | 1 |
and calling transform
on
col1 | col2 |
---|---|
"a" | 0 |
"b" | 1 |
should result in
col1_a | col1_b | col2 |
---|---|---|
1 | 0 | 0 |
0 | 1 | 1 |
Then the inverse_transform
can simply maintain the column order.
# Conflicts: # tests/safeds/data/tabular/transformation/test_one_hot_encoder.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added another test that should work but doesn't.
Since the column name and the value could contain underscores, it's not possible to figure out which original column a column in the transformed table corresponds to. For example, a_b_c
could correspond to the column called a_b
(and value c
) or a column called a
(and value b_c
).
Because of this, we'll need to store the mapping from original columns to transformed columns during fit
(dict[str, list[str]]
, with original column names as keys and names of corresponding transformed columns as values). This will also be useful for #190 later. During transform
and inverse_transform
you can then use this mapping to figure out how the columns need to be sorted instead of relying on the startswith
check.
I've created another issue (#201) about conflicting column names when using a |
## [0.11.0](v0.10.0...v0.11.0) (2023-04-21) ### Features * `OneHotEncoder.inverse_transform` now maintains the column order from the original table ([#195](#195)) ([3ec0041](3ec0041)), closes [#109](#109) [#109](#109) * add `plot_` prefix back to plotting methods ([#212](#212)) ([e50c3b0](e50c3b0)), closes [#211](#211) * adjust `Column`, `Schema` and `Table` to changes in `Row` ([#216](#216)) ([ca3eebb](ca3eebb)) * back `Row` by a `polars.DataFrame` ([#214](#214)) ([62ca34d](62ca34d)), closes [#196](#196) [#149](#149) * clean up `Row` class ([#215](#215)) ([b12fc68](b12fc68)) * convert between `Row` and `dict` ([#206](#206)) ([e98b653](e98b653)), closes [#204](#204) * convert between a `dict` and a `Table` ([#198](#198)) ([2a5089e](2a5089e)), closes [#197](#197) * create column types for `polars` data types ([#208](#208)) ([e18b362](e18b362)), closes [#196](#196) * dataframe interchange protocol ([#200](#200)) ([bea976a](bea976a)), closes [#199](#199) * move existing ML solutions into `safeds.ml.classical` package ([#213](#213)) ([655f07f](655f07f)), closes [#210](#210) ### Bug Fixes * `table.keep_only_columns` now maps column names to correct data ([#194](#194)) ([459ab75](459ab75)), closes [#115](#115) * typo in type hint ([#184](#184)) ([e79727d](e79727d)), closes [#180](#180)
🎉 This PR is included in version 0.11.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
Closes #109.
Summary of Changes
OneHotEncoder.inverse_transform
now maintains the column order from the original table (#109)Fixed bug with
OneHotEncoder.inverse_transform
to not work if not all columns were fittedNew feature columns in
OneHotEncoder
will now be inserted where the combined columns were in the original table