-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Table.replace for the in-memory backend #8935
Changes from 26 commits
756dfa9
58da2fd
f2cd94a
0501d05
fcab4e0
f2a58f6
5ed3fed
8caac00
74bc949
69753fa
4cf1309
aae2bd2
d458ed8
f924690
bf142c0
b97841c
b88633e
2a02eb0
2b7ad13
8b7c361
9a084d0
64d43f3
4acb007
df35418
5ac57ce
d01152d
69cdb47
76368e7
4ea9c22
f9aaf71
7ad9723
fe389f8
645805b
02a0d54
80dbc76
5a1b6d0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -1913,7 +1913,7 @@ type Table | |||||||
- If a column that is being updated from the lookup table has a type | ||||||||
that is not compatible with the type of the corresponding column in | ||||||||
this table, a `No_Common_Type` error is raised. | ||||||||
- If a key column contains `Nothing` values, either in the lookup table, | ||||||||
- If a key column contains `Nothing` values in the lookup table, | ||||||||
a `Null_Values_In_Key_Columns` error is raised. | ||||||||
- If `allow_unmatched_rows` is `False` and there are rows in this table | ||||||||
that do not have a matching row in the lookup table, an | ||||||||
|
@@ -1959,6 +1959,109 @@ type Table | |||||||
java_table = LookupJoin.lookupAndReplace java_keys java_descriptions allow_unmatched_rows java_problem_aggregator | ||||||||
Table.Value java_table | ||||||||
|
||||||||
## ALIAS find replace | ||||||||
GROUP Standard.Base.Calculations | ||||||||
ICON join | ||||||||
Replaces values in `column` using `lookup_table` to specify a | ||||||||
mapping from old to new values. | ||||||||
|
||||||||
Arguments: | ||||||||
- lookup_table: the table to use as a mapping from old to new values. A | ||||||||
`Map` can also be used here (in which case `from_column` and | ||||||||
`to_column` are ignored). | ||||||||
- column: the column within `self` to perform the replace on. | ||||||||
- from_column: the column within `lookup_table` to match against `column` | ||||||||
in `self`. | ||||||||
- to_column: the column within `lookup_table` to get new values from. | ||||||||
- allow_unmatched_rows: Specifies how to handle missing rows in the lookup. | ||||||||
If `False` (the default), an `Unmatched_Rows_In_Lookup` error is raised. | ||||||||
If `True`, the unmatched rows are left unchanged. Any new columns will | ||||||||
be filled with `Nothing`. | ||||||||
- on_problems: Specifies how to handle problems if they occur, reporting | ||||||||
them as warnings by default. | ||||||||
|
||||||||
? Result Ordering | ||||||||
|
||||||||
When operating in-memory, this operation preserves the order of rows | ||||||||
from this table (unlike `join`). | ||||||||
In the Database backend, there are no guarantees related to ordering of | ||||||||
results. | ||||||||
|
||||||||
? Error Conditions | ||||||||
|
||||||||
- If this table or the lookup table is lacking any of the columns | ||||||||
specified by `from_column`, `to_column`, or `column`, a | ||||||||
`Missing_Input_Columns` error is raised. | ||||||||
- If a single row is matched by multiple entries in the lookup table, | ||||||||
a `Non_Unique_Key` error is raised. | ||||||||
- If a column that is being updated from the lookup table has a type | ||||||||
that is not compatible with the type of the corresponding column in | ||||||||
this table, a `No_Common_Type` error is raised. | ||||||||
- If a key column contains `Nothing` values in the lookup table, | ||||||||
a `Null_Values_In_Key_Columns` error is raised. | ||||||||
- If `allow_unmatched_rows` is `False` and there are rows in this table | ||||||||
that do not have a matching row in the lookup table, an | ||||||||
`Unmatched_Rows_In_Lookup` error is raised. | ||||||||
- The following problems may be reported according to the `on_problems` | ||||||||
setting: | ||||||||
- If any of the `columns` is a floating-point type, | ||||||||
a `Floating_Point_Equality`. | ||||||||
|
||||||||
> Example | ||||||||
Replace values in column 'x' using a lookup table. | ||||||||
|
||||||||
table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']], ['z', ['e', 'f', 'g', 'h']]] | ||||||||
# | x | y | z | ||||||||
# ---+---+---+--- | ||||||||
# 0 | 1 | a | e | ||||||||
# 1 | 2 | b | f | ||||||||
# 2 | 3 | c | g | ||||||||
# 3 | 4 | d | h | ||||||||
|
||||||||
lookup_table = table_builder [['x', [1, 2, 3, 4]], ['new_x', [10, 20, 30, 40]]] | ||||||||
# | old_x | new_x | ||||||||
# ---+-------+------- | ||||||||
# 0 | 1 | 10 | ||||||||
# 1 | 2 | 20 | ||||||||
# 2 | 3 | 30 | ||||||||
# 3 | 4 | 40 | ||||||||
|
||||||||
result = table.replace lookup_table 'x' | ||||||||
# | x | y | z | ||||||||
# ---+----+---+--- | ||||||||
# 0 | 10 | a | e | ||||||||
# 1 | 20 | b | f | ||||||||
# 2 | 30 | c | g | ||||||||
# 3 | 40 | d | h | ||||||||
@column Widget_Helpers.make_column_name_selector | ||||||||
@from_column Widget_Helpers.make_column_name_selector | ||||||||
@to_column Widget_Helpers.make_column_name_selector | ||||||||
replace : Table | Map -> (Text | Integer) -> (Text | Integer) -> (Text | Integer) -> Boolean -> Problem_Behavior -> Table ! Missing_Input_Columns | Non_Unique_Key | Unmatched_Rows_In_Lookup | ||||||||
replace self lookup_table:(Table | Map) column:(Text | Integer) from_column:(Text | Integer)=0 to_column:(Text | Integer)=1 allow_unmatched_rows:Boolean=True on_problems:Problem_Behavior=Problem_Behavior.Report_Warning = | ||||||||
case lookup_table of | ||||||||
_ : Map -> | ||||||||
self.replace (map_to_lookup_table lookup_table 'from' 'to') column 'from' 'to' allow_unmatched_rows=allow_unmatched_rows on_problems=on_problems | ||||||||
_ : Table -> | ||||||||
self.select_columns column . if_not_error <| lookup_table.select_columns [from_column, to_column] . if_not_error <| | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please do this:
Suggested change
then you can use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||||||||
unique = self.column_naming_helper.create_unique_name_strategy | ||||||||
unique.mark_used (self.column_names) | ||||||||
unique.mark_used (lookup_table.column_names) | ||||||||
GregoryTravis marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
|
||||||||
## We perform a `merge` into `column`, using a duplicate of `column` | ||||||||
as the key column to join with `from_column`. | ||||||||
|
||||||||
duplicate_key_column_name = unique.make_unique "duplicate_key" | ||||||||
duplicate_key_column = self.at column . rename duplicate_key_column_name | ||||||||
self_with_duplicate = Table.new ([duplicate_key_column] + self.columns) | ||||||||
GregoryTravis marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
|
||||||||
## Create a lookup table with just `to_column` and `from_column`, | ||||||||
renamed to match the base table's `column` and its duplicate, | ||||||||
respectively. | ||||||||
lookup_table_renamed = Table.new [lookup_table.at from_column . rename duplicate_key_column_name, lookup_table.at to_column . rename column] | ||||||||
GregoryTravis marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
|
||||||||
merged = self_with_duplicate.merge lookup_table_renamed duplicate_key_column_name add_new_columns=False allow_unmatched_rows=allow_unmatched_rows on_problems=on_problems | ||||||||
merged.remove_columns duplicate_key_column_name | ||||||||
|
||||||||
## ALIAS join by row position | ||||||||
GROUP Standard.Base.Calculations | ||||||||
ICON dataframes_join | ||||||||
|
@@ -2701,6 +2804,12 @@ concat_columns column_set all_tables result_type result_row_count on_problems = | |||||||
sealed_storage = storage_builder.seal | ||||||||
Column.from_storage column_set.name sealed_storage | ||||||||
|
||||||||
## PRIVATE | ||||||||
A helper that creates a two-column table from a map. | ||||||||
map_to_lookup_table : Map Any Any -> Text -> Text -> Table | ||||||||
map_to_lookup_table map key_column value_column = | ||||||||
Table.new [[key_column, map.keys], [value_column, map.values]] | ||||||||
GregoryTravis marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
|
||||||||
## PRIVATE | ||||||||
Conversion method to a Table from a Column. | ||||||||
Table.from (that:Column) = that.to_table | ||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
from Standard.Base import all | ||
import Standard.Base.Errors.Illegal_Argument.Illegal_Argument | ||
|
||
from Standard.Table import all | ||
from Standard.Table.Errors import all | ||
|
||
from Standard.Database import all | ||
|
||
from Standard.Test_New import all | ||
|
||
from project.Common_Table_Operations.Util import run_default_backend | ||
import project.Util | ||
|
||
main = run_default_backend add_specs | ||
|
||
type Data | ||
Value ~connection | ||
|
||
setup create_connection_fn = | ||
Data.Value (create_connection_fn Nothing) | ||
|
||
teardown self = self.connection.close | ||
|
||
|
||
add_specs suite_builder setup = | ||
prefix = setup.prefix | ||
create_connection_fn = setup.create_connection_func | ||
suite_builder.group prefix+"Table.replace" group_builder-> | ||
data = Data.setup create_connection_fn | ||
|
||
group_builder.teardown <| | ||
data.teardown | ||
|
||
table_builder cols = | ||
setup.table_builder cols connection=data.connection | ||
|
||
group_builder.specify "should be able to replace values via a lookup table, using from/to column defaults" <| | ||
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]] | ||
lookup_table = table_builder [['x', [2, 1, 4, 3]], ['z', [20, 10, 40, 30]]] | ||
expected = table_builder [['x', [10, 20, 30, 40, 20]], ['y', ['a', 'b', 'c', 'd', 'e']]] | ||
result = table.replace lookup_table 'x' | ||
result . should_equal expected | ||
|
||
group_builder.specify "should be able to replace values via a lookup table, specifying from/to columns" <| | ||
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]] | ||
lookup_table = table_builder [['x', [2, 1, 4, 3]], ['z', [20, 10, 40, 30]]] | ||
expected = table_builder [['x', [10, 20, 30, 40, 20]], ['y', ['a', 'b', 'c', 'd', 'e']]] | ||
result = table.replace lookup_table 'x' 'x' 'z' | ||
GregoryTravis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
result . should_equal expected | ||
|
||
group_builder.specify "should be able to replace values via a lookup table provided as a Map" <| | ||
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]] | ||
lookup_table = Map.from_vector [[2, 20], [1, 10], [4, 40], [3, 30]] | ||
expected = table_builder [['x', [10, 20, 30, 40, 20]], ['y', ['a', 'b', 'c', 'd', 'e']]] | ||
result = table.replace lookup_table 'x' | ||
result . should_equal expected | ||
|
||
group_builder.specify "should fail with Illegal_Argument if the specified columns do not exist" <| | ||
GregoryTravis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]] | ||
lookup_table = table_builder [['x', [2, 1, 4, 3]], ['z', [20, 10, 40, 30]]] | ||
table.replace lookup_table 'q' 'x' 'z' . should_fail_with Missing_Input_Columns | ||
table.replace lookup_table 'x' 'q' 'z' . should_fail_with Missing_Input_Columns | ||
table.replace lookup_table 'x' 'x' 'q' . should_fail_with Missing_Input_Columns | ||
|
||
group_builder.specify "can allow unmatched rows" <| | ||
table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]] | ||
lookup_table = table_builder [['x', [4, 3, 1]], ['z', [40, 30, 10]]] | ||
expected = table_builder [['x', [10, 2, 30, 40]], ['y', ['a', 'b', 'c', 'd']]] | ||
result = table.replace lookup_table 'x' | ||
result . should_equal expected | ||
|
||
group_builder.specify "fails on unmatched rows" <| | ||
table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]] | ||
lookup_table = table_builder [['x', [4, 3, 1]], ['z', [40, 30, 10]]] | ||
table.replace lookup_table 'x' allow_unmatched_rows=False . should_fail_with Unmatched_Rows_In_Lookup | ||
|
||
group_builder.specify "fails on non-unique keys" <| | ||
table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]] | ||
lookup_table = table_builder [['x', [2, 1, 4, 1, 3]], ['z', [20, 10, 40, 11, 30]]] | ||
table.replace lookup_table 'x' . should_fail_with Non_Unique_Key | ||
|
||
group_builder.specify "should avoid name clashes in the generated column name" <| | ||
table = table_builder [['duplicate_key', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]] | ||
lookup_table = table_builder [['x', [2, 1, 4, 3]], ['z', [20, 10, 40, 30]]] | ||
expected = table_builder [['duplicate_key', [10, 20, 30, 40]], ['y', ['a', 'b', 'c', 'd']]] | ||
result = table.replace lookup_table 'duplicate_key' | ||
result . should_equal expected | ||
GregoryTravis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
group_builder.specify "(edge-case) should allow lookup with itself" <| | ||
table = table_builder [['x', [2, 1, 4, 3]], ['y', [20, 10, 40, 30]]] | ||
expected = table_builder [['x', [20, 10, 40, 30]], ['y', [20, 10, 40, 30]]] | ||
result = table.replace table 'x' | ||
result . should_equal expected | ||
Comment on lines
+89
to
+93
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cool :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I got that from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. :D |
||
|
||
group_builder.specify "should fail on null key values in lookup table" <| | ||
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]] | ||
lookup_table = table_builder [['x', [2, 1, Nothing, 3]], ['z', [20, 10, 40, 30]]] | ||
table.replace lookup_table 'x' . should_fail_with Null_Values_In_Key_Columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really like to avoid
IMO having arguments that only make sense in one context but are to be ignored in another is not great design. I remember we used to try really hard to avoid issues like that in the initial designs for
Table
, likeselect_columns
back in the day.What if we replace this with
lookup_table
beingLookup | Map
, whereThen, in 'table' mode we'd write:
and if we do a map:
so there's no ignored arguments like now.
A downside is you have to create the nested
Lookup
atom. This can be done in the GUI by widgets, but it is still a bit problematic as then just dragging a lookup table onto it will not work (until we have 'smart' from conversions that somehow allow to add more arguments whenever an implicit conversion fromTable
toLookup
happens).This downside is quite problematic - so I guess maybe we should just keep it as is... Just wanted to think through, because if possible, I think it would be good to avoid such ignored arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is definitely worth thinking through. The approach in enso is to have methods that do things in multiple ways, often with parameters that are only meaningful in some cases, so this will be keep being a question.
The 'smart' conversion could perhaps work like this:
This would be a non-standard way to use
from
since the type would be (Table -> Integer | Text -> Integer | Text -> Lookup).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be an error to supply a
Map
and alsofrom_column
orto_column
? I think maybe it should, so that would change "ignored" to "not allowed".There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it? I think we try to avoid it whenever possible. What methods already do this?
Yes, I think that is a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess what I should have said is: I have seen this before (I don't remember where) and I expect we will be tempted again. It's definitely something we should avoid if we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now throws an error if from/to_column is used with a Map.