-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Table.replace for the in-memory backend #8935
Changes from all commits
756dfa9
58da2fd
f2cd94a
0501d05
fcab4e0
f2a58f6
5ed3fed
8caac00
74bc949
69753fa
4cf1309
aae2bd2
d458ed8
f924690
bf142c0
b97841c
b88633e
2a02eb0
2b7ad13
8b7c361
9a084d0
64d43f3
4acb007
df35418
5ac57ce
d01152d
69cdb47
76368e7
4ea9c22
f9aaf71
7ad9723
fe389f8
645805b
02a0d54
80dbc76
5a1b6d0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1913,7 +1913,7 @@ type Table | |
- If a column that is being updated from the lookup table has a type | ||
that is not compatible with the type of the corresponding column in | ||
this table, a `No_Common_Type` error is raised. | ||
- If a key column contains `Nothing` values, either in the lookup table, | ||
- If a key column contains `Nothing` values in the lookup table, | ||
a `Null_Values_In_Key_Columns` error is raised. | ||
- If `allow_unmatched_rows` is `False` and there are rows in this table | ||
that do not have a matching row in the lookup table, an | ||
|
@@ -1959,6 +1959,112 @@ type Table | |
java_table = LookupJoin.lookupAndReplace java_keys java_descriptions allow_unmatched_rows java_problem_aggregator | ||
Table.Value java_table | ||
|
||
## ALIAS find replace | ||
GROUP Standard.Base.Calculations | ||
ICON join | ||
Replaces values in `column` using `lookup_table` to specify a | ||
mapping from old to new values. | ||
|
||
Arguments: | ||
- lookup_table: the table to use as a mapping from old to new values. A | ||
`Map` can also be used here (in which case passing `from_column` or | ||
`to_column` is disallowed and will throw an `Illegal_Argument` error. | ||
- column: the column within `self` to perform the replace on. | ||
- from_column: the column within `lookup_table` to match against `column` | ||
in `self`. | ||
- to_column: the column within `lookup_table` to get new values from. | ||
- allow_unmatched_rows: Specifies how to handle missing rows in the lookup. | ||
If `False` (the default), an `Unmatched_Rows_In_Lookup` error is raised. | ||
If `True`, the unmatched rows are left unchanged. Any new columns will | ||
be filled with `Nothing`. | ||
- on_problems: Specifies how to handle problems if they occur, reporting | ||
them as warnings by default. | ||
|
||
? Result Ordering | ||
|
||
When operating in-memory, this operation preserves the order of rows | ||
from this table (unlike `join`). | ||
In the Database backend, there are no guarantees related to ordering of | ||
results. | ||
|
||
? Error Conditions | ||
|
||
- If this table or the lookup table is lacking any of the columns | ||
specified by `from_column`, `to_column`, or `column`, a | ||
`Missing_Input_Columns` error is raised. | ||
- If a single row is matched by multiple entries in the lookup table, | ||
a `Non_Unique_Key` error is raised. | ||
- If a column that is being updated from the lookup table has a type | ||
that is not compatible with the type of the corresponding column in | ||
this table, a `No_Common_Type` error is raised. | ||
- If a key column contains `Nothing` values in the lookup table, | ||
a `Null_Values_In_Key_Columns` error is raised. | ||
- If `allow_unmatched_rows` is `False` and there are rows in this table | ||
that do not have a matching row in the lookup table, an | ||
`Unmatched_Rows_In_Lookup` error is raised. | ||
- The following problems may be reported according to the `on_problems` | ||
setting: | ||
- If any of the `columns` is a floating-point type, | ||
a `Floating_Point_Equality`. | ||
|
||
> Example | ||
Replace values in column 'x' using a lookup table. | ||
|
||
table = Table.new [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']], ['z', ['e', 'f', 'g', 'h']]] | ||
# | x | y | z | ||
# ---+---+---+--- | ||
# 0 | 1 | a | e | ||
# 1 | 2 | b | f | ||
# 2 | 3 | c | g | ||
# 3 | 4 | d | h | ||
|
||
lookup_table = Table.new [['x', [1, 2, 3, 4]], ['new_x', [10, 20, 30, 40]]] | ||
# | old_x | new_x | ||
# ---+-------+------- | ||
# 0 | 1 | 10 | ||
# 1 | 2 | 20 | ||
# 2 | 3 | 30 | ||
# 3 | 4 | 40 | ||
|
||
result = table.replace lookup_table 'x' | ||
# | x | y | z | ||
# ---+----+---+--- | ||
# 0 | 10 | a | e | ||
# 1 | 20 | b | f | ||
# 2 | 30 | c | g | ||
# 3 | 40 | d | h | ||
@column Widget_Helpers.make_column_name_selector | ||
@from_column Widget_Helpers.make_column_name_selector | ||
@to_column Widget_Helpers.make_column_name_selector | ||
replace : Table | Map -> (Text | Integer) -> (Text | Integer | Nothing) -> (Text | Integer | Nothing) -> Boolean -> Problem_Behavior -> Table ! Missing_Input_Columns | Non_Unique_Key | Unmatched_Rows_In_Lookup | ||
replace self lookup_table:(Table | Map) column:(Text | Integer) from_column:(Text | Integer | Nothing)=Nothing to_column:(Text | Integer | Nothing)=Nothing allow_unmatched_rows:Boolean=True on_problems:Problem_Behavior=Problem_Behavior.Report_Warning = | ||
case lookup_table of | ||
_ : Map -> | ||
if from_column.is_nothing.not || to_column.is_nothing.not then Error.throw (Illegal_Argument.Error "If a Map is provided as the lookup_table, then from_column and to_column should not also be specified.") else | ||
self.replace (map_to_lookup_table lookup_table 'from' 'to') column 'from' 'to' allow_unmatched_rows=allow_unmatched_rows on_problems=on_problems | ||
_ : Table -> | ||
from_column_resolved = from_column.if_nothing 0 | ||
to_column_resolved = to_column.if_nothing 1 | ||
Comment on lines
+2046
to
+2047
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we use a Map conversion we can avoid needing this. |
||
selected_lookup_columns = lookup_table.select_columns [from_column_resolved, to_column_resolved] | ||
self.select_columns column . if_not_error <| selected_lookup_columns . if_not_error <| | ||
unique = self.column_naming_helper.create_unique_name_strategy | ||
unique.mark_used (self.column_names) | ||
|
||
## We perform a `merge` into `column`, using a duplicate of `column` | ||
as the key column to join with `from_column`. | ||
|
||
duplicate_key_column_name = unique.make_unique "duplicate_key" | ||
duplicate_key_column = self.at column . rename duplicate_key_column_name | ||
self_with_duplicate = self.set duplicate_key_column set_mode=Set_Mode.Add | ||
|
||
## Create a lookup table with just `to_column` and `from_column`, | ||
renamed to match the base table's `column` and its duplicate, | ||
respectively. | ||
lookup_table_renamed = selected_lookup_columns . rename_columns (Map.from_vector [[from_column_resolved, duplicate_key_column_name], [to_column_resolved, column]]) | ||
|
||
merged = self_with_duplicate.merge lookup_table_renamed duplicate_key_column_name add_new_columns=False allow_unmatched_rows=allow_unmatched_rows on_problems=on_problems | ||
merged.remove_columns duplicate_key_column_name | ||
|
||
## ALIAS join by row position | ||
GROUP Standard.Base.Calculations | ||
ICON dataframes_join | ||
|
@@ -2701,6 +2807,13 @@ concat_columns column_set all_tables result_type result_row_count on_problems = | |
sealed_storage = storage_builder.seal | ||
Column.from_storage column_set.name sealed_storage | ||
|
||
## PRIVATE | ||
A helper that creates a two-column table from a map. | ||
map_to_lookup_table : Map Any Any -> Text -> Text -> Table | ||
map_to_lookup_table map key_column value_column = | ||
keys_and_values = map.to_vector | ||
Table.new [[key_column, keys_and_values.map .first], [value_column, keys_and_values.map .second]] | ||
|
||
Comment on lines
+2810
to
+2816
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we can avoid this if we do the conversion approach. |
||
## PRIVATE | ||
Conversion method to a Table from a Column. | ||
Table.from (that:Column) = that.to_table | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
from Standard.Base import all | ||
import Standard.Base.Errors.Illegal_Argument.Illegal_Argument | ||
|
||
from Standard.Table import all | ||
from Standard.Table.Errors import all | ||
|
||
from Standard.Database import all | ||
|
||
from Standard.Test_New import all | ||
|
||
from project.Common_Table_Operations.Util import run_default_backend | ||
import project.Util | ||
|
||
main = run_default_backend add_specs | ||
|
||
type Data | ||
Value ~connection | ||
|
||
setup create_connection_fn = | ||
Data.Value (create_connection_fn Nothing) | ||
|
||
teardown self = self.connection.close | ||
|
||
|
||
add_specs suite_builder setup = | ||
prefix = setup.prefix | ||
create_connection_fn = setup.create_connection_func | ||
suite_builder.group prefix+"Table.replace" group_builder-> | ||
data = Data.setup create_connection_fn | ||
|
||
group_builder.teardown <| | ||
data.teardown | ||
|
||
table_builder cols = | ||
setup.table_builder cols connection=data.connection | ||
|
||
group_builder.specify "should be able to replace values via a lookup table, using from/to column defaults" <| | ||
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]] | ||
lookup_table = table_builder [['x', [2, 1, 4, 3]], ['z', [20, 10, 40, 30]]] | ||
expected = table_builder [['x', [10, 20, 30, 40, 20]], ['y', ['a', 'b', 'c', 'd', 'e']]] | ||
result = table.replace lookup_table 'x' | ||
result . should_equal expected | ||
|
||
group_builder.specify "should be able to replace values via a lookup table, specifying from/to columns" <| | ||
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]] | ||
lookup_table = table_builder [['d', [4, 5, 6, 7]], ['x', [2, 1, 4, 3]], ['d2', [5, 6, 7, 8]], ['z', [20, 10, 40, 30]]] | ||
expected = table_builder [['x', [10, 20, 30, 40, 20]], ['y', ['a', 'b', 'c', 'd', 'e']]] | ||
result = table.replace lookup_table 'x' 'x' 'z' | ||
result . should_equal expected | ||
|
||
group_builder.specify "should be able to replace values via a lookup table provided as a Map" <| | ||
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]] | ||
lookup_table = Map.from_vector [[2, 20], [1, 10], [4, 40], [3, 30]] | ||
expected = table_builder [['x', [10, 20, 30, 40, 20]], ['y', ['a', 'b', 'c', 'd', 'e']]] | ||
result = table.replace lookup_table 'x' | ||
result . should_equal expected | ||
|
||
group_builder.specify "should fail with Missing_Input_Columns if the specified columns do not exist" <| | ||
table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]] | ||
lookup_table = table_builder [['x', [2, 1, 4, 3]], ['z', [20, 10, 40, 30]]] | ||
table.replace lookup_table 'q' 'x' 'z' . should_fail_with Missing_Input_Columns | ||
table.replace lookup_table 'x' 'q' 'z' . should_fail_with Missing_Input_Columns | ||
table.replace lookup_table 'x' 'x' 'q' . should_fail_with Missing_Input_Columns | ||
|
||
group_builder.specify "can allow unmatched rows" <| | ||
table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]] | ||
lookup_table = table_builder [['x', [4, 3, 1]], ['z', [40, 30, 10]]] | ||
expected = table_builder [['x', [10, 2, 30, 40]], ['y', ['a', 'b', 'c', 'd']]] | ||
result = table.replace lookup_table 'x' | ||
result . should_equal expected | ||
|
||
group_builder.specify "fails on unmatched rows" <| | ||
table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]] | ||
lookup_table = table_builder [['x', [4, 3, 1]], ['z', [40, 30, 10]]] | ||
table.replace lookup_table 'x' allow_unmatched_rows=False . should_fail_with Unmatched_Rows_In_Lookup | ||
|
||
group_builder.specify "fails on non-unique keys" <| | ||
table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]] | ||
lookup_table = table_builder [['x', [2, 1, 4, 1, 3]], ['z', [20, 10, 40, 11, 30]]] | ||
table.replace lookup_table 'x' . should_fail_with Non_Unique_Key | ||
|
||
group_builder.specify "should avoid name clashes in the (internally) generated column name" <| | ||
table = table_builder [['duplicate_key', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]] | ||
lookup_table = table_builder [['x', [2, 1, 4, 3]], ['z', [20, 10, 40, 30]]] | ||
expected = table_builder [['duplicate_key', [10, 20, 30, 40]], ['y', ['a', 'b', 'c', 'd']]] | ||
result = table.replace lookup_table 'duplicate_key' | ||
result . should_equal expected | ||
|
||
group_builder.specify "(edge-case) should allow lookup with itself" <| | ||
table = table_builder [['x', [2, 1, 4, 3]], ['y', [20, 10, 40, 30]]] | ||
expected = table_builder [['x', [20, 10, 40, 30]], ['y', [20, 10, 40, 30]]] | ||
result = table.replace table 'x' | ||
result . should_equal expected | ||
Comment on lines
+89
to
+93
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cool :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I got that from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. :D |
||
|
||
group_builder.specify "should not merge columns other than the one specified in the `column` param" <| | ||
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']], ['q', [4, 5, 6, 7, 8]]] | ||
lookup_table = table_builder [['x', [2, 1, 4, 3]], ['z', [20, 10, 40, 30]], ['q', [40, 50, 60, 70]]] | ||
expected = table_builder [['x', [10, 20, 30, 40, 20]], ['y', ['a', 'b', 'c', 'd', 'e']], ['q', [4, 5, 6, 7, 8]]] | ||
result = table.replace lookup_table 'x' | ||
result . should_equal expected | ||
|
||
group_builder.specify "should fail on null key values in lookup table" <| | ||
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]] | ||
lookup_table = table_builder [['x', [2, 1, Nothing, 3]], ['z', [20, 10, 40, 30]]] | ||
table.replace lookup_table 'x' . should_fail_with Null_Values_In_Key_Columns | ||
|
||
group_builder.specify "should not allow from/to_coumn to specified if the argument is a Map" <| | ||
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]] | ||
lookup_table = Map.from_vector [[2, 20], [1, 10], [4, 40], [3, 30]] | ||
table.replace lookup_table 'x' from_column=8 . should_fail_with Illegal_Argument | ||
table.replace lookup_table 'x' to_column=9 . should_fail_with Illegal_Argument | ||
table.replace lookup_table 'x' from_column=8 to_column=9 . should_fail_with Illegal_Argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add support for
to Table
from aMap
.Then we can take a Table and use conversions.
We can then use 0 for the
from_column
and 1 forto_column
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've enough changes in part 2 (Database Table.replace) that I'd like to change to conversion style after that's done.
#8984