Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Table.replace for the in-memory backend #8935

Merged
merged 36 commits into from
Feb 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
756dfa9
one test
GregoryTravis Jan 30, 2024
58da2fd
hack
GregoryTravis Jan 31, 2024
f2cd94a
Merge branch 'develop' into wip/gmt/8578-Table.replace
GregoryTravis Jan 31, 2024
0501d05
revert hack
GregoryTravis Jan 31, 2024
fcab4e0
use merge
GregoryTravis Jan 31, 2024
f2a58f6
unhack
GregoryTravis Jan 31, 2024
5ed3fed
example
GregoryTravis Jan 31, 2024
8caac00
tests
GregoryTravis Jan 31, 2024
74bc949
tests
GregoryTravis Jan 31, 2024
69753fa
scramble lookup table order in tests
GregoryTravis Jan 31, 2024
4cf1309
duplicate inputs
GregoryTravis Jan 31, 2024
aae2bd2
self-lookup
GregoryTravis Jan 31, 2024
d458ed8
type test
GregoryTravis Jan 31, 2024
f924690
remove incorrect test
GregoryTravis Feb 1, 2024
bf142c0
remove materialize
GregoryTravis Feb 1, 2024
b97841c
db stub
GregoryTravis Feb 1, 2024
b88633e
unused imports, widgets
GregoryTravis Feb 1, 2024
2a02eb0
cleanup
GregoryTravis Feb 1, 2024
2b7ad13
Merge branch 'develop' into wip/gmt/8578-Table.replace
GregoryTravis Feb 1, 2024
8b7c361
rename replace_column to column
GregoryTravis Feb 1, 2024
9a084d0
docs, comment, cleanup
GregoryTravis Feb 1, 2024
64d43f3
comment
GregoryTravis Feb 1, 2024
4acb007
fix docs
GregoryTravis Feb 1, 2024
df35418
convert from Map
GregoryTravis Feb 1, 2024
5ac57ce
changelog
GregoryTravis Feb 1, 2024
d01152d
cleanup
GregoryTravis Feb 1, 2024
69cdb47
Merge branch 'develop' into wip/gmt/8578-Table.replace
GregoryTravis Feb 2, 2024
76368e7
review
GregoryTravis Feb 2, 2024
4ea9c22
review
GregoryTravis Feb 2, 2024
f9aaf71
review
GregoryTravis Feb 5, 2024
7ad9723
Merge branch 'develop' into wip/gmt/8578-Table.replace
GregoryTravis Feb 5, 2024
fe389f8
review
GregoryTravis Feb 5, 2024
645805b
update docs
GregoryTravis Feb 5, 2024
02a0d54
Merge branch 'develop' into wip/gmt/8578-Table.replace
GregoryTravis Feb 6, 2024
80dbc76
Merge branch 'develop' into wip/gmt/8578-Table.replace
GregoryTravis Feb 6, 2024
5a1b6d0
no table_builder
GregoryTravis Feb 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -611,6 +611,7 @@
`Filter_Condition`.][8865]
- [Added `File_By_Line` type allowing processing a file line by line. New faster
JSON parser based off Jackson.][8719]
- [Implemented `Table.replace` for the in-memory backend.][8935]

[debug-shortcuts]:
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
Expand Down Expand Up @@ -878,6 +879,7 @@
[8816]: https://github.com/enso-org/enso/pull/8816
[8849]: https://github.com/enso-org/enso/pull/8849
[8865]: https://github.com/enso-org/enso/pull/8865
[8935]: https://github.com/enso-org/enso/pull/8935

#### Enso Compiler

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1403,7 +1403,7 @@ type Table
- If a column that is being updated from the lookup table has a type
that is not compatible with the type of the corresponding column in
this table, a `No_Common_Type` error is raised.
- If a key column contains `Nothing` values, either in the lookup table,
- If a key column contains `Nothing` values in the lookup table,
a `Null_Values_In_Key_Columns` error is raised.
- If `allow_unmatched_rows` is `False` and there are rows in this table
that do not have a matching row in the lookup table, an
Expand All @@ -1420,6 +1420,85 @@ type Table
Helpers.ensure_same_connection "table" [self, lookup_table] <|
Lookup_Query_Helper.build_lookup_query self lookup_table key_columns add_new_columns allow_unmatched_rows on_problems

## ALIAS find replace
GROUP Standard.Base.Calculations
ICON join
Replaces values in `column` using `lookup_table` to specify a
mapping from old to new values.

Arguments:
- lookup_table: the table to use as a mapping from old to new values. A
`Map` can also be used here (in which case passing `from_column` or
`to_column` is disallowed and will throw an `Illegal_Argument` error.
- column: the column within `self` to perform the replace on.
- from_column: the column within `lookup_table` to match against `column`
in `self`.
- to_column: the column within `lookup_table` to get new values from.
- allow_unmatched_rows: Specifies how to handle missing rows in the lookup.
If `False` (the default), an `Unmatched_Rows_In_Lookup` error is raised.
If `True`, the unmatched rows are left unchanged. Any new columns will
be filled with `Nothing`.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.

? Result Ordering

When operating in-memory, this operation preserves the order of rows
from this table (unlike `join`).
In the Database backend, there are no guarantees related to ordering of
results.

? Error Conditions

- If this table or the lookup table is lacking any of the columns
specified by `from_column`, `to_column`, or `column`, a
`Missing_Input_Columns` error is raised.
- If a single row is matched by multiple entries in the lookup table,
a `Non_Unique_Key` error is raised.
- If a column that is being updated from the lookup table has a type
that is not compatible with the type of the corresponding column in
this table, a `No_Common_Type` error is raised.
- If a key column contains `Nothing` values in the lookup table,
a `Null_Values_In_Key_Columns` error is raised.
- If `allow_unmatched_rows` is `False` and there are rows in this table
that do not have a matching row in the lookup table, an
`Unmatched_Rows_In_Lookup` error is raised.
- The following problems may be reported according to the `on_problems`
setting:
- If any of the `columns` is a floating-point type,
a `Floating_Point_Equality`.

> Example
Replace values in column 'x' using a lookup table.

table = Table.new [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']], ['z', ['e', 'f', 'g', 'h']]]
# | x | y | z
# ---+---+---+---
# 0 | 1 | a | e
# 1 | 2 | b | f
# 2 | 3 | c | g
# 3 | 4 | d | h

lookup_table = Table.new [['x', [1, 2, 3, 4]], ['new_x', [10, 20, 30, 40]]]
# | old_x | new_x
# ---+-------+-------
# 0 | 1 | 10
# 1 | 2 | 20
# 2 | 3 | 30
# 3 | 4 | 40

result = table.replace lookup_table 'x'
# | x | y | z
# ---+----+---+---
# 0 | 10 | a | e
# 1 | 20 | b | f
# 2 | 30 | c | g
# 3 | 40 | d | h
replace : Table | Map -> (Text | Integer) -> (Text | Integer) -> (Text | Integer) -> Boolean -> Problem_Behavior -> Table ! Missing_Input_Columns | Non_Unique_Key | Unmatched_Rows_In_Lookup
replace self lookup_table:(Table | Map) column:(Text | Integer) from_column:(Text | Integer)=0 to_column:(Text | Integer)=1 allow_unmatched_rows:Boolean=True on_problems:Problem_Behavior=Problem_Behavior.Report_Warning =
_ = [lookup_table, column, from_column, to_column, allow_unmatched_rows, on_problems]
Error.throw (Unsupported_Database_Operation.Error "Table.replace is not implemented yet for the Database backends.")

## ALIAS join by row position
GROUP Standard.Base.Calculations
ICON dataframes_join
Expand Down
115 changes: 114 additions & 1 deletion distribution/lib/Standard/Table/0.0.0-dev/src/Data/Table.enso
Original file line number Diff line number Diff line change
Expand Up @@ -1913,7 +1913,7 @@ type Table
- If a column that is being updated from the lookup table has a type
that is not compatible with the type of the corresponding column in
this table, a `No_Common_Type` error is raised.
- If a key column contains `Nothing` values, either in the lookup table,
- If a key column contains `Nothing` values in the lookup table,
a `Null_Values_In_Key_Columns` error is raised.
- If `allow_unmatched_rows` is `False` and there are rows in this table
that do not have a matching row in the lookup table, an
Expand Down Expand Up @@ -1959,6 +1959,112 @@ type Table
java_table = LookupJoin.lookupAndReplace java_keys java_descriptions allow_unmatched_rows java_problem_aggregator
Table.Value java_table

## ALIAS find replace
GROUP Standard.Base.Calculations
ICON join
Replaces values in `column` using `lookup_table` to specify a
mapping from old to new values.

Arguments:
- lookup_table: the table to use as a mapping from old to new values. A
`Map` can also be used here (in which case passing `from_column` or
`to_column` is disallowed and will throw an `Illegal_Argument` error.
- column: the column within `self` to perform the replace on.
- from_column: the column within `lookup_table` to match against `column`
in `self`.
- to_column: the column within `lookup_table` to get new values from.
- allow_unmatched_rows: Specifies how to handle missing rows in the lookup.
If `False` (the default), an `Unmatched_Rows_In_Lookup` error is raised.
If `True`, the unmatched rows are left unchanged. Any new columns will
be filled with `Nothing`.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.

? Result Ordering

When operating in-memory, this operation preserves the order of rows
from this table (unlike `join`).
In the Database backend, there are no guarantees related to ordering of
results.

? Error Conditions

- If this table or the lookup table is lacking any of the columns
specified by `from_column`, `to_column`, or `column`, a
`Missing_Input_Columns` error is raised.
- If a single row is matched by multiple entries in the lookup table,
a `Non_Unique_Key` error is raised.
- If a column that is being updated from the lookup table has a type
that is not compatible with the type of the corresponding column in
this table, a `No_Common_Type` error is raised.
- If a key column contains `Nothing` values in the lookup table,
a `Null_Values_In_Key_Columns` error is raised.
- If `allow_unmatched_rows` is `False` and there are rows in this table
that do not have a matching row in the lookup table, an
`Unmatched_Rows_In_Lookup` error is raised.
- The following problems may be reported according to the `on_problems`
setting:
- If any of the `columns` is a floating-point type,
a `Floating_Point_Equality`.

> Example
Replace values in column 'x' using a lookup table.

table = Table.new [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']], ['z', ['e', 'f', 'g', 'h']]]
# | x | y | z
# ---+---+---+---
# 0 | 1 | a | e
# 1 | 2 | b | f
# 2 | 3 | c | g
# 3 | 4 | d | h

lookup_table = Table.new [['x', [1, 2, 3, 4]], ['new_x', [10, 20, 30, 40]]]
# | old_x | new_x
# ---+-------+-------
# 0 | 1 | 10
# 1 | 2 | 20
# 2 | 3 | 30
# 3 | 4 | 40

result = table.replace lookup_table 'x'
# | x | y | z
# ---+----+---+---
# 0 | 10 | a | e
# 1 | 20 | b | f
# 2 | 30 | c | g
# 3 | 40 | d | h
@column Widget_Helpers.make_column_name_selector
@from_column Widget_Helpers.make_column_name_selector
@to_column Widget_Helpers.make_column_name_selector
replace : Table | Map -> (Text | Integer) -> (Text | Integer | Nothing) -> (Text | Integer | Nothing) -> Boolean -> Problem_Behavior -> Table ! Missing_Input_Columns | Non_Unique_Key | Unmatched_Rows_In_Lookup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add support for to Table from a Map.
image

Then we can take a Table and use conversions.

We can then use 0 for the from_column and 1 for to_column.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've enough changes in part 2 (Database Table.replace) that I'd like to change to conversion style after that's done.

#8984

replace self lookup_table:(Table | Map) column:(Text | Integer) from_column:(Text | Integer | Nothing)=Nothing to_column:(Text | Integer | Nothing)=Nothing allow_unmatched_rows:Boolean=True on_problems:Problem_Behavior=Problem_Behavior.Report_Warning =
case lookup_table of
_ : Map ->
if from_column.is_nothing.not || to_column.is_nothing.not then Error.throw (Illegal_Argument.Error "If a Map is provided as the lookup_table, then from_column and to_column should not also be specified.") else
self.replace (map_to_lookup_table lookup_table 'from' 'to') column 'from' 'to' allow_unmatched_rows=allow_unmatched_rows on_problems=on_problems
_ : Table ->
from_column_resolved = from_column.if_nothing 0
to_column_resolved = to_column.if_nothing 1
Comment on lines +2046 to +2047
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we use a Map conversion we can avoid needing this.

selected_lookup_columns = lookup_table.select_columns [from_column_resolved, to_column_resolved]
self.select_columns column . if_not_error <| selected_lookup_columns . if_not_error <|
unique = self.column_naming_helper.create_unique_name_strategy
unique.mark_used (self.column_names)

## We perform a `merge` into `column`, using a duplicate of `column`
as the key column to join with `from_column`.

duplicate_key_column_name = unique.make_unique "duplicate_key"
duplicate_key_column = self.at column . rename duplicate_key_column_name
self_with_duplicate = self.set duplicate_key_column set_mode=Set_Mode.Add

## Create a lookup table with just `to_column` and `from_column`,
renamed to match the base table's `column` and its duplicate,
respectively.
lookup_table_renamed = selected_lookup_columns . rename_columns (Map.from_vector [[from_column_resolved, duplicate_key_column_name], [to_column_resolved, column]])

merged = self_with_duplicate.merge lookup_table_renamed duplicate_key_column_name add_new_columns=False allow_unmatched_rows=allow_unmatched_rows on_problems=on_problems
merged.remove_columns duplicate_key_column_name

## ALIAS join by row position
GROUP Standard.Base.Calculations
ICON dataframes_join
Expand Down Expand Up @@ -2701,6 +2807,13 @@ concat_columns column_set all_tables result_type result_row_count on_problems =
sealed_storage = storage_builder.seal
Column.from_storage column_set.name sealed_storage

## PRIVATE
A helper that creates a two-column table from a map.
map_to_lookup_table : Map Any Any -> Text -> Text -> Table
map_to_lookup_table map key_column value_column =
keys_and_values = map.to_vector
Table.new [[key_column, keys_and_values.map .first], [value_column, keys_and_values.map .second]]

Comment on lines +2810 to +2816
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can avoid this if we do the conversion approach.

## PRIVATE
Conversion method to a Table from a Column.
Table.from (that:Column) = that.to_table
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ type Data

setup create_connection_fn =
Data.Value (create_connection_fn Nothing)

teardown self = self.connection.close


Expand Down
112 changes: 112 additions & 0 deletions test/Table_Tests/src/Common_Table_Operations/Join/Replace_Spec.enso
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
from Standard.Base import all
import Standard.Base.Errors.Illegal_Argument.Illegal_Argument

from Standard.Table import all
from Standard.Table.Errors import all

from Standard.Database import all

from Standard.Test_New import all

from project.Common_Table_Operations.Util import run_default_backend
import project.Util

main = run_default_backend add_specs

type Data
Value ~connection

setup create_connection_fn =
Data.Value (create_connection_fn Nothing)

teardown self = self.connection.close


add_specs suite_builder setup =
prefix = setup.prefix
create_connection_fn = setup.create_connection_func
suite_builder.group prefix+"Table.replace" group_builder->
data = Data.setup create_connection_fn

group_builder.teardown <|
data.teardown

table_builder cols =
setup.table_builder cols connection=data.connection

group_builder.specify "should be able to replace values via a lookup table, using from/to column defaults" <|
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]]
lookup_table = table_builder [['x', [2, 1, 4, 3]], ['z', [20, 10, 40, 30]]]
expected = table_builder [['x', [10, 20, 30, 40, 20]], ['y', ['a', 'b', 'c', 'd', 'e']]]
result = table.replace lookup_table 'x'
result . should_equal expected

group_builder.specify "should be able to replace values via a lookup table, specifying from/to columns" <|
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]]
lookup_table = table_builder [['d', [4, 5, 6, 7]], ['x', [2, 1, 4, 3]], ['d2', [5, 6, 7, 8]], ['z', [20, 10, 40, 30]]]
expected = table_builder [['x', [10, 20, 30, 40, 20]], ['y', ['a', 'b', 'c', 'd', 'e']]]
result = table.replace lookup_table 'x' 'x' 'z'
result . should_equal expected

group_builder.specify "should be able to replace values via a lookup table provided as a Map" <|
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]]
lookup_table = Map.from_vector [[2, 20], [1, 10], [4, 40], [3, 30]]
expected = table_builder [['x', [10, 20, 30, 40, 20]], ['y', ['a', 'b', 'c', 'd', 'e']]]
result = table.replace lookup_table 'x'
result . should_equal expected

group_builder.specify "should fail with Missing_Input_Columns if the specified columns do not exist" <|
table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]]
lookup_table = table_builder [['x', [2, 1, 4, 3]], ['z', [20, 10, 40, 30]]]
table.replace lookup_table 'q' 'x' 'z' . should_fail_with Missing_Input_Columns
table.replace lookup_table 'x' 'q' 'z' . should_fail_with Missing_Input_Columns
table.replace lookup_table 'x' 'x' 'q' . should_fail_with Missing_Input_Columns

group_builder.specify "can allow unmatched rows" <|
table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]]
lookup_table = table_builder [['x', [4, 3, 1]], ['z', [40, 30, 10]]]
expected = table_builder [['x', [10, 2, 30, 40]], ['y', ['a', 'b', 'c', 'd']]]
result = table.replace lookup_table 'x'
result . should_equal expected

group_builder.specify "fails on unmatched rows" <|
table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]]
lookup_table = table_builder [['x', [4, 3, 1]], ['z', [40, 30, 10]]]
table.replace lookup_table 'x' allow_unmatched_rows=False . should_fail_with Unmatched_Rows_In_Lookup

group_builder.specify "fails on non-unique keys" <|
table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]]
lookup_table = table_builder [['x', [2, 1, 4, 1, 3]], ['z', [20, 10, 40, 11, 30]]]
table.replace lookup_table 'x' . should_fail_with Non_Unique_Key

group_builder.specify "should avoid name clashes in the (internally) generated column name" <|
table = table_builder [['duplicate_key', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]]
lookup_table = table_builder [['x', [2, 1, 4, 3]], ['z', [20, 10, 40, 30]]]
expected = table_builder [['duplicate_key', [10, 20, 30, 40]], ['y', ['a', 'b', 'c', 'd']]]
result = table.replace lookup_table 'duplicate_key'
result . should_equal expected

group_builder.specify "(edge-case) should allow lookup with itself" <|
table = table_builder [['x', [2, 1, 4, 3]], ['y', [20, 10, 40, 30]]]
expected = table_builder [['x', [20, 10, 40, 30]], ['y', [20, 10, 40, 30]]]
result = table.replace table 'x'
result . should_equal expected
Comment on lines +89 to +93
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got that from Lookup_Spec.enso 😀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:D


group_builder.specify "should not merge columns other than the one specified in the `column` param" <|
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']], ['q', [4, 5, 6, 7, 8]]]
lookup_table = table_builder [['x', [2, 1, 4, 3]], ['z', [20, 10, 40, 30]], ['q', [40, 50, 60, 70]]]
expected = table_builder [['x', [10, 20, 30, 40, 20]], ['y', ['a', 'b', 'c', 'd', 'e']], ['q', [4, 5, 6, 7, 8]]]
result = table.replace lookup_table 'x'
result . should_equal expected

group_builder.specify "should fail on null key values in lookup table" <|
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]]
lookup_table = table_builder [['x', [2, 1, Nothing, 3]], ['z', [20, 10, 40, 30]]]
table.replace lookup_table 'x' . should_fail_with Null_Values_In_Key_Columns

group_builder.specify "should not allow from/to_coumn to specified if the argument is a Map" <|
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]]
lookup_table = Map.from_vector [[2, 20], [1, 10], [4, 40], [3, 30]]
table.replace lookup_table 'x' from_column=8 . should_fail_with Illegal_Argument
table.replace lookup_table 'x' to_column=9 . should_fail_with Illegal_Argument
table.replace lookup_table 'x' from_column=8 to_column=9 . should_fail_with Illegal_Argument
Loading