Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Table.replace for the in-memory backend #8935

Merged
merged 36 commits into from
Feb 6, 2024
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
756dfa9
one test
GregoryTravis Jan 30, 2024
58da2fd
hack
GregoryTravis Jan 31, 2024
f2cd94a
Merge branch 'develop' into wip/gmt/8578-Table.replace
GregoryTravis Jan 31, 2024
0501d05
revert hack
GregoryTravis Jan 31, 2024
fcab4e0
use merge
GregoryTravis Jan 31, 2024
f2a58f6
unhack
GregoryTravis Jan 31, 2024
5ed3fed
example
GregoryTravis Jan 31, 2024
8caac00
tests
GregoryTravis Jan 31, 2024
74bc949
tests
GregoryTravis Jan 31, 2024
69753fa
scramble lookup table order in tests
GregoryTravis Jan 31, 2024
4cf1309
duplicate inputs
GregoryTravis Jan 31, 2024
aae2bd2
self-lookup
GregoryTravis Jan 31, 2024
d458ed8
type test
GregoryTravis Jan 31, 2024
f924690
remove incorrect test
GregoryTravis Feb 1, 2024
bf142c0
remove materialize
GregoryTravis Feb 1, 2024
b97841c
db stub
GregoryTravis Feb 1, 2024
b88633e
unused imports, widgets
GregoryTravis Feb 1, 2024
2a02eb0
cleanup
GregoryTravis Feb 1, 2024
2b7ad13
Merge branch 'develop' into wip/gmt/8578-Table.replace
GregoryTravis Feb 1, 2024
8b7c361
rename replace_column to column
GregoryTravis Feb 1, 2024
9a084d0
docs, comment, cleanup
GregoryTravis Feb 1, 2024
64d43f3
comment
GregoryTravis Feb 1, 2024
4acb007
fix docs
GregoryTravis Feb 1, 2024
df35418
convert from Map
GregoryTravis Feb 1, 2024
5ac57ce
changelog
GregoryTravis Feb 1, 2024
d01152d
cleanup
GregoryTravis Feb 1, 2024
69cdb47
Merge branch 'develop' into wip/gmt/8578-Table.replace
GregoryTravis Feb 2, 2024
76368e7
review
GregoryTravis Feb 2, 2024
4ea9c22
review
GregoryTravis Feb 2, 2024
f9aaf71
review
GregoryTravis Feb 5, 2024
7ad9723
Merge branch 'develop' into wip/gmt/8578-Table.replace
GregoryTravis Feb 5, 2024
fe389f8
review
GregoryTravis Feb 5, 2024
645805b
update docs
GregoryTravis Feb 5, 2024
02a0d54
Merge branch 'develop' into wip/gmt/8578-Table.replace
GregoryTravis Feb 6, 2024
80dbc76
Merge branch 'develop' into wip/gmt/8578-Table.replace
GregoryTravis Feb 6, 2024
5a1b6d0
no table_builder
GregoryTravis Feb 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -611,6 +611,7 @@
`Filter_Condition`.][8865]
- [Added `File_By_Line` type allowing processing a file line by line. New faster
JSON parser based off Jackson.][8719]
- [Implemented `Table.replace` for the in-memory backend.][8935]

[debug-shortcuts]:
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
Expand Down Expand Up @@ -878,6 +879,7 @@
[8816]: https://github.com/enso-org/enso/pull/8816
[8849]: https://github.com/enso-org/enso/pull/8849
[8865]: https://github.com/enso-org/enso/pull/8865
[8935]: https://github.com/enso-org/enso/pull/8935

#### Enso Compiler

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1403,7 +1403,7 @@ type Table
- If a column that is being updated from the lookup table has a type
that is not compatible with the type of the corresponding column in
this table, a `No_Common_Type` error is raised.
- If a key column contains `Nothing` values, either in the lookup table,
- If a key column contains `Nothing` values in the lookup table,
a `Null_Values_In_Key_Columns` error is raised.
- If `allow_unmatched_rows` is `False` and there are rows in this table
that do not have a matching row in the lookup table, an
Expand All @@ -1420,6 +1420,85 @@ type Table
Helpers.ensure_same_connection "table" [self, lookup_table] <|
Lookup_Query_Helper.build_lookup_query self lookup_table key_columns add_new_columns allow_unmatched_rows on_problems

## ALIAS find replace
GROUP Standard.Base.Calculations
ICON join
Replaces values in `column` using `lookup_table` to specify a
mapping from old to new values.

Arguments:
- lookup_table: the table to use as a mapping from old to new values. A
`Map` can also be used here (in which case `from_column` and
`to_column` are ignored).
- column: the column within `self` to perform the replace on.
- from_column: the column within `lookup_table` to match against `column`
in `self`.
- to_column: the column within `lookup_table` to get new values from.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really like to avoid

(in which case from_column and to_column are ignored

IMO having arguments that only make sense in one context but are to be ignored in another is not great design. I remember we used to try really hard to avoid issues like that in the initial designs for Table, like select_columns back in the day.

What if we replace this with lookup_table being Lookup | Map, where

type Lookup
    Table table (from : Integer | Text) (to : Integer | Text)

Then, in 'table' mode we'd write:

src_table.replace (Lookup.Table lookup "from" "to")

and if we do a map:

src_table.replace map

so there's no ignored arguments like now.


A downside is you have to create the nested Lookup atom. This can be done in the GUI by widgets, but it is still a bit problematic as then just dragging a lookup table onto it will not work (until we have 'smart' from conversions that somehow allow to add more arguments whenever an implicit conversion from Table to Lookup happens).


This downside is quite problematic - so I guess maybe we should just keep it as is... Just wanted to think through, because if possible, I think it would be good to avoid such ignored arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is definitely worth thinking through. The approach in enso is to have methods that do things in multiple ways, often with parameters that are only meaningful in some cases, so this will be keep being a question.

The 'smart' conversion could perhaps work like this:

type Lookup
    Table table (from : Integer | Text) (to : Integer | Text)
    Map (map: Map)

Lookup.from (table:Table) = Table.table # arguments are not supplied

This would be a non-standard way to use from since the type would be (Table -> Integer | Text -> Integer | Text -> Lookup).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be an error to supply a Map and also from_column or to_column? I think maybe it should, so that would change "ignored" to "not allowed".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach in enso is to have methods that do things in multiple ways, often with parameters that are only meaningful in some cases, so this will be keep being a question.

Is it? I think we try to avoid it whenever possible. What methods already do this?

Should it be an error to supply a Map and also from_column or to_column? I think maybe it should, so that would change "ignored" to "not allowed".

Yes, I think that is a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach in enso is to have methods that do things in multiple ways, often with parameters that are only meaningful in some cases, so this will be keep being a question.

Is it? I think we try to avoid it whenever possible. What methods already do this?

I guess what I should have said is: I have seen this before (I don't remember where) and I expect we will be tempted again. It's definitely something we should avoid if we can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now throws an error if from/to_column is used with a Map.

- allow_unmatched_rows: Specifies how to handle missing rows in the lookup.
If `False` (the default), an `Unmatched_Rows_In_Lookup` error is raised.
If `True`, the unmatched rows are left unchanged. Any new columns will
be filled with `Nothing`.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.

? Result Ordering

When operating in-memory, this operation preserves the order of rows
from this table (unlike `join`).
In the Database backend, there are no guarantees related to ordering of
results.

? Error Conditions

- If this table or the lookup table is lacking any of the columns
specified by `from_column`, `to_column`, or `column`, a
`Missing_Input_Columns` error is raised.
- If a single row is matched by multiple entries in the lookup table,
a `Non_Unique_Key` error is raised.
- If a column that is being updated from the lookup table has a type
that is not compatible with the type of the corresponding column in
this table, a `No_Common_Type` error is raised.
- If a key column contains `Nothing` values in the lookup table,
a `Null_Values_In_Key_Columns` error is raised.
- If `allow_unmatched_rows` is `False` and there are rows in this table
that do not have a matching row in the lookup table, an
`Unmatched_Rows_In_Lookup` error is raised.
- The following problems may be reported according to the `on_problems`
setting:
- If any of the `columns` is a floating-point type,
a `Floating_Point_Equality`.

> Example
Replace values in column 'x' using a lookup table.

table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']], ['z', ['e', 'f', 'g', 'h']]]
GregoryTravis marked this conversation as resolved.
Show resolved Hide resolved
# | x | y | z
# ---+---+---+---
# 0 | 1 | a | e
# 1 | 2 | b | f
# 2 | 3 | c | g
# 3 | 4 | d | h

lookup_table = table_builder [['x', [1, 2, 3, 4]], ['new_x', [10, 20, 30, 40]]]
# | old_x | new_x
# ---+-------+-------
# 0 | 1 | 10
# 1 | 2 | 20
# 2 | 3 | 30
# 3 | 4 | 40

result = table.replace lookup_table 'x'
# | x | y | z
# ---+----+---+---
# 0 | 10 | a | e
# 1 | 20 | b | f
# 2 | 30 | c | g
# 3 | 40 | d | h
replace : Table | Map -> (Text | Integer) -> (Text | Integer) -> (Text | Integer) -> Boolean -> Problem_Behavior -> Table ! Missing_Input_Columns | Non_Unique_Key | Unmatched_Rows_In_Lookup
replace self lookup_table:(Table | Map) column:(Text | Integer) from_column:(Text | Integer)=0 to_column:(Text | Integer)=1 allow_unmatched_rows:Boolean=True on_problems:Problem_Behavior=Problem_Behavior.Report_Warning =
_ = [lookup_table, column, from_column, to_column, allow_unmatched_rows, on_problems]
Error.throw (Unsupported_Database_Operation.Error "Table.replace is not implemented yet for the Database backends.")

## ALIAS join by row position
GROUP Standard.Base.Calculations
ICON dataframes_join
Expand Down
111 changes: 110 additions & 1 deletion distribution/lib/Standard/Table/0.0.0-dev/src/Data/Table.enso
Original file line number Diff line number Diff line change
Expand Up @@ -1913,7 +1913,7 @@ type Table
- If a column that is being updated from the lookup table has a type
that is not compatible with the type of the corresponding column in
this table, a `No_Common_Type` error is raised.
- If a key column contains `Nothing` values, either in the lookup table,
- If a key column contains `Nothing` values in the lookup table,
a `Null_Values_In_Key_Columns` error is raised.
- If `allow_unmatched_rows` is `False` and there are rows in this table
that do not have a matching row in the lookup table, an
Expand Down Expand Up @@ -1959,6 +1959,109 @@ type Table
java_table = LookupJoin.lookupAndReplace java_keys java_descriptions allow_unmatched_rows java_problem_aggregator
Table.Value java_table

## ALIAS find replace
GROUP Standard.Base.Calculations
ICON join
Replaces values in `column` using `lookup_table` to specify a
mapping from old to new values.

Arguments:
- lookup_table: the table to use as a mapping from old to new values. A
`Map` can also be used here (in which case `from_column` and
`to_column` are ignored).
- column: the column within `self` to perform the replace on.
- from_column: the column within `lookup_table` to match against `column`
in `self`.
- to_column: the column within `lookup_table` to get new values from.
- allow_unmatched_rows: Specifies how to handle missing rows in the lookup.
If `False` (the default), an `Unmatched_Rows_In_Lookup` error is raised.
If `True`, the unmatched rows are left unchanged. Any new columns will
be filled with `Nothing`.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.

? Result Ordering

When operating in-memory, this operation preserves the order of rows
from this table (unlike `join`).
In the Database backend, there are no guarantees related to ordering of
results.

? Error Conditions

- If this table or the lookup table is lacking any of the columns
specified by `from_column`, `to_column`, or `column`, a
`Missing_Input_Columns` error is raised.
- If a single row is matched by multiple entries in the lookup table,
a `Non_Unique_Key` error is raised.
- If a column that is being updated from the lookup table has a type
that is not compatible with the type of the corresponding column in
this table, a `No_Common_Type` error is raised.
- If a key column contains `Nothing` values in the lookup table,
a `Null_Values_In_Key_Columns` error is raised.
- If `allow_unmatched_rows` is `False` and there are rows in this table
that do not have a matching row in the lookup table, an
`Unmatched_Rows_In_Lookup` error is raised.
- The following problems may be reported according to the `on_problems`
setting:
- If any of the `columns` is a floating-point type,
a `Floating_Point_Equality`.

> Example
Replace values in column 'x' using a lookup table.

table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']], ['z', ['e', 'f', 'g', 'h']]]
# | x | y | z
# ---+---+---+---
# 0 | 1 | a | e
# 1 | 2 | b | f
# 2 | 3 | c | g
# 3 | 4 | d | h

lookup_table = table_builder [['x', [1, 2, 3, 4]], ['new_x', [10, 20, 30, 40]]]
# | old_x | new_x
# ---+-------+-------
# 0 | 1 | 10
# 1 | 2 | 20
# 2 | 3 | 30
# 3 | 4 | 40

result = table.replace lookup_table 'x'
# | x | y | z
# ---+----+---+---
# 0 | 10 | a | e
# 1 | 20 | b | f
# 2 | 30 | c | g
# 3 | 40 | d | h
@column Widget_Helpers.make_column_name_selector
@from_column Widget_Helpers.make_column_name_selector
@to_column Widget_Helpers.make_column_name_selector
replace : Table | Map -> (Text | Integer) -> (Text | Integer) -> (Text | Integer) -> Boolean -> Problem_Behavior -> Table ! Missing_Input_Columns | Non_Unique_Key | Unmatched_Rows_In_Lookup
replace self lookup_table:(Table | Map) column:(Text | Integer) from_column:(Text | Integer)=0 to_column:(Text | Integer)=1 allow_unmatched_rows:Boolean=True on_problems:Problem_Behavior=Problem_Behavior.Report_Warning =
case lookup_table of
_ : Map ->
self.replace (map_to_lookup_table lookup_table 'from' 'to') column 'from' 'to' allow_unmatched_rows=allow_unmatched_rows on_problems=on_problems
_ : Table ->
self.select_columns column . if_not_error <| lookup_table.select_columns [from_column, to_column] . if_not_error <|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do this:

Suggested change
self.select_columns column . if_not_error <| lookup_table.select_columns [from_column, to_column] . if_not_error <|
selected_lookup_columns = lookup_table.select_columns [from_column, to_column]
self.select_columns column . if_not_error <| selected_lookup_columns . if_not_error <|

then you can use selected_lookup_columns instead of running the select twice for no need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

unique = self.column_naming_helper.create_unique_name_strategy
unique.mark_used (self.column_names)
unique.mark_used (lookup_table.column_names)
GregoryTravis marked this conversation as resolved.
Show resolved Hide resolved

## We perform a `merge` into `column`, using a duplicate of `column`
as the key column to join with `from_column`.

duplicate_key_column_name = unique.make_unique "duplicate_key"
duplicate_key_column = self.at column . rename duplicate_key_column_name
self_with_duplicate = Table.new ([duplicate_key_column] + self.columns)
GregoryTravis marked this conversation as resolved.
Show resolved Hide resolved

## Create a lookup table with just `to_column` and `from_column`,
renamed to match the base table's `column` and its duplicate,
respectively.
lookup_table_renamed = Table.new [lookup_table.at from_column . rename duplicate_key_column_name, lookup_table.at to_column . rename column]
GregoryTravis marked this conversation as resolved.
Show resolved Hide resolved

merged = self_with_duplicate.merge lookup_table_renamed duplicate_key_column_name add_new_columns=False allow_unmatched_rows=allow_unmatched_rows on_problems=on_problems
merged.remove_columns duplicate_key_column_name

## ALIAS join by row position
GROUP Standard.Base.Calculations
ICON dataframes_join
Expand Down Expand Up @@ -2701,6 +2804,12 @@ concat_columns column_set all_tables result_type result_row_count on_problems =
sealed_storage = storage_builder.seal
Column.from_storage column_set.name sealed_storage

## PRIVATE
A helper that creates a two-column table from a map.
map_to_lookup_table : Map Any Any -> Text -> Text -> Table
map_to_lookup_table map key_column value_column =
Table.new [[key_column, map.keys], [value_column, map.values]]
GregoryTravis marked this conversation as resolved.
Show resolved Hide resolved

## PRIVATE
Conversion method to a Table from a Column.
Table.from (that:Column) = that.to_table
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ type Data

setup create_connection_fn =
Data.Value (create_connection_fn Nothing)

teardown self = self.connection.close


Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
from Standard.Base import all
import Standard.Base.Errors.Illegal_Argument.Illegal_Argument

from Standard.Table import all
from Standard.Table.Errors import all

from Standard.Database import all

from Standard.Test_New import all

from project.Common_Table_Operations.Util import run_default_backend
import project.Util

main = run_default_backend add_specs

type Data
Value ~connection

setup create_connection_fn =
Data.Value (create_connection_fn Nothing)

teardown self = self.connection.close


add_specs suite_builder setup =
prefix = setup.prefix
create_connection_fn = setup.create_connection_func
suite_builder.group prefix+"Table.replace" group_builder->
data = Data.setup create_connection_fn

group_builder.teardown <|
data.teardown

table_builder cols =
setup.table_builder cols connection=data.connection

group_builder.specify "should be able to replace values via a lookup table, using from/to column defaults" <|
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]]
lookup_table = table_builder [['x', [2, 1, 4, 3]], ['z', [20, 10, 40, 30]]]
expected = table_builder [['x', [10, 20, 30, 40, 20]], ['y', ['a', 'b', 'c', 'd', 'e']]]
result = table.replace lookup_table 'x'
result . should_equal expected

group_builder.specify "should be able to replace values via a lookup table, specifying from/to columns" <|
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]]
lookup_table = table_builder [['x', [2, 1, 4, 3]], ['z', [20, 10, 40, 30]]]
expected = table_builder [['x', [10, 20, 30, 40, 20]], ['y', ['a', 'b', 'c', 'd', 'e']]]
result = table.replace lookup_table 'x' 'x' 'z'
GregoryTravis marked this conversation as resolved.
Show resolved Hide resolved
result . should_equal expected

group_builder.specify "should be able to replace values via a lookup table provided as a Map" <|
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]]
lookup_table = Map.from_vector [[2, 20], [1, 10], [4, 40], [3, 30]]
expected = table_builder [['x', [10, 20, 30, 40, 20]], ['y', ['a', 'b', 'c', 'd', 'e']]]
result = table.replace lookup_table 'x'
result . should_equal expected

group_builder.specify "should fail with Illegal_Argument if the specified columns do not exist" <|
GregoryTravis marked this conversation as resolved.
Show resolved Hide resolved
table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]]
lookup_table = table_builder [['x', [2, 1, 4, 3]], ['z', [20, 10, 40, 30]]]
table.replace lookup_table 'q' 'x' 'z' . should_fail_with Missing_Input_Columns
table.replace lookup_table 'x' 'q' 'z' . should_fail_with Missing_Input_Columns
table.replace lookup_table 'x' 'x' 'q' . should_fail_with Missing_Input_Columns

group_builder.specify "can allow unmatched rows" <|
table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]]
lookup_table = table_builder [['x', [4, 3, 1]], ['z', [40, 30, 10]]]
expected = table_builder [['x', [10, 2, 30, 40]], ['y', ['a', 'b', 'c', 'd']]]
result = table.replace lookup_table 'x'
result . should_equal expected

group_builder.specify "fails on unmatched rows" <|
table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]]
lookup_table = table_builder [['x', [4, 3, 1]], ['z', [40, 30, 10]]]
table.replace lookup_table 'x' allow_unmatched_rows=False . should_fail_with Unmatched_Rows_In_Lookup

group_builder.specify "fails on non-unique keys" <|
table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]]
lookup_table = table_builder [['x', [2, 1, 4, 1, 3]], ['z', [20, 10, 40, 11, 30]]]
table.replace lookup_table 'x' . should_fail_with Non_Unique_Key

group_builder.specify "should avoid name clashes in the generated column name" <|
table = table_builder [['duplicate_key', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]]
lookup_table = table_builder [['x', [2, 1, 4, 3]], ['z', [20, 10, 40, 30]]]
expected = table_builder [['duplicate_key', [10, 20, 30, 40]], ['y', ['a', 'b', 'c', 'd']]]
result = table.replace lookup_table 'duplicate_key'
result . should_equal expected
GregoryTravis marked this conversation as resolved.
Show resolved Hide resolved

group_builder.specify "(edge-case) should allow lookup with itself" <|
table = table_builder [['x', [2, 1, 4, 3]], ['y', [20, 10, 40, 30]]]
expected = table_builder [['x', [20, 10, 40, 30]], ['y', [20, 10, 40, 30]]]
result = table.replace table 'x'
result . should_equal expected
Comment on lines +89 to +93
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got that from Lookup_Spec.enso 😀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:D


group_builder.specify "should fail on null key values in lookup table" <|
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]]
lookup_table = table_builder [['x', [2, 1, Nothing, 3]], ['z', [20, 10, 40, 30]]]
table.replace lookup_table 'x' . should_fail_with Null_Values_In_Key_Columns
Loading