Skip to content

Commit

Permalink
Implement Table.lookup_and_replace for in-memory (#7979)
Browse files Browse the repository at this point in the history
- Closes #7749 implementing the in-memory logic.
- Additional complications have surfaced regarding the Database logic, so it has been split off into a separate ticket: #7981
  • Loading branch information
radeusgd authored Oct 10, 2023
1 parent a234e82 commit 6e0bd86
Show file tree
Hide file tree
Showing 20 changed files with 900 additions and 68 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -582,6 +582,7 @@
- [Implemented `Table.auto_value_types` for in-memory tables.][7908]
- [Implemented Text.substring to easily select part of a Text field][7913]
- [Implemented basic XML support][7947]
- [Implemented `Table.lookup_and_replace` for the in-memory backend.][7979]

[debug-shortcuts]:
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
Expand Down Expand Up @@ -828,6 +829,7 @@
[7908]: https://github.com/enso-org/enso/pull/7908
[7913]: https://github.com/enso-org/enso/pull/7913
[7947]: https://github.com/enso-org/enso/pull/7947
[7979]: https://github.com/enso-org/enso/pull/7979

#### Enso Compiler

Expand Down
58 changes: 58 additions & 0 deletions distribution/lib/Standard/Database/0.0.0-dev/src/Data/Table.enso
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ import Standard.Table.Data.Type.Value_Type_Helpers
import Standard.Table.Internal.Add_Row_Number
import Standard.Table.Internal.Aggregate_Column_Helper
import Standard.Table.Internal.Column_Naming_Helper.Column_Naming_Helper
import Standard.Table.Internal.Lookup_Helpers
import Standard.Table.Internal.Problem_Builder.Problem_Builder
import Standard.Table.Internal.Table_Helpers
import Standard.Table.Internal.Table_Helpers.Table_Column_Helper
Expand Down Expand Up @@ -1313,6 +1314,63 @@ type Table
on_problems.attach_problems_before limit_problems <|
self.join_or_cross_join right join_kind=Join_Kind_Cross.Cross on=[] right_prefix on_problems

## Replaces values in this table by values from a lookup table.
New values are looked up in the lookup table based on the `key_columns`.
Columns found in the lookup table values are replaced by values from the
lookup. Columns not found are left unchanged.
This operation is similar to `Table.update_rows`, but just returns a new
`Table` instance, instead of updating the table in-place (which is only
possible for Database tables).

Arguments:
- lookup_table: The table to use for looking up values.
- key_columns: Specifies the columns to use for correlating rows between
the two tables. Must identify values uniquely within `lookup_table`.
- add_new_columns: Specifies if new columns from the lookup table should
be added to the result. If `False`, an `Unexpected_Extra_Columns`
problem is reported.
- allow_unmatched_rows: Specifies how to handle missing rows in the lookup.
If `False` (the default), an `Unmatched_Rows_In_Lookup` error is raised.
If `True`, the unmatched rows are left unchanged. Any new columns will
be filled with `Nothing`.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.

? Result Ordering

When operating in-memory, this operation preserves the order of rows
from this table (unlike `join`).
In the Database backend, there are no guarantees related to ordering of
results.

? Error Conditions

- If this table or the lookup table is lacking any of the columns
specified in `key_columns`, a `Missing_Input_Columns` error is raised.
- If an empty vector is provided for `key_columns`, a
`No_Input_Columns_Selected` error is raised.
- If the lookup table contains multiple rows with the same values in
the `key_columns`, an `Non_Unique_Key` error is raised.
- If a column that is being updated from the lookup table has a type
that is not compatible with the type of the corresponding column in
this table, a `No_Common_Type` error is raised.
- If a key column contains `Nothing` values, either in the lookup table,
a `Null_Values_In_Key_Columns` error is raised.
- If `allow_unmatched_rows` is `False` and there are rows in this table
that do not have a matching row in the lookup table, an
`Unmatched_Rows_In_Lookup` error is raised.
- The following problems may be reported according to the `on_problems`
setting:
- If any of the `key_columns` is a floating-point type,
a `Floating_Point_Equality`.
- If `add_new_columns` is `False` and the lookup table has columns
that are not present in this table, an `Unexpected_Extra_Columns`.
@key_columns Widget_Helpers.make_column_name_vector_selector
lookup_and_replace : Table -> (Vector (Integer | Text | Regex) | Text | Integer | Regex) -> Boolean -> Boolean -> Problem_Behavior -> Table ! Missing_Input_Columns | Non_Unique_Key | Unmatched_Rows_In_Lookup
lookup_and_replace self lookup_table:Table key_columns:(Vector (Integer | Text | Regex) | Text | Integer | Regex) add_new_columns:Boolean=True allow_unmatched_rows:Boolean=True on_problems:Problem_Behavior=Problem_Behavior.Report_Warning =
_ = [lookup_table, key_columns, add_new_columns, allow_unmatched_rows, on_problems]
Error.throw (Unsupported_Database_Operation.Error "Table.lookup_and_replace is not implemented yet for the Database backends.")

## ALIAS join by row position
GROUP Standard.Base.Calculations
Joins two tables by zipping rows from both tables table together - the
Expand Down
31 changes: 0 additions & 31 deletions distribution/lib/Standard/Database/0.0.0-dev/src/Errors.enso
Original file line number Diff line number Diff line change
Expand Up @@ -147,25 +147,6 @@ type Table_Already_Exists
to_display_text : Text
to_display_text self = "Table " + self.table_name.pretty + " already exists in the database."

type Non_Unique_Primary_Key
## PRIVATE
Indicates that the columns selected for the primary key do not uniquely
identify rows in the table.

Arguments:
- primary_key: The primary key that is not unique.
- clashing_primary_key: The values of an example key that corresponds to
more than one row.
- clashing_example_row_count: The number of rows that correspond to the
example key.
Error (primary_key : Vector Text) (clashing_primary_key : Vector Any) (clashing_example_row_count : Integer)

## PRIVATE
Pretty print the non-unique primary key error.
to_display_text : Text
to_display_text self =
"The primary key " + self.primary_key.to_display_text + " is not unique. The key "+self.clashing_primary_key.to_display_text+" corresponds to "+self.clashing_example_row_count.to_text+" rows."

type Unmatched_Rows
## PRIVATE
Indicates that the `Update` operation encountered input rows that did not
Expand Down Expand Up @@ -202,18 +183,6 @@ type Multiple_Target_Rows_Matched_For_Update
to_display_text self =
"The update operation encountered input rows that matched multiple rows in the target table (for example, the key " + self.example_key.to_display_text + " matched " + self.example_count.to_text + " rows). The operation has been rolled back. You may need to use a more specific key for matching."

type Null_Values_In_Key_Columns
## PRIVATE
Indicates that the source table contained NULL values in key columns.
Rows containing NULL values as part of their key will not be correctly
correlated with target rows due to how NULL equality works in SQL.
Error (example_row : Vector Any)

## PRIVATE
to_display_text : Text
to_display_text self =
"The update operation encountered input rows that contained NULL values in key columns (for example, the row " + self.example_row.to_display_text + "). The operation has been rolled back. Due to how NULL equality works in SQL, these rows would not be correctly matched to the target rows. Please use a key that does not contain NULLs."

type Unsupported_Database_Encoding
## PRIVATE
A warning indicating that the encoding inferred to be used by the
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ from project.Internal.Upload_Table import all
- If the provided primary key columns are not present in the source table,
`Missing_Input_Columns` error is raised.
- If the selected primary key columns are not unique, a
`Non_Unique_Primary_Key` error is raised.
`Non_Unique_Key` error is raised.
- An `SQL_Error` may be reported if there is a failure on the database
side.

Expand All @@ -55,7 +55,7 @@ from project.Internal.Upload_Table import all
More expensive checks, like clashing keys are checked only on the sample of
rows, so errors may still occur when the output action is enabled.
@primary_key Widget_Helpers.make_column_name_vector_selector
Table.select_into_database_table : Connection -> Text -> Vector Text | Nothing -> Boolean -> Problem_Behavior -> Table ! Table_Already_Exists | Inexact_Type_Coercion | Missing_Input_Columns | Non_Unique_Primary_Key | SQL_Error | Illegal_Argument
Table.select_into_database_table : Connection -> Text -> Vector Text | Nothing -> Boolean -> Problem_Behavior -> Table ! Table_Already_Exists | Inexact_Type_Coercion | Missing_Input_Columns | Non_Unique_Key | SQL_Error | Illegal_Argument
Table.select_into_database_table self connection (table_name : Text) primary_key=[self.columns.first.name] temporary=False on_problems=Problem_Behavior.Report_Warning =
select_into_table_implementation self connection table_name primary_key temporary on_problems

Expand Down Expand Up @@ -105,7 +105,7 @@ Table.select_into_database_table self connection (table_name : Text) primary_key
corresponding rows in the target table, a `Unmatched_Rows` error is
raised.
- If the source table contains multiple rows for the same key, a
`Non_Unique_Primary_Key` error is raised.
`Non_Unique_Key` error is raised.
- If a row in the source table matches multiple rows in the target table, a
`Multiple_Target_Rows_Matched_For_Update` error is raised.
- If another database error occurs, an `SQL_Error` is raised.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ from project.Internal.Upload_Table import all
- If the provided primary key columns are not present in the source table,
`Missing_Input_Columns` error is raised.
- If the selected primary key columns are not unique, a
`Non_Unique_Primary_Key` error is raised.
`Non_Unique_Key` error is raised.
- An `SQL_Error` may be reported if there is a failure on the database
side.

Expand All @@ -55,7 +55,7 @@ from project.Internal.Upload_Table import all
More expensive checks, like clashing keys are checked only on the sample of
rows, so errors may still occur when the output action is enabled.
@primary_key Widget_Helpers.make_column_name_vector_selector
Table.select_into_database_table : Connection -> Text -> Vector Text | Nothing -> Boolean -> Problem_Behavior -> Database_Table ! Table_Already_Exists | Inexact_Type_Coercion | Missing_Input_Columns | Non_Unique_Primary_Key | SQL_Error | Illegal_Argument
Table.select_into_database_table : Connection -> Text -> Vector Text | Nothing -> Boolean -> Problem_Behavior -> Database_Table ! Table_Already_Exists | Inexact_Type_Coercion | Missing_Input_Columns | Non_Unique_Key | SQL_Error | Illegal_Argument
Table.select_into_database_table self connection (table_name : Text) primary_key=[self.columns.first.name] temporary=False on_problems=Problem_Behavior.Report_Warning =
select_into_table_implementation self connection table_name primary_key temporary on_problems

Expand Down Expand Up @@ -96,7 +96,7 @@ Table.select_into_database_table self connection (table_name : Text) primary_key
corresponding rows in the target table, a `Unmatched_Rows` error is
raised.
- If the source table contains multiple rows for the same key, a
`Non_Unique_Primary_Key` error is raised.
`Non_Unique_Key` error is raised.
- If a row in the source table matches multiple rows in the target table, a
`Multiple_Target_Rows_Matched_For_Update` error is raised.
- If another database error occurs, an `SQL_Error` is raised.
Expand All @@ -123,7 +123,7 @@ Table.select_into_database_table self connection (table_name : Text) primary_key
Table.update_rows : Database_Table | Table -> Update_Action -> Vector Text | Nothing -> Boolean -> Problem_Behavior -> Database_Table ! Table_Not_Found | Unmatched_Columns | Missing_Input_Columns | Column_Type_Mismatch | SQL_Error | Illegal_Argument
Table.update_rows self (source_table : Database_Table | Table) (update_action : Update_Action = Update_Action.Update_Or_Insert) (key_columns : Vector | Nothing = Nothing) (error_on_missing_columns : Boolean = False) (on_problems : Problem_Behavior = Problem_Behavior.Report_Warning) =
_ = [source_table, update_action, key_columns, error_on_missing_columns, on_problems]
Error.throw (Illegal_Argument.Error "Table.update_rows modifies the underlying table, so it is only supported for Database tables - in-memory tables are immutable.")
Error.throw (Illegal_Argument.Error "Table.update_rows modifies the underlying table, so it is only supported for Database tables - in-memory tables are immutable. Consider using `join` or `lookup_and_replace` for a similar operation that creates a new Table instead.")

## GROUP Standard.Base.Output
Removes rows from a database table.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -174,30 +174,30 @@ resolve_primary_key structure primary_key = case primary_key of
converted into a proper error in an outer layer.

The special handling is needed, because computing the
`Non_Unique_Primary_Key` error may need to perform a SQL query that must be
`Non_Unique_Key` error may need to perform a SQL query that must be
run outside of the just-failed transaction.
internal_translate_known_upload_errors source_table connection primary_key ~action =
handler caught_panic =
error_mapper = connection.dialect.get_error_mapper
sql_error = caught_panic.payload
case error_mapper.is_primary_key_violation sql_error of
True -> Panic.throw (Non_Unique_Primary_Key_Recipe.Recipe source_table primary_key caught_panic)
True -> Panic.throw (Non_Unique_Key_Recipe.Recipe source_table primary_key caught_panic)
False -> Panic.throw caught_panic
Panic.catch SQL_Error action handler

## PRIVATE
handle_upload_errors ~action =
Panic.catch Non_Unique_Primary_Key_Recipe action caught_panic->
Panic.catch Non_Unique_Key_Recipe action caught_panic->
recipe = caught_panic.payload
raise_duplicated_primary_key_error recipe.source_table recipe.primary_key recipe.original_panic

## PRIVATE
type Non_Unique_Primary_Key_Recipe
type Non_Unique_Key_Recipe
## PRIVATE
Recipe source_table primary_key original_panic

## PRIVATE
Creates a `Non_Unique_Primary_Key` error containing information about an
Creates a `Non_Unique_Key` error containing information about an
example group violating the uniqueness constraint.
raise_duplicated_primary_key_error source_table primary_key original_panic =
agg = source_table.aggregate [Aggregate_Column.Count]+(primary_key.map Aggregate_Column.Group_By)
Expand All @@ -213,7 +213,7 @@ raise_duplicated_primary_key_error source_table primary_key original_panic =
row = materialized.first_row.to_vector
example_count = row.first
example_entry = row.drop 1
Error.throw (Non_Unique_Primary_Key.Error primary_key example_entry example_count)
Error.throw (Non_Unique_Key.Error primary_key example_entry example_count)

## PRIVATE
align_structure : Connection | Any -> Database_Table | In_Memory_Table | Vector Column_Description -> Vector Column_Description
Expand Down Expand Up @@ -610,7 +610,7 @@ check_for_null_keys table key_columns ~continuation =
True -> continuation
False ->
example_key = example.first_row.to_vector
Error.throw (Null_Values_In_Key_Columns.Error example_key)
Error.throw (Null_Values_In_Key_Columns.Error example_key add_sql_suffix=True)

## PRIVATE
check_for_null_keys_if_any_keys_set table key_columns ~continuation =
Expand Down
Loading

0 comments on commit 6e0bd86

Please sign in to comment.