Skip to content

Commit

Permalink
Table.cast prototype, some tests
Browse files Browse the repository at this point in the history
  • Loading branch information
radeusgd committed Apr 5, 2023
1 parent 9d882a3 commit 5363b89
Show file tree
Hide file tree
Showing 5 changed files with 87 additions and 7 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -922,6 +922,11 @@ type Column
## UNSTABLE
Cast the column to a specific type.

Arguments:
- value_type: The `Value_Type` to cast the column to.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.

TODO [RW] this is a prototype needed for debugging, proper implementation
and testing will come with #6112.

Expand Down Expand Up @@ -959,11 +964,6 @@ type Column
If the backend does not support the requested target type, the closest
supported type is chosen and a `Inexact_Type_Coercion` problem is
reported.

Arguments:
- value_type: The `Value_Type` to cast the column to.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.
cast : Value_Type -> Problem_Behavior -> Column ! Illegal_Argument | Inexact_Type_Coercion | Lossy_Conversion
cast self value_type=self.value_type on_problems=Problem_Behavior.Report_Warning =
dialect = self.connection.dialect
Expand Down
56 changes: 56 additions & 0 deletions distribution/lib/Standard/Database/0.0.0-dev/src/Data/Table.enso
Original file line number Diff line number Diff line change
Expand Up @@ -1391,6 +1391,62 @@ type Table
msg = "Parsing values is not supported in database tables, the table has to be materialized first with `read`."
Error.throw (Unsupported_Database_Operation.Error msg)

## UNSTABLE
Cast the selected columns to a specific type.

Returns a new table in which the selected columns are replaced with
columns having the new types.

Arguments:
- columns: The selection of columns to cast.
- value_type: The `Value_Type` to cast the column to.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.

TODO [RW] this is a prototype needed for debugging, proper implementation
and testing will come with #6112.

In the Database backend, this will boil down to a CAST operation.
In the in-memory backend, a conversion will be performed according to
the following rules:
- Anything can be cast into the `Mixed` type.
- Converting to a `Char` type, the elements of the column will be
converted to text. If it is fixed length, the texts will be trimmed or
padded on the right with the space character to match the desired
length.
- Conversion between numeric types will replace values exceeding the
range of the target type with `Nothing`.
- Booleans may also be converted to numbers, with `True` being converted
to `1` and `False` to `0`. The reverse is not supported - use `iif`
instead.
- A `Date_Time` may be converted into a `Date` or `Time` type - the
resulting value will be truncated to the desired type.
- If a `Date` is to be converted to `Date_Time`, it will be set at
midnight of the default system timezone.

? Conversion Precision

In the in-memory backend, if the conversion is lossy, a
`Lossy_Conversion` warning will be reported. The only exception is when
truncating a column which is already a text column - as then the
truncation seems like an intended behaviour, so it is not reported. If
truncating needs to occur when converting a non-text column, a warning
will still be reported.

Currently, the warning is not reported for Database backends.

? Inexact Target Type

If the backend does not support the requested target type, the closest
supported type is chosen and a `Inexact_Type_Coercion` problem is
reported.
cast : (Text | Integer | Column_Selector | Vector (Integer | Text | Column_Selector)) -> Value_Type -> Problem_Behavior -> Table ! Illegal_Argument | Inexact_Type_Coercion | Lossy_Conversion
cast self columns=[0] value_type=Value_Type.Char on_problems=Problem_Behavior.Report_Warning =
selected = self.select_columns columns
selected.columns.fold self table-> column_to_cast->
new_column = column_to_cast.cast value_type on_problems
table.set new_column new_name=column_to_cast.name set_mode=Set_Mode.Update

## ALIAS dropna
ALIAS drop_missing_rows
Remove rows which are all blank or containing blank values.
Expand Down
22 changes: 22 additions & 0 deletions test/Table_Tests/src/Common_Table_Operations/Cast_Spec.enso
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,12 @@ spec setup =
c.value_type.is_text . should_be_true
c.to_vector . should_equal ["true", "false", "true"]

Test.specify "should allow to cast a text column to fixed-length" pending=(if setup.test_selection.fixed_length_text_columns.not then "Fixed-length Char columns are not supported by this backend.") <|
t = table_builder [["X", ["a", "DEF", "a slightly longer text"]]]
c = t.at "X" . cast (Value_Type.Char size=3 variable_length=False)
c.value_type . should_equal (Value_Type.Char size=3 variable_length=False)
c.to_vector . should_equal ["a ", "DEF", "a s"]

Test.specify "should work if the first row is NULL" <|
t = table_builder [["X", [Nothing, 1, 2, 3000]], ["Y", [Nothing, True, False, True]]]

Expand Down Expand Up @@ -69,3 +75,19 @@ spec setup =
c4 = c2 + 1000
c4.value_type.is_integer . should_be_true
c4.to_vector . should_equal [Nothing, 1001, 1000, 1001]

Test.group prefix+"Table.cast" pending=(if setup.is_database.not then "Cast is not implemented in the in-memory backend yet.") <|
Test.specify "should cast the columns in-place and not reorder them" <|
t = table_builder [["X", [1, 2, 3000]], ["Y", [4, 5, 6]], ["Z", [7, 8, 9]], ["A", [True, False, True]]]
t2 = t.cast ["Z", "Y"] Value_Type.Char
t2.column_names . should_equal ["X", "Y", "Z", "A"]

t2.at "X" . value_type . is_integer . should_be_true
t2.at "Y" . value_type . is_text . should_be_true
t2.at "Z" . value_type . is_text . should_be_true
t2.at "A" . value_type . is_boolean . should_be_true

t2.at "X" . to_vector . should_equal [1, 2, 3000]
t2.at "Y" . to_vector . should_equal ["4", "5", "6"]
t2.at "Z" . to_vector . should_equal ["7", "8", "9"]
t2.at "A" . to_vector . should_equal [True, False, True]
4 changes: 3 additions & 1 deletion test/Table_Tests/src/Common_Table_Operations/Main.enso
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,9 @@ type Test_Selection
each group. Guaranteed in the in-memory backend, but may not be
supported by all databases.
- date_time: Specifies if the backend supports date/time operations.
Config supports_case_sensitive_columns=True order_by=True natural_ordering=False case_insensitive_ordering=True order_by_unicode_normalization_by_default=False case_insensitive_ascii_only=False take_drop=True allows_mixed_type_comparisons=True supports_unicode_normalization=False is_nan_and_nothing_distinct=True supports_full_join=True distinct_returns_first_row_from_group_if_ordered=True date_time=True
- fixed_length_text_columns: Specifies if the backend supports fixed
length text columns.
Config supports_case_sensitive_columns=True order_by=True natural_ordering=False case_insensitive_ordering=True order_by_unicode_normalization_by_default=False case_insensitive_ascii_only=False take_drop=True allows_mixed_type_comparisons=True supports_unicode_normalization=False is_nan_and_nothing_distinct=True supports_full_join=True distinct_returns_first_row_from_group_if_ordered=True date_time=True fixed_length_text_columns=False

spec setup =
Core_Spec.spec setup
Expand Down
2 changes: 1 addition & 1 deletion test/Table_Tests/src/Database/Postgres_Spec.enso
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ run_tests connection db_name =
Common_Spec.spec prefix connection
postgres_specific_spec connection db_name

common_selection = Common_Table_Operations.Main.Test_Selection.Config supports_case_sensitive_columns=True order_by_unicode_normalization_by_default=True take_drop=False allows_mixed_type_comparisons=False
common_selection = Common_Table_Operations.Main.Test_Selection.Config supports_case_sensitive_columns=True order_by_unicode_normalization_by_default=True take_drop=False allows_mixed_type_comparisons=False fixed_length_text_columns=True
aggregate_selection = Common_Table_Operations.Aggregate_Spec.Test_Selection.Config first_last_row_order=False aggregation_problems=False
agg_in_memory_table = (enso_project.data / "data.csv") . read
agg_table = connection.upload_table (Name_Generator.random_name "Agg1") agg_in_memory_table
Expand Down

0 comments on commit 5363b89

Please sign in to comment.