Skip to content

Commit

Permalink
Implement Table.format similar to Table.parse allowing to format colu…
Browse files Browse the repository at this point in the history
…mns in bulk (#8150)

* doc

* one test

* date tests

* empty and nothing

* ints floats

* bools

* all columns

* regex and index

* locales

* bad formats

* all with one format

* docs

* examples, not impl db

* docs, more errors

* cleanup

* changelog

* check list

* reorder

* clue

* review

* review

* review

* review

* review

* review

* specify time zone
  • Loading branch information
GregoryTravis authored Nov 2, 2023
1 parent 8884852 commit 3c371ad
Show file tree
Hide file tree
Showing 4 changed files with 441 additions and 12 deletions.
8 changes: 5 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -581,15 +581,16 @@
- [Added `Table.expand_column` and improved JSON deserialization.][7859]
- [Implemented `Table.auto_value_types` for in-memory tables.][7908]
- [Implemented Text.substring to easily select part of a Text field][7913]
- [Implemented new selector for when parameter in `filter_blank_rows`,
`select_blank_columns`, `remove_blank_columns`][7935]
- [Implemented basic XML support][7947]
- [Implemented `Table.lookup_and_replace` for the in-memory backend.][7979]
- [Added `Column_Operation` to `Table.set` allowing for more streamlined flow of
deriving column values in the GUI.][8005]
- [Implemented `Table.expand_to_rows` for the in-memory backend.][8029]
- [Added XML support for `.to Table` and `.expand_column`.][8083]
- [Added `Previous_Value` option to `fill_nothing` and `fill_empty`.][8105]
- [Implemented new selector for when parameter in `filter_blank_rows`,
`select_blank_columns`, `remove_blank_columns`][7935]
- [Added `Table.format` for the in-memory backend.][8150]

[debug-shortcuts]:
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
Expand Down Expand Up @@ -835,13 +836,14 @@
[7859]: https://github.com/enso-org/enso/pull/7859
[7908]: https://github.com/enso-org/enso/pull/7908
[7913]: https://github.com/enso-org/enso/pull/7913
[7935]: https://github.com/enso-org/enso/pull/7935
[7947]: https://github.com/enso-org/enso/pull/7947
[7979]: https://github.com/enso-org/enso/pull/7979
[8005]: https://github.com/enso-org/enso/pull/8005
[8029]: https://github.com/enso-org/enso/pull/8029
[8083]: https://github.com/enso-org/enso/pull/8083
[8105]: https://github.com/enso-org/enso/pull/8105
[7935]: https://github.com/enso-org/enso/pull/7935
[8150]: https://github.com/enso-org/enso/pull/8150

#### Enso Compiler

Expand Down
90 changes: 86 additions & 4 deletions distribution/lib/Standard/Database/0.0.0-dev/src/Data/Table.enso
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from Standard.Base import all
import Standard.Base.Data.Array_Proxy.Array_Proxy
import Standard.Base.Data.Filter_Condition as Filter_Condition_Module
import Standard.Base.Data.Time.Errors.Date_Time_Format_Parse_Error
import Standard.Base.Errors.Common.Incomparable_Values
import Standard.Base.Errors.Common.Index_Out_Of_Bounds
import Standard.Base.Errors.Common.Type_Error
Expand Down Expand Up @@ -1852,16 +1853,18 @@ type Table
`Nothing` is provided, the default formatting settings of the backend
will be used. `Nothing` is currently the only setting accepted by the
Database backends.
- error_on_missing_columns: if `True` (the default) raises an error if
any column is missing. Otherwise, reported as a problem.
- error_on_missing_columns: Specifies if a missing input column should
result in an error regardless of the `on_problems` settings. Defaults
to `True`.
- on_problems: Specifies how to handle if a problem occurs, raising as a
warning by default.

! Error Conditions

- If a column in `columns` is not in the input table, a
`Missing_Input_Columns` is raised as an error or problem
following the `error_on_missing_columns` rules.
`Missing_Input_Columns` is raised as an error, unless
`error_on_missing_columns` is set to `False`, in which case the
problem is reported according to the `on_problems` setting.
- If a column selected for parsing is not a text column, an
`Invalid_Value_Type` error is raised.
- If no columns have been selected for parsing,
Expand All @@ -1883,6 +1886,85 @@ type Table
new_column = column_to_parse.parse type format on_problems
table.set new_column new_name=column_to_parse.name set_mode=Set_Mode.Update

## GROUP Standard.Base.Conversions
Formats `Column`s within a `Table` using a format string,
`Date_Time_Formatter`, or `Column` of format strings.

Arguments:
- columns: The columns to format. The columns can have different types,
but all columns must be compatible with any provided `format` value.
- format: The type-dependent format string to use to format the values.
If `format` is `""` or `Nothing`, .to_text is used to format the value.
In case of date/time columns, the format can also be a
`Date_Time_Formatter`. If `format` is a `Column`, it must be a text
column.
- locale: The locale in which the format should be interpreted.
If a `Date_Time_Formatter` is provided for `format` and the `locale` is
set to anything else than `Locale.default`, then that locale will
override the formatters locale.
- error_on_missing_columns: Specifies if a missing input column should
result in an error regardless of the `on_problems` settings. Defaults
to `True`.
- on_problems: Specifies how to handle if a problem occurs, raising as a
warning by default.

! Error Conditions

- If a column in `columns` is not in the input table, a
`Missing_Input_Columns` is raised as an error, unless
`error_on_missing_columns` is set to `False`, in which case the
problem is reported according to the `on_problems` setting.
- If a provided `format` value is not compatible with all selected
columns, an Illegal_Argument error will be thrown, or a
Date_Time_Format_Parse_Error in the case of a badly-formed date/time
format.
- If no columns have been selected for formatting, a
`No_Input_Columns_Selected` error is raised.

? Supported Types
- `Value_Type.Date`
- `Value_Type.Date_Time`
- `Value_Type.Time`
- `Value_Type.Integer`
- `Value_Type.Float`
- `Value_Type.Boolean`

? `Value_Type.Date`, `Value_Type.Date_Time`, `Value_Type.Time` format strings

See `Date_Time_Formatter` for more details.

? `Value_Type.Integer`, `Value_Type.Float` format strings

Numeric format strings are specified by the Java DecimalFormat class.
See https://docs.oracle.com/javase/8/docs/api/java/text/DecimalFormat.html
for a complete format specification.

? `Value_Type.Boolean` format strings

Format strings for `Boolean` consist of two values that represent true
and false, separated by a `|`.

> Example
Format the first and last boolean columns as 'Yes'/'No'.

table.format columns=[0, -1] format="Yes|No"

> Example
Format dates in a column using the format `yyyyMMdd`.

table.format "birthday" "yyyyMMdd"

> Example
Format all columns in the table using the default formatter.

table.format
@columns Widget_Helpers.make_column_name_vector_selector
@locale Locale.default_widget
format : Vector (Text | Integer | Regex) | Text | Integer | Regex -> Text | Date_Time_Formatter | Column | Nothing -> Locale -> Boolean -> Problem_Behavior -> Table ! Date_Time_Format_Parse_Error | Illegal_Argument
format self columns format:(Text | Date_Time_Formatter | Column | Nothing)=Nothing locale=Locale.default error_on_missing_columns=True on_problems=Report_Warning =
_ = [columns, format, locale, error_on_missing_columns, on_problems]
Error.throw (Unsupported_Database_Operation.Error "Table.format is not implemented yet for the Database backends.")

## GROUP Standard.Base.Conversions
Splits a column of text into a set of new columns.
The original column will be removed from the table.
Expand Down
105 changes: 100 additions & 5 deletions distribution/lib/Standard/Table/0.0.0-dev/src/Data/Table.enso
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ from Standard.Base import all
import Standard.Base.Data.Array_Proxy.Array_Proxy
import Standard.Base.Data.Filter_Condition as Filter_Condition_Module
import Standard.Base.Data.Index_Sub_Range as Index_Sub_Range_Module
import Standard.Base.Data.Time.Errors.Date_Time_Format_Parse_Error
import Standard.Base.Errors.Common.Incomparable_Values
import Standard.Base.Errors.Common.Index_Out_Of_Bounds
import Standard.Base.Errors.Common.Out_Of_Memory
Expand Down Expand Up @@ -842,7 +843,7 @@ type Table
Table.Value java_table

## GROUP Standard.Base.Conversions
Parses columns within a Table to a specific value type.
Parses columns within a `Table` to a specific value type.
By default, it looks at all `Text` columns and attempts to deduce the
type (columns with other types are not affected).

Expand All @@ -868,16 +869,18 @@ type Table
`Nothing` is provided, the default formatting settings of the backend
will be used. `Nothing` is currently the only setting accepted by the
Database backends.
- error_on_missing_columns: if `True` (the default) raises an error if
any column is missing. Otherwise, reported as a problem.
- error_on_missing_columns: Specifies if a missing input column should
result in an error regardless of the `on_problems` settings. Defaults
to `True`.
- on_problems: Specifies how to handle if a problem occurs, raising as a
warning by default.

! Error Conditions

- If a column in `columns` is not in the input table, a
`Missing_Input_Columns` is raised as an error or problem
following the `error_on_missing_columns` rules.
`Missing_Input_Columns` is raised as an error, unless
`error_on_missing_columns` is set to `False`, in which case the
problem is reported according to the `on_problems` setting.
- If a column selected for parsing is not a text column, an
`Invalid_Value_Type` error is raised.
- If no columns have been selected for parsing,
Expand Down Expand Up @@ -962,6 +965,98 @@ type Table
Column.Value (Java_Column.new column.name new_storage)
Table.new new_columns

## GROUP Standard.Base.Conversions
Formats `Column`s within a `Table` using a format string,
`Date_Time_Formatter`, or `Column` of format strings.

Arguments:
- columns: The columns to format. The columns can have different types,
but all columns must be compatible with any provided `format` value.
- format: The type-dependent format string to use to format the values.
If `format` is `""` or `Nothing`, .to_text is used to format the value.
In case of date/time columns, the format can also be a
`Date_Time_Formatter`. If `format` is a `Column`, it must be a text
column.
- locale: The locale in which the format should be interpreted.
If a `Date_Time_Formatter` is provided for `format` and the `locale` is
set to anything else than `Locale.default`, then that locale will
override the formatters locale.
- error_on_missing_columns: Specifies if a missing input column should
result in an error regardless of the `on_problems` settings. Defaults
to `True`.
- on_problems: Specifies how to handle if a problem occurs, raising as a
warning by default.

! Error Conditions

- If a column in `columns` is not in the input table, a
`Missing_Input_Columns` is raised as an error, unless
`error_on_missing_columns` is set to `False`, in which case the
problem is reported according to the `on_problems` setting.
- If a provided `format` value is not compatible with all selected
columns, an Illegal_Argument error will be thrown, or a
Date_Time_Format_Parse_Error in the case of a badly-formed date/time
format.
- If no columns have been selected for formatting, a
`No_Input_Columns_Selected` error is raised.

? Supported Types
- `Value_Type.Date`
- `Value_Type.Date_Time`
- `Value_Type.Time`
- `Value_Type.Integer`
- `Value_Type.Float`
- `Value_Type.Boolean`

? `Value_Type.Date`, `Value_Type.Date_Time`, `Value_Type.Time` format strings

See `Date_Time_Formatter` for more details.

? `Value_Type.Integer`, `Value_Type.Float` format strings

Numeric format strings are specified by the Java DecimalFormat class.
See https://docs.oracle.com/javase/8/docs/api/java/text/DecimalFormat.html
for a complete format specification.

? `Value_Type.Boolean` format strings

Format strings for `Boolean` consist of two values that represent true
and false, separated by a `|`.

> Example
Format the first and last boolean columns as 'Yes'/'No'.

table.format columns=[0, -1] format="Yes|No"

> Example
Format dates in a column using the format `yyyyMMdd`.

table.format "birthday" "yyyyMMdd"

> Example
Format all columns in the table using the default formatter.

table.format
@columns Widget_Helpers.make_column_name_vector_selector
@locale Locale.default_widget
format : Vector (Text | Integer | Regex) | Text | Integer | Regex -> Text | Date_Time_Formatter | Column | Nothing -> Locale -> Boolean -> Problem_Behavior -> Table ! Date_Time_Format_Parse_Error | Illegal_Argument
format self columns format:(Text | Date_Time_Formatter | Column | Nothing)=Nothing locale=Locale.default error_on_missing_columns=True on_problems=Report_Warning =
select_problem_builder = Problem_Builder.new error_on_missing_columns=error_on_missing_columns
selected_columns = self.columns_helper.select_columns_helper columns Case_Sensitivity.Default True select_problem_builder
select_problem_builder.attach_problems_before on_problems <|
selected_column_names = case selected_columns.is_empty of
True ->
no_columns_problem_behavior = case error_on_missing_columns of
True -> Problem_Behavior.Report_Error
False -> on_problems
no_columns_problem_behavior.attach_problem_before No_Input_Columns_Selected Map.empty
False ->
Map.from_vector <| selected_columns.map c-> [c.name, True]

new_columns = self.columns.map column-> if selected_column_names.contains_key column.name . not then column else
column.format format locale
Table.new new_columns

## GROUP Standard.Base.Conversions
Cast the selected columns to a specific type.

Expand Down
Loading

0 comments on commit 3c371ad

Please sign in to comment.