Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data analysts should be able to transform a Table using the rename_columns functions #3249

Merged
merged 16 commits into from
Feb 11, 2022
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@
operations.][3240]
- [Implemented the `Table.sort_columns` operation.][3250]
- [Fixed `Vector.sort` to handle tail-recursive comparators][3256]
- [Implemented `Range.find`, `Table.rename_columns` and
`Table.use_first_row_as_names` operations][3249]

[3153]: https://github.com/enso-org/enso/pull/3153
[3166]: https://github.com/enso-org/enso/pull/3166
Expand All @@ -47,6 +49,7 @@
[3240]: https://github.com/enso-org/enso/pull/3240
[3250]: https://github.com/enso-org/enso/pull/3250
[3256]: https://github.com/enso-org/enso/pull/3256
[3249]: https://github.com/enso-org/enso/pull/3249

#### Enso Compiler

Expand Down
27 changes: 21 additions & 6 deletions distribution/lib/Standard/Base/0.0.0-dev/src/Data/Range.enso
Original file line number Diff line number Diff line change
Expand Up @@ -125,11 +125,7 @@ type Range

1.up_to 100 . exists (> 10)
exists : (Number -> Boolean) -> Boolean
exists predicate =
limit = this.end
go n found = if found || (n >= limit) then found else
@Tail_Call go n+1 (predicate n)
go this.start False
exists predicate = this.find predicate . is_nothing . not

## Checks whether `predicate` is satisfied for any number in this range.

Expand All @@ -145,6 +141,26 @@ type Range
any : (Number -> Boolean) -> Boolean
any predicate = this.exists predicate

## Gets the first index when `predicate` is satisfied this range.
If no index satisfies the predicate, return Nothing

Arguments:
- predicate: A function that takes a list element and returns a boolean
value that says whether that value satisfies the conditions of the
function.

> Example
Get the first number in the range divisible by 2, 3 and 5.

1.up_to 100 . find i->(i%2==0 && i%3==0 && i%5==0)
find : (Integer -> Boolean) -> Integer | Nothing
find predicate =
limit = this.end
go n = if (n >= limit) then Nothing else
if (predicate n) then n else
@Tail_Call go n+1
go this.start

## Converts the range to a vector containing the numbers in the range.

> Example
Expand All @@ -155,4 +171,3 @@ type Range
to_vector =
length = Math.max 0 (this.end - this.start)
Vector.new length (i -> i + this.start)

Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ from Standard.Table.Data.Order_Rule as Order_Rule_Module import Order_Rule
from Standard.Table.Data.Column_Selector as Column_Selector_Module import Column_Selector, By_Index
from Standard.Table.Data.Sort_Method as Sort_Method_Module import Sort_Method
from Standard.Base.Error.Problem_Behavior as Problem_Behavior_Module import Problem_Behavior, Report_Warning
import Standard.Table.Data.Column_Mapping
import Standard.Table.Data.Position
import Standard.Base.Error.Warnings

Expand Down Expand Up @@ -279,6 +280,41 @@ type Table
new_columns = Table_Helpers.sort_columns internal_columns=this.internal_columns sort_method
this.updated_columns new_columns

## Returns a new table with the columns renamed based on either a mapping
from the old name to the new or a positional list of new names.

Arguments:
- column_map: Mapping from old column names to new.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.

The following problems can occur:
- If a column in columns is not in the input table, a
`Missing_Input_Columns`.
- If duplicate columns, names or indices are provided, a
`Duplicate_Column_Selectors`.
- If a column index is out of range, a `Column_Indexes_Out_Of_Range`.
- If two distinct indices would refer to the same column, a
`Input_Indices_Already_Matched`, indicating that the additional
indices will not introduce additional columns.
- If any of the new names are invalid, an
`Invalid_Output_Column_Names`.
- If any of the new names clash either with existing names or each
other, a Duplicate_Output_Column_Names.
- warnings: A `Warning_System` instance specifying how to handle
warnings. This is a temporary workaround to allow for testing the
warning mechanism. Once the proper warning system is implemented, this
argument will become obsolete and will be removed. No user code should
use this argument, as it will be removed in the future.

> Example
rename_columns : Column_Mapping -> Problem_Behavior -> Warnings.Warning_System -> Table
rename_columns (column_map=(Column_Mapping.By_Position ["Column"])) (on_problems=Report_Warning) (warnings=Warnings.default) =
new_names = Table_Helpers.rename_columns internal_columns=this.internal_columns mapping=column_map on_problems=on_problems warnings=warnings
if new_names.is_error then new_names else
new_columns = this.internal_columns.map_with_index i->c->(c.rename (new_names.at i))
this.updated_columns new_columns

## PRIVATE

Resolves the column name to a column within this table.
Expand Down Expand Up @@ -590,9 +626,9 @@ type Table
right_new_columns_names = new_names.second

# Rename columns to the newly allocated names
new_index = here.rename_columns left_new_meta_index left_new_meta_index_names
left_renamed_columns = here.rename_columns left_new_columns left_new_columns_names
right_renamed_columns = here.rename_columns right_new_columns right_new_columns_names
new_index = here.internal_rename_columns left_new_meta_index left_new_meta_index_names
left_renamed_columns = here.internal_rename_columns left_new_columns left_new_columns_names
right_renamed_columns = here.internal_rename_columns right_new_columns right_new_columns_names
new_columns = left_renamed_columns + right_renamed_columns

on_exprs = left_new_join_index.zip right_new_join_index l-> r->
Expand Down Expand Up @@ -993,8 +1029,8 @@ fresh_names used_names preferred_names =
Arguments:
- columns: A vector of columns to rename.
- new_names: The new names for the columns.
rename_columns : Vector Internal_Column -> Vector Text -> Vector Internal_Column
rename_columns columns new_names =
internal_rename_columns : Vector Internal_Column -> Vector Text -> Vector Internal_Column
internal_rename_columns columns new_names =
columns.zip new_names col-> name->
col.rename name

Expand All @@ -1012,5 +1048,5 @@ rename_columns columns new_names =
freshen_columns : Vector Text -> Vector Internal_Column -> Vector Internal_Column
freshen_columns used_names columns =
fresh_names = here.fresh_names used_names (columns.map .name)
here.rename_columns columns fresh_names
here.internal_rename_columns columns fresh_names

Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
from Standard.Base import all

from Standard.Table.Data.Matching import Matching_Strategy, Exact

## Specifies a selection of columns from the table and the new name for them to
become.
type Column_Mapping

## Selects columns based on their names.

The `matching_strategy` can be used to specify if the names should be
matched exactly or should be treated as regular expressions. It also
allows to specify if the matching should be case-sensitive.
type By_Name (names : Map Text Text) (matching_strategy : Matching_Strategy = Exact True)

## Selects columns by their index.

The index of the first column in the table is 0. If the provided index is
negative, it counts from the end of the table (e.g. -1 refers to the last
column in the table).
type By_Index (indexes : Map Number Text)

## Selects columns having exactly the same names as the columns provided in
the input.

The input columns do not necessarily have to come from the same table, so
this approach can be used to match columns with the same names as a set
of columns of some other table, for example, when preparing for a join.

The Vector should be of the form [[Column, Name], [Column1, Name1], ...]
type By_Column (columns : Vector)

## Selects columns by position starting at the first column until the
new_names is exhausted.
type By_Position (new_names : Vector Text)

## UNSTABLE
A temporary workaround to allow the By_Name constructor to work with default arguments.
By_Name.new : Map Text Text -> Matching_Strategy -> By_Name
By_Name.new names (matching_strategy = Exact.new) = By_Name names matching_strategy
49 changes: 30 additions & 19 deletions distribution/lib/Standard/Table/0.0.0-dev/src/Data/Matching.enso
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,31 @@ from Standard.Base.Error.Warnings import Warning_System

## Strategy for matching names.
type Matching_Strategy
## UNSTABLE
Exact name matching.
## UNSTABLE
Exact name matching.

A name is matched if its exact name is provided.
type Exact (case_sensitivity : (True | Case_Insensitive) = True)
A name is matched if its exact name is provided.
type Exact (case_sensitivity : (True | Case_Insensitive) = True)

## UNSTABLE
Regex-based name matching.
## UNSTABLE
Regex-based name matching.

A name is matched if its name matches the provided regular expression.
type Regex (case_sensitivity : (True | Case_Insensitive) = True)

## ADVANCED
Compiles the regular expression following the Matching_Strategy rules.
compile : Text -> Regex_Module.Pattern
compile criterion =
case this of
Regex _ ->
insensitive = case this.case_sensitivity of
True -> False
Case_Insensitive -> True
re = Regex_Module.compile criterion case_insensitive=insensitive
re
Exact _ -> Error.throw "Invalid Matching_Strategy to compile"

A name is matched if its name matches the provided regular expression.
type Regex (case_sensitivity : (True | Case_Insensitive) = True)

## UNSTABLE
A temporary workaround to allow the `Exact` constructor to work with default
Expand All @@ -30,6 +44,7 @@ type Matching_Strategy
Exact.new : (True | Case_Insensitive) -> Exact
Exact.new (case_sensitivity = True) = Exact case_sensitivity


## UNSTABLE
A temporary workaround to allow the `Regex` constructor to work with default
arguments.
Expand All @@ -41,6 +56,7 @@ Exact.new (case_sensitivity = True) = Exact case_sensitivity
Regex.new : (True | Case_Insensitive) -> Regex
Regex.new (case_sensitivity = True) = Regex case_sensitivity


## UNSTABLE
Specifies that the operation should ignore case.

Expand Down Expand Up @@ -167,14 +183,9 @@ match_criteria objects criteria reorder=False name_mapper=(x->x) matching_strate
Matching.match_single_criterion "Foobar" "f.*" (Regex case_sensitivity=Case_Insensitive) == True
match_single_criterion : Text -> Text -> Matching_Strategy -> Boolean
match_single_criterion name criterion matching_strategy = case matching_strategy of
Exact case_sensitivity -> case case_sensitivity of
True ->
name == criterion
Case_Insensitive ->
name.equals_ignore_case criterion
Regex case_sensitivity ->
insensitive = case case_sensitivity of
True -> False
Case_Insensitive -> True
re = Regex_Module.compile criterion case_insensitive=insensitive
re.matches name
Exact case_sensitivity ->
case case_sensitivity of
True -> name == criterion
Case_Insensitive -> name.equals_ignore_case criterion
Regex _ ->
matching_strategy.compile criterion . matches name
80 changes: 80 additions & 0 deletions distribution/lib/Standard/Table/0.0.0-dev/src/Data/Table.enso
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ from Standard.Table.Data.Order_Rule as Order_Rule_Module import Order_Rule
from Standard.Table.Data.Column_Selector as Column_Selector_Module import Column_Selector, By_Index
from Standard.Table.Data.Sort_Method as Sort_Method_Module import Sort_Method
from Standard.Base.Error.Problem_Behavior as Problem_Behavior_Module import Problem_Behavior, Report_Warning
import Standard.Table.Data.Column_Mapping
import Standard.Table.Data.Position
import Standard.Base.Error.Warnings

Expand Down Expand Up @@ -434,6 +435,85 @@ type Table
new_columns = Table_Helpers.sort_columns internal_columns=this.columns sort_method
here.new new_columns

## Returns a new table with the columns renamed based on either a mapping
from the old name to the new or a positional list of new names.

Arguments:
- column_map: Mapping from old column names to new.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.

The following problems can occur:
- If a column in columns is not in the input table, a
`Missing_Input_Columns`.
- If duplicate columns, names or indices are provided, a
`Duplicate_Column_Selectors`.
- If a column index is out of range, a `Column_Indexes_Out_Of_Range`.
- If two distinct indices would refer to the same column, a
`Input_Indices_Already_Matched`, indicating that the additional
indices will not introduce additional columns.
- If any of the new names are invalid, an
`Invalid_Output_Column_Names`.
- If any of the new names clash either with existing names or each
other, a Duplicate_Output_Column_Names.
- warnings: A `Warning_System` instance specifying how to handle
warnings. This is a temporary workaround to allow for testing the
warning mechanism. Once the proper warning system is implemented, this
argument will become obsolete and will be removed. No user code should
use this argument, as it will be removed in the future.

> Example
Rename the first column to "FirstColumn"

table.rename_columns (Column_Mapping.By_Position ["FirstColumn"])
rename_columns : Column_Mapping -> Problem_Behavior -> Warnings.Warning_System -> Table
rename_columns (column_map=(Column_Mapping.By_Position ["Column"])) (on_problems=Report_Warning) (warnings=Warnings.default) =
new_names = Table_Helpers.rename_columns internal_columns=this.columns mapping=column_map on_problems=on_problems warnings=warnings
if new_names.is_error then new_names else
new_columns = this.columns.map_with_index i->c->(c.rename (new_names.at i))
here.new new_columns

## Returns a new table with the columns renamed based on entries in the
first row.

Arguments:
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.

The following problems can occur:
- If a column in columns is not in the input table, a
`Missing_Input_Columns`.
- If duplicate columns, names or indices are provided, a
`Duplicate_Column_Selectors`.
- If a column index is out of range, a `Column_Indexes_Out_Of_Range`.
- If two distinct indices would refer to the same column, a
`Input_Indices_Already_Matched`, indicating that the additional
indices will not introduce additional columns.
- If any of the new names are invalid, an
`Invalid_Output_Column_Names`.
- If any of the new names clash either with existing names or each
other, a Duplicate_Output_Column_Names.
- warnings: A `Warning_System` instance specifying how to handle
warnings. This is a temporary workaround to allow for testing the
warning mechanism. Once the proper warning system is implemented, this
argument will become obsolete and will be removed. No user code should
use this argument, as it will be removed in the future.

> Example
Rename the column based on the first row

table.use_first_row_as_names
use_first_row_as_names : Problem_Behavior -> Warnings.Warning_System -> Table
use_first_row_as_names (on_problems=Report_Warning) (warnings=Warnings.default) =
mapper = col->
val = col.at 0
case val of
Text -> val
Nothing -> Nothing
_ -> val.to_text
new_names = this.columns.map mapper
this.take_end (this.length - 1) . rename_columns (Column_Mapping.By_Position new_names) on_problems=on_problems warnings=warnings

## ALIAS Filter Rows
ALIAS Mask Columns

Expand Down
14 changes: 14 additions & 0 deletions distribution/lib/Standard/Table/0.0.0-dev/src/Error.enso
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,26 @@ Column_Indexes_Out_Of_Range.to_display_text = case this.indexes.length == 1 of
Can occur when using By_Position.
type Too_Many_Column_Names_Provided (column_names : [Text])

Too_Many_Column_Names_Provided.to_display_text : Text
Too_Many_Column_Names_Provided.to_display_text =
"Too many column names provided. " + (this.column_names.at 0).to_text + " unused."

## One or more column names were invalid during a rename operation.
type Invalid_Output_Column_Names (column_names : [Text])

Invalid_Output_Column_Names.to_display_text : Text
Invalid_Output_Column_Names.to_display_text = case this.column_names.length == 1 of
True -> "The name " + (this.column_names.at 0).to_text + " is invalid."
False -> "The names "+this.column_names.short_display_text+" are invalid."

## One or more column names clashed during a rename operation.
type Duplicate_Output_Column_Names (column_names : [Text])

Duplicate_Output_Column_Names.to_display_text : Text
Duplicate_Output_Column_Names.to_display_text = case this.column_names.length == 1 of
True -> "The name " + (this.column_names.at 0).to_text + " was repeated in the output, so was renamed."
False -> "The names "+this.column_names.short_display_text+" were repeated in the output, and were renamed."

## No columns in the output result.
type No_Output_Columns

Expand Down
Loading