Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data analysts should be able to transform a Table using the select_columns function #3230

Merged
merged 28 commits into from
Feb 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions app/gui/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
`Vector.filter_with_index`. Made `Vector.at` accept negative indices and
ensured it fails with a dataflow error on out of bounds access instead of an
internal Java exception.][3232]
- [Implemented the `Table.select_columns` operation.][3230]

[3153]: https://github.com/enso-org/enso/pull/3153
[3166]: https://github.com/enso-org/enso/pull/3166
Expand All @@ -38,6 +39,7 @@
[3229]: https://github.com/enso-org/enso/pull/3229
[3231]: https://github.com/enso-org/enso/pull/3231
[3232]: https://github.com/enso-org/enso/pull/3232
[3230]: https://github.com/enso-org/enso/pull/3230

# Enso 2.0.0-alpha.18 (2021-10-12)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,7 @@ type Pattern
Panic.throw Invalid_Bounds_Error
_ -> do_match_mode mode 0 input.length

## ADVANDED
## ADVANCED

Returns `True` if the input matches against the pattern described by
`this`, otherwise `False`.
Expand All @@ -327,7 +327,7 @@ type Pattern
input = "aa"
pattern.matches input
matches : Text -> Boolean
matches input = case this.match input mode=Mode.First of
matches input = case this.match input mode=Mode.Full of
Match _ _ _ -> True
Vector.Vector _ -> True
_ -> False
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,42 +16,131 @@ type Problem_Behavior
Report the problem as a dataflow error and abort the operation
type Report_Error

## UNSTABLE
Attaches an error-value to the given value according to the expected problem
behavior.

If the problem behavior is set to Ignore, the value is returned as-is.
If it is set to Report_Warning, the value is returned with the error-value
attached as a warning.
If it is set to Report_Error, the error-value is returned in the form of a
dataflow error.

TODO: the Warning_System argument is temporary, as the warning system is
mocked until the real implementation is shipped. It will be removed soon.
attach_as_needed : Any -> Problem_Behavior -> Vector -> Warning_System -> Any
attach_as_needed decorated_value problem_behavior ~payload warnings=Warnings.default =
case problem_behavior of
## ADVANCED
UNSTABLE
Attaches a problem to the given value according to the expected problem
behavior.

If the problem behavior is set to Ignore, the value is returned as-is.
If it is set to Report_Warning, the value is returned with the problem
attached as a warning, after any warnings that were already attached to
this value.
If it is set to Report_Error, the problem is returned in the form of a
dataflow error. If the value already contained any dataflow error, that
error takes precedence.

TODO [RW] the Warning_System argument is temporary, as the warning system
is mocked until the real implementation is shipped. It will be removed
soon. See: https://www.pivotaltracker.com/story/show/180901472
attach_problem_after : Any -> Any -> Warning_System -> Any
attach_problem_after decorated_value ~problem warnings = case this of
Ignore ->
decorated_value
Report_Warning ->
warnings.attach decorated_value payload
warnings.attach decorated_value problem
Report_Error ->
case decorated_value of
_ -> Error.throw payload

## UNSTABLE
Attaches issues to the given value according to the expected problem
behavior.

If the problem behavior is set to Ignore, the value is returned as-is.
If it is set to Report_Warning, the value is returned with the issues
attached as warnings.
If it is set to Report_Error, the first issue is returned in the form of a
dataflow error.

TODO: the Warning_System argument is temporary, as the warning system is
mocked until the real implementation is shipped. It will be removed soon.
attach_issues_as_needed : Any -> Problem_Behavior -> Vector -> Warning_System -> Any
attach_issues_as_needed decorated_value problem_behavior issues warnings=Warnings.default =
issues.fold decorated_value value-> issue->
here.attach_as_needed value problem_behavior issue warnings=warnings
_ -> Error.throw problem

## ADVANCED
UNSTABLE
Attaches a problem to the given value according to the expected problem
behavior.

If the problem behavior is set to Ignore, the value is returned as-is.
If it is set to Report_Warning, the value is returned with the problem
attached as a warning, before any warnings that were already attached to
this value.

TODO [RW] attaching before a warning is not supported in the mock warning
system, it can only be added once the full Warning system is implemented.
See: https://www.pivotaltracker.com/story/show/180901472

If it is set to Report_Error, the problem is returned in the form of
a dataflow error. The problem takes precedence over any errors that may
have been contained in the value - in this case the `decorated_value` is
not computed at all.

TODO [RW] the Warning_System argument is temporary, as the warning system
is mocked until the real implementation is shipped. It will be removed
soon. See: https://www.pivotaltracker.com/story/show/180901472
attach_problem_before : Any -> Any -> Warning_System -> Any
attach_problem_before problem warnings ~decorated_value = case this of
Ignore ->
decorated_value
Report_Warning ->
# TODO [RW] attach before; see comment above.
warnings.attach decorated_value problem
Report_Error ->
Error.throw problem

## ADVANCED
UNSTABLE
Attaches problems to the given value according to the expected problem
behavior.

If the problem behavior is set to Ignore, the value is returned as-is.
If it is set to Report_Warning, the value is returned with the problems
attached as warnings, before any warnings that were already attached to
this value.
If it is set to Report_Error, the first problem is returned in the form
of a dataflow error. The problem takes precedence over any errors that
may have been contained in the value - in this case the `decorated_value`
is not computed at all.

TODO [RW] the Warning_System argument is temporary, as the warning system
is mocked until the real implementation is shipped. It will be removed
soon. See: https://www.pivotaltracker.com/story/show/180901472

> Example
Perform pre-flight checks and then compute the actual result only if needed.

problems = preflight_checks
problem_behavior.attach_problems_before problems Warnings.default <|
expensive_computation

attach_problems_before : Vector -> Warning_System -> Any -> Any
attach_problems_before problems warnings ~decorated_value = case this of
Ignore ->
decorated_value
Report_Warning ->
# TODO [RW] attach before; see comment above.
warnings.attach_many decorated_value problems
Report_Error ->
if problems.is_empty then decorated_value else
Error.throw problems.first

## ADVANCED
UNSTABLE
Attaches problems to the given value according to the expected problem
behavior.

If the problem behavior is set to Ignore, the value is returned as-is.
If it is set to Report_Warning, the value is returned with the problems
attached as warnings, after any warnings that were already attached to
this value.
If it is set to Report_Error, the first problem is returned in the form
of a dataflow error. If the value already contained any dataflow error,
that error takes precedence.

TODO [RW] the Warning_System argument is temporary, as the warning system
is mocked until the real implementation is shipped. It will be removed
soon. See: https://www.pivotaltracker.com/story/show/180901472

> Example
First compute a result and then, only if the computation has succeeded,
perform any postprocessing checks which may raise warnings/errors.

result = compute_result
# TODO [RW] the underscore will be able to be removed once the `warnings` argument is deprecated, see above.
problem_behavior.attach_problems_after result _ Warnings.default <|
perform_post_process_checks_and_return_problems
attach_problems_after : Any -> Vector -> Warning_System -> Any
attach_problems_after decorated_value ~problems warnings = case this of
Ignore ->
decorated_value
Report_Warning ->
warnings.attach_many decorated_value problems
Report_Error -> case decorated_value of
_ -> if problems.is_empty then decorated_value else
Error.throw problems.first
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ from Standard.Base import all
A placeholder for reporting warnings. It should be replaced once the warning
mechanism is designed and implemented.
type Warning_System
type Warning_System (warning_callback : Any -> Nothing)
type Warning_System (warning_callback : Any -> Nothing) (mapping : Any -> Any)

## UNSTABLE
Attaches a warning to a value.
Expand All @@ -17,10 +17,69 @@ type Warning_System
_ ->
case warning_payload of
_ ->
this.warning_callback warning_payload
this.warning_callback <| this.mapping warning_payload
decorated_value

## UNSTABLE
Attaches multiple warnings to a value.

If the warning argument holds a dataflow error, the error is also
inherited by the decorated value.
attach_many : Any -> Vector -> Any
attach_many decorated_value warnings =
warnings.fold decorated_value acc-> warning->
this.attach acc warning

## PRIVATE
with_mapping : (Any -> Any) -> Warning_System
with_mapping new_mapping =
Warning_System this.warning_callback (new_mapping << this.mapping)

## PRIVATE
The default implementation of a warning system mock.

To be removed once warnings are implemented.
default : Warning_System
default = Warning_System warning->
IO.println "[WARNING] "+warning.to_display_text
default =
callback = warning->
IO.println "[WARNING] "+warning.to_display_text
Warning_System callback (x->x)

## UNSTABLE
Maps warnings attached to a value.

Currently it is not implemented as the warning system is missing. It just
returns the original value without changes.
map_attached_warnings : (Any -> Any) -> Warning_System -> (Warning_System -> Any) -> Any
map_attached_warnings mapper warnings callback =
new_warnings = warnings.with_mapping mapper
result = callback new_warnings
result

## UNSTABLE
An utility function which applies the mapping function both to any attached
warnings and dataflow errors.

The Warning_System has to be passed through to be able to correctly map the
warnings.

Once the proper Warning system is implemented, the new signature will be:
(Any -> Any) -> Any -> Any

map_warnings_and_errors : (Any -> Any) -> Warning_System -> (Warning_System -> Any) -> Any
map_warnings_and_errors mapper warnings callback =
(here.map_attached_warnings mapper warnings callback) . map_error mapper

## PRIVATE
A temporary helper for testing warnings.

Gets a closure which takes a `Warning_System` instance and runs some action.
It returns a `Pair` containing the result of the closure and a list of
warnings reported when running it.
test_warnings : (Warning_System -> Any) -> Pair Any (Vector Any)
test_warnings closure =
builder = Vector.new_builder
warning_system = Warning_System builder.append (x->x)
result = closure warning_system
warnings = builder.to_vector
Pair result warnings
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,15 @@ import Standard.Database.Data.Sql
import Standard.Table.Data.Column as Materialized_Column
import Standard.Table.Data.Table as Materialized_Table
import Standard.Table.Internal.Java_Exports
import Standard.Table.Internal.Table_Helpers

from Standard.Database.Data.Column as Column_Module import all
from Standard.Database.Data.Internal.IR import Internal_Column
from Standard.Table.Data.Order_Rule as Order_Rule_Module import Order_Rule
from Standard.Table.Data.Table import No_Such_Column_Error
from Standard.Table.Data.Column_Selector as Column_Selector_Module import all
radeusgd marked this conversation as resolved.
Show resolved Hide resolved
from Standard.Base.Error.Problem_Behavior as Problem_Behavior_Module import all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import Problem_Behavior

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar thing with "all" imports – we should not pollute the scope this way. The rule in Enso is to import only needed things, not all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be drastically improved after we implement the auto-scoping mechanism in the Engine. Btw, you've been talking with Engine about that already, right?

import Standard.Base.Error.Warnings

polyglot java import java.sql.JDBCType

Expand Down Expand Up @@ -77,6 +81,63 @@ type Table
internal = candidates.find (p -> p.name == name)
this.make_column internal . map_error (_ -> No_Such_Column_Error name)

## Returns a new table with a chosen subset of columns, as specified by the
`columns`, from the input table. Any unmatched input columns will be
dropped from the output.

Arguments:
- columns: Column selection criteria.
- reorder: By default, or if set to `False`, columns in the output will
be in the same order as in the input table. If `True`, the order in the
output table will match the order in the columns list.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.

The following problems can occur:
- If a column in columns is not in the input table, a
`Missing_Input_Columns`.
- If duplicate columns, names or indices are provided, a
`Duplicate_Column_Selectors`.
- If a column index is out of range, a `Column_Indexes_Out_Of_Range`.
- If two distinct indices would refer to the same column, a
`Input_Indices_Already_Matched`, indicating that the additional
indices will not introduce additional columns.
- If there are no columns in the output table, a `No_Output_Columns`.
- warnings: A `Warning_System` instance specifying how to handle
warnings. This is a temporary workaround to allow for testing the
warning mechanism. Once the proper warning system is implemented, this
argument will become obsolete and will be removed. No user code should
use this argument, as it will be removed in the future.

> Example
Select columns by name.

table.select_columns (By_Name ["bar", "foo"] (Matching.Exact True))

## TODO [RW] default arguments do not work on atoms, once this is fixed,
the above should be replaced with just `Matching.Exact`.
See: https://github.com/enso-org/enso/issues/1600


> Example
Select columns matching a regular expression.

table.select_columns (By_Name ["foo.+", "b.*"] (Matching.Regex case_senitivity=Matching.Case_Insensitive))

> Example
Select the first two columns and the last column, moving the last one to front.

table.select_columns (By_Index [-1, 0, 1]) reorder=True

> Example
Select columns with the same names as the ones provided.

table.select_columns (By_Column [column1, column2])
select_columns : Column_Selector -> Boolean -> Problem_Behavior -> Warnings.Warning_System -> Table
select_columns (columns = By_Index [0]) (reorder = False) (on_problems = Report_Warning) (warnings = Warnings.default) =
new_columns = Table_Helpers.select_columns internal_columns=this.internal_columns selector=columns reorder=reorder on_problems=on_problems warnings=warnings
this.updated_columns new_columns

## PRIVATE

Resolves the column name to a column within this table.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
from Standard.Base import all

from Standard.Table.Data.Matching import Matching_Strategy, Exact

## Specifies a selection of columns from the table on which an operation is
going to be performed.
type Column_Selector

## Selects columns based on their names.

The `matching_strategy` can be used to specify if the names should be
matched exactly or should be treated as regular expressions. It also
allows to specify if the matching should be case-sensitive.
type By_Name (names : Vector Text) (matching_strategy : Matching_Strategy = Exact True)

## Selects columns by their index.

The index of the first column in the table is 0. If the provided index is
negative, it counts from the end of the table (e.g. -1 refers to the last
column in the table).
type By_Index (indexes : Vector Number)

## Selects columns having exactly the same names as the columns provided in
the input.

The input columns do not necessarily have to come from the same table, so
this approach can be used to match columns with the same names as a set
of columns of some other table, for example, when preparing for a join.
type By_Column (columns : Vector Column)
Loading