Skip to content

Commit

Permalink
Implement Table.union for Database backend (#6204)
Browse files Browse the repository at this point in the history
Closes #5235
  • Loading branch information
radeusgd authored Apr 6, 2023
1 parent df4491d commit 83b10a2
Show file tree
Hide file tree
Showing 31 changed files with 761 additions and 128 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -375,6 +375,7 @@
- [Added support for Date/Time columns in the Postgres backend and added
`year`/`month`/`day` operations to Table columns.][6153]
- [`Text.split` can now take a vector of delimiters.][6156]
- [Implemented `Table.union` for the Database backend.][6204]

[debug-shortcuts]:
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
Expand Down Expand Up @@ -568,6 +569,7 @@
[6150]: https://github.com/enso-org/enso/pull/6150
[6153]: https://github.com/enso-org/enso/pull/6153
[6156]: https://github.com/enso-org/enso/pull/6156
[6204]: https://github.com/enso-org/enso/pull/6204

#### Enso Compiler

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ type Connection
Error.throw (Table_Not_Found.Error query sql_error treated_as_query=True)
SQL_Query.Raw_SQL raw_sql -> handle_sql_errors <|
self.jdbc_connection.ensure_query_has_no_holes raw_sql . if_not_error <|
columns = self.jdbc_connection.fetch_columns raw_sql Statement_Setter.null
columns = self.fetch_columns raw_sql Statement_Setter.null
name = if alias == "" then (UUID.randomUUID.to_text) else alias
ctx = Context.for_query raw_sql name
Database_Table_Module.make_table self name columns ctx
Expand All @@ -155,7 +155,7 @@ type Connection
ctx = Context.for_table name (if alias == "" then name else alias)
statement = self.dialect.generate_sql (Query.Select Nothing ctx)
statement_setter = self.dialect.get_statement_setter
columns = self.jdbc_connection.fetch_columns statement statement_setter
columns = self.fetch_columns statement statement_setter
Database_Table_Module.make_table self name columns ctx
result.catch SQL_Error sql_error->
Error.throw (Table_Not_Found.Error name sql_error treated_as_query=False)
Expand Down Expand Up @@ -189,6 +189,14 @@ type Connection
self.jdbc_connection.with_prepared_statement statement statement_setter stmt->
result_set_to_table stmt.executeQuery self.dialect.make_column_fetcher_for_type type_overrides last_row_only

## PRIVATE
Given a prepared statement, gets the column names and types for the
result set.
fetch_columns : Text | SQL_Statement -> Statement_Setter -> Any
fetch_columns self statement statement_setter =
needs_execute_query = self.dialect.needs_execute_query_for_type_inference
self.jdbc_connection.raw_fetch_columns statement needs_execute_query statement_setter

## PRIVATE
ADVANCED

Expand Down
75 changes: 69 additions & 6 deletions distribution/lib/Standard/Database/0.0.0-dev/src/Data/Column.enso
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import Standard.Table.Internal.Java_Problems
import Standard.Table.Internal.Problem_Builder.Problem_Builder
import Standard.Table.Internal.Widget_Helpers
from Standard.Table import Sort_Column, Data_Formatter, Value_Type, Auto
from Standard.Table.Errors import Floating_Point_Equality, Inexact_Type_Coercion, Invalid_Value_Type
from Standard.Table.Errors import Floating_Point_Equality, Inexact_Type_Coercion, Invalid_Value_Type, Lossy_Conversion

import project.Data.SQL_Statement.SQL_Statement
import project.Data.SQL_Type.SQL_Type
Expand Down Expand Up @@ -77,6 +77,13 @@ type Column
to_table self =
Table.Value self.name self.connection [self.as_internal] self.context

## Returns a Table describing this column's contents.

The table behaves like `Table.info` - it lists the column name, the count
of non-null items and the value type.
info : Table
info self = self.to_table.info

## Returns a materialized column containing rows of this column.

Arguments:
Expand All @@ -91,11 +98,10 @@ type Column
to_vector self =
self.to_table.read . at self.name . to_vector

## UNSTABLE TODO this is a very early prototype that will be revisited later
This implementation is really just so that we can use the types in
`filter`, it does not provide even a decent approximation of the true
type in many cases. It will be improved when the types work is
implemented.
## Returns the `Value_Type` associated with that column.

The value type determines what type of values the column is storing and
what operations are permitted.
value_type : Value_Type
value_type self =
mapping = self.connection.dialect.get_type_mapping
Expand Down Expand Up @@ -901,6 +907,63 @@ type Column
_ = [type, format, on_problems]
Error.throw <| Unsupported_Database_Operation.Error "`Column.parse` is not implemented yet for the Database backends."

## PRIVATE
UNSTABLE
Cast the column to a specific type.

Arguments:
- value_type: The `Value_Type` to cast the column to.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.

TODO [RW] this is a prototype needed for debugging, proper implementation
and testing will come with #6112.

In the Database backend, this will boil down to a CAST operation.
In the in-memory backend, a conversion will be performed according to
the following rules:
- Anything can be cast into the `Mixed` type.
- Converting to a `Char` type, the elements of the column will be
converted to text. If it is fixed length, the texts will be trimmed or
padded on the right with the space character to match the desired
length.
- Conversion between numeric types will replace values exceeding the
range of the target type with `Nothing`.
- Booleans may also be converted to numbers, with `True` being converted
to `1` and `False` to `0`. The reverse is not supported - use `iif`
instead.
- A `Date_Time` may be converted into a `Date` or `Time` type - the
resulting value will be truncated to the desired type.
- If a `Date` is to be converted to `Date_Time`, it will be set at
midnight of the default system timezone.

? Conversion Precision

In the in-memory backend, if the conversion is lossy, a
`Lossy_Conversion` warning will be reported. The only exception is when
truncating a column which is already a text column - as then the
truncation seems like an intended behaviour, so it is not reported. If
truncating needs to occur when converting a non-text column, a warning
will still be reported.

Currently, the warning is not reported for Database backends.

? Inexact Target Type

If the backend does not support the requested target type, the closest
supported type is chosen and a `Inexact_Type_Coercion` problem is
reported.
cast : Value_Type -> Problem_Behavior -> Column ! Illegal_Argument | Inexact_Type_Coercion | Lossy_Conversion
cast self value_type=self.value_type on_problems=Problem_Behavior.Report_Warning =
dialect = self.connection.dialect
type_mapping = dialect.get_type_mapping
target_sql_type = type_mapping.value_type_to_sql value_type on_problems
target_sql_type.if_not_error <|
infer_from_database new_expression =
SQL_Type_Reference.new self.connection self.context new_expression
new_column = dialect.make_cast self.as_internal target_sql_type infer_from_database
Column.Value new_column.name self.connection new_column.sql_type_reference new_column.expression self.context

## ALIAS Transform Column

Applies `function` to each item in this column and returns the column
Expand Down
63 changes: 63 additions & 0 deletions distribution/lib/Standard/Database/0.0.0-dev/src/Data/Dialect.enso
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,17 @@ import project.Data.SQL_Statement.SQL_Statement
import project.Data.SQL_Type.SQL_Type
import project.Data.Table.Table
import project.Internal.Column_Fetcher.Column_Fetcher
import project.Internal.IR.Context.Context
import project.Internal.IR.From_Spec.From_Spec
import project.Internal.IR.Internal_Column.Internal_Column
import project.Internal.IR.Order_Descriptor.Order_Descriptor
import project.Internal.IR.Query.Query
import project.Internal.IR.SQL_Expression.SQL_Expression
import project.Internal.Postgres.Postgres_Dialect
import project.Internal.Redshift.Redshift_Dialect
import project.Internal.SQLite.SQLite_Dialect
import project.Internal.SQL_Type_Mapping.SQL_Type_Mapping
import project.Internal.SQL_Type_Reference.SQL_Type_Reference
import project.Internal.Statement_Setter.Statement_Setter
from project.Errors import Unsupported_Database_Operation

Expand Down Expand Up @@ -113,6 +116,61 @@ type Dialect
get_statement_setter self =
Unimplemented.throw "This is an interface only."

## PRIVATE
Builds an SQL expression that casts the given expression to the given
target type.

Arguments:
- column: the input column to transform.
- target_type: the target type.
- infer_result_type_from_database_callback: A callback that can be used
to infer the type of the newly built expression from the Database. It
should be used by default, unless an override is chosen.
make_cast : Internal_Column -> SQL_Type -> (SQL_Expression -> SQL_Type_Reference) -> Internal_Column
make_cast self column target_type infer_result_type_from_database_callback =
_ = [column, target_type, infer_result_type_from_database_callback]
Unimplemented.throw "This is an interface only."

## PRIVATE
Specifies if the `fetch_columns` operation needs to execute the query to
get the column types.

In most backends, the `getMetaData` may be called on a
`PreparedStatement` directly, to infer column types without actually
executing the query. In some however, like SQLite, this is insufficient
and will yield incorrect results, so the query needs to be executed (even
though the full results may not need to be streamed).
needs_execute_query_for_type_inference : Boolean
needs_execute_query_for_type_inference self =
Unimplemented.throw "This is an interface only."

## PRIVATE
Specifies if the cast used to reconcile column types should be done after
performing the union. If `False`, the cast will be done before the union.

Most databases that care about column types will want to do the cast
before the union operation to ensure that types are aligned when merging.
For an SQLite workaround to work, it's better to do the cast after the
union operation.
cast_after_union : Boolean
cast_after_union self =
Unimplemented.throw "This is an interface only."

## PRIVATE
Prepares a query that can be used to fetch the type of an expression in
the provided context.

This method may modify the context to optimize the query while preserving
the types. For example, in most databases, it is fine to add
`WHERE FALSE` to the query - ensuring that the engine will not do any
actual work, but the resulting type will still be the same. There are
exceptions though, like SQLite, where the best we can do is add
`LIMIT 1`.
prepare_fetch_types_query : SQL_Expression -> Context -> SQL_Statement
prepare_fetch_types_query self expression context =
_ = [expression, context]
Unimplemented.throw "This is an interface only."

## PRIVATE
Checks if the given aggregate is supported.

Expand All @@ -139,3 +197,8 @@ postgres = Postgres_Dialect.postgres
The dialect of Redshift databases.
redshift : Dialect
redshift = Redshift_Dialect.redshift

## PRIVATE
default_fetch_types_query dialect expression context =
empty_context = context.add_where_filters [SQL_Expression.Literal "FALSE"]
dialect.generate_sql (Query.Select [["typed_column", expression]] empty_context)
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,12 @@ type SQL_Type
- precision: For character types, specifies their length.
See `ResultSetMetaData.getPrecision`.
- scale: The scale for fixed precision numeric types. Not applicable for
other types, so it's value is undefined and will usually just be 0.
other types, so it's value is undefined.
See `ResultSetMetaData.getScale`.
- nullable: Specifies if the given column is nullable. May be `Nothing`
if that is unknown / irrelevant for the type.
TODO: the precise meaning of this will be revised with #5872.
Value (typeid : Integer) (name : Text) (precision : Nothing | Integer = Nothing) (scale : Integer = 0) (nullable : Boolean | Nothing = Nothing)
Value (typeid : Integer) (name : Text) (precision : Nothing | Integer = Nothing) (scale : Nothing | Integer = Nothing) (nullable : Boolean | Nothing = Nothing)

## PRIVATE
ADVANCED
Expand All @@ -40,8 +40,9 @@ type SQL_Type
0 -> Nothing
p : Integer -> p
scale = metadata.getScale ix
effective_scale = if precision.is_nothing && (scale == 0) then Nothing else scale
nullable_id = metadata.isNullable ix
nullable = if nullable_id == ResultSetMetaData.columnNoNulls then False else
if nullable_id == ResultSetMetaData.columnNullable then True else
Nothing
SQL_Type.Value typeid typename precision scale nullable
SQL_Type.Value typeid typename precision effective_scale nullable
Loading

0 comments on commit 83b10a2

Please sign in to comment.