Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Table.union for Database backend #6204

Merged
merged 28 commits into from
Apr 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
a06bb15
Adapting tests for DB
radeusgd Mar 31, 2023
b9d6304
Prepare a Union query
radeusgd Apr 1, 2023
db6a30e
Change IR a bit to make more sense, generating SQL for UNION
radeusgd Apr 1, 2023
f140891
fixes - 8 tests passing, 9 to go
radeusgd Apr 1, 2023
23a2d18
fixes - 12 tests passing, 5 to go
radeusgd Apr 1, 2023
d697e8a
Basic infrastructure for CAST
radeusgd Apr 3, 2023
af9e768
add a de-dup test
radeusgd Apr 3, 2023
068a46d
Fix varchar casts in Postgres
radeusgd Apr 3, 2023
7713d75
Adding a stub for `Column.cast`
radeusgd Apr 3, 2023
0f4199c
WIP: Column.cast
radeusgd Apr 3, 2023
10b6628
Work around SQLite issues to implement basic type casting
radeusgd Apr 4, 2023
fe08069
Adding basic tests for Cast. Figured out issues with SQLite. Small fi…
radeusgd Apr 4, 2023
1f92395
Make sure SQLite tests pass.
radeusgd Apr 5, 2023
3dc0f5f
Fix Boolean conversions for Postgres
radeusgd Apr 5, 2023
247b478
tests
radeusgd Apr 5, 2023
f0754e9
fix
radeusgd Apr 5, 2023
3f3cff2
CHANGELOG.md
radeusgd Apr 5, 2023
d4e6f8c
Workaround for IIF issue.
radeusgd Apr 5, 2023
4fc0d64
fix a test
radeusgd Apr 5, 2023
a47cd24
update a field
radeusgd Apr 5, 2023
8bc6fb6
Checked in IDE and the calls are minimal. Removing print.
radeusgd Apr 5, 2023
c38fe13
Table.cast prototype, some tests
radeusgd Apr 5, 2023
0da9ce2
Update old comment
radeusgd Apr 5, 2023
28f5772
Add some more tests
radeusgd Apr 5, 2023
d8307e4
CR
radeusgd Apr 5, 2023
bc38ce5
Merge branch 'develop' into wip/radeusgd/union-in-database-5235
mergify[bot] Apr 5, 2023
5d781e6
Merge branch 'develop' into wip/radeusgd/union-in-database-5235
mergify[bot] Apr 5, 2023
e41768c
Merge branch 'develop' into wip/radeusgd/union-in-database-5235
mergify[bot] Apr 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -375,6 +375,7 @@
- [Added support for Date/Time columns in the Postgres backend and added
`year`/`month`/`day` operations to Table columns.][6153]
- [`Text.split` can now take a vector of delimiters.][6156]
- [Implemented `Table.union` for the Database backend.][6204]

[debug-shortcuts]:
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
Expand Down Expand Up @@ -568,6 +569,7 @@
[6150]: https://github.com/enso-org/enso/pull/6150
[6153]: https://github.com/enso-org/enso/pull/6153
[6156]: https://github.com/enso-org/enso/pull/6156
[6204]: https://github.com/enso-org/enso/pull/6204

#### Enso Compiler

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ type Connection
Error.throw (Table_Not_Found.Error query sql_error treated_as_query=True)
SQL_Query.Raw_SQL raw_sql -> handle_sql_errors <|
self.jdbc_connection.ensure_query_has_no_holes raw_sql . if_not_error <|
columns = self.jdbc_connection.fetch_columns raw_sql Statement_Setter.null
columns = self.fetch_columns raw_sql Statement_Setter.null
name = if alias == "" then (UUID.randomUUID.to_text) else alias
ctx = Context.for_query raw_sql name
Database_Table_Module.make_table self name columns ctx
Expand All @@ -155,7 +155,7 @@ type Connection
ctx = Context.for_table name (if alias == "" then name else alias)
statement = self.dialect.generate_sql (Query.Select Nothing ctx)
statement_setter = self.dialect.get_statement_setter
columns = self.jdbc_connection.fetch_columns statement statement_setter
columns = self.fetch_columns statement statement_setter
Database_Table_Module.make_table self name columns ctx
result.catch SQL_Error sql_error->
Error.throw (Table_Not_Found.Error name sql_error treated_as_query=False)
Expand Down Expand Up @@ -189,6 +189,14 @@ type Connection
self.jdbc_connection.with_prepared_statement statement statement_setter stmt->
result_set_to_table stmt.executeQuery self.dialect.make_column_fetcher_for_type type_overrides last_row_only

## PRIVATE
Given a prepared statement, gets the column names and types for the
result set.
fetch_columns : Text | SQL_Statement -> Statement_Setter -> Any
fetch_columns self statement statement_setter =
needs_execute_query = self.dialect.needs_execute_query_for_type_inference
self.jdbc_connection.raw_fetch_columns statement needs_execute_query statement_setter

## PRIVATE
ADVANCED

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import Standard.Table.Internal.Java_Problems
import Standard.Table.Internal.Problem_Builder.Problem_Builder
import Standard.Table.Internal.Widget_Helpers
from Standard.Table import Sort_Column, Data_Formatter, Value_Type, Auto
from Standard.Table.Errors import Floating_Point_Equality, Inexact_Type_Coercion, Invalid_Value_Type
from Standard.Table.Errors import Floating_Point_Equality, Inexact_Type_Coercion, Invalid_Value_Type, Lossy_Conversion

import project.Data.SQL_Statement.SQL_Statement
import project.Data.SQL_Type.SQL_Type
Expand Down Expand Up @@ -77,6 +77,13 @@ type Column
to_table self =
Table.Value self.name self.connection [self.as_internal] self.context

## Returns a Table describing this column's contents.

The table behaves like `Table.info` - it lists the column name, the count
of non-null items and the value type.
info : Table
info self = self.to_table.info

## Returns a materialized column containing rows of this column.

Arguments:
Expand All @@ -91,11 +98,10 @@ type Column
to_vector self =
self.to_table.read . at self.name . to_vector

## UNSTABLE TODO this is a very early prototype that will be revisited later
This implementation is really just so that we can use the types in
`filter`, it does not provide even a decent approximation of the true
type in many cases. It will be improved when the types work is
implemented.
## Returns the `Value_Type` associated with that column.

The value type determines what type of values the column is storing and
what operations are permitted.
value_type : Value_Type
value_type self =
mapping = self.connection.dialect.get_type_mapping
Expand Down Expand Up @@ -901,6 +907,63 @@ type Column
_ = [type, format, on_problems]
Error.throw <| Unsupported_Database_Operation.Error "`Column.parse` is not implemented yet for the Database backends."

## PRIVATE
UNSTABLE
Cast the column to a specific type.

Arguments:
- value_type: The `Value_Type` to cast the column to.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.

TODO [RW] this is a prototype needed for debugging, proper implementation
and testing will come with #6112.

In the Database backend, this will boil down to a CAST operation.
In the in-memory backend, a conversion will be performed according to
the following rules:
- Anything can be cast into the `Mixed` type.
- Converting to a `Char` type, the elements of the column will be
converted to text. If it is fixed length, the texts will be trimmed or
padded on the right with the space character to match the desired
length.
- Conversion between numeric types will replace values exceeding the
range of the target type with `Nothing`.
- Booleans may also be converted to numbers, with `True` being converted
to `1` and `False` to `0`. The reverse is not supported - use `iif`
instead.
- A `Date_Time` may be converted into a `Date` or `Time` type - the
resulting value will be truncated to the desired type.
- If a `Date` is to be converted to `Date_Time`, it will be set at
midnight of the default system timezone.

? Conversion Precision

In the in-memory backend, if the conversion is lossy, a
`Lossy_Conversion` warning will be reported. The only exception is when
truncating a column which is already a text column - as then the
truncation seems like an intended behaviour, so it is not reported. If
truncating needs to occur when converting a non-text column, a warning
will still be reported.

Currently, the warning is not reported for Database backends.

? Inexact Target Type

If the backend does not support the requested target type, the closest
supported type is chosen and a `Inexact_Type_Coercion` problem is
reported.
cast : Value_Type -> Problem_Behavior -> Column ! Illegal_Argument | Inexact_Type_Coercion | Lossy_Conversion
cast self value_type=self.value_type on_problems=Problem_Behavior.Report_Warning =
dialect = self.connection.dialect
type_mapping = dialect.get_type_mapping
target_sql_type = type_mapping.value_type_to_sql value_type on_problems
target_sql_type.if_not_error <|
infer_from_database new_expression =
SQL_Type_Reference.new self.connection self.context new_expression
new_column = dialect.make_cast self.as_internal target_sql_type infer_from_database
Column.Value new_column.name self.connection new_column.sql_type_reference new_column.expression self.context

## ALIAS Transform Column

Applies `function` to each item in this column and returns the column
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,17 @@ import project.Data.SQL_Statement.SQL_Statement
import project.Data.SQL_Type.SQL_Type
import project.Data.Table.Table
import project.Internal.Column_Fetcher.Column_Fetcher
import project.Internal.IR.Context.Context
import project.Internal.IR.From_Spec.From_Spec
import project.Internal.IR.Internal_Column.Internal_Column
import project.Internal.IR.Order_Descriptor.Order_Descriptor
import project.Internal.IR.Query.Query
import project.Internal.IR.SQL_Expression.SQL_Expression
import project.Internal.Postgres.Postgres_Dialect
import project.Internal.Redshift.Redshift_Dialect
import project.Internal.SQLite.SQLite_Dialect
import project.Internal.SQL_Type_Mapping.SQL_Type_Mapping
import project.Internal.SQL_Type_Reference.SQL_Type_Reference
import project.Internal.Statement_Setter.Statement_Setter
from project.Errors import Unsupported_Database_Operation

Expand Down Expand Up @@ -113,6 +116,61 @@ type Dialect
get_statement_setter self =
Unimplemented.throw "This is an interface only."

## PRIVATE
Builds an SQL expression that casts the given expression to the given
target type.

Arguments:
- column: the input column to transform.
- target_type: the target type.
- infer_result_type_from_database_callback: A callback that can be used
to infer the type of the newly built expression from the Database. It
should be used by default, unless an override is chosen.
make_cast : Internal_Column -> SQL_Type -> (SQL_Expression -> SQL_Type_Reference) -> Internal_Column
make_cast self column target_type infer_result_type_from_database_callback =
_ = [column, target_type, infer_result_type_from_database_callback]
Unimplemented.throw "This is an interface only."

## PRIVATE
Specifies if the `fetch_columns` operation needs to execute the query to
get the column types.

In most backends, the `getMetaData` may be called on a
`PreparedStatement` directly, to infer column types without actually
executing the query. In some however, like SQLite, this is insufficient
and will yield incorrect results, so the query needs to be executed (even
though the full results may not need to be streamed).
needs_execute_query_for_type_inference : Boolean
needs_execute_query_for_type_inference self =
Unimplemented.throw "This is an interface only."

## PRIVATE
Specifies if the cast used to reconcile column types should be done after
performing the union. If `False`, the cast will be done before the union.

Most databases that care about column types will want to do the cast
before the union operation to ensure that types are aligned when merging.
For an SQLite workaround to work, it's better to do the cast after the
union operation.
cast_after_union : Boolean
cast_after_union self =
Unimplemented.throw "This is an interface only."

## PRIVATE
Prepares a query that can be used to fetch the type of an expression in
the provided context.

This method may modify the context to optimize the query while preserving
the types. For example, in most databases, it is fine to add
`WHERE FALSE` to the query - ensuring that the engine will not do any
actual work, but the resulting type will still be the same. There are
exceptions though, like SQLite, where the best we can do is add
`LIMIT 1`.
prepare_fetch_types_query : SQL_Expression -> Context -> SQL_Statement
prepare_fetch_types_query self expression context =
_ = [expression, context]
Unimplemented.throw "This is an interface only."

## PRIVATE
Checks if the given aggregate is supported.

Expand All @@ -139,3 +197,8 @@ postgres = Postgres_Dialect.postgres
The dialect of Redshift databases.
redshift : Dialect
redshift = Redshift_Dialect.redshift

## PRIVATE
default_fetch_types_query dialect expression context =
empty_context = context.add_where_filters [SQL_Expression.Literal "FALSE"]
dialect.generate_sql (Query.Select [["typed_column", expression]] empty_context)
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,12 @@ type SQL_Type
- precision: For character types, specifies their length.
See `ResultSetMetaData.getPrecision`.
- scale: The scale for fixed precision numeric types. Not applicable for
other types, so it's value is undefined and will usually just be 0.
other types, so it's value is undefined.
See `ResultSetMetaData.getScale`.
- nullable: Specifies if the given column is nullable. May be `Nothing`
if that is unknown / irrelevant for the type.
TODO: the precise meaning of this will be revised with #5872.
Value (typeid : Integer) (name : Text) (precision : Nothing | Integer = Nothing) (scale : Integer = 0) (nullable : Boolean | Nothing = Nothing)
Value (typeid : Integer) (name : Text) (precision : Nothing | Integer = Nothing) (scale : Nothing | Integer = Nothing) (nullable : Boolean | Nothing = Nothing)

## PRIVATE
ADVANCED
Expand All @@ -40,8 +40,9 @@ type SQL_Type
0 -> Nothing
p : Integer -> p
scale = metadata.getScale ix
effective_scale = if precision.is_nothing && (scale == 0) then Nothing else scale
nullable_id = metadata.isNullable ix
nullable = if nullable_id == ResultSetMetaData.columnNoNulls then False else
if nullable_id == ResultSetMetaData.columnNullable then True else
Nothing
SQL_Type.Value typeid typename precision scale nullable
SQL_Type.Value typeid typename precision effective_scale nullable
Loading