Skip to content

Commit

Permalink
WIP: correct handling of fixed-length text types and other unificatio…
Browse files Browse the repository at this point in the history
…n in UNION and IIF/FILL_NOTHING
  • Loading branch information
radeusgd committed Apr 18, 2023
1 parent e83e7b7 commit 61b013c
Show file tree
Hide file tree
Showing 8 changed files with 101 additions and 44 deletions.
18 changes: 8 additions & 10 deletions distribution/lib/Standard/Database/0.0.0-dev/src/Data/Dialect.enso
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from Standard.Base import all
import Standard.Base.Errors.Unimplemented.Unimplemented

from Standard.Table import Aggregate_Column, Join_Kind
from Standard.Table import Aggregate_Column, Join_Kind, Value_Type
import Standard.Table.Internal.Naming_Helpers.Naming_Helpers
import Standard.Table.Internal.Problem_Builder.Problem_Builder

Expand Down Expand Up @@ -145,15 +145,13 @@ type Dialect
Unimplemented.throw "This is an interface only."

## PRIVATE
Specifies if the cast used to reconcile column types should be done after
performing the union. If `False`, the cast will be done before the union.

Most databases that care about column types will want to do the cast
before the union operation to ensure that types are aligned when merging.
For an SQLite workaround to work, it's better to do the cast after the
union operation.
cast_after_union : Boolean
cast_after_union self =
Performs any transformations on a column resulting from unifying other
columns.

These transformations depend on the dialect. They can be used to align
the result types, for example.
adapt_unified_column : Internal_Column -> Value_Type -> (SQL_Expression -> SQL_Type_Reference) -> Internal_Column
adapt_unified_column self column approximate_result_type infer_result_type_from_database_callback =
Unimplemented.throw "This is an interface only."

## PRIVATE
Expand Down
29 changes: 11 additions & 18 deletions distribution/lib/Standard/Database/0.0.0-dev/src/Data/Table.enso
Original file line number Diff line number Diff line change
Expand Up @@ -1151,29 +1151,26 @@ type Table
sql_type.catch Inexact_Type_Coercion error->
Panic.throw <|
Illegal_State.Error "Unexpected inexact type coercion in Union. The union logic should only operate in types supported by the given backend. This is a bug in the Database library. The coercion was: "+error.to_display_text cause=error
[column_set, sql_type]
[column_set, sql_type, result_type]
good_columns = merged_columns.filter r-> r.is_nothing.not
if good_columns.is_empty then Error.throw No_Output_Columns else
problem_builder.attach_problems_before on_problems <|
cast_after_union = dialect.cast_after_union
queries = all_tables.map_with_index i-> t->
columns_to_select = good_columns.map description->
column_set = description.first
result_type = description.second
sql_type = description.second
column_name = column_set.name
case column_set.column_indices.at i of
Nothing ->
typ = SQL_Type_Reference.from_constant SQL_Type.null
expr = SQL_Expression.Literal "NULL"
null_column = Internal_Column.Value column_name typ expr
if cast_after_union then null_column else
## We assume that the type for this
expression will never be queried - it
is just used internally to build the
Union operation and never exposed
externally.
infer_return_type _ = SQL_Type_Reference.null
dialect.make_cast null_column result_type infer_return_type
## We assume that the type for this
expression will never be queried - it is
just used internally to build the Union
operation and never exposed externally.
infer_return_type _ = SQL_Type_Reference.null
dialect.make_cast null_column sql_type infer_return_type
corresponding_column_index : Integer ->
t.at corresponding_column_index . as_internal . rename column_name
pairs = columns_to_select.map c->
Expand All @@ -1191,15 +1188,11 @@ type Table
SQL_Type_Reference.new self.connection new_ctx expression
new_columns = good_columns.map description->
column_set = description.first
result_type = description.second
result_type = description.at 2
name = column_set.name
expression = SQL_Expression.Column union_alias name
case cast_after_union of
True ->
input_column = Internal_Column.Value name SQL_Type_Reference.null expression
dialect.make_cast input_column result_type infer_return_type
False ->
Internal_Column.Value name (infer_return_type expression) expression
input_column = Internal_Column.Value name (infer_return_type expression) expression
dialect.adapt_unified_column input_column result_type infer_return_type

Table.Value union_alias self.connection new_columns new_ctx

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -153,8 +153,13 @@ type Postgres_Dialect
supports_separate_nan self = True

## PRIVATE
cast_after_union : Boolean
cast_after_union self = False
There is a bug in Postgres type inference, where if we unify two
fixed-length char columns of length N and M, the result type is said to
be a **fixed-length** column of length max_int4. This is wrong, and in
practice the column is just a variable-length text. This method detects
this situations and overrides the type to make it correct.
adapt_unified_column : Internal_Column -> Value_Type -> (SQL_Expression -> SQL_Type_Reference) -> Internal_Column
adapt_unified_column self column approximate_result_type infer_result_type_from_database_callback =

## PRIVATE
prepare_fetch_types_query : SQL_Expression -> Context -> SQL_Statement
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import project.Data.SQL_Type.SQL_Type
import project.Internal.IR.SQL_Expression.SQL_Expression
import project.Internal.SQL_Type_Mapping
import project.Internal.SQL_Type_Reference.SQL_Type_Reference
from project.Errors import Unsupported_Database_Operation

polyglot java import java.sql.Types

Expand Down Expand Up @@ -37,9 +38,11 @@ type Postgres_Type_Mapping
SQL_Type.Value Types.DECIMAL "decimal" precision scale
Value_Type.Char size variable ->
case variable of
True -> case size of
Nothing -> SQL_Type.Value Types.VARCHAR "text"
_ -> SQL_Type.Value Types.VARCHAR "varchar" size
True ->
is_unbounded = size.is_nothing || (size == max_precision)
case is_unbounded of
True -> SQL_Type.Value Types.VARCHAR "text"
False -> SQL_Type.Value Types.VARCHAR "varchar" size
False -> SQL_Type.Value Types.CHAR "char" size
Value_Type.Time ->
SQL_Type.Value Types.TIME "time"
Expand All @@ -51,7 +54,7 @@ type Postgres_Type_Mapping
Value_Type.Binary _ _ ->
SQL_Type.Value Types.BINARY "bytea" precision=max_precision
Value_Type.Mixed ->
Error.throw (Illegal_Argument.Error "Postgres tables do not support Mixed types.")
Error.throw (Unsupported_Database_Operation.Error "Postgres tables do not support Mixed types.")
Value_Type.Unsupported_Data_Type type_name underlying_type ->
underlying_type.if_nothing <|
Error.throw <|
Expand Down Expand Up @@ -115,7 +118,8 @@ complex_types_map = Map.from_vector <|
make_decimal sql_type =
Value_Type.Decimal sql_type.precision sql_type.scale
make_varchar sql_type =
Value_Type.Char size=sql_type.precision variable_length=True
effective_size = if sql_type.precision == max_precision then Nothing else sql_type.precision
Value_Type.Char size=effective_size variable_length=True
make_char sql_type =
Value_Type.Char size=sql_type.precision variable_length=False
make_binary variable sql_type =
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -123,8 +123,10 @@ type Redshift_Dialect
supports_separate_nan self = True

## PRIVATE
cast_after_union : Boolean
cast_after_union self = False
adapt_unified_column : Internal_Column -> Value_Type -> (SQL_Expression -> SQL_Type_Reference) -> Internal_Column
adapt_unified_column self column approximate_result_type infer_result_type_from_database_callback =
_ = [approximate_result_type, infer_result_type_from_database_callback]
column

## PRIVATE
prepare_fetch_types_query : SQL_Expression -> Context -> SQL_Statement
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import Standard.Table.Data.Aggregate_Column.Aggregate_Column
import Standard.Table.Internal.Naming_Helpers.Naming_Helpers
import Standard.Table.Internal.Problem_Builder.Problem_Builder
from Standard.Table.Data.Aggregate_Column.Aggregate_Column import all
from Standard.Table import Value_Type

import project.Connection.Connection.Connection
import project.Data.SQL.Builder
Expand Down Expand Up @@ -157,8 +158,12 @@ type SQLite_Dialect
supports_separate_nan self = False

## PRIVATE
cast_after_union : Boolean
cast_after_union self = True
SQLite allows mixed type columns, but we want our columns to be uniform.
So after unifying columns with mixed types, we add a cast to ensure that.
adapt_unified_column : Internal_Column -> Value_Type -> (SQL_Expression -> SQL_Type_Reference) -> Internal_Column
adapt_unified_column self column approximate_result_type infer_result_type_from_database_callback =
sql_type = type_mapping.value_type_to_sql result_type Problem_Behavior.Ignore
self.make_cast column sql_type infer_result_type_from_database_callback

## PRIVATE
prepare_fetch_types_query : SQL_Expression -> Context -> SQL_Statement
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,33 @@ spec setup =
t3.at "X" . iif (t3.at "Y") (t3.at "Z") . to_vector . should_fail_with No_Common_Type
t3.at "X" . iif (t3.at "Y") "<NA>" . to_vector . should_fail_with No_Common_Type

Test.specify "iif should correctly unify text columns of various lengths" pending=(if setup.test_selection.fixed_length_text_columns.not then "Fixed-length Char columns are not supported by this backend.") <|
t0 = table_builder [["x", [False, True, False]], ["A", ["a", "b", "c"]], ["B", ["xyz", "abc", "def"]]]
t1 = t0
. cast "A" (Value_Type.Char size=1 variable_length=False)
. cast "B" (Value_Type.Char size=3 variable_length=False)

a = t1.at "A"
b = t1.at "B"
a.value_type.should_equal (Value_Type.Char size=1 variable_length=False)
b.value_type.should_equal (Value_Type.Char size=3 variable_length=False)

c = (t.at "x").iif a b
c.to_vector.should_equal ["xyz", "b", "def"]
Test.with_clue "c.value_type="+c.value_type.to_display_text+": " <|
c.value_type.variable_length.should_be_true

d = b.cast (Value_Type.Char size=1 variable_length=False)
e = (t.at "x").iif a d
e.to_vector.should_equal ["x", "b", "d"]
e.value_type.should_equal (Value_Type.Char size=1 variable_length=False)

f = b.cast (Value_Type.Char size=1 variable_length=True)
g = (t.at "x").iif a f
g.to_vector.should_equal ["x", "b", "d"]
Test.with_clue "g.value_type="+g.value_type.to_display_text+": " <|
g.value_type.variable_length.should_be_true

Test.specify "should allow to compute &&, || and not" <|
t = table_builder [["X", [True, False, True]], ["Y", [True, False, False]]]
x = t.at "X"
Expand Down Expand Up @@ -165,6 +192,29 @@ spec setup =
c2 = t.at "X" . fill_empty "<empty>"
c2.to_vector . should_equal ["a", "<empty>", " ", "<empty>", "b"]

Test.specify "fill_nothing should correctly unify text columns of various lengths" pending=(if setup.test_selection.fixed_length_text_columns.not then "Fixed-length Char columns are not supported by this backend.") <|
t0 = table_builder [["A", ["a", Nothing, "c"]], ["B", ["X", "Y", "Z"]], ["C", ["xyz", "abc", "def"]]]
t1 = t0
. cast "A" (Value_Type.Char size=1 variable_length=False)
. cast "B" (Value_Type.Char size=1 variable_length=False)
. cast "C" (Value_Type.Char size=3 variable_length=False)

a = t1.at "A"
b = t1.at "B"
c = t1.at "C"
a.value_type.should_equal (Value_Type.Char size=1 variable_length=False)
b.value_type.should_equal (Value_Type.Char size=1 variable_length=False)
c.value_type.should_equal (Value_Type.Char size=3 variable_length=False)

d = a.fill_nothing b
d.to_vector . should_equal ["a", "Y", "c"]
d.value_type . should_equal (Value_Type.Char size=1 variable_length=False)

e = a.fill_nothing c
e.to_vector . should_equal ["a", "abc", "c"]
Test.with_clue "e.value_type="+e.value_type.to_display_text+": " <|
e.value_type.variable_length.should_be_true

Test.specify "should report a warning if checking equality on floating point columns" <|
t = table_builder [["X", [1.0, 2.1, 3.2]], ["Y", [1.0, 2.0, 3.2]]]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -123,11 +123,11 @@ spec connection db_name =
t2.at "b" . value_type . should_be_a (Value_Type.Unsupported_Data_Type ...)
t2.at "c" . value_type . should_be_a (Value_Type.Unsupported_Data_Type ...)

Test.specify "should approximate types to the closest supported one" pending="TODO: Table.cast" <|
# TODO this will be tested once the cast operator is implemented
# Binary 10 variable_length=False -> Binary max_int4 variable_length=True
# Byte -> Integer Bits.Bits_16
Nothing
Test.specify "should approximate types to the closest supported one" <|
t = make_table "T" [["b", "INT"]]
t2 = t.cast "b" Value_Type.Byte
t2.at "b" . value_type . should_equal (Value_Type.Integer Bits.Bits_16)
Problems.expect_warning Inexact_Type_Coercion t2

main = Test_Suite.run_main (test_with_connection spec)

Expand Down

0 comments on commit 61b013c

Please sign in to comment.