Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Non_Unique_Primary_Key error, split file format detection into read/write, improve SQLite format detection #6604

Merged
merged 14 commits into from
May 9, 2023
Original file line number Diff line number Diff line change
Expand Up @@ -438,6 +438,8 @@ type Array
flatten : Vector Any
flatten self = Vector.flatten self

## PRIVATE
ADVANCED
short_display_text : Integer -> Text
short_display_text self max_entries=10 = Vector.short_display_text self max_entries

Expand Down Expand Up @@ -641,9 +643,15 @@ type Array
join : Text -> Text -> Text -> Text
join self separator="" prefix="" suffix="" = Vector.join self separator prefix suffix

## PRIVATE
Generates a human-readable text representation of the array.
to_text : Text
to_text self = self.map .to_text . join ", " "[" "]"

## PRIVATE
to_display_text : Text
to_display_text self = self.short_display_text max_entries=40

## Combines all the elements of a non-empty array using a binary operation.
If the array is empty, it returns `if_empty`.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -618,6 +618,10 @@ type Vector a
to_text : Text
to_text self = self.map .to_text . join ", " "[" "]"

## PRIVATE
to_display_text : Text
to_display_text self = self.short_display_text max_entries=40

## PRIVATE
ADVANCED

Expand Down
11 changes: 10 additions & 1 deletion distribution/lib/Standard/Base/0.0.0-dev/src/System/File.enso
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,8 @@ type File
@format format_widget
read : File_Format -> Problem_Behavior -> Any ! File_Error
read self format=Auto_Detect (on_problems=Problem_Behavior.Report_Warning) =
format.read self on_problems
if self.exists.not then Error.throw (File_Error.Not_Found self) else
format.read self on_problems

## ALIAS Load Bytes, Open Bytes
Reads all bytes in this file into a byte vector.
Expand Down Expand Up @@ -612,6 +613,14 @@ type File
resource = Managed_Resource.register stream close_stream
Input_Stream.Value self resource

## PRIVATE
Reads first `n` bytes from the file (or less if the file is too small)
and returns a vector of bytes.
read_first_bytes : Integer -> Vector ! File_Error
read_first_bytes self n =
opts = [File_Access.Read]
self.with_input_stream opts (_.read_n_bytes n)

## PRIVATE
Reads last `n` bytes from the file (or less if the file is too small) and
returns a vector of bytes.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,14 +58,26 @@ type Auto_Detect
Implements the `File.read` for this `File_Format`
read : File -> Problem_Behavior -> Any ! File_Error
read self file on_problems =
reader = Auto_Detect.get_format file
reader = Auto_Detect.get_reading_format file
if reader == Nothing then Error.throw (File_Error.Unsupported_Type file) else
reader.read file on_problems

## PRIVATE
get_format : File -> Any | Nothing
get_format file =
get_format f-> f.for_file file
Finds a matching format for reading the file.

It assumes that `file` already exists.
get_reading_format : File -> Any | Nothing
get_reading_format file =
get_format f-> f.for_file_read file

## PRIVATE
Finds a matching format for reading the file.

It may not assume that the `file` exists, so it must only rely on the
file path (extension in particular), but not the contents.
get_writing_format : File -> Any | Nothing
get_writing_format file =
get_format f-> f.for_file_write file

## PRIVATE
get_web_parser : Text -> URI -> Any | Nothing
Expand All @@ -91,13 +103,18 @@ type Plain_Text_Format

## PRIVATE
If the File_Format supports reading from the file, return a configured instance.
for_file : File -> Plain_Text_Format | Nothing
for_file file =
for_file_read : File -> Plain_Text_Format | Nothing
for_file_read file =
case file.extension of
".txt" -> Plain_Text_Format.Plain_Text
".log" -> Plain_Text_Format.Plain_Text
_ -> Nothing

## PRIVATE
If this File_Format should be used for writing to that file, return a configured instance.
for_file_write : File -> Plain_Text_Format | Nothing
for_file_write file = Plain_Text_Format.for_file_read file

## PRIVATE
If the File_Format supports reading from the web response, return a configured instance.
for_web : Text -> URI -> Plain_Text_Format | Nothing
Expand Down Expand Up @@ -127,12 +144,17 @@ type Plain_Text_Format
type Bytes
## PRIVATE
If the File_Format supports reading from the file, return a configured instance.
for_file : File -> Bytes | Nothing
for_file file =
for_file_read : File -> Bytes | Nothing
for_file_read file =
case file.extension of
".dat" -> Bytes
_ -> Nothing

## PRIVATE
If this File_Format should be used for writing to that file, return a configured instance.
for_file_write : File -> Bytes | Nothing
for_file_write file = Bytes.for_file_read file

## PRIVATE
If the File_Format supports reading from the web response, return a configured instance.
As `Bytes`, does not support reading from the web returns `Nothing`.
Expand All @@ -148,13 +170,18 @@ type Bytes
type JSON_Format
## PRIVATE
If the File_Format supports reading from the file, return a configured instance.
for_file : File -> JSON_Format | Nothing
for_file file =
for_file_read : File -> JSON_Format | Nothing
for_file_read file =
case file.extension of
".json" -> JSON_Format
".geojson" -> JSON_Format
_ -> Nothing

## PRIVATE
If this File_Format should be used for writing to that file, return a configured instance.
for_file_write : File -> JSON_Format | Nothing
for_file_write file = JSON_Format.for_file_read file

## PRIVATE
If the File_Format supports reading from the web response, return a configured instance.
for_web : Text -> URI -> JSON_Format | Nothing
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,16 @@ type SQLite_Format

## PRIVATE
If the File_Format supports reading from the file, return a configured instance.
for_file : File -> SQLite_Format | Nothing
for_file file =
for_file_read : File -> SQLite_Format | Nothing
for_file_read file =
expected_header = magic_header_string
got_header = file.read_first_bytes expected_header.length
if got_header == expected_header then SQLite_Format.For_File else Nothing

## PRIVATE
If the File_Format supports writing to the file, return a configured instance.
for_file_write : File -> SQLite_Format | Nothing
for_file_write file =
case file.extension of
".db" -> SQLite_Format.For_File
".sqlite" -> SQLite_Format.For_File
Expand All @@ -31,3 +39,8 @@ type SQLite_Format
read self file on_problems =
_ = [on_problems]
Database.connect (SQLite_Details.SQLite file)

## PRIVATE
Based on the File Format definition at: https://www.sqlite.org/fileformat.html
magic_header_string =
"SQLite format 3".utf_8 + [0]
Original file line number Diff line number Diff line change
Expand Up @@ -155,10 +155,14 @@ type Non_Unique_Primary_Key

Arguments:
- primary_key: The primary key that is not unique.
Error (primary_key : Vector Text)
- clashing_primary_key: The values of an example key that corresponds to
more than one row.
- clashing_example_row_count: The number of rows that correspond to the
example key.
Error (primary_key : Vector Text) (clashing_primary_key : Vector Any) (clashing_example_row_count : Integer)

## PRIVATE
Pretty print the non-unique primary key error.
to_display_text : Text
to_display_text self =
"The primary key " + self.primary_key.to_display_text + " is not unique."
"The primary key " + self.primary_key.to_display_text + " is not unique. The key "+self.clashing_primary_key.to_display_text+" corresponds to "+self.clashing_example_row_count.to_text+" rows."
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ In_Memory_Table.create_database_table self connection table_name=Nothing primary
continue. Otherwise, they could 'leak' to `Panic.rethrow` and be wrongly
raised as panics.
upload_status = create_table_statement.if_not_error <|
translate_known_upload_errors connection resolved_primary_key <|
translate_known_upload_errors self connection resolved_primary_key <|
connection.jdbc_connection.run_within_transaction <|
Panic.rethrow <| connection.execute_update create_table_statement
if structure_only.not then
Expand Down Expand Up @@ -119,7 +119,7 @@ Database_Table.create_database_table self connection table_name=Nothing primary_
Error.throw (Unsupported_Database_Operation.Error "The Database table to be uploaded must be coming from the same connection as the connection on which the new table is being created. Cross-connection uploads are currently not supported. To work around this, you can first `.read` the table into memory and then upload it from memory to a different connection.")

upload_status = connection_check.if_not_error <| create_table_statement.if_not_error <|
translate_known_upload_errors connection resolved_primary_key <|
translate_known_upload_errors self connection resolved_primary_key <|
connection.jdbc_connection.run_within_transaction <|
Panic.rethrow <| connection.execute_update create_table_statement
if structure_only.not then
Expand All @@ -144,15 +144,35 @@ resolve_primary_key table primary_key = case primary_key of
## PRIVATE
Inspects any `SQL_Error` thrown and replaces it with a more precise error
type when available.
translate_known_upload_errors connection primary_key ~action =
translate_known_upload_errors source_table connection primary_key ~action =
handler caught_panic =
error_mapper = connection.dialect.get_error_mapper
sql_error = caught_panic.payload
case error_mapper.is_primary_key_violation sql_error of
True -> Error.throw (Non_Unique_Primary_Key.Error primary_key)
True -> raise_duplicated_primary_key_error source_table primary_key caught_panic
False -> Panic.throw caught_panic
Panic.catch SQL_Error action handler

## PRIVATE
Creates a `Non_Unique_Primary_Key` error containing information about an
example group violating the uniqueness constraint.
raise_duplicated_primary_key_error source_table primary_key original_panic =
agg = source_table.aggregate [Aggregate_Column.Count]+(primary_key.map Aggregate_Column.Group_By)
filtered = agg.filter column=0 (Filter_Condition.Greater than=1)
materialized = filtered.read max_rows=1
case materialized.row_count == 0 of
## If we couldn't find a duplicated key, we give up the translation and
rethrow the original panic containing the SQL error. This could
happen if the constraint violation is on some non-trivial key, like
case insensitive.
True -> Panic.throw original_panic
False ->
row = materialized.first_row.to_vector
example_count = row.first
example_entry = row.drop 1
Error.throw (Non_Unique_Primary_Key.Error primary_key example_entry example_count)


## PRIVATE
Creates a statement that will create a table with structure determined by the
provided columns.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,16 @@ type Image_File_Format

## PRIVATE
If the File_Format supports reading from the file, return a configured instance.
for_file : File -> Image_File_Format | Nothing
for_file file =
for_file_read : File -> Image_File_Format | Nothing
for_file_read file =
extension = file.extension
if supported.contains extension then Image_File_Format.For_File else Nothing

## PRIVATE
If this File_Format should be used for writing to that file, return a configured instance.
for_file_write : File -> Image_File_Format | Nothing
for_file_write file = Image_File_Format.for_file_read file

## PRIVATE
If the File_Format supports reading from the web response, return a configured instance.
for_web : Text -> URI -> Image_File_Format | Nothing
Expand Down
17 changes: 16 additions & 1 deletion distribution/lib/Standard/Table/0.0.0-dev/src/Data/Table.enso
Original file line number Diff line number Diff line change
Expand Up @@ -1660,6 +1660,21 @@ type Table
row_count : Integer
row_count self = self.java_table.rowCount

## Returns a materialized dataframe containing rows of this table.

In the in-memory backend, this returns the same table, truncated to
`max_rows`. This is only kept for API compatibility between database and
in-memory tables. The `read` operation can be used to ensure that the
table is now in-memory, regardless of its origin.

Arguments:
- max_rows: specifies a maximum amount of rows to fetch; if not set, all
available rows are fetched.
read : (Integer | Nothing) -> Table
read self max_rows=Nothing = case max_rows of
Nothing -> self
_ : Integer -> self.take (First max_rows)

## Returns a Table describing this table's contents.

The table lists all columns, counts of non-null items and value types of
Expand Down Expand Up @@ -1913,7 +1928,7 @@ type Table
file = File.new path
case format of
_ : Auto_Detect ->
base_format = format.get_format file
base_format = format.get_writing_format file
if base_format == Nothing then Error.throw (File_Error.Unsupported_Output_Type file Table) else
self.write file format=base_format on_existing_file match_columns on_problems
_ ->
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,14 +54,19 @@ type Delimited_Format
## PRIVATE
ADVANCED
If the File_Format supports reading from the file, return a configured instance.
for_file : File -> Delimited_Format | Nothing
for_file file =
for_file_read : File -> Delimited_Format | Nothing
for_file_read file =
case file.extension of
".csv" -> Delimited_Format.Delimited ','
".tab" -> Delimited_Format.Delimited '\t'
".tsv" -> Delimited_Format.Delimited '\t'
_ -> Nothing

## PRIVATE
If this File_Format should be used for writing to that file, return a configured instance.
for_file_write : File -> Delimited_Format | Nothing
for_file_write file = Delimited_Format.for_file_read file

## PRIVATE
ADVANCED
If the File_Format supports reading from the web response, return a configured instance.
Expand Down
7 changes: 7 additions & 0 deletions distribution/lib/Standard/Table/0.0.0-dev/src/Errors.enso
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,13 @@ polyglot java import org.enso.table.error.EmptySheetException
type Missing_Input_Columns
## PRIVATE
One or more columns not found in the input table.

Arguments:
- criteria: the names of the columns or regular expressions that did not
have any matches.
- where: an optional text describing to which object this error is
related to (for example in join, whether the reported error is for the
left or right table).
Error (criteria : [Text]) (where:Text|Nothing = Nothing)

## PRIVATE
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,12 +48,17 @@ type Excel_Format
## PRIVATE
ADVANCED
If the File_Format supports reading from the file, return a configured instance.
for_file : File -> Excel_Format | Nothing
for_file file =
for_file_read : File -> Excel_Format | Nothing
for_file_read file =
is_xls = should_treat_as_xls_format Infer file
if is_xls.is_error then Nothing else
Excel_Format.Excel xls_format=is_xls

## PRIVATE
If this File_Format should be used for writing to that file, return a configured instance.
for_file_write : File -> Excel_Format | Nothing
for_file_write file = Excel_Format.for_file_read file

## PRIVATE
ADVANCED
If the File_Format supports reading from the web response, return a configured instance.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,8 @@ Error.should_succeed self frames_to_skip=0 =

## Handles an unexpected dataflow error.
Error.should_be_a : Integer -> Any
Error.should_be_a self frames_to_skip=0 =
Error.should_be_a self typ frames_to_skip=0 =
_ = typ
Test.fail_match_on_unexpected_error self 1+frames_to_skip

## Asserts that the given `Boolean` is `True`
Expand Down
6 changes: 3 additions & 3 deletions test/Image_Tests/src/Image_Read_Write_Spec.enso
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,9 @@ spec =

Test.group "Image File_Format" <|
Test.specify "should recognise image files" <|
Auto_Detect.get_format (enso_project.data / "data.jpg") . should_be_a Image_File_Format
Auto_Detect.get_format (enso_project.data / "data.png") . should_be_a Image_File_Format
Auto_Detect.get_format (enso_project.data / "data.bmp") . should_be_a Image_File_Format
Auto_Detect.get_reading_format (enso_project.data / "data.jpg") . should_be_a Image_File_Format
Auto_Detect.get_reading_format (enso_project.data / "data.png") . should_be_a Image_File_Format
Auto_Detect.get_reading_format (enso_project.data / "data.bmp") . should_be_a Image_File_Format

Test.specify "should allow reading an Image" <|
img = Data.read rgba_file
Expand Down
Loading