-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Destination acceptance tests: verify that a destination is able to write any ISO8601-compliant date string #5712
Comments
STEP 1: ✅
STEP 2: ✅ Identified + made the
|
Format | Data Type | JsonSchema Type | JsonSchema Format | Example |
---|---|---|---|---|
YYYY-M-D |
string |
string |
date |
"2021-1-1" |
YYYY-M-D hh:mm:ss |
string |
string |
date-time |
"2021-1-1 01:01:01" |
YYYY-M-D hh:mm:ss Z |
string |
string |
date-time |
"2021-1-1 01:01:01 +1" |
YYYY-M-DThh:mm:ss Z |
string |
string |
date-time |
"2021-1-1T01:01:01 +1:00" |
======= | ======= | ======= | ======= | ======= |
YYYY-MM-DD |
string |
string |
date |
"2021-01-01" |
YYYY-MM-DD hh:mm:ss |
string |
string |
date-time |
"2021-01-01 01:01:01" |
YYYY-MM-DD hh:mm:ss Z |
string |
string |
date-time |
"2021-01-01 01:01:01 +0000" |
YYYY-MM-DDThh:mm:ssZ |
string |
string |
date-time |
"2021-01-01T01:01:01Z" "2021-01-01T01:01:01-01:00" |
YYYY-MM-DDThh:mm:ss Z |
string |
string |
date-time |
"2021-01-01T01:01:01 +1:00" |
INVALID DATE-TIME formats
Table with INVALID Date-Time formats
for JsonSchema
This examples will lead to validation error
in json schema, as well as Timestamp 'your_timestamp' is not recognized
inside Snowflake
as a destination.
Format | Data Type | JsonSchema Type | JsonSchema Format | Example |
---|---|---|---|---|
YYYY-MM-DD hh:mm:ss Z |
string |
string |
date-time |
"2021-01-01 01:01:01 UTC" |
YYYY-MM-DDThh:mm:ss Z |
string |
string |
date-time |
"2021-01-01T01:01:01 UTC" "2021-01-01T01:01:01 +1" "2021-01-01T01:01:01 +0000" |
YYYY-MM-DDThh:mm:ssZ |
string |
string |
date-time |
"2021-01-01T01:01:01+0000" "2021-01-01T01:01:01UTC" "2021-01-01T01:01:01+1" |
TESTS ON Snowflake Destination
The tests are made with the single conversion statement:
SELECT to_timestamp_tz('input_string_with_possible_datetime')
and the output is the result of the test, indicated in the tables bellow.
Table with tests on Snowflake
as a destination
VALID DATE-TIME formats
Input String | Format | Snowflake Output |
---|---|---|
"2021-1-1" |
YYYY-M-D |
2021-01-01 00:00:00.000 -0800 |
"2021-1-1 01:01:01" |
YYYY-M-D hh:mm:ss |
2021-01-01 01:01:01.000 -0800 |
"2021-1-1 01:01:01 +1" |
YYYY-M-D hh:mm:ss Z |
2021-01-01 01:01:01.000 +0100 |
"2021-1-1T01:01:01 +1:00" |
YYYY-M-DThh:mm:ss Z |
2021-01-01 01:01:01.000 +0100 |
"2021-01-01" |
YYYY-MM-DD |
2021-01-01 00:00:00.000 -0800 |
"2021-01-01 01:01:01" |
YYYY-MM-DD hh:mm:ss |
2021-01-01 01:01:01.000 -0800 |
"2021-01-01 01:01:01 +0000" |
YYYY-MM-DD hh:mm:ss Z |
2021-01-01 01:01:01.000 +0000 |
"2021-01-01T01:01:01Z" |
YYYY-MM-DDThh:mm:ssZ |
2021-01-01 01:01:01.000 +0000 |
"2021-01-01T01:01:01-01:00" |
YYYY-MM-DDThh:mm:ssZ |
2021-01-01 01:01:01.000 -0100 |
"2021-01-01T01:01:01 +1:00" |
YYYY-MM-DDThh:mm:ss Z |
2021-01-01 01:01:01.000 +0100 |
INVALID DATE-TIME formats
Input String | Format | Snowflake Output |
---|---|---|
"2021-01-01 01:01:01 UTC" |
YYYY-MM-DD hh:mm:ss Z |
Timestamp '2021-01-01 01:01:01 UTC' is not recognized |
"2021-01-01T01:01:01 UTC" |
YYYY-MM-DDThh:mm:ss Z |
Timestamp '2021-01-01T01:01:01 UTC' is not recognized |
"2021-01-01T01:01:01 +1" |
YYYY-MM-DDThh:mm:ss Z |
Timestamp '2021-01-01T01:01:01 +1' is not recognized |
"2021-01-01T01:01:01 +0000" |
YYYY-MM-DDThh:mm:ss Z |
Timestamp '2021-01-01T01:01:01 +0000' is not recognized |
"2021-01-01T01:01:01+0000" |
YYYY-MM-DDThh:mm:ssZ |
Timestamp '2021-01-01T01:01:01+0000' is not recognized |
"2021-01-01T01:01:01UTC" |
YYYY-MM-DDThh:mm:ssZ |
Timestamp '2021-01-01T01:01:01UTC' is not recognized |
"2021-01-01T01:01:01+1" |
YYYY-MM-DDThh:mm:ssZ |
Timestamp '2021-01-01T01:01:01+1' is not recognized |
STEP 3, 4, 5: ❌
-
Snowflake
DestinationAcceptanceTest
fails ondate
assertion with the following messages:SnowflakeInsertDestination:
SnowflakeInsertDestinationAcceptanceTest > [2] edge_case_messages.txt, edge_case_catalog.json FAILED org.opentest4j.AssertionFailedError: expected: <"2021-01-01"> but was: <"2021-01-01T00:00:00Z">
SnowflakeGcsCopyDestination:
SnowflakeGcsCopyDestinationAcceptanceTest > [2] edge_case_messages.txt, edge_case_catalog.json FAILED org.opentest4j.AssertionFailedError: expected: <"2021-01-01"> but was: <"2021-01-01T00:00:00Z">
SnowflakeS3CopyDestination:
SnowflakeS3CopyDestinationAcceptanceTest > [2] edge_case_messages.txt, edge_case_catalog.json FAILED org.opentest4j.AssertionFailedError: expected: <"2021-01-01"> but was: <"2021-01-01T00:00:00Z">
Which corresponds to the input of:
"2021-01-01"
or "2021-1-1"
of format: "date"
Perhaps, the underlying convertion tries to use Snowflakes to_timestamp_tz()
method, instead of to_date()
in this case or it's something else inside of DestinationAcceptanceTest
.
-
Normalisation
on Snowflake fails with the following messages:SnowflakeInsertDestination:
SnowflakeInsertDestinationAcceptanceTest > [2] edge_case_messages.txt, edge_case_catalog.json FAILED io.airbyte.workers.WorkerException: Normalization Failed.
SnowflakeGcsCopyDestination:
SnowflakeGcsCopyDestinationAcceptanceTest > [2] edge_case_messages.txt, edge_case_catalog.json FAILED io.airbyte.workers.WorkerException: Normalization Failed.
SnowflakeS3CopyDestination:
SnowflakeS3CopyDestinationAcceptanceTest > [2] edge_case_messages.txt, edge_case_catalog.json FAILED io.airbyte.workers.WorkerException: Normalization Failed.
THE FULL HTML FILE
WITH COLLECTED LOGS IS HERE:
SnowflakeDestination_IntegrationTest.zip
STEP 6: ✅ PR is here: #5814
STEP 7: In progress.
The following integration tests are failed on the normalization phase with error like: invalid datetime '1/1/2021':
These two tests are failed with the message like: Invalid format: "20210101T0930" is malformed at "30". It should be fixed on the Java destination side (
|
Are we sure that all the valid formats listed here are actually valid? For example: When we add these datetime strings back, please make sure that there is a Java or Python parser that can correctly distinguish between the valid and invalid ones. By the way @VitaliiMaltsev already implemented one that is capable to handle lots of them in here with various tests. We should have something similar in the main repo. |
Implementation hints
the destination acceptance test suite, among other things, inserts pre-defined records into each destination to verify that it works correctly. See this directory for the records it inserts and the catalogs that define the records/stream schemas. As part of this, we want to enumerate all compliant ISO8601 formats and insert a few records which conform to each formats (one or two per format is fine). Then we should expect destinations which aren't able to handle all formats (like snowflake) to start failing acceptance tests. To do this we'll probably want to do something like the following:
./gradlew :airbyte-integrations:connectors:destination-snowflake:integrationTest
(you'll need to place the config in lastpass in thesecrets/
directory for this to work)date
anddate-time
date string formats (i.e; what are all the acceptable string patterns for each of these formats?). It is probably helpful to list them in a text file in the directory linked above, just for reference e.g: just write out all the formats in aformats_reference.txt
file.edge_cases_catalog
which has two string fields whoseformat
fields are set todate
anddate-time
respectively.The text was updated successfully, but these errors were encountered: