-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: SQL TIME columns causing jsonschema ValidationError Failed validating 'format': 'time' #1457
Comments
@BuzzCutNorman I think the EDIT: $ check-jsonschema --schemafile <(echo '{"type":"string","format":"time"}') <(echo '"20:20:39.001000"')
Schema validation errors were encountered.
/dev/fd/12::$: '20:20:39.001000' is not a 'time'
$ check-jsonschema --schemafile <(echo '{"type":"string","format":"time"}') <(echo '"20:20:39"')
Schema validation errors were encountered.
/dev/fd/12::$: '20:20:39' is not a 'time'
$ check-jsonschema --schemafile <(echo '{"type":"string","format":"time"}') <(echo '"20:20:39.001000+00:00"')
ok -- validation done
$ check-jsonschema --schemafile <(echo '{"type":"string","format":"time"}') <(echo '"20:20:39.001000Z"')
ok -- validation done |
@edgarrmondragon Thank you for the very handy command and site for validating JSON Schema. I just tired adding the
|
@BuzzCutNorman hmm it may have to do with the version of the |
@edgarrmondragon I am using BuzzCutNorman target-mssql and target-postgres. I will give MeltanoLabs target-postgres a try and see if I get the same result. |
@edgarrmondragon just installed the MeltanoLabs target-postgres and got the same results.
|
@BuzzCutNorman Ok I think I know what's going on. I was testing schemas with the with suppress(ImportError):
from rfc3339_validator import validate_rfc3339
@_checks_drafts(name="date-time")
def is_datetime(instance: object) -> bool:
if not isinstance(instance, str):
return True
return validate_rfc3339(instance.upper())
@_checks_drafts(
draft7="time",
draft201909="time",
draft202012="time",
)
def is_time(instance: object) -> bool:
if not isinstance(instance, str):
return True
return is_datetime("1970-01-01T" + instance) but without the extras, only a naive check is performed: @_checks_drafts(draft3="time", raises=ValueError)
def is_draft3_time(instance: object) -> bool:
if not isinstance(instance, str):
return True
return bool(datetime.datetime.strptime(instance, "%H:%M:%S")) The latter is what's causing validation errors for you. So, can you try installing pip_url: git+https://github.com/MeltanoLabs/target-postgres.git jsonschema[format] should do it. |
@edgarrmondragon I added
|
@BuzzCutNorman We're looking for With original pip URL$ .meltano/loaders/target-postgres/venv/bin/pip list
Package Version
--------------------------- --------------------
appdirs 1.4.4
attrs 22.2.0
backoff 2.2.1
certifi 2022.12.7
cffi 1.15.1
charset-normalizer 3.0.1
click 8.1.3
cryptography 39.0.1
decorator 5.1.1
fs 2.4.16
greenlet 2.0.2
idna 3.4
inflection 0.5.1
joblib 1.2.0
jsonpath-ng 1.5.3
jsonschema 4.17.3
meltanolabs-target-postgres 0.0.4.post10+1859883
memoization 0.4.0
pendulum 2.1.2
pip 23.0.1
ply 3.11
psycopg2-binary 2.9.5
pycparser 2.21
PyJWT 2.6.0
pyrsistent 0.19.3
python-dateutil 2.8.2
python-dotenv 0.21.1
pytz 2022.7.1
pytzdata 2020.1
PyYAML 6.0
requests 2.28.2
setuptools 57.5.0
simplejson 3.18.3
singer-sdk 0.19.0
six 1.16.0
SQLAlchemy 1.4.46
typing_extensions 4.5.0
urllib3 1.26.14
wheel 0.38.4 With jsonschema extras$ .meltano/loaders/target-postgres/venv/bin/pip list
Package Version
--------------------------- --------------------
appdirs 1.4.4
arrow 1.2.3
attrs 22.2.0
backoff 2.2.1
certifi 2022.12.7
cffi 1.15.1
charset-normalizer 3.0.1
click 8.1.3
cryptography 39.0.1
decorator 5.1.1
fqdn 1.5.1
fs 2.4.16
greenlet 2.0.2
idna 3.4
inflection 0.5.1
isoduration 20.11.0
joblib 1.2.0
jsonpath-ng 1.5.3
jsonpointer 2.3
jsonschema 4.17.3
meltanolabs-target-postgres 0.0.4.post10+1859883
memoization 0.4.0
pendulum 2.1.2
pip 23.0.1
ply 3.11
psycopg2-binary 2.9.5
pycparser 2.21
PyJWT 2.6.0
pyrsistent 0.19.3
python-dateutil 2.8.2
python-dotenv 0.21.1
pytz 2022.7.1
pytzdata 2020.1
PyYAML 6.0
requests 2.28.2
rfc3339-validator 0.1.4
rfc3987 1.3.8
setuptools 57.5.0
simplejson 3.18.3
singer-sdk 0.19.0
six 1.16.0
SQLAlchemy 1.4.46
typing_extensions 4.5.0
uri-template 1.2.0
urllib3 1.26.14
webcolors 1.12
wheel 0.38.4 |
@edgarrmondragon I can confirm that the target-postgres pip list
Original
+00:00 Version
Z Version
|
@BuzzCutNorman thanks for the detailed tracebacks. I think I know what's going on. We need to do change the validator instance: - Draft7Validator(schema, format_checker=FormatChecker())
+ Draft7Validator(schema, format_checker=Draft7Validator.FORMAT_CHECKER) for Problem is, that then fails for I think we instead need to use a custom formatter or even let developers override the formatters for |
@edgarrmondragon that is a bit of a pickle. I am a little confused on what is a tap responsible for and what is a target responsible for when it comes to conforming records message data fields. Is it all SDK taps should give properly formatted JSON Schema data and all SDK targets are responsible for conforming JSON schema format to an acceptable target format if necessary? tap pull -> tap conform -> tap pass to target -> target ingest -> target validate -> target conform -> target load EDIT:
I would adjust the custom formatter to allow Does this match up with what you were suggesting above? |
@edgarrmondragon I added an import for the
I have checked all four time formats :
by running them through the following validator, validator format checker combos.
All four time formats passed validation for each combo listed above 😮💨🎉. Thank you 🙏 for taking time to track down the source of this issue.
I agree an easier way for developers to override would be a nice touch. |
@BuzzCutNorman Thanks for sharing your findings! This is great insight so I started a draft PR to support both custom validators and format checkers: |
@edgarrmondragon I tested the draft PR out with |
These examples give me pause. It makes sense that devs would like pairing validators and checkers like that, and my current implementation forces them to override both methods. I'll give it some thought over the weekend. |
Singer SDK Version
0.19.0
Is this a regression?
Python Version
3.9
Bug scope
Taps (catalog, state, etc.)
Operating System
Windows
Description
Error Info
Issue with
TIME
columns returning time format which is not passing JSON schema validation when sent to a target. This error was found runningmeltano run tap-mssql target-mssql
when the tap is reading from thetestdata
database tableTestTable
. The columnTestColumn1
which is a MS SQL data type ofTIME
is the offending column.The TestTable was created with the following script. It contains only one row and one column.
findings
If I edit the value in the TestColumn1 to be
'20:20:39'
validation passes. I also ran into this issue when generating batch messages. The fix I chose to resolve the issue for batch message was to extend the default JSONEncoder used byjson.dump()
with the following codeI attempted to overload the
_conform_primitive_property
method but did not successfully accomplish a working overload.There is a little more info in the issue I raised in the
buzzcutnorman--tap-mssql
repository.BuzzCutNorman/tap-mssql#25
Code
The text was updated successfully, but these errors were encountered: