Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix decimal precision (decimal.InvalidOperation decimal.DivisionImpossible error) #207

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

rabidaudio
Copy link

@rabidaudio rabidaudio commented Aug 16, 2021

Problem

Error message (this is from target-snowflake but the problem is in the shared code):

target-snowflake   | CRITICAL [<class 'decimal.DivisionImpossible'>]
target-snowflake   | INFO MillisLoggingCursor: 72 millis spent executing: ROLLBACK
target-snowflake   | Traceback (most recent call last):
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/bin/target-snowflake", line 8, in <module>
target-snowflake   |     sys.exit(cli())
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_snowflake/__init__.py", line 57, in cli
target-snowflake   |     main(args.config)
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_snowflake/__init__.py", line 51, in main
target-snowflake   |     target_tools.main(target)
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_postgres/target_tools.py", line 28, in main
target-snowflake   |     stream_to_target(input_stream, target, config=config)
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_postgres/target_tools.py", line 77, in stream_to_target
target-snowflake   |     raise e
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_postgres/target_tools.py", line 64, in stream_to_target
target-snowflake   |     line
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_postgres/target_tools.py", line 141, in _line_handler
target-snowflake   |     state_tracker.handle_record_message(line_data['stream'], line_data)
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_postgres/stream_tracker.py", line 63, in handle_record_message
target-snowflake   |     self.streams[stream].add_record_message(line_data)
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_postgres/singer_stream.py", line 145, in add_record_message
target-snowflake   |     self.validator.validate(record_message['record'])
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/jsonschema/validators.py", line 129, in validate
target-snowflake   |     for error in self.iter_errors(*args, **kwargs):
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/jsonschema/validators.py", line 105, in iter_errors
target-snowflake   |     for error in errors:
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/jsonschema/_validators.py", line 304, in properties_draft4
target-snowflake   |     schema_path=property,
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/jsonschema/validators.py", line 121, in descend
target-snowflake   |     for error in self.iter_errors(instance, schema):
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/jsonschema/validators.py", line 105, in iter_errors
target-snowflake   |     for error in errors:
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/jsonschema/_validators.py", line 127, in multipleOf
target-snowflake   |     failed = instance % dB
target-snowflake   | decimal.InvalidOperation: [<class 'decimal.DivisionImpossible'>]
meltano            | Loading failed (1): decimal.InvalidOperation: [<class 'decimal.DivisionImpossible'>]
meltano            | ELT could not be completed: Loader failed
ELT could not be completed: Loader failed

The problem is that Singer's offical tap-postgres uses minimum, maximum, and multipleOf to effectively report the scale of the column.

For example,

{
  "type": "SCHEMA",
  "stream": "foobar",
  "schema": {
    "type": "object",
    "properties": {
      "id": {
        "type": ["integer"],
        "minimum": -2147483648,
        "maximum": 2147483647
      },
      "sample_numeric": {
        "type": ["null", "number"],
        "exclusiveMaximum": true,
        "maximum": 100000000000000000000000000000000000000000000000000000000000000,
        "multipleOf": 1E-38,
        "exclusiveMinimum": true,
        "minimum": -100000000000000000000000000000000000000000000000000000000000000
      }
    }
  },
  "key_properties": ["id"],
  "bookmark_properties": []
}

{
  "type": "RECORD",
  "stream": "foobar",
  "record": {
    "id": 1,
    "sample_numeric": 0.000913808181253534
  },
  "version": 1596488217992,
  "time_extracted": "2020-08-03T20:56:57.992931Z"
}

This breaks JSON schema validation as when validating multipleOf it tries to do Decimal('0.000913808181253534') % Decimal('1E-38'), as the default decimal precision in python is too small (28 I believe).

Solution

This is a well-known problem that has bitten several targets, for which there are several solutions. One is simply to set the precision to something arbitrarily high, like 40. Another, which the pipelinewise-target-postgres does, is to simply not allow precision higher than some threshold.

Here, I ported a solution I wrote for meltano/target-postgres which sets Python's decimal precision to as large as it needs to be to match the schema. This solution was later ported to meltano/target-snowflake. Open to other solutions though. I would request that any solution also be ported to datamill-co/target-snowflake.

@ghost
Copy link

ghost commented Aug 17, 2021

Why not making the default precision configurable in the target configuration?
Does this PR make sure that the column precision is changed e.g. when a new tap-version sends data with a higher precision?

@rabidaudio
Copy link
Author

Why not making the default precision configurable in the target configuration?

That would be a fine solution too. I can send an updated PR for this approach if you like.

Does this PR make sure that the column precision is changed e.g. when a new tap-version sends data with a higher precision?

This doesn't effect the database column types, only the in-python JSON Schema validation of the incoming data. But if the schema was updated with new precision then this code would recognize it.

@ghost
Copy link

ghost commented Aug 20, 2021

A PR for a config option would be great 👍

@rabidaudio
Copy link
Author

@hz-lschick done 👍

@r-nyq
Copy link

r-nyq commented Mar 7, 2022

I seem to have a common problem with decimal.InvalidOperation: [<class 'decimal.DivisionImpossible'>]
Is there a plan to merge this?

@ericboucher
Copy link
Collaborator

Hi @r-nyq we are looking into it and will try to merge it soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants