Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loader is unable to cope with JSON column from postgres #39

Open
lukeasrodgers opened this issue May 4, 2023 · 4 comments
Open

Loader is unable to cope with JSON column from postgres #39

lukeasrodgers opened this issue May 4, 2023 · 4 comments

Comments

@lukeasrodgers
Copy link

lukeasrodgers commented May 4, 2023

I am trying to use this target (via meltano, using most recent releases of both) to process data from postgres (https://github.com/transferwise/pipelinewise-target-postgres), specifically from a table that has two JSON columns, object and object_changes. The JSON in these columns will always be an object, but otherwise the structure can vary drastically.

I've tried to get this table loaded into bigquery using both the denormalized and generate_view options separately, but both approaches fail in different ways. If I set both denormalized and generate_view to false, I'm able to load the data fine. I can of course generate my own view, or transform the data in dbt downstream, but would really like to figure out what the issue is here.

When using generate_view I get this failure:

2023-05-04T02:04:00.706411Z [info     ] 2023-05-04 02:04:00,706 | INFO     | target-bigquery      | Using thread-based parallelism cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:00.707149Z [info     ] 2023-05-04 02:04:00,706 | INFO     | target-bigquery      | Target 'target-bigquery' is listening for input from tap. cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:00.707611Z [info     ] 2023-05-04 02:04:00,706 | INFO     | target-bigquery      | Initializing 'target-bigquery' target sink... cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:00.708052Z [info     ] 2023-05-04 02:04:00,706 | INFO     | target-bigquery      | Initializing target sink for stream 'public-versions'... cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.961363Z [info     ] Traceback (most recent call last): cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.962093Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/bin/target-bigquery", line 8, in <module> cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.962556Z [info     ]     sys.exit(TargetBigQuery.cli()) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.962999Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/click/core.py", line 1130, in __call__ cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.963615Z [info     ]     return self.main(*args, **kwargs) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.964072Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/click/core.py", line 1055, in main cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.964468Z [info     ]     rv = self.invoke(ctx)      cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.964854Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.965254Z [info     ]     return ctx.invoke(self.callback, **ctx.params) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.965644Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/click/core.py", line 760, in invoke cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.966083Z [info     ]     return __callback(*args, **kwargs) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.966501Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/singer_sdk/target_base.py", line 578, in cli cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.967015Z [info     ]     target.listen(file_input)  cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.967543Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/singer_sdk/io_base.py", line 34, in listen cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.967970Z [info     ]     self._process_lines(file_input) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.968360Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/singer_sdk/target_base.py", line 278, in _process_lines cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.968765Z [info     ]     counter = super()._process_lines(file_input) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.969151Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/singer_sdk/io_base.py", line 78, in _process_lines cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.969534Z [info     ]     self._process_schema_message(line_dict) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.969959Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/singer_sdk/target_base.py", line 378, in _process_schema_message cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.970344Z [info     ]     _ = self.get_sink(         cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.970739Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/target_bigquery/target.py", line 482, in get_sink cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.971335Z [info     ]     return self.add_sink(stream_name, schema, key_properties) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.971723Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/singer_sdk/target_base.py", line 240, in add_sink cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.972130Z [info     ]     sink = sink_class(         cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.972509Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/target_bigquery/storage_write.py", line 270, in __init__ cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.972892Z [info     ]     super().__init__(target, stream_name, schema, key_properties) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.973376Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/target_bigquery/core.py", line 303, in __init__ cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.973762Z [info     ]     self.create_target(key_properties=key_properties) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.974171Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/tenacity/__init__.py", line 289, in wrapped_f cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.974874Z [info     ]     return self(f, *args, **kw) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.975252Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/tenacity/__init__.py", line 379, in __call__ cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.975631Z [info     ]     do = self.iter(retry_state=retry_state) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.976025Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/tenacity/__init__.py", line 314, in iter cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.976451Z [info     ]     return fut.result()        cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.977003Z [info     ]   File "/usr/lib/python3.9/concurrent/futures/_base.py", line 439, in result cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.977393Z [info     ]     return self.__get_result() cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.977812Z [info     ]   File "/usr/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.978215Z [info     ]     raise self._exception      cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.978792Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/tenacity/__init__.py", line 382, in __call__ cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.979256Z [info     ]     result = fn(*args, **kwargs) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.979698Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/target_bigquery/core.py", line 471, in create_target cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.980238Z [info     ]     self.client.query(         cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.980656Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/google/cloud/bigquery/job/query.py", line 1520, in result cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.981057Z [info     ]     do_get_result()            cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.981451Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/google/api_core/retry.py", line 349, in retry_wrapped_func cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.981888Z [info     ]     return retry_target(       cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.982290Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/google/api_core/retry.py", line 191, in retry_target cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.982706Z [info     ]     return target()            cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.983104Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/google/cloud/bigquery/job/query.py", line 1510, in do_get_result cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.983496Z [info     ]     super(QueryJob, self).result(retry=retry, timeout=timeout) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.983916Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/google/cloud/bigquery/job/base.py", line 911, in result cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.984311Z [info     ]     return super(_AsyncJob, self).result(timeout=timeout, **kwargs) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.984747Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/google/api_core/future/polling.py", line 261, in result cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.985153Z [info     ]     raise self._exception      cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.985536Z [info     ] google.api_core.exceptions.BadRequest: 400 Invalid cast from STRING to JSON at [21:24] cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.985953Z [info     ]                                cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.986332Z [info     ] Location: US                   cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.986733Z [info     ] Job ID: b3f85e6c-6d4a-42c0-b1da-c60b00cdd63e cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:04:06.987114Z [info     ]                                cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery

When using denormalized I get this failure:

2023-05-04T02:10:41.607175Z [info     ] 2023-05-04 02:10:41,606 | INFO     | target-bigquery      | Using thread-based parallelism cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:41.607936Z [info     ] 2023-05-04 02:10:41,607 | INFO     | target-bigquery      | Target 'target-bigquery' is listening for input from tap. cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:41.608446Z [info     ] 2023-05-04 02:10:41,607 | INFO     | target-bigquery      | Initializing 'target-bigquery' target sink... cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:41.608919Z [info     ] 2023-05-04 02:10:41,607 | INFO     | target-bigquery      | Initializing target sink for stream 'public-versions'... cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.483950Z [info     ] 2023-05-04 02:10:47,483 | INFO     | target-bigquery      | Setting up public-versions cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.484296Z [info     ] 2023-05-04 02:10:47,483 | WARNING  | target-bigquery      | ACTIVATE_VERSION message received but not implemented by this target. Ignoring. cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.484673Z [info     ] Traceback (most recent call last): cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.485154Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/google/protobuf/json_format.py", line 595, in _ConvertFieldValuePair cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.485679Z [info     ]     raise ParseError('repeated field {0} must be in [] which is ' cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.486162Z [info     ] google.protobuf.json_format.ParseError: repeated field object must be in [] which is {'id': 91, 'created_at': '2015-05-22T17:21:04.739Z', 'updated_at': '2015-05-22T17:21:04.739Z', 'confirmed_by': None, 'dispatched_at': None, 'dispatcher_id': None, 'subscription_id': 28, 'confirmation_token': None, 'schedu
led_send_date': '2015-06-21', 'confirmation_sent_at': None, 'confirmed_delivered_at': None, 'confirmation_received_at': None} at AnonymousProto_94efc93b560275747c1d98a9eb9c2e6a45baa220 cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.486626Z [info     ]                                cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.487030Z [info     ] The above exception was the direct cause of the following exception: cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.487403Z [info     ]                                cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.487777Z [info     ] Traceback (most recent call last): cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.488149Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/bin/target-bigquery", line 8, in <module> cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.488542Z [info     ]     sys.exit(TargetBigQuery.cli()) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.488907Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/click/core.py", line 1130, in __call__ cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.489275Z [info     ]     return self.main(*args, **kwargs) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.489657Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/click/core.py", line 1055, in main cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.490070Z [info     ]     rv = self.invoke(ctx)      cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.490462Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.490856Z [info     ]     return ctx.invoke(self.callback, **ctx.params) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.491258Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/click/core.py", line 760, in invoke cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.491661Z [info     ]     return __callback(*args, **kwargs) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.492102Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/singer_sdk/target_base.py", line 578, in cli cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.492550Z [info     ]     target.listen(file_input)  cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.493121Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/singer_sdk/io_base.py", line 34, in listen cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.493505Z [info     ]     self._process_lines(file_input) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.493926Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/singer_sdk/target_base.py", line 278, in _process_lines cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.494303Z [info     ]     counter = super()._process_lines(file_input) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.494725Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/singer_sdk/io_base.py", line 81, in _process_lines cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.495119Z [info     ]     self._process_record_message(line_dict) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.495494Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/singer_sdk/target_base.py", line 326, in _process_record_message cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.495865Z [info     ]     sink.process_record(transformed_record, context) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.496237Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/target_bigquery/storage_write.py", line 299, in process_record cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.496619Z [info     ]     json_format.ParseDict(record, self.proto_schema()).SerializeToString() cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.496986Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/google/protobuf/json_format.py", line 471, in ParseDict cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.497353Z [info     ]     parser.ConvertMessage(js_dict, message, '') cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.497724Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/google/protobuf/json_format.py", line 512, in ConvertMessage cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.498126Z [info     ]     self._ConvertFieldValuePair(value, message, path) cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.498486Z [info     ]   File "/home/ubuntu/meltano-elt/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/google/protobuf/json_format.py", line 637, in _ConvertFieldValuePair cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.498847Z [info     ]     raise ParseError(          cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-04T02:10:47.499220Z [info     ] google.protobuf.json_format.ParseError: Failed to parse object field: repeated field object must be in [] which is {'id': 91, 'created_at': '2015-05-22T17:21:04.739Z', 'updated_at': '2015-05-22T17:21:04.739Z', 'confirmed_by': None, 'dispatched_at': None, 'dispatcher_id': None, 'subscription_id': 28, 'conf
irmation_token': None, 'scheduled_send_date': '2015-06-21', 'confirmation_sent_at': None, 'confirmed_delivered_at': None, 'confirmation_received_at': None} at AnonymousProto_94efc93b560275747c1d98a9eb9c2e6a45baa220. cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery

I have deleted the bigquery table and meltano state/runs data between invocations, for a clean slate.

I may be misunderstanding the limits of these two different approaches, but it does seem like it should be possible to load postgres JSON columns -> bigquery JSON columns.

Perhaps relatedly, when using denormalized the schema that is created in bigquery specifies the mode of both the JSON columns as REPEATED when I think it should be NULLABLE, but see no way to configure this. I manually edited the schema in bigquery, changing their mode to NULLABLE but this didn't fix the problem this loader is encountering - probably because the issue is between this library and the protobuf library, not specifically when hitting the bigquery API.

@davert0
Copy link

davert0 commented Jul 7, 2023

I faced the same Issue

@z3z1ma
Copy link
Owner

z3z1ma commented Jul 7, 2023

We should update the jsonschema type translation to properly handle object types. We inherited the translation from the old target bigquery.

The only tricky part is that some load methods will load json from a complex object, other methods need the contents of the key to be stringified.

@thijsbrouwers
Copy link

I'm facing the same issue. Any update or fix on this? Thanks!

@buu-nguyen
Copy link
Contributor

buu-nguyen commented Jan 22, 2024

After investigating, I believe the main issue stems from the JSON type column being recognized as both object and array by tap-postgres (source).
This ambiguity leads to a conflict in our jsonschema type translation, which checks array first, resulting in the JSON field being treated with a REPEATED mode.
Therefore, I suggest modifying the fallback SchemaField of array in target-bigquery from REPEATED mode to NULLABLE. This change makes sense because BigQuery's JSON already supports arrays, and JSON fields can handle semi-structured loading as outlined here.
What are your thoughts on this, @z3z1ma ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants