Asset Inventory : google.api_core.exceptions.BadRequest: 400 Too many fields: 10090 #533

jf-marquis-Adeo · 2020-08-10T03:51:36Z

Hello
Dataflow pipeline is broken since 6th of august due to quota error: Too Many fields 10090!

{
    "insertId": "2411828167005042233:27398:0:130255",
    "jsonPayload": {
      "job": "2020-08-09_12_00_10-9423103018968181521",
      "logger": "root:batchworker.py:do_work",
      "thread": "69:140260486854400",
      "exception": "Traceback (most recent call last):\n  File \"apache_beam/runners/common.py\", line 806, in apache_beam.runners.common.DoFnRunner._invoke_bundle_method\n  File \"apache_beam/runners/common.py\", line 398, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle\n  File \"apache_beam/runners/common.py\", line 402, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle\n  File \"./asset_inventory/import_pipeline.py\", line 422, in finish_bundle\n    raise e\n  File \"./asset_inventory/import_pipeline.py\", line 419, in finish_bundle\n    load_job.result()\n  File \"/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/job.py\", line 733, in result\n    return super(_AsyncJob, self).result(timeout=timeout)\n  File \"/usr/local/lib/python3.7/site-packages/google/api_core/future/polling.py\", line 127, in result\n    raise self._exception\ngoogle.api_core.exceptions.BadRequest: 400 Too many fields: 10090\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py\", line 650, in do_work\n    work_executor.execute()\n  File \"/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py\", line 178, in execute\n    op.finish()\n  File \"apache_beam/runners/worker/operations.py\", line 611, in apache_beam.runners.worker.operations.DoOperation.finish\n  File \"apache_beam/runners/worker/operations.py\", line 612, in apache_beam.runners.worker.operations.DoOperation.finish\n  File \"apache_beam/runners/worker/operations.py\", line 613, in apache_beam.runners.worker.operations.DoOperation.finish\n  File \"apache_beam/runners/common.py\", line 824, in apache_beam.runners.common.DoFnRunner.finish\n  File \"apache_beam/runners/common.py\", line 808, in apache_beam.runners.common.DoFnRunner._invoke_bundle_method\n  File \"apache_beam/runners/common.py\", line 849, in apache_beam.runners.common.DoFnRunner._reraise_augmented\n  File \"/usr/local/lib/python3.7/site-packages/future/utils/__init__.py\", line 421, in raise_with_traceback\n    raise exc.with_traceback(traceback)\n  File \"apache_beam/runners/common.py\", line 806, in apache_beam.runners.common.DoFnRunner._invoke_bundle_method\n  File \"apache_beam/runners/common.py\", line 398, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle\n  File \"apache_beam/runners/common.py\", line 402, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle\n  File \"./asset_inventory/import_pipeline.py\", line 422, in finish_bundle\n    raise e\n  File \"./asset_inventory/import_pipeline.py\", line 419, in finish_bundle\n    load_job.result()\n  File \"/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/job.py\", line 733, in result\n    return super(_AsyncJob, self).result(timeout=timeout)\n  File \"/usr/local/lib/python3.7/site-packages/google/api_core/future/polling.py\", line 127, in result\n    raise self._exception\ngoogle.api_core.exceptions.BadRequest: 400 Too many fields: 10090 [while running 'load_to_bigquery/load_to_bigquery']\n",
      "worker": "adeo-dfty-cloud-asset-imp-08091200-95r9-harness-nvn5",
      "message": "An exception was raised when trying to execute the workitem 5510000210188050724 : Traceback (most recent call last):\n  File \"apache_beam/runners/common.py\", line 806, in apache_beam.runners.common.DoFnRunner._invoke_bundle_method\n  File \"apache_beam/runners/common.py\", line 398, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle\n  File \"apache_beam/runners/common.py\", line 402, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle\n  File \"./asset_inventory/import_pipeline.py\", line 422, in finish_bundle\n    raise e\n  File \"./asset_inventory/import_pipeline.py\", line 419, in finish_bundle\n    load_job.result()\n  File \"/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/job.py\", line 733, in result\n    return super(_AsyncJob, self).result(timeout=timeout)\n  File \"/usr/local/lib/python3.7/site-packages/google/api_core/future/polling.py\", line 127, in result\n    raise self._exception\ngoogle.api_core.exceptions.BadRequest: 400 Too many fields: 10090\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py\", line 650, in do_work\n    work_executor.execute()\n  File \"/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py\", line 178, in execute\n    op.finish()\n  File \"apache_beam/runners/worker/operations.py\", line 611, in apache_beam.runners.worker.operations.DoOperation.finish\n  File \"apache_beam/runners/worker/operations.py\", line 612, in apache_beam.runners.worker.operations.DoOperation.finish\n  File \"apache_beam/runners/worker/operations.py\", line 613, in apache_beam.runners.worker.operations.DoOperation.finish\n  File \"apache_beam/runners/common.py\", line 824, in apache_beam.runners.common.DoFnRunner.finish\n  File \"apache_beam/runners/common.py\", line 808, in apache_beam.runners.common.DoFnRunner._invoke_bundle_method\n  File \"apache_beam/runners/common.py\", line 849, in apache_beam.runners.common.DoFnRunner._reraise_augmented\n  File \"/usr/local/lib/python3.7/site-packages/future/utils/__init__.py\", line 421, in raise_with_traceback\n    raise exc.with_traceback(traceback)\n  File \"apache_beam/runners/common.py\", line 806, in apache_beam.runners.common.DoFnRunner._invoke_bundle_method\n  File \"apache_beam/runners/common.py\", line 398, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle\n  File \"apache_beam/runners/common.py\", line 402, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle\n  File \"./asset_inventory/import_pipeline.py\", line 422, in finish_bundle\n    raise e\n  File \"./asset_inventory/import_pipeline.py\", line 419, in finish_bundle\n    load_job.result()\n  File \"/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/job.py\", line 733, in result\n    return super(_AsyncJob, self).result(timeout=timeout)\n  File \"/usr/local/lib/python3.7/site-packages/google/api_core/future/polling.py\", line 127, in result\n    raise self._exception\ngoogle.api_core.exceptions.BadRequest: 400 Too many fields: 10090 [while running 'load_to_bigquery/load_to_bigquery']\n",
      "stage": "s18",
      "step": "group_written_objects_by_key/Read"
    },

What's wrong? How to correct this

The text was updated successfully, but these errors were encountered:

jf-marquis-Adeo · 2020-08-10T04:07:41Z

the update failed on:

datafusion_googleapis_com_Instance
appengine_googleapis_com_Service
bigtableadmin_googleapis_com_Instance
bigtableadmin_googleapis_com_Cluster

ryanmcdowell · 2020-08-10T04:39:50Z

@bmenasha can you take a look?

bmenasha · 2020-08-10T19:33:25Z

thanks for reporting.

this should be fixed by #535 , the production dataflow template has been updated with this change. let me know if it doesn't resolve the issue.
-b

jf-marquis-Adeo · 2020-08-11T15:28:55Z

I have relauch the pipeline with this parameter :
'type': 'JOB_TYPE_BATCH'}}
'startTime': '2020-08-11T14:25:27.207918Z',
'projectId': 'dfdp-sre-data',
'name': 'adeo-dfty-cloud-asset-import-2020-08-11t14-25-22z',
'location': 'europe-west1',
'id': '2020-08-11_07_25_25-4477138383482487997',
'currentStateTime': '1970-01-01T00:00:00Z',
waiting on pipeline : {'job': {'createTime': '2020-08-11T14:25:27.207918Z',
Attempting refresh to obtain Initial access_token
'write_disposition': 'WRITE_APPEND'}}
'stage': 'gs://dfty-cai2bq/gcpAdeoComTenantInventory/Inventory/stage',
'load_time': '2020-08-11T14:25:22Z',
'input': 'gs://adeo-resource-inventory-dump/ExportAsset_Ressource-2020-08-11.dump',
'group_by': 'ASSET_TYPE',
'parameters': {'dataset': 'all_adeo_asset_inventory',
'jobName': 'adeo-dfty-cloud-asset-import-2020-08-11t14-25-22z',
'zone': 'europe-west1-b'},
'subnetwork': 'https://www.googleapis.com/compute/v1/projects/dfdp-sre-data/regions/europe-west1/subnetworks/dfdp-sre-data-default-subnetwork-west1',
launching template gs://professional-services-tools-asset-inventory/latest/import_pipeline In dfdp-sre-data:europe-west1 with {'environment': {'network': 'dfdp-sre-data-default-network',
<oauth2client.contrib.gce.AppAssertionCredentials object at 0x3ed970902810>
Fic to Load gs://adeo-resource-inventory-dump/ExportAsset_Ressource-2020-08-11.dump
Found File to loadnto BQExportAsset_Ressource-2020-08-11.dump
File Match ExportAsset_Ressource-2020-08-11.dump2020-08-11 14:23:48.899000+00:00
Start Listing objectsn Bucketadeo-resource-inventory-dump
Launching Dataflow for today's dump file
and same problem.
Can you tell me what's wrong ?
thanks a lot

bmenasha · 2020-08-11T16:51:18Z

Marquis, can you please try running this dataflow template gs://professional-services-tools-asset-inventory/test/import_pipeline ?
thanks

jf-marquis-Adeo · 2020-08-11T17:17:49Z

done dataflow job launched with this parameters :
gcloud dataflow jobs run testimport --gcs-location gs://professional-services-tools-asset-inventory/test/import_pipeline --region europe-west1 --network=dfdp-sre-data-default-network --subnetwork=https://www.googleapis.com/compute/v1/projects/dfdp-sre-data/regions/europe-west1/subnetworks/dfdp-sre-data-default-subnetwork-west1 --staging-location gs://dataflow-staging-europe-west1-36683006236 --parameters load_time=2020-08-11T14:25:22,input=gs://adeo-resource-inventory-dump/ExportAsset_Ressource-2020-08-11.dump,stage=gs://dfty-cai2bq/gcpAdeoComTenantInventory/Inventory/stage,group_by=ASSET_TYPE,write_disposition=WRITE_APPEND,dataset=all_adeo_asset_inventory
waiting for result in 1h...

jf-marquis-Adeo · 2020-08-11T18:11:05Z

same result
Expand all | Collapse all{
insertId: "1gqa5ehc67d"
labels: {…}
logName: "projects/dfdp-sre-data/logs/dataflow.googleapis.com%2Fjob-message"
receiveTimestamp: "2020-08-11T18:10:04.533492864Z"
resource: {…}
severity: "ERROR"
textPayload: "Error message from worker: Traceback (most recent call last):
File "apache_beam/runners/common.py", line 806, in apache_beam.runners.common.DoFnRunner._invoke_bundle_method
File "apache_beam/runners/common.py", line 398, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
File "apache_beam/runners/common.py", line 402, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
File "./asset_inventory/import_pipeline.py", line 422, in finish_bundle
raise e
File "./asset_inventory/import_pipeline.py", line 419, in finish_bundle
load_job.result()
File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/job.py", line 733, in result
return super(_AsyncJob, self).result(timeout=timeout)
File "/usr/local/lib/python3.7/site-packages/google/api_core/future/polling.py", line 127, in result
raise self._exception
google.api_core.exceptions.BadRequest: 400 Too many fields: 10136

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 650, in do_work
work_executor.execute()
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 178, in execute
op.finish()
File "apache_beam/runners/worker/operations.py", line 611, in apache_beam.runners.worker.operations.DoOperation.finish
File "apache_beam/runners/worker/operations.py", line 612, in apache_beam.runners.worker.operations.DoOperation.finish
File "apache_beam/runners/worker/operations.py", line 613, in apache_beam.runners.worker.operations.DoOperation.finish
File "apache_beam/runners/common.py", line 824, in apache_beam.runners.common.DoFnRunner.finish
File "apache_beam/runners/common.py", line 808, in apache_beam.runners.common.DoFnRunner._invoke_bundle_method
File "apache_beam/runners/common.py", line 849, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "/usr/local/lib/python3.7/site-packages/future/utils/init.py", line 421, in raise_with_traceback
raise exc.with_traceback(traceback)
File "apache_beam/runners/common.py", line 806, in apache_beam.runners.common.DoFnRunner._invoke_bundle_method
File "apache_beam/runners/common.py", line 398, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
File "apache_beam/runners/common.py", line 402, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
File "./asset_inventory/import_pipeline.py", line 422, in finish_bundle
raise e
File "./asset_inventory/import_pipeline.py", line 419, in finish_bundle
load_job.result()
File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/job.py", line 733, in result
return super(_AsyncJob, self).result(timeout=timeout)
File "/usr/local/lib/python3.7/site-packages/google/api_core/future/polling.py", line 127, in result
raise self._exception
google.api_core.exceptions.BadRequest: 400 Too many fields: 10136 [while running 'load_to_bigquery/load_to_bigquery']

bmenasha · 2020-08-11T20:20:58Z

Thanks Marquis. I think this is caused by WRITE_APPEND combining the columns on the existing table putting us over the 10k column limit. The fixed submitted only kept the current snapshot under 10k.

You should be able to complete a successful import with write_disposition set to WRITE_EMPTY but this will delete your old tables, saving only the most recent snapshot.

To support WRITE_APPEND the import pipeline will need to read the existing schema merge it with the current schema truncating any new columns on the data to import that is over 10k. This should be doable and i'll work to complete that, but it will take a while longer.
thanks

bmenasha · 2020-08-12T14:46:28Z

10,000 columns also seems like a lot, can you send these schema for one of these failing tables to me? ([email protected]). thanks.

bq show all_adeo_asset_inventory.

the logs should also show the bigquery job id of the load job that failed and it too will show the schema. or you can list failed load job ids with

bq ls -j -a | grep -v SUCCESS | grep load

and for the failing job run

bq show --format json -j [the-failing-job-id]

i'm just curious what these 10,000 columns are!
thanks

bmenasha · 2020-08-18T15:49:17Z

Thanks Marquis. The latest change should resolve this but it's not backwards compatible with existing imports and will require importing to a new dateset. I deployed this fix to the test pipeline and will promote to the production one in a few days. Yes it will break any WRITE_APPEND imports but actually those just broke recently due to a cloud run api change so perfect time for it i guess.

jf-marquis-Adeo · 2020-08-30T05:38:29Z

solved by starting a new dataset.

bmenasha mentioned this issue Aug 18, 2020

Translate additionalProperties API properties into arrays. #544

Merged

jf-marquis-Adeo closed this as completed Aug 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asset Inventory : google.api_core.exceptions.BadRequest: 400 Too many fields: 10090 #533

Asset Inventory : google.api_core.exceptions.BadRequest: 400 Too many fields: 10090 #533

jf-marquis-Adeo commented Aug 10, 2020

jf-marquis-Adeo commented Aug 10, 2020

ryanmcdowell commented Aug 10, 2020

bmenasha commented Aug 10, 2020 •

edited

Loading

jf-marquis-Adeo commented Aug 11, 2020 •

edited

Loading

bmenasha commented Aug 11, 2020

jf-marquis-Adeo commented Aug 11, 2020 •

edited

Loading

jf-marquis-Adeo commented Aug 11, 2020

bmenasha commented Aug 11, 2020

bmenasha commented Aug 12, 2020 •

edited

Loading

bmenasha commented Aug 18, 2020

jf-marquis-Adeo commented Aug 30, 2020

Asset Inventory : google.api_core.exceptions.BadRequest: 400 Too many fields: 10090 #533

Asset Inventory : google.api_core.exceptions.BadRequest: 400 Too many fields: 10090 #533

Comments

jf-marquis-Adeo commented Aug 10, 2020

jf-marquis-Adeo commented Aug 10, 2020

ryanmcdowell commented Aug 10, 2020

bmenasha commented Aug 10, 2020 • edited Loading

jf-marquis-Adeo commented Aug 11, 2020 • edited Loading

bmenasha commented Aug 11, 2020

jf-marquis-Adeo commented Aug 11, 2020 • edited Loading

jf-marquis-Adeo commented Aug 11, 2020

bmenasha commented Aug 11, 2020

bmenasha commented Aug 12, 2020 • edited Loading

bmenasha commented Aug 18, 2020

jf-marquis-Adeo commented Aug 30, 2020

bmenasha commented Aug 10, 2020 •

edited

Loading

jf-marquis-Adeo commented Aug 11, 2020 •

edited

Loading

jf-marquis-Adeo commented Aug 11, 2020 •

edited

Loading

bmenasha commented Aug 12, 2020 •

edited

Loading