-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BadRequestError: failed to parse field [indexed_document_volume] of type [integer] #735
Comments
cc @wangch079 |
Hi @prashant-elastic , may I know which version/branch you are running? |
We checked on |
Also please note that, The issue is occurring when we are working with the larger data [~10 GB & ~49000 objects]. |
Can I get a copy of the data set? |
shared you on slack 1:1 |
@wangch079 Is there any update on this issue? |
I can reproduce this on 8.7.0 against a large-ish mySQL dataset here.
@danajuratoni Would be nice to address this one before GA. |
I think we're overflowing the integer field for |
As @artem-shelkovnikov pointed out, we use cc. @danajuratoni This will make any connector trying to sync any source with more than 2 GB of data fail in Ruby (since 8.6) and Python (since 8.7.1). do you think we should document it? 8.7.1 is not released yet, but I don't think this can be considered a blocker. We could fix it in 8.8 |
For @ppf2 reported issue, this is because the job is not seeing any update for more than 60 seconds (supposed to receive a heartbeat every 10 seconds), and is marked as This can happen when the job reporting task got no chance to run for more than 60 seconds, which is rare. I will look into this issue separately. |
## Summary ### Part of elastic/connectors#735 The field `indexed_document_volume` (in bytes) in `.elastic-connectors-sync-jobs` is of type `integer`, which can hold a maximum value of `2^31-1`, which is equivalent to 2-ish GB. This PR changes it to `unsigned_long`, which can hold a maximum value of `2^64-1`, which is equivalent to 18-ish Exa Bytes (1 Exa Byte = 1000 PB). ### Checklist Delete any items that are not applicable to this PR. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
Based on this slack thread, I'm reverting the above 3 PRs.
Instead, in a separate set of PRs, Chenhui and I will change these fields from representing
|
Regarding this issue: #735 (comment), I tested locally but I can't reproduce it. I guess somehow the sync was stuck somewhere for more than 60 seconds, causing the job marked as idle. |
## Part of elastic/connectors#735 ## Summary The field type for `indexed_document_volume` is `integer`, which can only represent about 2GB worth of "bytes". To be able to support syncing with larger datasets, `indexed_document_volume` is updated to store the size in `MebiBytes`. This PR makes sure the size is rendered correctly in UI. ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
Close this issue as all the fixes have been merged. |
Bug Description
BadRequestError: 400, 'mapper_parsing_exception' failed to parse field [indexed_document_volume] of type [integer]
To Reproduce
Steps to reproduce the behavior:
Expected behavior
All sharepoint documents should be successfully indexed in Elasticsearch
Actual behavior
BadRequestError: 400, 'mapper_parsing_exception' failed to parse field [indexed_document_volume] of type [integer]
Screenshots
Environment
Additional context
[FMWK][12:16:55][INFO] Fetcher <create: 49099 |update: 0 |delete: 0>
Exception in callback ConcurrentTasks._callback(result_callback=None)(<Task finishe...tatus': 400})>)
handle: <Handle ConcurrentTasks._callback(result_callback=None)(<Task finishe...tatus': 400})>)>
Traceback (most recent call last):
File "/home/ubuntu/es-connectors/connectors/sync_job_runner.py", line 131, in execute
await self._sync_done(sync_status=sync_status, sync_error=fetch_error)
File "/home/ubuntu/es-connectors/connectors/sync_job_runner.py", line 170, in _sync_done
await self.sync_job.done(ingestion_stats=ingestion_stats)
File "/home/ubuntu/es-connectors/connectors/byoc.py", line 237, in done
await self._terminate(
File "/home/ubuntu/es-connectors/connectors/byoc.py", line 275, in _terminate
await self.index.update(doc_id=self.id, doc=doc)
File "/home/ubuntu/es-connectors/connectors/es/index.py", line 72, in update
await self.client.update(
File "/home/ubuntu/es-connectors/lib/python3.10/site-packages/elasticsearch/_async/client/init.py", line 4513, in update
return await self.perform_request( # type: ignore[return-value]
File "/home/ubuntu/es-connectors/lib/python3.10/site-packages/elasticsearch/_async/client/_base.py", line 321, in perform_request
raise HTTP_EXCEPTIONS.get(meta.status, ApiError)(
elasticsearch.BadRequestError: BadRequestError(400, 'mapper_parsing_exception', "failed to parse field [indexed_document_volume] of type [integer] in document with id 'XmfoS4cBVSm7nRdw6PPq'. Preview of field's value: '13167265024'")
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/home/ubuntu/es-connectors/connectors/utils.py", line 318, in _callback
raise task.exception()
File "/home/ubuntu/es-connectors/connectors/sync_job_runner.py", line 135, in execute
await self._sync_done(sync_status=JobStatus.ERROR, sync_error=e)
File "/home/ubuntu/es-connectors/connectors/sync_job_runner.py", line 164, in _sync_done
await self.sync_job.fail(sync_error, ingestion_stats=ingestion_stats)
File "/home/ubuntu/es-connectors/connectors/byoc.py", line 242, in fail
await self._terminate(
File "/home/ubuntu/es-connectors/connectors/byoc.py", line 275, in _terminate
await self.index.update(doc_id=self.id, doc=doc)
File "/home/ubuntu/es-connectors/connectors/es/index.py", line 72, in update
await self.client.update(
File "/home/ubuntu/es-connectors/lib/python3.10/site-packages/elasticsearch/_async/client/init.py", line 4513, in update
return await self.perform_request( # type: ignore[return-value]
File "/home/ubuntu/es-connectors/lib/python3.10/site-packages/elasticsearch/_async/client/_base.py", line 321, in perform_request
raise HTTP_EXCEPTIONS.get(meta.status, ApiError)(
elasticsearch.BadRequestError: BadRequestError(400, 'mapper_parsing_exception', "failed to parse field [indexed_document_volume] of type [integer] in document with id 'XmfoS4cBVSm7nRdw6PPq'. Preview of field's value: '13167265024'")
The text was updated successfully, but these errors were encountered: