Possible to prevent sorting by '_id' in Document Table visualization? #271

gplechuck · 2022-05-17T12:06:54Z

gplechuck
May 17, 2022

Very nice plugin, have been using the 'Document Table' visualization for some time and am very happy with it as a lightweight alternative to the 'Data Table' visualization. Have noticed a problem when querying large datasets however!

It seems that when the 'Max Hits' is set higher than 10000 , a sort by '_id' field is introduced into the resulting query. Sorting by '_id' is not recommended by Elasticsearch and can lead to a large amount of cached _id field data occupying the heap (I think use of _id field data will actually be disabled as default in future versions , see elastic/elasticsearch#64511 ). I recently ran a query over a couple years worth of our data, and the overall heap usage spiked by 91GB, almost exclusively _id field data.

Use of the '_id' field does not appear to be configurable in the Document Table 'Query Parameters'

I had a very quick look at the source files and can see where the _id sort is introduced -

# from file .data_load/enhanced_table_request_handler.js , line 50, plugin, enhanced-table-1.13.0_7.6.1

  if ((visParams.hitsSize !== undefined && visParams.hitsSize > MAX_HITS_SIZE) || visParams.csvFullExport) {
      searchSourceFields.sort.push({'_id': {'order': 'asc','unmapped_type': 'keyword'}});
    }

Is this behaviour essential for anything and would there be any adverse effects in removing that snippet of code from the plugin as a workaround, so we could continue to use the table for large datasets ?

Cheers

fbaligand · 2022-05-18T07:51:57Z

fbaligand
May 18, 2022
Maintainer

Hi @gplechuck,
Thanks to have opened an issue about this problem and thanks for your detailed explanation about “why it is important to fix it”.
To answer your question, there is no real need to sort by id specifically.
The real need is to be able to scroll the data among several queries, when requested size is over 10000 hits, and avoid pagination problems (duplicates or misses).
So to do that, an efficient way should be to sort by “_doc”, that seems to be the recommendation up to Elastic documentation.
What do you think about this change?

0 replies

fbaligand · 2022-05-18T07:52:46Z

fbaligand
May 18, 2022
Maintainer

By the way, I think it is more an issue than a discussion ;)

1 reply

gplechuck May 19, 2022
Author

Moved to an issue, thanks!
#274

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible to prevent sorting by '_id' in Document Table visualization? #271

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Possible to prevent sorting by '_id' in Document Table visualization? #271

gplechuck May 17, 2022

Replies: 2 comments · 1 reply

fbaligand May 18, 2022 Maintainer

fbaligand May 18, 2022 Maintainer

gplechuck May 19, 2022 Author

gplechuck
May 17, 2022

Replies: 2 comments 1 reply

fbaligand
May 18, 2022
Maintainer

fbaligand
May 18, 2022
Maintainer

gplechuck May 19, 2022
Author