[Search] The `csv_searchsource` export type requests all the fields when just a few are needed #123187

dokmic · 2022-01-17T18:59:45Z

The original issue popped up in the reporting for CSV export type.

The current implementation of the CSV generator is reusing the request body of the search source. The body always contains stored_fields: ['*'] and a field with * in fields. That is causing the excessive gathering of all the available fields even though they are not exported.

That problem significantly slows down the CSV generation on large datasets and increases memory consumption by the reporting.

In the stored_fields documentation, it is recommended to load those fields selectively.

Should we fix that in the search_source object to return a request body from getSearchRequestBody to fetch only selected fields? Namely:

Can we remove { field: '*' } from the fields?
Can we provide a stored_fields value different from ['*']?

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-01-17T18:59:47Z

Pinging @elastic/kibana-reporting-services (Team:Reporting Services)

elasticmachine · 2022-01-17T18:59:47Z

Pinging @elastic/kibana-app-services (Team:AppServicesUx)

lukasolson · 2022-01-18T18:56:19Z

I don't believe we should be reusing this behavior from Discover's search source configuration. Currently, Discover sets the fields property to '*' by design, since Discover shows all of the fields in the field list on the left, which is populated from the fields returned in the documents.

It seems that, instead, when generating the CSV, we should look at the list of columns configured and use searchSource.setField('fields', [... the list of fields]). This way, we would only request the fields in the table rather than all fields. This will also enforce that stored_fields will only contain those which are requested as well.

Related: #82383, #85162

tsullivan · 2022-01-19T00:30:26Z

Thanks @lukasolson

CSV Export gets its search source configuration, and a separate list of columns, from here:

kibana/src/plugins/discover/public/utils/get_sharing_data.ts

Line 57 in 467d111

return {

It seems that, instead, when generating the CSV, we should look at the list of columns configured and use searchSource.setField('fields', [... the list of fields])

That is possible, but there is not always a list of columns configured (default "new search"). When there is, it comes from the Discover app state.

When the user has not selected any fields, CSV export should export all of the fields. It sounds like we can't optimize that scenario very well, but is still worth discussion. Maybe we can make the list available in getSearchSource and use that as the columns data when the app state has no selected columns.

lukasolson · 2022-01-19T15:21:59Z

When the user has not selected any fields, CSV export should export all of the fields. It sounds like we can't optimize that scenario very well, but is still worth discussion. Maybe we can make the list available in getSearchSource and use that as the columns data when the app state has no selected columns.

Do we have access to the ~~index pattern~~ data view here? If not, we could probably get it from searchSource.getField('index'). The data view should have a getFields method we can use to get the list of fields. There might be some caveats there, so maybe @mattkime can weight in on using that method to get the list of fields to populate for CSV export when no specific columns are configured.

mattkime · 2022-01-19T21:09:52Z

@lukasolson I can't add to your faultless advice. In so far as the export is created from a data view context, using the data view will provide the most consistent experience.

tsullivan · 2022-01-20T20:01:02Z

@lukasolson

It seems that, instead, when generating the CSV, we should look at the list of columns configured and use searchSource.setField('fields', [... the list of fields]). This way, we would only request the fields in the table rather than all fields. This will also enforce that stored_fields will only contain those which are requested as well.

With your help, I took at stab and implementing this on the server side before CSV Export gets the search body: https://github.com/elastic/kibana/pull/123412/files#diff-a86296d9011252ff17c1007bb3de66763c43af633f7daff1207151da34838e7fR309

I floated this change by @dokmic and he convinced me this feels pretty hacky.

I don't believe we should be reusing this behavior from Discover's search source configuration. Currently, Discover sets the fields property to '*' by design, since Discover shows all of the fields in the field list on the left, which is populated from the fields returned in the documents.

This seems to be the reason why Reporting has to add on a hack to "fix" the search source. The function that you linked to describes itself as a Helper function to update the given searchSource before fetching/sharing/persisting. It seems Discover should maybe not modify fields in this function, because Reporting has to undo what is done there. What are your thoughts?

Edit: it looks like Discover does not have access to the fields/columns within the updateSearchSource function, so my suggestion perhaps won't work.

tsullivan · 2022-02-08T22:58:57Z

Closed in #123412

dokmic added bug Fixes for quality problems that affect the customer experience triage_needed (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead Team:Reporting Services labels Jan 17, 2022

dokmic changed the title ~~[Reporting] The csv_searchsource export type requests all the fields when just a few are needed~~ [Search] The csv_searchsource export type requests all the fields when just a few are needed Jan 18, 2022

tsullivan mentioned this issue Jan 19, 2022

[Reporting] Set Fields into the SearchSource in Discover's getSharingData #123412

Merged

2 tasks

tsullivan closed this as completed Feb 8, 2022

sophiec20 added the Feature:Reporting:CSV Reporting issues pertaining to CSV file export label Aug 21, 2024

botelastic bot added the needs-team Issues missing a team label label Aug 21, 2024

sophiec20 removed the (Deprecated) Team:Reporting Services label Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Search] The `csv_searchsource` export type requests all the fields when just a few are needed #123187

[Search] The `csv_searchsource` export type requests all the fields when just a few are needed #123187

dokmic commented Jan 17, 2022 •

edited

Loading

elasticmachine commented Jan 17, 2022

elasticmachine commented Jan 17, 2022

lukasolson commented Jan 18, 2022

tsullivan commented Jan 19, 2022 •

edited

Loading

lukasolson commented Jan 19, 2022

mattkime commented Jan 19, 2022

tsullivan commented Jan 20, 2022 •

edited

Loading

tsullivan commented Feb 8, 2022

[Search] The csv_searchsource export type requests all the fields when just a few are needed #123187

[Search] The csv_searchsource export type requests all the fields when just a few are needed #123187

Comments

dokmic commented Jan 17, 2022 • edited Loading

elasticmachine commented Jan 17, 2022

elasticmachine commented Jan 17, 2022

lukasolson commented Jan 18, 2022

tsullivan commented Jan 19, 2022 • edited Loading

lukasolson commented Jan 19, 2022

mattkime commented Jan 19, 2022

tsullivan commented Jan 20, 2022 • edited Loading

tsullivan commented Feb 8, 2022

[Search] The `csv_searchsource` export type requests all the fields when just a few are needed #123187

[Search] The `csv_searchsource` export type requests all the fields when just a few are needed #123187

dokmic commented Jan 17, 2022 •

edited

Loading

tsullivan commented Jan 19, 2022 •

edited

Loading

tsullivan commented Jan 20, 2022 •

edited

Loading