Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Search] The csv_searchsource export type requests all the fields when just a few are needed #123187

Closed
dokmic opened this issue Jan 17, 2022 · 8 comments
Labels
bug Fixes for quality problems that affect the customer experience (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead Feature:Reporting:CSV Reporting issues pertaining to CSV file export impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. loe:medium Medium Level of Effort needs-team Issues missing a team label triage_needed

Comments

@dokmic
Copy link
Contributor

dokmic commented Jan 17, 2022

The original issue popped up in the reporting for CSV export type.

The current implementation of the CSV generator is reusing the request body of the search source. The body always contains stored_fields: ['*'] and a field with * in fields. That is causing the excessive gathering of all the available fields even though they are not exported.

That problem significantly slows down the CSV generation on large datasets and increases memory consumption by the reporting.

In the stored_fields documentation, it is recommended to load those fields selectively.

Should we fix that in the search_source object to return a request body from getSearchRequestBody to fetch only selected fields? Namely:

  • Can we remove { field: '*' } from the fields?
  • Can we provide a stored_fields value different from ['*']?
@dokmic dokmic added bug Fixes for quality problems that affect the customer experience triage_needed (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead Team:Reporting Services labels Jan 17, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-reporting-services (Team:Reporting Services)

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-services (Team:AppServicesUx)

@dokmic dokmic changed the title [Reporting] The csv_searchsource export type requests all the fields when just a few are needed [Search] The csv_searchsource export type requests all the fields when just a few are needed Jan 18, 2022
@lukasolson
Copy link
Member

I don't believe we should be reusing this behavior from Discover's search source configuration. Currently, Discover sets the fields property to '*' by design, since Discover shows all of the fields in the field list on the left, which is populated from the fields returned in the documents.

It seems that, instead, when generating the CSV, we should look at the list of columns configured and use searchSource.setField('fields', [... the list of fields]). This way, we would only request the fields in the table rather than all fields. This will also enforce that stored_fields will only contain those which are requested as well.

Related: #82383, #85162

@tsullivan
Copy link
Member

tsullivan commented Jan 19, 2022

Thanks @lukasolson

CSV Export gets its search source configuration, and a separate list of columns, from here:

It seems that, instead, when generating the CSV, we should look at the list of columns configured and use searchSource.setField('fields', [... the list of fields])

That is possible, but there is not always a list of columns configured (default "new search"). When there is, it comes from the Discover app state.

When the user has not selected any fields, CSV export should export all of the fields. It sounds like we can't optimize that scenario very well, but is still worth discussion. Maybe we can make the list available in getSearchSource and use that as the columns data when the app state has no selected columns.

@lukasolson
Copy link
Member

When the user has not selected any fields, CSV export should export all of the fields. It sounds like we can't optimize that scenario very well, but is still worth discussion. Maybe we can make the list available in getSearchSource and use that as the columns data when the app state has no selected columns.

Do we have access to the index pattern data view here? If not, we could probably get it from searchSource.getField('index'). The data view should have a getFields method we can use to get the list of fields. There might be some caveats there, so maybe @mattkime can weight in on using that method to get the list of fields to populate for CSV export when no specific columns are configured.

@mattkime
Copy link
Contributor

@lukasolson I can't add to your faultless advice. In so far as the export is created from a data view context, using the data view will provide the most consistent experience.

@tsullivan
Copy link
Member

tsullivan commented Jan 20, 2022

@lukasolson

It seems that, instead, when generating the CSV, we should look at the list of columns configured and use searchSource.setField('fields', [... the list of fields]). This way, we would only request the fields in the table rather than all fields. This will also enforce that stored_fields will only contain those which are requested as well.

With your help, I took at stab and implementing this on the server side before CSV Export gets the search body: https://github.com/elastic/kibana/pull/123412/files#diff-a86296d9011252ff17c1007bb3de66763c43af633f7daff1207151da34838e7fR309

I floated this change by @dokmic and he convinced me this feels pretty hacky.

I don't believe we should be reusing this behavior from Discover's search source configuration. Currently, Discover sets the fields property to '*' by design, since Discover shows all of the fields in the field list on the left, which is populated from the fields returned in the documents.

This seems to be the reason why Reporting has to add on a hack to "fix" the search source. The function that you linked to describes itself as a Helper function to update the given searchSource before fetching/sharing/persisting. It seems Discover should maybe not modify fields in this function, because Reporting has to undo what is done there. What are your thoughts?

Edit: it looks like Discover does not have access to the fields/columns within the updateSearchSource function, so my suggestion perhaps won't work.

@exalate-issue-sync exalate-issue-sync bot added impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:small Small Level of Effort impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. loe:medium Medium Level of Effort impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. and removed impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:small Small Level of Effort impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. labels Jan 24, 2022
@tsullivan
Copy link
Member

Closed in #123412

@sophiec20 sophiec20 added the Feature:Reporting:CSV Reporting issues pertaining to CSV file export label Aug 21, 2024
@botelastic botelastic bot added the needs-team Issues missing a team label label Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead Feature:Reporting:CSV Reporting issues pertaining to CSV file export impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. loe:medium Medium Level of Effort needs-team Issues missing a team label triage_needed
Projects
None yet
Development

No branches or pull requests

6 participants