Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose more information and stats about a data stream #58316

Closed
martijnvg opened this issue Jun 18, 2020 · 5 comments
Closed

Expose more information and stats about a data stream #58316

martijnvg opened this issue Jun 18, 2020 · 5 comments
Labels
:Data Management/Data streams Data streams and their lifecycles Team:Data Management Meta label for data/management team Team:Deployment Management Meta label for Management Experience - Deployment Management team

Comments

@martijnvg
Copy link
Member

The get data stream api currently returns for each returned data stream the following properties:

  • name of the data stream
  • timestamp_field of the data stream
  • list of indices that are part of the data stream
  • data stream generation

The data stream ui and telemetry requires more information about the data stream in order to become more useful. There is interest in adding the following data stream related information:

  • Composable index template name that matches with a data stream
  • The ilm policy used for a data stream. I think multiple need to be returned, because if a ilm policy changes the historic backing indices will properly use the old ilm policy?
  • The health status of a data stream. If at least one backing index is red or yellow than the data stream should report that too.
  • The accumulated used disk space of a data stream.
  • The timestamp of the last indexed document.
  • (tbd)

Some of these stats are relatively expensive to compute compared to the information returned from the get data streams api (which reads from cluster state). I think therefor we should add the expensive stats to a be added get data stream stats api.

Relates to #53100

@martijnvg martijnvg added the :Data Management/Data streams Data streams and their lifecycles label Jun 18, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Data streams)

@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Jun 18, 2020
@martijnvg
Copy link
Member Author

@cjcenizal If we add a data stream stats api then means that for the data stream overview, two api calls would need to be made. I know just invoking a single api, would be the easiest way to build the data stream overview, but given that computing the more expensive stats changes the runtime characteristics of the get data stream api completely, I wonder whether it would be ok if we add stats to the data stream stats api? Alternatively we can add the information returned by the get data stream also to the data stream stats api, since those bits of information don't affect how data stream stats is executed, but then there is overlap between these two apis.

@cjcenizal
Copy link
Contributor

Thanks @martijnvg! As long as the stats API supports wildcards, I think it's OK for the UI to make two API calls to retrieve the data it needs. We'll retrieve all data streams and stats for all data streams with two parallel API requests, and then merge this information together so we can present it in the table.

Alternatively we can add the information returned by the get data stream also to the data stream stats api, since those bits of information don't affect how data stream stats is executed, but then there is overlap between these two apis.

Thanks for mentioning this possibility as well. I'm comfortable with moving forward with the implementation I mentioned above, and then exploring merging data stream info into the stats API if we discover a concrete need for that level of optimization.

@martijnvg
Copy link
Member Author

Part of this has been implemented via #59128 and the other part will be implemented via #58707.

@danhermann
Copy link
Contributor

Completed via #59128 and #58707.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Data streams Data streams and their lifecycles Team:Data Management Meta label for data/management team Team:Deployment Management Meta label for Management Experience - Deployment Management team
Projects
None yet
Development

No branches or pull requests

4 participants