-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial support for indexing to data streams #4409
Conversation
💔 Tests Failed
Expand to view the summary
Build stats
Test stats 🧪
Test errorsExpand to view the tests failures
|
Introduce `apm-server.data_streams.enabled` config, which will be used by idxmgmt to route events to data streams based on data_stream.* fields that are expected to be in each published event.
When transforming model objects into beat.Events, set the data_stream.{type,dataset} fields. We add data_stream.namespace elsewhere, using an event processor.
Handle the new apm-server.data_streams.enabled config: - when enabled, we add data_stream.namespace to all published events - when disabled, we remove data_stream.* fields from all published events For now we just set data_stream.namespace to "default". Later this will be based on the config received from Fleet.
There is a hack in here to inject data_stream.namespace in all published events, since the tests do not use the standard libbeat pipeline code.
b4135f3
to
6b5c23f
Compare
Codecov Report
@@ Coverage Diff @@
## master #4409 +/- ##
==========================================
+ Coverage 76.11% 76.17% +0.05%
==========================================
Files 156 158 +2
Lines 9713 9782 +69
==========================================
+ Hits 7393 7451 +58
- Misses 2320 2331 +11
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me, left a couple of comments.
I realized that apm-*
templates are still set up when data streams are enabled, while they wouldn't be applied any more. Doesn't have to be changed in this PR but we could disable template setup in this case.
Out of curiosity - Is there a Kibana/APM issue already to adapt the UI to the data stream changes? (right now it is ofc broken when data streams are enabled)
Existing integrations disable this by injecting config, e.g. https://github.com/elastic/beats/blob/b6896ee6892da8ce4567d63ee0fa962512fb8701/x-pack/elastic-agent/spec/filebeat.yml#L3. I don't mind disabling it by default when enabling data streams though. Probably a bit neater.
Not yet AFAIK. It's broken by default, but you can load a template and update the APM app settings in Kibana to fetch transactions and spans from |
model/transaction.go
Outdated
@@ -111,7 +112,13 @@ func (e *Transaction) fields() common.MapStr { | |||
func (e *Transaction) Transform(_ context.Context, _ *transform.Config) []beat.Event { | |||
transactionTransformations.Inc() | |||
|
|||
// Transactions are stored in a "traces" data stream along with spans. | |||
dataset := datastreams.NormalizeServiceName(e.Metadata.Service.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assumption that we don't need to differenciate spans from transactions is strong, but what if we are wrong? Isn't cheap and harmless to add an apm.span
or apm.transaction
dataset prefix, just in case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't cheap and harmless to add an apm.span or apm.transaction dataset prefix, just in case
It's not necessarily harmless, as more datasets will lead to more shards. We can always revisit this in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how much it would hurt?
I guess its fine, but a similar argument can be done the other way around: by separating spans and transactions in different indices, it is likely that some queries in the APM UI can be changed to faster alternatives.
We would not get this benefit if we revisit this decision in the future because the UI would have to query for both index name and index content...
The opposite is not true: if we go with separated spans and transactions to start with, and later we find they don't add value (performance or user-flexibility wise), we would be able to optimize for less shards without any drawbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how much it would hurt?
We won't know for sure until we test.
This was discussed in the proposal, and the discussion concluded with this comment (from me):
"It seems to me that there's no clear answer here. In order to make progress I propose that we start out conservatively w.r.t. sharding, and have a combined data stream as described.
This will go out as experimental to start with, and we can adjust as necessary. We can replay some real data through the options to compare performance."
I guess its fine, but a similar argument can be done the other way around: by separating spans and transactions in different indices, it is likely that some queries in the APM UI can be changed to faster alternatives.
Span docs are currently used in two places in the UI, AFAIK:
- displaying an individual trace waterfall
- walking traces for the service map
I'm confident (1) is going to be cheap either way. We know (2) is a sub-optimal approach to building the service map, and we're investigation options to move away from it.
We would not get this benefit if we revisit this decision in the future because the UI would have to query for both index name and index content...
Not sure I follow. We don't query spans by themselves; only ever spans and transactions together. If we did want to query only spans in the future, then we could split the data sets and make processor.event
a constant_keyword
; that would mean only the spans data streams would be considered in the query. This can be done without changing the index pattern, and without breaking backwards compatibility.
Anyway, as mentioned above: let's go with this for now and test it at scale. We can use an ingest node processor to split the transactions and spans into separate streams for testing.
Use a common "apm." prefix, and place the service name last.
jenkins run the tests please |
jenkins run the tests please |
jenkins run the tests please |
Something very wrong is going on in the integration tests... |
(I'm going to try reproducing this locally, so I can debug...) |
Same error is occurring in #4445 ... |
The integration tests have been failing for a while, and are unrelated to the server as far as I can see. I've captured my findings in elastic/apm-integration-testing#987. |
* idxmgmt: add support for data streams Introduce `apm-server.data_streams.enabled` config, which will be used by idxmgmt to route events to data streams based on data_stream.* fields that are expected to be in each published event. * _meta/fields.common.yml: add data_stream.* fields * model: add data_stream.{type,dataset} fields When transforming model objects into beat.Events, set the data_stream.{type,dataset} fields. We add data_stream.namespace elsewhere, using an event processor. * beater: handle apm-server.data_streams.enabled Handle the new apm-server.data_streams.enabled config: - when enabled, we add data_stream.namespace to all published events - when disabled, we remove data_stream.* fields from all published events For now we just set data_stream.namespace to "default". Later this will be based on the config received from Fleet. * processor/stream/package_tests: update tests There is a hack in here to inject data_stream.namespace in all published events, since the tests do not use the standard libbeat pipeline code. * Update approvals * Add changelog entry * datastreams: address review comments * changelogs: clarify feature is experimental * _meta: specify allowed values for data_stream.type * model: update datasets Use a common "apm." prefix, and place the service name last.
* idxmgmt: add support for data streams Introduce `apm-server.data_streams.enabled` config, which will be used by idxmgmt to route events to data streams based on data_stream.* fields that are expected to be in each published event. * _meta/fields.common.yml: add data_stream.* fields * model: add data_stream.{type,dataset} fields When transforming model objects into beat.Events, set the data_stream.{type,dataset} fields. We add data_stream.namespace elsewhere, using an event processor. * beater: handle apm-server.data_streams.enabled Handle the new apm-server.data_streams.enabled config: - when enabled, we add data_stream.namespace to all published events - when disabled, we remove data_stream.* fields from all published events For now we just set data_stream.namespace to "default". Later this will be based on the config received from Fleet. * processor/stream/package_tests: update tests There is a hack in here to inject data_stream.namespace in all published events, since the tests do not use the standard libbeat pipeline code. * Update approvals * Add changelog entry * datastreams: address review comments * changelogs: clarify feature is experimental * _meta: specify allowed values for data_stream.type * model: update datasets Use a common "apm." prefix, and place the service name last.
Tested with BC1 - sent some event data for errors, spans, transactions and metrics - everything ended up in the expected data streams. |
Motivation/summary
This PR introduces initial support for indexing events into data streams, with the new
apm-server.data_streams.enabled
config. When this is set, the server will:data_stream.{type,dataset,namespace}
fields to each published event$type-$dataset-$namespace
)Still TODO, in followups:
apm-server.data_streams.enabled
when in Fleet modedata_stream.namespace
configurable (using config received from Elastic Agent)output.elasticsearch.indices
cannot be set in conjunctionapm-server.data_streams.enabled
Checklist
I have considered changes for:
- [ ] documentation(later maybe; more likely this will be undocumented and set indirectly by using Fleet)- [ ] logging (add log lines, choose appropriate log selector, etc.)- [ ] metrics and monitoring (create issue for Kibana team to add metrics to visualizations, e.g. Kibana#44001)- [ ] Elasticsearch Service (https://cloud.elastic.co)- [ ] Elastic Cloud Enterprise (https://www.elastic.co/products/ece)- [ ] Elastic Cloud on Kubernetes (https://www.elastic.co/elastic-cloud-kubernetes)How to test these changes
-E apm-server.data_streams.enabled=true
traces-*
, errors intologs-*
, and runtime metrics intometrics-*
.Related issues
Partially addresses #4378