-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Add structured tags to ML anomaly data points to make it possible to query for them #67180
Comments
Pinging @elastic/ml-ui (:ml) |
It looks like proposed change would need to go in the job config and so should be an elasticsearch issue. @droberts195 would you agree? |
Yes, certainly part of the request is on the Elasticsearch side. It's asking for extra fields in every result written by the anomaly detector. There is another side to this though, which is that once we complete the "ML in spaces" project it won't be desirable for Kibana apps to directly search the ML results index, but instead go through APIs in the ML UI. In the example of searching results by tag, no job ID is specified. So that implies the ML UI would provide a space-aware results endpoint that could search for results by tag, but taking into account which jobs are visible in the current space. So this functionality is non-trivial both on the Elasticsearch side and the Kibana side. |
Maybe job groups could achieve what is required here. It's getting late in my day, but another day we should think through more carefully how job groups could be used instead of adding more functionality that is doing something quite similar. If the job groups feature doesn't work as it stands then it may be better to meet this requirement by enhancing job groups rather than adding new overlapping functionality and then having someone in the future ask why we have both tags and job groups. |
We discussed this on a Zoom call. It turns out there shouldn't be a need to aggregate different values of We already add tags for a jobs "by" and "partition" fields. Therefore we agreed the requirement can be met by configuring It will then be possible to do terms aggregations or terms filtering on documents with |
@droberts195 Is there any difference between:
vs
Asking because "dimensions": ["service.name", "transaction.type" ] |
@sqren By and partition fields behave differently in how the results are aggregated up the results hierarchy. With the Based on what we know about your data it makes more sense for the config
|
Thanks for the background @sophiec20.
So something like this would intuitively make more sense to me: by_field_name: ["service.name", "transaction.type"]
Is this opbeans data, or apm data in general? Just wondering if we are optimizing for sample data, instead of real customer data. |
The anomaly detection modelling is complex, see https://github.com/elastic/ml-cpp .. so fundamental changes to the way jobs are configured and data is modelled is not trivial. It is not a visualisation of an aggregation and there are significant bwc implications to both the modelling and the elasticsearch APIs. Some bulk APM data was made available to @blaklaybul last week and we are now working through the prototypes for job configurations as we've discussed above. It is always preferred to optimise against real customer data providing this usage of data is permitted. We are working with the data provided to us. Once these prototype job configurations are ready we can walk through and explain results against data examples and show how this can support the requirement given regarding labelled results. |
Okay, I just want to make sure we are on the same page. What we are interested in is very much the same behaviour we get today by starting individual jobs. To simplify the experience for users it would be beneficial if we can start a single job where anomalies are separated by a number of dimensions ( Do you see |
The prototype @blaklaybul has made goes a long way by promoting We've briefly talked about adding
limited character set User editable Suggestion Timeline |
My mistake I'm thinking of something else. |
For 7.9 APM will use the existing
This However it is acknowledged that this solution is not ideal, so as part of the on-going project to make ML jobs space-aware, work will start in 7.10 to store ML jobs as Kibana saved objects which will allow us to store meta data, such as 'system tags', as part of the saved object. This has the advantages of:
|
This sounds great @peteharverson! |
Good question @sqren . Yes, we are planning on adding a number of checks around the Spaces / Saved Objects on start-up when the user upgrades, and this should definitely include checking for the |
^^ My ER from 3 years ago now has a chance! :) |
@sqren with ML jobs being made space-aware from 7.11, we are now creating saved objects to act as wrappers around the Elasticsearch job object. I wondered if the Tags functionality for Kibana saved objects might be a way to meet your requirements, but on first look I am thinking it won't be sufficient for your use case here as it only allows a name to be attached to a saved object e.g. If the saved object tags don't look like a solution here, we can investigate adding a
with this replacing your current use of the Would appreciate your thoughts @sqren on whether you think the Kibana saved object tags would be suitable for your use case, or if you think adding a new |
@peteharverson I can't answer for @sqren, but IMHO it would be ideal if we have tags on anomalies, not just jobs. I'm looking into an issue where we are seeing the ML calls slowing down a request to 7s (from about 1.5s without it). One reason for this is because we do a capabilities call, then one to get the job ids, then another one to get the anomaly data for those jobs. Ideally we can just do one request: get all the anomaly data with tag "service.name:opbeans-java". |
The problem is that the ML jobs are now space-aware, so every call needs to be checked again the space(s) that the job is in.
Is there a breakdown of how much of that 7s goes on each call? Maybe there is inefficiency that can be addressed in a different way to adding tags. I think the first step in deciding what to do is to break that 7s down between the 3 ML API calls, and then again between the underlying APIs that those 3 ML APIs are calling, and look at opportunities for efficiencies. Although it would be possible to add tags to the ML jobs that got copied into every single ML result it would be a large piece of work because it would affect all of the different ML result classes in the Java code and, with the "ML in Spaces" project it's important to realise that this wouldn't allow results to be retrieved simply by searching the ML results index because all ML APIs now need to be checked against the jobs space membership. So we should start by making sure we understand where exactly the time is going today. |
Why does it need to be checked? AFAICT, spaces are for organising things rather than securing things. Is that incorrect?
I don't have that breakdown yet. But I'll send you a link on Slack to a screenshot (I haven't scrubbed out potentially sensitive information). |
Currently it is only possible to query for anomaly data points by
job_id
. The problem with the job_id it's not easy to query for specific attributes, and mostly we have to parse the job_id on the client to determine what service or transaction type the data point represent.Example
A job id might be
opbeans-node-request-high_mean_response_time
. We can make a helper function that extracts the service name (opbeans-node) and transaction type (request). But a job could span all transaction types and will therefore not include transaction type:opbeans-node-high_mean_response_time
. Additionally we are soon going to add support for jobs per environment:opbeans-node-production-high_mean_response_time
(where "production" is the env). This makes the job_id fragile.Instead I propose that ML data points should contain user defined tags. This is how I'd like to be able to query for anomaly data:
Get anomaly data:
GET .ml-anomalies-*/_search
Create ML job
This is how I propose the API for creating an ML job should look like:
POST /api/ml/modules/setup/apm_transaction
The text was updated successfully, but these errors were encountered: