Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Logs UI] Add ML job results APIs #42356

Merged

Conversation

weltenwort
Copy link
Member

@weltenwort weltenwort commented Jul 31, 2019

Summary

This PR adds a route that can be used to fetch the log entry rate anomaly job results when a corresponding job has been set up.

closes #42057

New Routes

POST /api/infra/log_analysis/results/log_entry_rate

This route grants access to the log rate anomaly detection results within a given time interval.

interface GetLogEntryRateRequest {
  data: {
    // length of a bucket in milliseconds, which should be a multiple of the ml job's bucket span
    bucketDuration: number;
    // the id of the source this job belongs to
    sourceId: string;
    timeRange: {
      // start of the requested time interval as an epoch timestamp in milliseconds
      startTime: number;
      // end of the requested time interval as an epoch timestamp in milliseconds
      endTime: number;
    };
  };
}
interface GetLogEntryRateSuccessResponse {
  data: {
    // length of a bucket in milliseconds
    bucketDuration: number;
    // a sequence of non-overlapping time buckets in ascending order
    histogramBuckets: Array<{
      // a set of anomalies found within this bucket
      anomalies: Array<{
        // the number of log entries actually found
        actualLogEntryRate: number;
        // a relative measure of the anomalousness
        anomalyScore: number;
        // duration of the anomaly in milliseconds
	  		duration: number;
        // start of the anomaly as an epoch timestamp in milliseconds
        startTime: number;
        // the number of log entries typically found according to the model
        typicalLogEntryRate: number;
      }>
      // length of the bucket in milliseconds
			duration: number;
      // the statistical characteristics of the log entry rate in the bucket
      logEntryRateStats: {
        avg: number | null, // null if count === 0
        count: number,
        max: number | null, // null if count === 0
        min: number | null, // null if count === 0
        sum: number,
      };
      // the statistical characteristics of the model's lower bound for the log entry rate in this bucket
      modelLowerBoundStats: {
        avg: number | null, // null if count === 0
        count: number,
        max: number | null, // null if count === 0
        min: number | null, // null if count === 0
        sum: number,
      };
      // the statistical characteristics of the model's upper bound for the log entry rate in this bucket
      modelUpperBoundStats: {
        avg: number | null, // null if count === 0
        count: number,
        max: number | null, // null if count === 0
        min: number | null, // null if count === 0
        sum: number,
      };
      // start of the bucket as an epoch timestamp in milliseconds
      startTime: number;
    }>;
  };
}

Failure conditions:

  • no log entry rate job configured for this source: Not Found
  • insufficient permissions: Forbidden

Implementation Notes

  • The io-ts runtime types used to validate and type the request and response payloads on both server- and client-side are located in common/http_api/log_analysis/results.
  • The route assumes the job id to be derived from the space id and the source id. A source of truth for the job ids has therefore been implemented in common/log_analysis/job_parameters.ts.
  • The result histogram data include information about the model boundaries as well as the actual values. Therefore the route assumes that the underlying job is configured with the model plot being enabled. (see the testing hints below)
  • In the discussion with the ML team I learned that 15 minutes would be a reasonable bucket span for the underlying job. Specifying bucket durations smaller than the job's bucket span in the route parameters will yield empty buckets and should therefore be prevented by the requesting UI. Additionally, the requested bucket duration should ideally be a multiple of the job's bucket span to ensure the bucket boundaries can be aligned.
  • The PR includes an example container in public/containers/logs/log_analysis that can be built upon in later PRs.

Testing Hints

  • As mentioned in the implementation notes the route assumes a specific job id for the log rate anomaly results, which is kibana-logs-ui-${spaceId}-${sourceId}-${jobType}.
  • In order to receive results from the route, a job with that id must therefore be created. The job and datafeed configurations could look something like the following:
    {
      "job_id": "kibana-logs-ui-testspace-default-log-entry-rate",
      "analysis_config": {
        "bucket_span": "15m",
        "summary_count_field_name": "doc_count",
        "detectors": [
          {
            "detector_description": "count",
            "function": "count",
            "detector_index": 0
          }
        ],
        "influencers": []
      },
      "data_description": {
        "time_field": "@timestamp",
        "time_format": "epoch_ms"
      },
      "model_plot_config": {
        "enabled": true
      }
    }
    {
      "datafeed_id": "datafeed-kibana-logs-ui-testspace-default-log-entry-rate",
      "job_id": "kibana-logs-ui-testspace-default-log-entry-rate",
      "indexes": ["filebeat-*"],
      "aggregations": {
        "buckets": {
          "date_histogram": {
            "field": "@timestamp",
            "fixed_interval": "900000ms"
          },
          "aggregations": {
            "@timestamp": {
              "max": {
                "field": "@timestamp"
              }
            }
          }
        }
      }
    }
    This assumes that log data with a @timestamp field are present in the cluster.
  • The url to query the route would then be /s/testspace/api/infra/log_analysis/results/log_entry_rate.

Checklist

@weltenwort weltenwort added v8.0.0 Feature:Logs UI Logs UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.4.0 labels Jul 31, 2019
@weltenwort weltenwort self-assigned this Jul 31, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/infra-logs-ui

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@weltenwort weltenwort marked this pull request as ready for review August 6, 2019 14:54
@weltenwort weltenwort requested a review from a team as a code owner August 6, 2019 14:54
@weltenwort weltenwort added the release_note:skip Skip the PR/issue when compiling release notes label Aug 6, 2019
@Kerry350 Kerry350 self-requested a review August 6, 2019 16:03
@elasticmachine
Copy link
Contributor

💔 Build Failed

@weltenwort
Copy link
Member Author

that's a not-so-nice way to learn that we have firefox smoke tests... jenkins, test this again

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@weltenwort
Copy link
Member Author

and flaky smoke tests at that... sorry for the noise

@Kerry350
Copy link
Contributor

Kerry350 commented Aug 8, 2019

Functionally this works great after playing around with the API via curl 👍

One thing I did want to do - and isn't explicitly linked to this API as it's not responsible for the job setup - is try to setup the job and datafeed via the appropriate APIs using the configuration in the "Testing Hints".

I hit a snag there doing that against my locally running cluster - I got the following error: cannot retrieve field [doc_count] because it has no mappings, which had been specified for summary_count_field_name. My Filebeat index templates had been loaded / setup. But saying that if I queried GET /filebeat-*/_mapping there's nothing for doc_count.

Out of interest - is there something I'm missing there? I know it's not directly related to this work, but I want to make sure I'm fully understanding the job setup portion that pairs with these results.

This works great against the shared cluster using the pre-existing kibana-logs-ui-default-default-log-entry-rate job though 👌

Will go through the code now.

@weltenwort
Copy link
Member Author

Sorry, that was my mistake. 🙈 The datafeed definition is incomplete because it's missing the histogram aggregation that produces buckets with doc_count values:

{
  "datafeed_id": "datafeed-kibana-logs-ui-testspace-default-log-entry-rate",
  "job_id": "kibana-logs-ui-testspace-default-log-entry-rate",
  "indexes": ["filebeat-*"],
  "aggregations": {
    "buckets": {
      "date_histogram": {
        "field": "@timestamp",
        "fixed_interval": "900000ms"
      },
      "aggregations": {
        "@timestamp": {
          "max": {
            "field": "@timestamp"
          }
        }
      }
    }
  }
}

@Kerry350
Copy link
Contributor

Kerry350 commented Aug 8, 2019

Sorry, that was my mistake. 🙈 The datafeed definition is incomplete because it's missing the histogram aggregation that produces buckets with doc_count values

Ah, nice! That's much better than what I thought, which is that I'd fundamentally misunderstood something, epsecially r.e. how doc_counts work 😄

Copy link
Contributor

@Kerry350 Kerry350 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! 🎉

As we're still evaluating io-ts and our general "simple HTTP API" approach, I'll add that the code here was easy to follow for me (the types, encoding, decoding etc all made sense).

@weltenwort weltenwort merged commit b306d76 into elastic:master Aug 8, 2019
@weltenwort weltenwort deleted the logs-ui-ml-integration-job-results-api branch August 8, 2019 12:38
weltenwort added a commit to weltenwort/kibana that referenced this pull request Aug 8, 2019
This PR adds a route that can be used to fetch the log entry rate anomaly job results when a corresponding job has been set up.
weltenwort added a commit that referenced this pull request Aug 8, 2019
Backports the following commits to 7.x:
 - [Logs UI] Add ML job results APIs (#42356)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Logs UI Logs UI feature release_note:skip Skip the PR/issue when compiling release notes review Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.4.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Logs UI] Create API route to access log rate analysis results
3 participants