Skip to content

Latest commit

 

History

History
241 lines (195 loc) · 9.31 KB

cluster-logging-es-rollover-data-design.md

File metadata and controls

241 lines (195 loc) · 9.31 KB
title authors reviewers approvers creation-date last-updated status
cluster-logging-es-rollover-data-design
@jcantril
@bparees
@ewolinetz
@richm
@portante
@pavolloffay
@sichvoge
@ewolinetz
@richm
2019-11-04
2019-11-04
implementable

Cluster Logging Elasticsearch Rollover Data Design

Release Signoff Checklist

  • Enhancement is implementable
  • Design details are appropriately documented from clear requirements
  • Test plan is defined
  • Graduation criteria for dev preview, tech preview, GA
  • User-facing documentation is created in [openshift/docs]

Summary

This proposal alters the data design for storing logs in Elasticsearch to co-locate logs to fewer indices. It additionally leverages the Elasticsearch rollover api to help maintain the number of indices and shards in order to align with Elastic's performance and scaling recommendations.

Motivation

The initial data design for Cluster Logging segments logs by OpenShift namespace in order to facilitate multi-tenant support and data curation. This choice was made because index level security was the only feature available from an open source library. It additionally facilitated curation as full indicies could be removed when the retention period expired. This means, however, at any one time, there are at least "$noOfNamespaces * $daysRetained" number of indices maintained by the Elasticsearch server. Each index can additionally be sharded to spread the load across the Elasticsearch nodes. The end result of this shard explosion is the Elasticsearch cluster performance is not optimized and is not capable of efficiently processing and storing logs.

Each index adds load and overhead (e.g. mapping, metadata) to the Elasticsearch cluster that needs to be tracked beyond the actual data. Elasticsearch has recommendations for maximum shard size and the number of shards per node per allocated gigabyte of heap. Cluster logging typically exceeds these recommendations for any OpenShift clusters that has significant log traffic.

Goals

The goals of this proposals are:

  • Utilize a data design that aligns data schema with Elasticsearch's recommendations.
  • Expose data management policy as API in the cluster-logging-operator and elasticsearch-operator in support of Cluster Logging's mission to gather log sources
  • Migrate indicies from the previous schema into the new. Migrated inidices will be governed by the data management policy exposed by this proposal

Non-Goals

This change will not:

  • Provide a general data management policy API that fully exposes Elasticsearch's rollover API

Proposal

This proposal introduces two specific changes to achieve its goals:

Logs of a given type (e.g. app container, infra) are separated by index. The Cluster Logging collector writes logs to a well-known alias established by the cluster-logging-operator. The ClusterLogging CR instance specifies the management policy for each log type index. This controls:

  • the maximum age (e.g. 7 days).

This policy is passed to the Elasticsearch CR for the elasticsearch-operator to manage the rollover policy. The details of the policy are specified by the cluster-logging-operator and are based on the guidelines suggested by Elasticsearch.

Implementation Details

Assumptions

  • ClusterLogging will expose the minimal needed set of the Elasticsearch CR rollover policy management API in order to achieve the previously described goals
  • ClusterLogging will manage rollover as Elasticsearch index management is either restricted by Elastic licensing or not available in the opensource version of OpenDistro (OpenDistro Index Management requires Elasticsearch 7.x and higher)
  • Security will be addressed by using the OpenDistro security plugin and document level security (DLS). Details TBD.

Data Model

Logs of a given type are co-located to the following indices:

Log Type Read Alias Write Alias Initial Index
Infra (logs-infra) infra,logs-infra infra-write infra-00001
Application Container (logs-app) app,logs-app app-write app-000001
Audit (logs-audit) audit,logs-audit audit-write audit-000001

Note: Log types are further defined in LogForwarding.

ClusterLogging API

apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"
spec:
  logStore:
    retentionPolicy:
      application:
        maxAge: 7d
      infra
        maxAge: 7d

Cluster Logging Operator

The cluster-logging-operator will utilize the ClusterLogging CR retention policy to spec the desired aforementioned Elasticsearch CR indexManagement.

Elasticsearch API

apiVersion: "logging.openshift.io/v1"
kind: "Elasticsearch"
metadata:
  name: "elasticsearch"
spec:
  indexManagement:
    policies:
    - name: infra-policy
      pollInterval: 5m
      phases:
        hot:
          actions:
            rollover:
              maxAge:   3d
        delete:
          minAge: 7d
    mappings:
    - name:  infra               #creates infra-00001 aliased infra-write
      policyRef: infra-policy         #policy applies to index patterns infra*
      aliases:
      - infra

Elasticsearch Operator

The elasticsearch-operator will be modified to:

  • Expose the index management API
  • Create and seed index templates to support the policy
  • Create the initial indices as needed
  • Block data ingestion if needed until initial index is seeded
  • Deploy curation CronJob for each mapping to rollover using the defined policy

Collector

Example configuration:

  ....
   <elasticsearch_index_name>
     tag "**"
     name_type static
     static_index_name 'app-write'
   </elasticsearch_index_name>

Deprecations

  • The Curator Cronjob deployed by the cluster-logging-operator will be deprecated and eventually removed. The responsibilities for curation will be subsumed by implementation of Elasticsearch rollover management.
  • Curation configuration by namespace is no longer configurable and is restricted to cluster wide settings associated with log type

Design Details

Test Plan

  • Regression tests will be executed to confirm no regressions from previous releases

Unit Tests

  • Unit tests will be modified to account for the change in data design

Integration Tests

  • e2e tests will be modified to account for the change in data design

Upgrade / Downgrade Strategy

Upgrade

The elasticsearch-operator will migrate existing log indices to work with the new data design by:

  • Indices beginning with project.* are aliased to: app
  • Indices beginning with .operations.* are aliased to infra
  • Migrated indices are deleted after migration

Note: The cluster-logging-operator will leave the deployed curation CronJob to manage indices from the older data schema. These indices will be curated as previously and, eventually, removed from the cluster. The curation CronJob will be removed in fugure releases.

Downgrade

Downgrades should be discouraged unless we know for certain the Elasticsearch version managed by cluster logging is the same version. There is risk that Elasticsearch may have migrated data that is unreadable by an older version.

Implementation History

release Description
4.4 GA release of rollover data design

Drawbacks

The drawback to not implementing this change is that Cluster Logging will:

  • Continue to experience performance and scaling issues directly related to a less then optimal data schema

Alternatives

There are no current alternatives

References

Commands

Create Template

curl http://localhost:9200/_template/app_logs?pretty -HContent-Type:application/json -XPUT -d '{
  "index_patterns": ["app*"],
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "aliases": {
    "app": {}
   }
}'

Initialize Index

curl http://localhost:9200/app.container-000001?pretty -HContent-Type:application/json -XPUT -d '{
  "aliases": {
    "app-write": {"is_write_index": true},
    "logs-app": {}
   }
}'

Insert Data

curl http://localhost:9200/app-write/_doc/0?pretty \
  -HContent-Type:application/json -XPOST -d '{"value":"1"}'

Retrieve Data

curl http://localhost:9200/logs-app/_search?pretty \
  -HContent-Type:application/json

Rollover Index

curl http://localhost:9200/app-write/_rollover?pretty \
  -HContent-Type:application/json -XPOST -d '{"conditions": {"max_docs": 1}}'