title | authors | reviewers | approvers | creation-date | last-updated | status | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
cluster-logging-es-rollover-data-design |
|
|
|
2019-11-04 |
2019-11-04 |
implementable |
- Enhancement is
implementable
- Design details are appropriately documented from clear requirements
- Test plan is defined
-
Graduation criteria for dev preview, tech preview, GA - User-facing documentation is created in [openshift/docs]
This proposal alters the data design for storing logs in Elasticsearch to co-locate logs to fewer indices. It additionally leverages the Elasticsearch rollover api to help maintain the number of indices and shards in order to align with Elastic's performance and scaling recommendations.
The initial data design for Cluster Logging segments logs by OpenShift namespace in order to facilitate multi-tenant support and data curation. This choice was made because index level security was the only feature available from an open source library. It additionally facilitated curation as full indicies could be removed when the retention period expired. This means, however, at any one time, there are at least "$noOfNamespaces * $daysRetained" number of indices maintained by the Elasticsearch server. Each index can additionally be sharded to spread the load across the Elasticsearch nodes. The end result of this shard explosion is the Elasticsearch cluster performance is not optimized and is not capable of efficiently processing and storing logs.
Each index adds load and overhead (e.g. mapping, metadata) to the Elasticsearch cluster that needs to be tracked beyond the actual data. Elasticsearch has recommendations for maximum shard size and the number of shards per node per allocated gigabyte of heap. Cluster logging typically exceeds these recommendations for any OpenShift clusters that has significant log traffic.
The goals of this proposals are:
- Utilize a data design that aligns data schema with Elasticsearch's recommendations.
- Expose data management policy as API in the
cluster-logging-operator
andelasticsearch-operator
in support of Cluster Logging's mission to gather log sources - Migrate indicies from the previous schema into the new. Migrated inidices will be governed by the data management policy exposed by this proposal
This change will not:
- Provide a general data management policy API that fully exposes Elasticsearch's rollover API
This proposal introduces two specific changes to achieve its goals:
- Co-located data in a few opinionated indices
- Index management using rollover index API
Logs of a given type (e.g. app container, infra) are separated by index. The Cluster Logging collector writes logs to a well-known alias established by the cluster-logging-operator
. The ClusterLogging CR instance specifies the management policy for each log type index. This controls:
- the maximum age (e.g. 7 days).
This policy is passed to the Elasticsearch CR for the elasticsearch-operator
to manage the rollover policy. The details of the policy are specified by the cluster-logging-operator
and are based on the guidelines suggested by Elasticsearch.
- ClusterLogging will expose the minimal needed set of the Elasticsearch CR rollover policy management API in order to achieve the previously described goals
- ClusterLogging will manage rollover as Elasticsearch index management is either restricted by Elastic licensing or not available in the opensource version of OpenDistro (OpenDistro Index Management requires Elasticsearch 7.x and higher)
- Security will be addressed by using the OpenDistro security plugin and document level security (DLS). Details TBD.
Logs of a given type are co-located to the following indices:
Log Type | Read Alias | Write Alias | Initial Index |
---|---|---|---|
Infra (logs-infra ) |
infra,logs-infra | infra-write | infra-00001 |
Application Container (logs-app ) |
app,logs-app | app-write | app-000001 |
Audit (logs-audit ) |
audit,logs-audit | audit-write | audit-000001 |
Note: Log types are further defined in LogForwarding.
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
name: "instance"
spec:
logStore:
retentionPolicy:
application:
maxAge: 7d
infra
maxAge: 7d
The cluster-logging-operator
will utilize the ClusterLogging CR retention policy to spec
the desired aforementioned Elasticsearch CR indexManagement.
apiVersion: "logging.openshift.io/v1"
kind: "Elasticsearch"
metadata:
name: "elasticsearch"
spec:
indexManagement:
policies:
- name: infra-policy
pollInterval: 5m
phases:
hot:
actions:
rollover:
maxAge: 3d
delete:
minAge: 7d
mappings:
- name: infra #creates infra-00001 aliased infra-write
policyRef: infra-policy #policy applies to index patterns infra*
aliases:
- infra
The elasticsearch-operator
will be modified to:
- Expose the index management API
- Create and seed index templates to support the policy
- Create the initial indices as needed
- Block data ingestion if needed until initial index is seeded
- Deploy curation CronJob for each mapping to rollover using the defined policy
- Update the
fluent-plugin-viaq-data-model
to allow defining a static index to write logs
Example configuration:
....
<elasticsearch_index_name>
tag "**"
name_type static
static_index_name 'app-write'
</elasticsearch_index_name>
- The Curator Cronjob deployed by the
cluster-logging-operator
will be deprecated and eventually removed. The responsibilities for curation will be subsumed by implementation of Elasticsearch rollover management. - Curation configuration by namespace is no longer configurable and is restricted to cluster wide settings associated with log type
- Regression tests will be executed to confirm no regressions from previous releases
- Unit tests will be modified to account for the change in data design
- e2e tests will be modified to account for the change in data design
The elasticsearch-operator
will migrate existing log indices to work with the new data design by:
- Indices beginning with
project.*
are aliased to:app
- Indices beginning with
.operations.*
are aliased toinfra
- Migrated indices are deleted after migration
Note: The cluster-logging-operator
will leave the deployed curation CronJob to manage indices from the older data schema. These indices will be curated as previously and, eventually, removed from the cluster. The curation CronJob will be removed in fugure releases.
Downgrades should be discouraged unless we know for certain the Elasticsearch version managed by cluster logging is the same version. There is risk that Elasticsearch may have migrated data that is unreadable by an older version.
release | Description |
---|---|
4.4 | GA release of rollover data design |
The drawback to not implementing this change is that Cluster Logging will:
- Continue to experience performance and scaling issues directly related to a less then optimal data schema
There are no current alternatives
curl http://localhost:9200/_template/app_logs?pretty -HContent-Type:application/json -XPUT -d '{
"index_patterns": ["app*"],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"aliases": {
"app": {}
}
}'
curl http://localhost:9200/app.container-000001?pretty -HContent-Type:application/json -XPUT -d '{
"aliases": {
"app-write": {"is_write_index": true},
"logs-app": {}
}
}'
curl http://localhost:9200/app-write/_doc/0?pretty \
-HContent-Type:application/json -XPOST -d '{"value":"1"}'
curl http://localhost:9200/logs-app/_search?pretty \
-HContent-Type:application/json
curl http://localhost:9200/app-write/_rollover?pretty \
-HContent-Type:application/json -XPOST -d '{"conditions": {"max_docs": 1}}'