Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elasticsearch/log exporter dropping data + context deadline exceeded from aks to elastic cloud es #34564

Closed
iamp3 opened this issue Aug 9, 2024 · 5 comments
Labels
bug Something isn't working exporter/elasticsearch needs triage New item requiring triage

Comments

@iamp3
Copy link

iamp3 commented Aug 9, 2024

Component(s)

exporter/elasticsearch

What happened?

Description

my environment is hosted on the AKS 1.28.9 cluster and elasticsearch is 8.13.4 on the elastic cloud. I'm using elasticsearch exporter to send log to the elastic cloud. I'm getting errors with dropping data and context deadline exceeded. I dont see any other issue.

Steps to Reproduce

Setup AKS + Elastic Cloud integration on Azure
Try to deploy otel/opentelemetry-collector-contrib:0.106.1 with configmap from OpenTelemetry Collector configuration field below.

Expected Result

elasticsearchexport can send pods/containers logs to the elastic cloud from my aks cluster

Actual Result

Exporting failed. Dropping data.
image

Collector version

0.106.1

Environment information

Environment

AKS cluster version: 1.28.9

OpenTelemetry Collector configuration

receivers:
  filelog:
    include_file_path: true
    include:
      - /var/log/pods/*/*/*.log
    operators:
      - id: container-parser
        type: container

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 2000
  batch: {}

exporters:
  debug:
    verbosity: basic

  elasticsearch/log:
    endpoints: ["https://*****.azure.elastic-cloud.com:443"] 
    logs_index: test-logs-index
    timeout: 2m
    api_key: "***"
    tls:
      insecure_skip_verify: true
    discover:
      on_start: true
    flush:
      bytes: 10485760
    retry:
      max_requests: 5
    sending_queue:
      enabled: true  

service:
  pipelines:
    logs:
      receivers: [filelog]
      processors: [batch, memory_limiter]
      exporters: [debug, elasticsearch/log]

Log output

2024-08-09T13:01:46.334Z        error   [email protected]/bulkindexer.go:226       bulk indexer flush error        {"kind": "exporter", "data_type": "logs", "name": "elasticsearch/log", "error": "failed to execute the request: context deadline exceeded"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter.(*asyncBulkIndexerWorker).flush
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/bulkindexer.go:226
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter.(*asyncBulkIndexerWorker).run
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/bulkindexer.go:211
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter.newAsyncBulkIndexer.func1
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/bulkindexer.go:109
2024-08-09T13:01:48.829Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 7, "log records": 147}
2024-08-09T13:01:49.030Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 65}
2024-08-09T13:01:56.347Z        error   exporterhelper/queue_sender.go:92       Exporting failed. Dropping data.        {"kind": "exporter", "data_type": "logs", "name": "elasticsearch/log", "error": "context deadline exceeded", "dropped_items": 24}
go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1
        go.opentelemetry.io/collector/[email protected]/exporterhelper/queue_sender.go:92
go.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume
        go.opentelemetry.io/collector/[email protected]/internal/queue/bounded_memory_queue.go:52
go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1
        go.opentelemetry.io/collector/[email protected]/internal/queue/consumers.go:43


### Additional context

I tried with old image (0.96.1) and otlp/elastic exporter, but got the same dropping data issue as well.
@iamp3 iamp3 added bug Something isn't working needs triage New item requiring triage labels Aug 9, 2024
Copy link
Contributor

github-actions bot commented Aug 9, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@carsonip
Copy link
Contributor

  • Do you see any logs in ES from the collector? Did the collector drop 100% of the logs or just a subset of it?
  • Can you double check if you have the right ES endpoint configured in the collector? If you browse the endpoint directly in browser / curl, you should get something like
{"error":{"root_cause":[{"type":"security_exception","reason":"missing authentication credentials for REST request [/]","header":{"WWW-Authenticate":["Basic realm=\"security\" charset=\"UTF-8\"","Bearer realm=\"security\"","ApiKey"]}}],"type":"security_exception","reason":"missing authentication credentials for REST request [/]","header":{"WWW-Authenticate":["Basic realm=\"security\" charset=\"UTF-8\"","Bearer realm=\"security\"","ApiKey"]}},"status":401}
  • Although it isn't related to the root cause, do you mind sending it without sending queue, i.e. sending_queue::enabled=false? Elasticsearch exporter bulk indexer is sending it in async, so a sending queue wouldn't help anyway. Disabling it would reduce noise in collector logs.

@iamp3
Copy link
Author

iamp3 commented Aug 12, 2024

  • Do you see any logs in ES from the collector? Did the collector drop 100% of the logs or just a subset of it?
  • Can you double check if you have the right ES endpoint configured in the collector? If you browse the endpoint directly in browser / curl, you should get something like
{"error":{"root_cause":[{"type":"security_exception","reason":"missing authentication credentials for REST request [/]","header":{"WWW-Authenticate":["Basic realm=\"security\" charset=\"UTF-8\"","Bearer realm=\"security\"","ApiKey"]}}],"type":"security_exception","reason":"missing authentication credentials for REST request [/]","header":{"WWW-Authenticate":["Basic realm=\"security\" charset=\"UTF-8\"","Bearer realm=\"security\"","ApiKey"]}},"status":401}
  • Although it isn't related to the root cause, do you mind sending it without sending queue, i.e. sending_queue::enabled=false? Elasticsearch exporter bulk indexer is sending it in async, so a sending queue wouldn't help anyway. Disabling it would reduce noise in collector logs.

Looks like it was a configuration problem. Here is the final version of the config that works good for deployment with 1 replica, but on daemon-set setup periodically I see dropping data errors with context deadline exceeded msg:

        receivers:
          filelog:
            include_file_path: true
            include:
              - /var/log/pods/*/*/*.log
            exclude:
              - /var/log/pods/*/*/filelog-*.log
            operators:
              - id: container-parser
                type: container
        
        exporters:
          debug:
            verbosity: basic
          elasticsearch:
            endpoint: ${env:ELASTICSEARCH_URL}
            logs_index: filelog-logs
            api_key: "${env:FILELOG_SECRET_KEY}"
            tls:
              insecure_skip_verify: false
              ca_file: "/etc/ssl/certs/elk.cer"
            retry:
              enabled: true
            sending_queue:
              enabled: true
              num_consumers: 10
              queue_size: 50000
        service:
          pipelines:
            logs:
              receivers: [filelog]
              processors: []
              exporters: [debug, elasticsearch]

@carsonip Could you please tell me how I could configure dynamic creation of indexes in filelog-logs-2024-12 format for example? I found logs_dynamic_index, but couldnt find how I could create with %Y-%M timestamp

@carsonip
Copy link
Contributor

carsonip commented Aug 12, 2024

My recommendation would be to use data streams instead of indices.

If you perform all the steps below:

  1. set
elasticsearch:
    logs_dynamic_index:
        enabled: true
  1. remove logs_index or set it to something that complies with data stream naming convention, e.g. logs-my_application-default
  2. (optional) set mapping mode to ecs for to map fields to Elastic Common Schema (ECS)
elasticsearch:
    mapping:
        mode: ecs

Then, the logs will then be sent to the configured data stream, and as anything under logs-*-* should have a default index template, it will be rolled over as configured. You may configure more specific index templates to control how often backing indices are created for a data stream.

@iamp3
Copy link
Author

iamp3 commented Aug 15, 2024

The working configuration is written above, there was a problem with it. Also added logs_dynamic_index instead of dedicated index, for convenience. Thanks for tips @carsonip

@iamp3 iamp3 closed this as completed Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/elasticsearch needs triage New item requiring triage
Projects
None yet
Development

No branches or pull requests

2 participants