Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loki query-scheduler is not working in ipv6 kubernetes clusters #10884

Closed
01cj opened this issue Oct 12, 2023 · 2 comments · Fixed by #11121
Closed

Loki query-scheduler is not working in ipv6 kubernetes clusters #10884

01cj opened this issue Oct 12, 2023 · 2 comments · Fixed by #11121
Labels
type/bug Somehing is not working as expected

Comments

@01cj
Copy link

01cj commented Oct 12, 2023

Describe the bug
Loki query-scheduler is not working in ipv6 kubernetes clusters

To Reproduce
Steps to reproduce the behavior:

  1. install latest loki-distributed helm chart in ipv6 kind or eks clusters
  2. Check all services are running fine including query-scheduler
  3. Query the data and you see no labels will get listed in Grafana Explore UI
  4. Disable the query-scheduler in loki-distributed helm chart and it will start working again.

This is working fine ipv4 cluster without any issues.

Note: Query-scheduler always starts without any erros in ipv6 EKS cluster and noticing no issues in logs.

Expected behavior
query-scheduler working just like in ipv4 clusters

Environment:

  • Infrastructure: laptop, AWS EKS
  • Deployment tool: helm

Screenshots, Promtail config, or terminal output
querier logs:

ts=2023-10-12T18:47:44.229117762Z caller=spanlogger.go:86 user=fake level=debug metric=logs
ts=2023-10-12T18:47:44.23102972Z caller=spanlogger.go:86 user=fake level=debug Ingester.TotalReached=0 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=0 Ingester.TotalLinesSent=0 Ingester.TotalChunksRef=0 Ingester.TotalChunksDownloaded=0 Ingester.ChunksDownloadTime=0s Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.PostFilterLInes=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Querier.TotalChunksRef=0 Querier.TotalChunksDownloaded=0 Querier.ChunksDownloadTime=0s Querier.HeadChunkBytes="0 B" Querier.HeadChunkLines=0 Querier.DecompressedBytes="0 B" Querier.DecompressedLines=0 Querier.PostFilterLInes=0 Querier.CompressedBytes="0 B" Querier.TotalDuplicates=0
ts=2023-10-12T18:47:44.231066929Z caller=spanlogger.go:86 user=fake level=debug Cache.Chunk.Requests=0 Cache.Chunk.EntriesRequested=0 Cache.Chunk.EntriesFound=0 Cache.Chunk.EntriesStored=0 Cache.Chunk.BytesSent="0 B" Cache.Chunk.BytesReceived="0 B" Cache.Chunk.DownloadTime=0s Cache.Index.Requests=0 Cache.Index.EntriesRequested=0 Cache.Index.EntriesFound=0 Cache.Index.EntriesStored=0 Cache.Index.BytesSent="0 B" Cache.Index.BytesReceived="0 B" Cache.Index.DownloadTime=0s Cache.StatsResult.Requests=0 Cache.StatsResult.EntriesRequested=0 Cache.StatsResult.EntriesFound=0 Cache.StatsResult.EntriesStored=0 Cache.StatsResult.BytesSent="0 B" Cache.StatsResult.BytesReceived="0 B" Cache.Result.DownloadTime=0s Cache.Result.Requests=0 Cache.Result.EntriesRequested=0 Cache.Result.EntriesFound=0 Cache.Result.EntriesStored=0 Cache.Result.BytesSent="0 B" Cache.Result.BytesReceived="0 B" Cache.Result.DownloadTime=0s
ts=2023-10-12T18:47:44.23108697Z caller=spanlogger.go:86 user=fake level=debug Summary.BytesProcessedPerSecond="0 B" Summary.LinesProcessedPerSecond=0 Summary.TotalBytesProcessed="0 B" Summary.TotalLinesProcessed=0 Summary.PostFilterLines=0 Summary.ExecTime=2.072333ms Summary.QueueTime=0s
ts=2023-10-12T18:47:44.231102637Z caller=spanlogger.go:86 user=fake level=info org_id=fake traceID=58d796b4a1472ff4 latency=fast query_type=labels length=6h0m0s duration=2.072333ms status=200 label= query= splits=0 throughput=0B total_bytes=0B total_entries=10
level=error ts=2023-10-12T18:47:44.231558512Z caller=scheduler_processor.go:208 org_id=fake traceID=58d796b4a1472ff4 frontend=127.0.0.1:9095 msg="error notifying frontend about finished query" err="rpc error: code = Unimplemented desc = unknown service frontendv2pb.FrontendForQuerier"
level=error ts=2023-10-12T18:47:44.421006512Z caller=scheduler_processor.go:208 org_id=fake traceID=58d796b4a1472ff4 frontend=127.0.0.1:9095 msg="error notifying frontend about finished query" err="rpc error: code = Unimplemented desc = unknown service frontendv2pb.FrontendForQuerier"
level=error ts=2023-10-12T18:47:44.667192554Z caller=scheduler_processor.go:208 org_id=fake traceID=58d796b4a1472ff4 frontend=127.0.0.1:9095 msg="error notifying frontend about finished query" err="rpc error: code = Unimplemented desc = unknown service frontendv2pb.FrontendForQuerier"
level=error ts=2023-10-12T18:47:45.181733638Z caller=scheduler_processor.go:208 org_id=fake traceID=58d796b4a1472ff4 frontend=127.0.0.1:9095 msg="error notifying frontend about finished query" err="rpc error: code = Unimplemented desc = unknown service frontendv2pb.FrontendForQuerier"
level=error ts=2023-10-12T18:47:46.396227596Z caller=scheduler_processor.go:208 org_id=fake traceID=58d796b4a1472ff4 frontend=127.0.0.1:9095 msg="error notifying frontend about finished query" err="rpc error: code = Unimplemented desc = unknown service frontendv2pb.FrontendForQuerier"

Frontend config

frontend:
  compress_responses: true
  log_queries_longer_than: 5s
  scheduler_address: loki-loki-distributed-query-scheduler:9095
  tail_proxy_url: http://loki-loki-distributed-querier:3100
frontend_worker:
  scheduler_address: loki-loki-distributed-query-scheduler:9095

@hainenber
Copy link
Contributor

Can you try if new loki Helm chart would solve your case? There was a closed PR that attempted to add IPv6 support for loki-distributed Helm chart but they've routed to loki Helm chart.

@01cj
Copy link
Author

01cj commented Oct 15, 2023

I was able to setup Loki in EKS ipv6 cluster using loki-distributed helm chart.

Sample structuredConfig used:

loki:
  structuredConfig:
    server:
      log_level: info
      grpc_server_max_recv_msg_size: 104857600
      grpc_server_max_send_msg_size: 104857600
      http_server_read_timeout: 430s
      http_server_write_timeout: 430s
    common:
      ring:
        kvstore:
          store: memberlist
        instance_enable_ipv6: true
    compactor:
      compaction_interval: 10m
      retention_delete_delay: 2h
      retention_delete_worker_count: 150
      retention_enabled: true
      shared_store: s3
      working_directory: /var/loki/
      compactor_ring:
        instance_enable_ipv6: true 
    distributor:
      ring:
        kvstore:
          store: memberlist
        instance_enable_ipv6: true
    ingester:
      chunk_idle_period: 2h
      chunk_target_size: 1572864
      max_chunk_age: 2h
      max_transfer_retries: 0
      lifecycler:
        ring:
          kvstore:
            store: memberlist
          replication_factor: 1
        enable_inet6: true
    limits_config:
      max_query_length: 15d
      max_query_series: 100000
      max_global_streams_per_user: 100000
      max_query_parallelism: 128
      ingestion_burst_size_mb: 32
      ingestion_rate_mb: 24
      per_stream_rate_limit: 9437184
      reject_old_samples: true
      reject_old_samples_max_age: 15d
      retention_period: 15d
      split_queries_by_interval: 15m
      query_timeout: 5m
      memberlist:
        cluster_label: loki-distributed
        bind_addr:
          - '::'
    querier:
      max_concurrent: 32
    query_scheduler:
      scheduler_ring:
        instance_enable_ipv6: true
    schema_config:
      configs:
      - from: "2022-03-15"
        index:
          period: 24h
          prefix: index_
        object_store: s3
        schema: v12
        store: tsdb
    storage_config:
      aws:
        s3: s3_bucket 
        s3forcepathstyle: true
      tsdb_shipper:
        active_index_directory: /var/loki/index
        cache_location: /var/loki/cache
        index_gateway_client:
          server_address: dns:///loki-distributed-index-gateway:9095
        shared_store: s3

All loki components starts fine in ipv6 using above config.

So when I enable the query-scheduler the querier just forward the request to localhost for some reason.

level=error ts=2023-10-12T18:47:46.396227596Z caller=scheduler_processor.go:208 org_id=fake traceID=58d796b4a1472ff4 frontend=127.0.0.1:9095 msg="error notifying frontend about finished query" err="rpc error: code = Unimplemented desc = unknown service frontendv2pb.FrontendForQuerier"

Once I disable the query-scheduler everything starts working again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Somehing is not working as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants