Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log message sometimes returned and sometimes not #2668

Closed
markfermor opened this issue Sep 24, 2020 · 5 comments
Closed

Log message sometimes returned and sometimes not #2668

markfermor opened this issue Sep 24, 2020 · 5 comments

Comments

@markfermor
Copy link

Describe the bug
We were running a search in Grafana explorer and sometimes would find a log entry and on other occasions it would not. I've since moved to using the logcli and can produce the same results (ruling out grafana), we were running 1.3.0 of Loki (where we feel like the message would disappear completely once moving out of memory and into GCS backed storage), I’ve since upgraded everything to 1.6.1 Loki (we use the docker image) and a similar problem still occurs, where the log message will sometimes be returned and other times not, without changing query or start/end times. We’re currently running Loki in single binary mode, but don’t have lots of logs in this staging environment (same problem occurs in production environment as well, which is separate Loki set-up etc)

I've not found anything obvious in the Loki logs themselves (in debug mode) as to why this might be happening.

To Reproduce
Steps to reproduce the behavior:

  1. Started Loki (1.6.1)
  2. Started Promtail (1.6.1) on K8s host. Caused an error log in one of the applications on the k8s host which creates a single ~13KB JSON log entry. I specify the size of the log entry in case it is relevant to the problem, as we've not noticed other situations where log messages are only occasionally returning, but equally our log entries aren't generally this large...
  3. Query: go/bin/logcli --addr=http://XX:3100 query --stats --from="2020-09-24T13:00:00Z" --to="2020-09-24T15:00:00Z" '{appName="XX"} |= "uniqueIdForTheSpecificLogEntry"'

Expected behavior
Every time we execute the query, the same result set is returned. It seems completely random as to whether the entry will be returned or not.

Environment:

  • Infrastructure: Kubernetes (GKE)
  • Deployment tool:
    - Promtail is manually edited YAML and deployed as daemonSet
    - Loki is run on a single server not in K8s but running Docker with command for the container /usr/bin/loki -config.file=/etc/loki/loki-config.yaml -log.level=debug

Screenshots, Promtail config, or terminal output
If applicable, add any output to help explain your problem.

Loki Config (/etc/loki/loki-config.yaml):

auth_enabled: false

server:
  http_listen_port: 3100

ingester:
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 5m
  chunk_retain_period: 30s

schema_config:
  configs:
  - from: 2018-04-15
    store: boltdb
    object_store: gcs
    schema: v11
    index:
      prefix: index_
      period: 168h
  - from: 2020-09-24
    store: boltdb
    object_store: gcs
    schema: v11
    index:
      prefix: index_
      period: 24h

storage_config:
  gcs:
    bucket_name: XX-Bucket-XX
    request_timeout: 300s

  boltdb:
    directory: /data/loki/index
limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 168h

chunk_store_config:
  max_look_back_period: 0s

table_manager:
  chunk_tables_provisioning:
    inactive_read_throughput: 0
    inactive_write_throughput: 0
    provisioned_read_throughput: 0
    provisioned_write_throughput: 0
  index_tables_provisioning:
    inactive_read_throughput: 0
    inactive_write_throughput: 0
    provisioned_read_throughput: 0
    provisioned_write_throughput: 0
  retention_deletes_enabled: false
  retention_period: 0s

logcli Output, run the command it finds and returns the log entry, run the command again right after and get no log entry returned:

========> go/bin/logcli --addr=http://XX:3100 query --stats --from="2020-09-24T13:00:00Z"  --to="2020-09-24T15:00:00Z" '{appName=“XX”} |= "43b8cc2ac8044d07"'
http://XX:3100/loki/api/v1/query_range?direction=BACKWARD&end=1600959600000000000&limit=30&query=%7BappName%3D%22XX%22%7D+%7C%3D+%2243b8cc2ac8044d07%22&start=1600952400000000000
Ingester.TotalReached 		 1
Ingester.TotalChunksMatched 	 0
Ingester.TotalBatches 		 0
Ingester.TotalLinesSent 	 0
Ingester.HeadChunkBytes 	 0 B
Ingester.HeadChunkLines 	 0
Ingester.DecompressedBytes 	 0 B
Ingester.DecompressedLines 	 0
Ingester.CompressedBytes 	 0 B
Ingester.TotalDuplicates 	 0
Store.TotalChunksRef 		 2
Store.TotalChunksDownloaded 	 2
Store.ChunksDownloadTime 	 27.167144ms
Store.HeadChunkBytes 		 0 B
Store.HeadChunkLines 		 0
Store.DecompressedBytes 	 13 kB
Store.DecompressedLines 	 2
Store.CompressedBytes 		 3.5 kB
Store.TotalDuplicates 		 0
Summary.BytesProcessedPerSecond 	 259 kB
Summary.LinesProcessedPerSecond 	 39
Summary.TotalBytesProcessed 		 13 kB
Summary.TotalLinesProcessed 		 2
Summary.ExecTime 			 50.352218ms
Common labels: {appName="XX", applicationLabel="XX-pod", container_name="XX-po", env="staging", filename="/var/log/pods/staging_XX-deploy-69d6fcc86b-cjc6k_8032c291-ddcd-4f31-8312-79b323a8bd7c/XX-po/0.log", level="50", namespace="staging", node_name="gke-XXXX-other-default2-preemptiv-6fbb8862-a9zj", pod_name="XX-deploy-69d6fcc86b-cjc6k", stream="stderr", teamOwner=“XX”}
2020-09-24T14:08:30+01:00 {} {"level":50,"time":"2020-09-24T13:08:30.603Z","msg":"inbound http error","pid":55,"hostname":"XX-deploy-69d6fcc86b-cjc6k","service":"XX","env":"staging”,...........Removed JSON..........,”traceId":"43b8cc2ac8044d07","v":1}
http://XX:3100/loki/api/v1/query_range?direction=BACKWARD&end=1600952910606450856&limit=30&query=%7BappName%3D%22XX%22%7D+%7C%3D+%2243b8cc2ac8044d07%22&start=1600952400000000000
Ingester.TotalReached 		 1
Ingester.TotalChunksMatched 	 0
Ingester.TotalBatches 		 0
Ingester.TotalLinesSent 	 0
Ingester.HeadChunkBytes 	 0 B
Ingester.HeadChunkLines 	 0
Ingester.DecompressedBytes 	 0 B
Ingester.DecompressedLines 	 0
Ingester.CompressedBytes 	 0 B
Ingester.TotalDuplicates 	 0
Store.TotalChunksRef 		 2
Store.TotalChunksDownloaded 	 2
Store.ChunksDownloadTime 	 31.332223ms
Store.HeadChunkBytes 		 0 B
Store.HeadChunkLines 		 0
Store.DecompressedBytes 	 13 kB
Store.DecompressedLines 	 2
Store.CompressedBytes 		 3.5 kB
Store.TotalDuplicates 		 0
Summary.BytesProcessedPerSecond 	 203 kB
Summary.LinesProcessedPerSecond 	 31
Summary.TotalBytesProcessed 		 13 kB
Summary.TotalLinesProcessed 		 2
Summary.ExecTime 			 64.156005ms
Common labels: {appName="XX", applicationLabel="XX-pod", container_name="XX-po", env="staging", filename="/var/log/pods/staging_XX-deploy-69d6fcc86b-cjc6k_8032c291-ddcd-4f31-8312-79b323a8bd7c/XX-po/0.log", level="50", namespace="staging", node_name="gke-XXXX-other-default2-preemptiv-6fbb8862-a9zj", pod_name="XX-deploy-69d6fcc86b-cjc6k", stream="stderr", teamOwner=“XXX”}

=========> go/bin/logcli --addr=http://XX:3100 query --stats --from="2020-09-24T13:00:00Z"  --to="2020-09-24T15:00:00Z" '{appName="XX"} |= "43b8cc2ac8044d07"'
http://XX:3100/loki/api/v1/query_range?direction=BACKWARD&end=1600959600000000000&limit=30&query=%7BappName%3D%22XX%22%7D+%7C%3D+%2243b8cc2ac8044d07%22&start=1600952400000000000
Ingester.TotalReached 		 1
Ingester.TotalChunksMatched 	 0
Ingester.TotalBatches 		 0
Ingester.TotalLinesSent 	 0
Ingester.HeadChunkBytes 	 0 B
Ingester.HeadChunkLines 	 0
Ingester.DecompressedBytes 	 0 B
Ingester.DecompressedLines 	 0
Ingester.CompressedBytes 	 0 B
Ingester.TotalDuplicates 	 0
Store.TotalChunksRef 		 2
Store.TotalChunksDownloaded 	 2
Store.ChunksDownloadTime 	 33.825779ms
Store.HeadChunkBytes 		 0 B
Store.HeadChunkLines 		 0
Store.DecompressedBytes 	 0 B
Store.DecompressedLines 	 0
Store.CompressedBytes 		 0 B
Store.TotalDuplicates 		 0
Summary.BytesProcessedPerSecond 	 0 B
Summary.LinesProcessedPerSecond 	 0
Summary.TotalBytesProcessed 		 0 B
Summary.TotalLinesProcessed 		 0
Summary.ExecTime 			 64.28988ms
@pcbl
Copy link

pcbl commented Sep 25, 2020

Hi,

Would that have any similarity with this issue?
https://github.com/grafana/loki/issues/2616

On my case I got no data when I provide certain intervals (1s I get data, 100ms I do not get)...

BR
Pedro

@markfermor
Copy link
Author

Would that have any similarity with this issue?
#2616

It could be, but sounds slightly different. I can run the exact same query, and sometimes get results and sometimes not. It doesn't seem to just be a problem when returning log messages of a large size, even smaller sized log messages go missing

@markfermor
Copy link
Author

The problem for us was #2624
Caused this problem where sometimes results would return and other times not. Also caused problems looking back at data older than an hour in some cases. Applying the suggested work around there appears to have solved this issue

@pcbl
Copy link

pcbl commented Sep 29, 2020

OK.
On my case I am always using the same timespan, with different intervals(1s or 100ms) for example, and in one case it returns data, on the other not... My Issue is still open, luckly someone might give me some hint as I am running out of options... :-|

cyriltovena pushed a commit to cyriltovena/loki that referenced this issue Jun 11, 2021
* do not invalidate chunks cache when series are deleted

Signed-off-by: Sandeep Sukhani <[email protected]>

* add comment regarding exclusion of chunksCache from invalidating cache for series deletion

Signed-off-by: Sandeep Sukhani <[email protected]>
@zzswang
Copy link

zzswang commented Aug 1, 2023

Run into same issue.

  1. Sometimes just could not query logs 2 days ago.
  2. Redeploy loki, and then the logs come back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants