error fetching chunks & failed to get s3 object #5221

firstsko · 2022-01-24T11:05:42Z

I found something error on the log.When I query ge QueryRange more than 1 hour,The error will be happend.Any one have ideas for it ?


level=error ts=2022-01-24T10:57:05.273404635Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: context canceled"
level=error ts=2022-01-24T10:57:05.273434964Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: context canceled"
level=warn ts=2022-01-24T10:57:05.273457211Z caller=grpc_logging.go:55 method=/logproto.Querier/Query duration=16.524312361s err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled" msg="gRPC\n"
level=error ts=2022-01-24T10:57:05.273486368Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: context canceled"
level=error ts=2022-01-24T10:57:05.273786914Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled"
level=error ts=2022-01-24T10:57:05.274246531Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled"
level=warn ts=2022-01-24T10:57:05.274385302Z caller=grpc_logging.go:55 method=/logproto.Querier/Query duration=29.542213769s err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled" msg="gRPC\n"

Here is my Query Instance config:

querier:
    engine:
        timeout: 5m
    query_ingesters_within: 3h
    query_timeout: 5m

Here is my Query Frontend Instance Config:
query_range:

make queries more cache-able by aligning them with their step intervals

align_queries_with_step: true
max_retries: 5
parallelise_shardable_queries: true
cache_results: true
split_queries_by_interval: 5m
results_cache:
cache:
enable_fifocache: true
fifocache:
size: 2048
validity: 24h

frontend:
log_queries_longer_than: 5s
compress_responses: true
max_outstanding_per_tenant: 2048

My S3 Config:

storage_config:
boltdb_shipper:
  active_index_directory: ./data/boltdb-shipper-active
  cache_location: ./data/boltdb-shipper-cache
  cache_ttl: 24h         # Can be increased for faster performance over longer query periods, uses more disk space
  shared_store: s3
filesystem:
  directory: ./data/chunks
aws:
  s3: http://loki:[email protected]:9000/loki-data
  s3forcepathstyle: true
  region: loki-test

the minio Oss system is worked ，and can be searched

103a51ec5321702e:17e8b407c48:17e8b40de3f:46cb9fb5
Mon Jan 24 2022 16:43:31 GMT+0800
679 KB
103a51ec5321702e:17e8b40de3e:17e8b40e841:e723ea12
Mon Jan 24 2022 16:43:31 GMT+0800
678 KB
103a51ec5321702e:17e8b40e841:17e8b40f215:c964c4f5
Mon Jan 24 2022 16:43:31 GMT+0800
678 KB
103a51ec5321702e:17e8b40f214:17e8b40fb12:c59ef872
Mon Jan 24 2022 16:43:31 GMT+0800
677 KB
103a51ec5321702e:17e8b40fb11:17e8b410477:2eb32d1d
Mon Jan 24 2022 16:43:32 GMT+0800
678 KB
103a51ec5321702e:17e8b410476:17e8b410e04:158423d5
Mon Jan 24 2022 16:43:32 GMT+0800
677 KB
103a51ec5321702e:17e8b410e03:17e8b411726:71315af6
Mon Jan 24 2022 16:43:32 GMT+0800
678 KB
103a51ec5321702e:17e8b411725:17e8b41208f:d2fd006f
Mon Jan 24 2022 16:43:32 GMT+0800
678 KB
103a51ec5321702e:17e8b41208e:17e8b4129d8:40d7d18d
Mon Jan 24 2022 16:43:32 GMT+0800
678 KB
103a51ec5321702e:17e8b4129d7:17e8b41340f:54d47763
Mon Jan 24 2022 16:44:01 GMT+0800
678 KB
103a51ec5321702e:17e8b41340e:17e8b413d5d:4ba55c3e
Mon Jan 24 2022 16:44:01 GMT+0800
677 KB

The text was updated successfully, but these errors were encountered:

liguozhong · 2022-01-25T09:42:03Z

grpc timeout ？？？

vendor/github.com/grafana/dskit/grpcclient/grpcclient.go:98

level=warn ts=2022-01-24T10:57:05.273457211Z caller=grpc_logging.go:55 method=/logproto.Querier/Query duration=16.524312361s err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled" msg="gRPC\n"

twix14 · 2022-01-31T17:08:40Z

Same issue here ✋

Syphixs · 2022-02-14T09:29:44Z

Sadly I have the same issue. Any updates yet?

firstsko · 2022-02-21T03:04:21Z

make queries more cache-able by aligning them with their step intervals ?
align_queries_with_step: true
split_queries_by_interval: 5m

anything else ?

DolevAlgam · 2022-03-01T20:29:52Z

Same here ✋

danielserrao · 2022-03-21T12:38:02Z

Same here

chthtlo · 2022-03-22T07:22:02Z

Hey guys, have the same issue
Any update please?

marcusteixeira · 2022-03-24T19:04:14Z

Same error with gcs backend.

liguozhong · 2022-03-29T03:27:38Z

This error is an expected log. When the limit is 1000, the context will automatically cancel after 1000 logs are queried.

manu756 · 2022-04-03T17:49:15Z

Same issue here! any update?

step-baby · 2022-04-12T07:00:10Z

Same issue here!

primeroz · 2022-05-06T09:20:35Z

@liguozhong

This error is an expected log. When the limit is 1000, the context will automatically cancel after 1000 logs are queried.

do you have any reference to this ? while this make sense i would expect not to get an error in this case when is a clean close of the context

rickysos · 2022-05-12T19:37:26Z

Same error with gcs backend.

ninetyninenines · 2022-05-12T19:39:26Z

Same error with GCS backend as well.

alexandre1984rj · 2022-05-19T20:53:07Z

Same issue using S3

joaoluchetti · 2022-05-24T19:03:04Z

Same issue with GCS backend as well, is there any setting on that we can set to override the default 1000 limit?

akshaya-msd · 2022-05-25T12:38:53Z

Same issue with azure blob store as well

sflanker · 2022-06-01T10:06:20Z

So, there are probably a few hundred reasons why context canceled might be reported as the cause of a storage operation failing (which is probably why there are so many "me too" comments above). That said, here is one possibility and the journey I took to find it:

TLDR; my network config was such that my EC2 instance needed the PUT response hop limit increased, and unfortunately the HTTP request sent by the golang aws-sdk to the EC2 instance metadata service in order to get the API token for accessing S3 as the instance IAM role simply hangs indefinitely when there are more network hops then allowed. See this aws-sdk-go bug.

Figuring this out was quite the chore. The logging here was so obtuse, even with debug enabled, they I wound up forking and customizing loki to add a bunch more logging. Mostly this just helped me determine that my configuration was being correctly applied and the S3 operations were being attempted as expected.

Additionally I verified that my EC2 instance and Kubernetes pods running on it could all access the S3 bucket in question (test via CURL and the aws CLI). I also verified that they could perform GET requests on the EC2 metadata service (which seems to play an important role in IAM based access to AWS resources from an EC2 instance). However I was only testing GETs, so this didn't turn up any problems.

The critical diagnostic step for me was to alter the aws-sdk's s3 configuration to have it output debug logs as well (see s3_storage_client.go):

  s3Config = s3Config.WithLogLevel(*aws.LogLevel(aws.LogDebugWithHTTPBody))
  s3Config = s3Config.WithCredentialsChainVerboseErrors(true)

This is what clued me in to the fact that the HTTP PUT to the EC2 metadata services /latest/api/token endpoint was hanging and that was the root cause.

In conclusion, this was a horrible debugging experience, but I think it is as much the aws-sdk-go library's fault as it is loki's. More debug logs around storage would be good, and a way to enable the underlying storage SDK's logging would be nice too (or maybe it could be magically enabled if the log level is set to debug).

wkshare · 2022-06-06T01:54:52Z

This error is an expected log. When the limit is 1000, the context will automatically cancel after 1000 logs are queried.

Where is such an instruction?

alexandre1984rj · 2022-06-10T03:40:49Z

Hi there @liguozhong. If this is an expected log is there any discussion to update the error to warning or change the message?

liguozhong · 2022-06-10T07:44:35Z

Hi there @liguozhong. If this is an expected log is there any discussion to update the error to warning or change the message?

good idea.

wkshare · 2022-06-12T13:33:50Z

i change replication_factor from 3 to 1. the question is fixed.
but i don't know why.

LinTechSo · 2022-07-16T11:20:24Z

Hi. any updates?

homie-du · 2022-08-25T09:48:00Z

Same issue using S3

saidamef · 2022-09-21T08:23:16Z

same issue, is thery any way around this?

dansl1982 · 2022-10-19T14:13:58Z

+1

…6360) **What this PR does / why we need it**: ``` level=error ts=2022-01-24T10:57:05.273404635Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: context canceled" level=error ts=2022-01-24T10:57:05.273434964Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: context canceled" level=warn ts=2022-01-24T10:57:05.273457211Z caller=grpc_logging.go:55 method=/logproto.Querier/Query duration=16.524312361s err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled" msg="gRPC\n" level=error ts=2022-01-24T10:57:05.273486368Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: context canceled" level=error ts=2022-01-24T10:57:05.273786914Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled" level=error ts=2022-01-24T10:57:05.274246531Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled" level=warn ts=2022-01-24T10:57:05.274385302Z caller=grpc_logging.go:55 method=/logproto.Querier/Query duration=29.542213769s err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled" msg="gRPC\n" ``` When logql timeout, do not print these logs. Since we retry, we don't want to show intermittent timeouts, and we only care about the last error. **Which issue(s) this PR fixes**: Fixes [<5221>](#5221)

…rafana#6360) **What this PR does / why we need it**: ``` level=error ts=2022-01-24T10:57:05.273404635Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: context canceled" level=error ts=2022-01-24T10:57:05.273434964Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: context canceled" level=warn ts=2022-01-24T10:57:05.273457211Z caller=grpc_logging.go:55 method=/logproto.Querier/Query duration=16.524312361s err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled" msg="gRPC\n" level=error ts=2022-01-24T10:57:05.273486368Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: context canceled" level=error ts=2022-01-24T10:57:05.273786914Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled" level=error ts=2022-01-24T10:57:05.274246531Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled" level=warn ts=2022-01-24T10:57:05.274385302Z caller=grpc_logging.go:55 method=/logproto.Querier/Query duration=29.542213769s err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled" msg="gRPC\n" ``` When logql timeout, do not print these logs. Since we retry, we don't want to show intermittent timeouts, and we only care about the last error. **Which issue(s) this PR fixes**: Fixes [<5221>](grafana#5221)

git-aliuge · 2023-09-05T12:40:06Z

Same here ✋

ryuseongryong · 2024-02-16T06:12:21Z

This error seems to occur when there is no chunks bucket in S3.
In my case, the compactor is working from scratch, so it looks like I need a bucket.

liguozhong mentioned this issue Jun 10, 2022

fix: Hide this error message when ctx timeout occurs in s3.getObject #6360

Merged

4 tasks

MichelHollands closed this as completed in #6360 Nov 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error fetching chunks & failed to get s3 object #5221

error fetching chunks & failed to get s3 object #5221

firstsko commented Jan 24, 2022 •

edited

Loading

liguozhong commented Jan 25, 2022

twix14 commented Jan 31, 2022

Syphixs commented Feb 14, 2022

firstsko commented Feb 21, 2022

DolevAlgam commented Mar 1, 2022

danielserrao commented Mar 21, 2022

chthtlo commented Mar 22, 2022

marcusteixeira commented Mar 24, 2022

liguozhong commented Mar 29, 2022

manu756 commented Apr 3, 2022

step-baby commented Apr 12, 2022

primeroz commented May 6, 2022

rickysos commented May 12, 2022

ninetyninenines commented May 12, 2022

alexandre1984rj commented May 19, 2022

joaoluchetti commented May 24, 2022

akshaya-msd commented May 25, 2022

sflanker commented Jun 1, 2022

wkshare commented Jun 6, 2022

alexandre1984rj commented Jun 10, 2022

liguozhong commented Jun 10, 2022

wkshare commented Jun 12, 2022

LinTechSo commented Jul 16, 2022

homie-du commented Aug 25, 2022

saidamef commented Sep 21, 2022

dansl1982 commented Oct 19, 2022

git-aliuge commented Sep 5, 2023

ryuseongryong commented Feb 16, 2024 •

edited

Loading

error fetching chunks & failed to get s3 object #5221

error fetching chunks & failed to get s3 object #5221

Comments

firstsko commented Jan 24, 2022 • edited Loading

make queries more cache-able by aligning them with their step intervals

liguozhong commented Jan 25, 2022

twix14 commented Jan 31, 2022

Syphixs commented Feb 14, 2022

firstsko commented Feb 21, 2022

DolevAlgam commented Mar 1, 2022

danielserrao commented Mar 21, 2022

chthtlo commented Mar 22, 2022

marcusteixeira commented Mar 24, 2022

liguozhong commented Mar 29, 2022

manu756 commented Apr 3, 2022

step-baby commented Apr 12, 2022

primeroz commented May 6, 2022

rickysos commented May 12, 2022

ninetyninenines commented May 12, 2022

alexandre1984rj commented May 19, 2022

joaoluchetti commented May 24, 2022

akshaya-msd commented May 25, 2022

sflanker commented Jun 1, 2022

wkshare commented Jun 6, 2022

alexandre1984rj commented Jun 10, 2022

liguozhong commented Jun 10, 2022

wkshare commented Jun 12, 2022

LinTechSo commented Jul 16, 2022

homie-du commented Aug 25, 2022

saidamef commented Sep 21, 2022

dansl1982 commented Oct 19, 2022

git-aliuge commented Sep 5, 2023

ryuseongryong commented Feb 16, 2024 • edited Loading

firstsko commented Jan 24, 2022 •

edited

Loading

ryuseongryong commented Feb 16, 2024 •

edited

Loading