Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receive: Unify error handling #2899

Merged
merged 6 commits into from
Dec 18, 2020

Conversation

kakkoyun
Copy link
Member

@kakkoyun kakkoyun commented Jul 15, 2020

The Receiver has a couple of similar error modes that could be originated from different resources (i.e local tsdb, forward requests for other peers in the ring), this PR attempts to unify how errors are returned from those paths and how the errors handled, especially for the nested errors, in order to provide consistent fail mode behaviour and make the code easier to follow.

One other reason was, for some cases because of the wrapped errors, it was wrongly categorising the root cause of the errors, this PR also attempts to fix it.

Signed-off-by: Kemal Akkoyun [email protected]

  • Change is not relevant to the end-user.

Changes

  • Unify error handling paths for fan-out forward requests.

Verification

  • make test-local

@kakkoyun kakkoyun requested review from brancz, bwplotka and squat July 15, 2020 16:11
@kakkoyun kakkoyun force-pushed the unify_receive_error_handling branch 2 times, most recently from f3f2cc5 to 02f2957 Compare July 15, 2020 16:49
Copy link
Member

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kakkoyun this PR introduces pretty invasive changes to how errors are counted and handled and it seems like more than just a unification IMO. In general, I am worried about such large changes to the core of the receive forwarding as this could easily introduce regressions. I would like to be very cautious about changing this logic as significantly as this PR unless there is a strong need for it. I would find it really helpful for the commit message to have a summary describing what problem(s) this PR is hoping to address to that we can see if maybe there is a simpler way to do it.

pkg/receive/handler.go Outdated Show resolved Hide resolved
pkg/receive/handler.go Outdated Show resolved Hide resolved
pkg/receive/handler_test.go Show resolved Hide resolved
@kakkoyun kakkoyun marked this pull request as draft July 16, 2020 07:00
@kakkoyun kakkoyun force-pushed the unify_receive_error_handling branch 2 times, most recently from cb2cf99 to 44ebc68 Compare July 20, 2020 08:10
@kakkoyun
Copy link
Member Author

@squat I have removed the nested counting, it should be good now.

@kakkoyun kakkoyun marked this pull request as ready for review July 20, 2020 08:29
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like the direction, but I would work on those helpers more. I think we could work on simplifying them more, it might be hard to follow.

pkg/receive/handler.go Outdated Show resolved Hide resolved
pkg/receive/handler.go Outdated Show resolved Hide resolved
pkg/receive/handler.go Outdated Show resolved Hide resolved
pkg/receive/handler.go Show resolved Hide resolved
pkg/receive/handler.go Outdated Show resolved Hide resolved
pkg/receive/handler.go Outdated Show resolved Hide resolved
pkg/receive/handler_test.go Outdated Show resolved Hide resolved
pkg/receive/handler.go Outdated Show resolved Hide resolved
@kakkoyun kakkoyun force-pushed the unify_receive_error_handling branch from f0f8259 to 4554982 Compare July 27, 2020 11:47
Copy link
Member

@brancz brancz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this refactoring a lot! I think we can do a bit better on naming so it's easier to understand what the functions do from a caller perspective.

pkg/receive/handler.go Outdated Show resolved Hide resolved
pkg/receive/handler.go Outdated Show resolved Hide resolved
@kakkoyun
Copy link
Member Author

@bwplotka @brancz I did my best to come up with a name. I'm too bad at naming things.

Copy link
Member

@brancz brancz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love it

@kakkoyun kakkoyun requested a review from bwplotka July 31, 2020 09:21
@kakkoyun kakkoyun requested a review from squat August 11, 2020 07:06
@kakkoyun
Copy link
Member Author

@squat @bwplotka Could you please have another look at this?

@kakkoyun kakkoyun force-pushed the unify_receive_error_handling branch from 6d6695e to e754ad7 Compare September 1, 2020 12:05
@kakkoyun
Copy link
Member Author

kakkoyun commented Sep 1, 2020

@squat @bwplotka Could you please have another look at this?

@squat @bwplotka friendly ping!

@kakkoyun kakkoyun requested a review from brancz September 2, 2020 11:27
Copy link
Member

@brancz brancz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm but I think I would like another pair of eyes on this

@kakkoyun
Copy link
Member Author

@bwplotka I think I had addressed your requests for this one. Could you have another look?

@kakkoyun kakkoyun force-pushed the unify_receive_error_handling branch from e754ad7 to a4beeaf Compare December 9, 2020 12:19
@kakkoyun
Copy link
Member Author

kakkoyun commented Dec 9, 2020

@bwplotka friendly ping.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just small nits 👍🏽

pkg/receive/handler.go Outdated Show resolved Hide resolved
pkg/receive/handler.go Outdated Show resolved Hide resolved
pkg/receive/handler.go Outdated Show resolved Hide resolved
Signed-off-by: Kemal Akkoyun <[email protected]>
Signed-off-by: Kemal Akkoyun <[email protected]>
Signed-off-by: Kemal Akkoyun <[email protected]>
Signed-off-by: Kemal Akkoyun <[email protected]>
@kakkoyun kakkoyun force-pushed the unify_receive_error_handling branch 4 times, most recently from 8039222 to 2fcb946 Compare December 17, 2020 08:36
Signed-off-by: Kemal Akkoyun <[email protected]>
@kakkoyun kakkoyun force-pushed the unify_receive_error_handling branch from 2fcb946 to 8ed993c Compare December 17, 2020 12:29
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 💪🏽

@bwplotka bwplotka merged commit 4723967 into thanos-io:master Dec 18, 2020
@kakkoyun kakkoyun deleted the unify_receive_error_handling branch December 18, 2020 10:23
brancz added a commit to Chans321/thanos that referenced this pull request Feb 3, 2021
yashrsharma44 pushed a commit to yashrsharma44/thanos that referenced this pull request Mar 10, 2021
Signed-off-by: Chans321 <[email protected]>

fixes in e2e test for new receieve architecture - more fixes pending

fixes in receive and e2e test of receive

Signed-off-by: Chans321 <[email protected]>

fixed connection reset by peer error in receive e2e tests. all e2e tests pass

Signed-off-by: Chans321 <[email protected]>

pkg/route: Fix unit tests

route: Make hashring only work on labels

route: Incorporate thanos-io#2899

fixed import issues and changed the query test signature

Signed-off-by: Yash Sharma <[email protected]>

made some corrections while rebasing receiver changes

Signed-off-by: Yash Sharma <[email protected]>

Added a method Error() to implement golang error interface

Signed-off-by: Yash Sharma <[email protected]>

linting

Signed-off-by: Yash Sharma <[email protected]>

use the new promauto package

Signed-off-by: Yash Sharma <[email protected]>

removed unused params

Signed-off-by: Yash Sharma <[email protected]>

rename method and add comments

Signed-off-by: Yash Sharma <[email protected]>

removed unused flags in receiver service in e2e tests

Signed-off-by: Yash Sharma <[email protected]>

replaced promauto with prometheus, as earlier was causing unnecessary panic

Signed-off-by: Yash Sharma <[email protected]>

changes for debug

Signed-off-by: Yash Sharma <[email protected]>

removed entry for receive test for e2e

Signed-off-by: Yash Sharma <[email protected]>

added a readiness probe

Signed-off-by: Yash Sharma <[email protected]>

fix E2E tests for receive router

Signed-off-by: yeya24 <[email protected]>
yashrsharma44 pushed a commit to yashrsharma44/thanos that referenced this pull request Mar 14, 2021
Signed-off-by: Chans321 <[email protected]>

fixes in e2e test for new receieve architecture - more fixes pending

fixes in receive and e2e test of receive

Signed-off-by: Chans321 <[email protected]>

fixed connection reset by peer error in receive e2e tests. all e2e tests pass

Signed-off-by: Chans321 <[email protected]>

pkg/route: Fix unit tests

route: Make hashring only work on labels

route: Incorporate thanos-io#2899

fixed import issues and changed the query test signature

Signed-off-by: Yash Sharma <[email protected]>

made some corrections while rebasing receiver changes

Signed-off-by: Yash Sharma <[email protected]>

Added a method Error() to implement golang error interface

Signed-off-by: Yash Sharma <[email protected]>

linting

Signed-off-by: Yash Sharma <[email protected]>

use the new promauto package

Signed-off-by: Yash Sharma <[email protected]>

removed unused params

Signed-off-by: Yash Sharma <[email protected]>

rename method and add comments

Signed-off-by: Yash Sharma <[email protected]>

removed unused flags in receiver service in e2e tests

Signed-off-by: Yash Sharma <[email protected]>

replaced promauto with prometheus, as earlier was causing unnecessary panic

Signed-off-by: Yash Sharma <[email protected]>

changes for debug

Signed-off-by: Yash Sharma <[email protected]>

removed entry for receive test for e2e

Signed-off-by: Yash Sharma <[email protected]>

added a readiness probe

Signed-off-by: Yash Sharma <[email protected]>

fix E2E tests for receive router

Signed-off-by: yeya24 <[email protected]>
yashrsharma44 pushed a commit to yashrsharma44/thanos that referenced this pull request Mar 23, 2021
Signed-off-by: Chans321 <[email protected]>

fixes in e2e test for new receieve architecture - more fixes pending

fixes in receive and e2e test of receive

Signed-off-by: Chans321 <[email protected]>

fixed connection reset by peer error in receive e2e tests. all e2e tests pass

Signed-off-by: Chans321 <[email protected]>

pkg/route: Fix unit tests

route: Make hashring only work on labels

route: Incorporate thanos-io#2899

fixed import issues and changed the query test signature

Signed-off-by: Yash Sharma <[email protected]>

made some corrections while rebasing receiver changes

Signed-off-by: Yash Sharma <[email protected]>

Added a method Error() to implement golang error interface

Signed-off-by: Yash Sharma <[email protected]>

linting

Signed-off-by: Yash Sharma <[email protected]>

use the new promauto package

Signed-off-by: Yash Sharma <[email protected]>

removed unused params

Signed-off-by: Yash Sharma <[email protected]>

rename method and add comments

Signed-off-by: Yash Sharma <[email protected]>

removed unused flags in receiver service in e2e tests

Signed-off-by: Yash Sharma <[email protected]>

replaced promauto with prometheus, as earlier was causing unnecessary panic

Signed-off-by: Yash Sharma <[email protected]>

changes for debug

Signed-off-by: Yash Sharma <[email protected]>

removed entry for receive test for e2e

Signed-off-by: Yash Sharma <[email protected]>

added a readiness probe

Signed-off-by: Yash Sharma <[email protected]>

fix E2E tests for receive router

Signed-off-by: yeya24 <[email protected]>

revert changes for cmd/main

Signed-off-by: Yash Sharma <[email protected]>

nitpicks

Signed-off-by: Yash Sharma <[email protected]>

removed redundant health check

Signed-off-by: Yash Sharma <[email protected]>

Add receive router registration

Signed-off-by: Yash Sharma <[email protected]>

not to be commited

Signed-off-by: Yash Sharma <[email protected]>

modified pkg/receive/handler.go

Signed-off-by: Yash Sharma <[email protected]>

modified pkg/receive/multitsdb.go

Signed-off-by: Yash Sharma <[email protected]>

modified pkg/route

Signed-off-by: Yash Sharma <[email protected]>

modified cmd/thanos/receive_route.go

Signed-off-by: Yash Sharma <[email protected]>

replaced MultiError with NonNilMultiError

Signed-off-by: Yash Sharma <[email protected]>

changed to promauto

Signed-off-by: Yash Sharma <[email protected]>

removed unused variables

Signed-off-by: Yash Sharma <[email protected]>

changed labelpb.Zlabel to timeseries

Signed-off-by: Yash Sharma <[email protected]>

whitespace removal and revert the labels

Signed-off-by: Yash Sharma <[email protected]>

use the main branch's implementation of MultiError

Signed-off-by: Yash Sharma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants