Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch vendored etcd to v3.5.6 #114403

Closed
wants to merge 1 commit into from

Conversation

dims
Copy link
Member

@dims dims commented Dec 11, 2022

We updated etcd server version to v3.5.6 in #114093

In this PR, we update the vendored dependency as well.

Related to #104007 as well, we got rid of one more redundant dependency form3tech-oss/jwt-go

Signed-off-by: Davanum Srinivas davanum@gmail.com

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Signed-off-by: Davanum Srinivas <davanum@gmail.com>
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 11, 2022
@k8s-ci-robot
Copy link
Contributor

@dims: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 11, 2022
@dims
Copy link
Member Author

dims commented Dec 11, 2022

/sig architecture

@k8s-ci-robot k8s-ci-robot added sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 11, 2022
@dims
Copy link
Member Author

dims commented Dec 11, 2022

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 11, 2022
@dims
Copy link
Member Author

dims commented Dec 11, 2022

/assign @liggitt

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/cloudprovider area/dependency Issues or PRs related to dependency changes sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. labels Dec 11, 2022
@dims
Copy link
Member Author

dims commented Dec 11, 2022

ONE more down!!

@@ -1,6 +1,6 @@
 {
-	"directDependencies": 212,
-	"transitiveDependencies": 234,
-	"totalDependencies": 285,
+	"directDependencies": 211,
+	"transitiveDependencies": 233,
+	"totalDependencies": 284,
 	"maxDepthOfDependencies": 26
 }
\ No newline at end of file

@dims
Copy link
Member Author

dims commented Dec 12, 2022

/assign @serathius

@serathius
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 12, 2022
Comment on lines +722 to +723
return (strings.Compare(cancelReasonError.Error(), v3rpc.ErrGRPCInvalidAuthToken.Error()) == 0) ||
(strings.Compare(cancelReasonError.Error(), v3rpc.ErrGRPCAuthOldRevision.Error()) == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is stringifying the parameter error twice when it doesn't need to, and is stringifying constant API errors twice when they will never change

it's also unclear that exact string comparisons are correct for these... is there zero chance they will get wrapped by other errors that will add prefixes or suffixes? I'm surprised this isn't checking errors in a typed way

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@ahrtr ahrtr Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is stringifying the parameter error twice

Indeed there is no need to stringify the error twice. I just delivered a PR to improve this: etcd-io/etcd#14935

Thanks @liggitt for pointing this out.

it's also unclear that exact string comparisons are correct for these.

It's correct here. etcd defines a set of gRPC errors on server side. We need to translate the error to a string before responding to the client side.

The logic here is that we need to retry on some error types (ErrGRPCInvalidAuthToken and ErrGRPCAuthOldRevision) on client side. So we need to compare the error message. Note that we have strict error messages definition, there is no any wrap & unwrap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

etcd defines a set of gRPC errors on server side. We need to translate each error to a string before responding to the client side.

I'm really surprised the client is expected to compare error message strings to drive behavior. Isn't that fragile with skewed client and server versions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really surprised the client is expected to compare error message strings to drive behavior. Isn't that fragile with skewed client and server versions?

This is a valid concern. This should be a technical debt. We may consider to define client side codes in future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will kick off a discussion in the etcd community later, get you CCed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI. etcd-io/etcd#14992

@ahrtr will you be doing a v3.5.x release with that change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dims Yes, will do it for 3.5 today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +592 to +593
cancelReasonError := v3rpc.Error(errors.New(pbresp.CancelReason))
if shouldRetryWatch(cancelReasonError) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks really weird... is this handling a canceled flow in the created case? is that correct?

even if that's correct, it looks like this is triggering even in non-error cases (pbresp.CancelReason == "") - it's always creating a new error, a new rpc error, and calling shouldRetryWatch (I also commented in that method that stringification is happening more than it should)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above #114403 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if pbresp.CancelReason is empty, shouldn't we avoid all of this added code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.

@liggitt
Copy link
Member

liggitt commented Dec 12, 2022

/hold for a question on the etcd client change to the watch Created handling path

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 12, 2022
@dims
Copy link
Member Author

dims commented Dec 12, 2022

/retest

@ahrtr
Copy link
Member

ahrtr commented Dec 12, 2022

/lgtm
/approve

Thanks @dims

@k8s-ci-robot
Copy link
Contributor

@ahrtr: changing LGTM is restricted to collaborators

In response to this:

/lgtm
/approve

Thanks @dims

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, dims, serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@alexzielenski
Copy link
Contributor

/remove-sig api-machinery

@k8s-ci-robot k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Dec 13, 2022
ahrtr added a commit to ahrtr/etcd that referenced this pull request Dec 14, 2022
In order to fix etcd-io#12385,
PR etcd-io#14322 introduced a change
in which the client side may retry based on the error message
returned from server side.

This is not good, as it's too fragile and it's also changed the
protocol between client and server. Please see the discussion
in kubernetes/kubernetes#114403

Note: The issue etcd-io#12385 only
happens when auth is enabled, and client side reuse the same client
to watch.

So we decided to rollback the change on 3.5, reasons:
1.K8s doesn't enable auth at all. It has no any impact on K8s.
2.It's very easy for client application to workaround the issue.
  The client just needs to create a new client each time before watching.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
@liggitt
Copy link
Member

liggitt commented Jan 10, 2023

/test pull-kubernetes-dependencies
to run check added in #114952

@k8s-ci-robot
Copy link
Contributor

@dims: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-dependencies 472db10 link true /test pull-kubernetes-dependencies

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot
Copy link
Contributor

@dims: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 12, 2023
@dims dims closed this Jan 29, 2023
tjungblu pushed a commit to tjungblu/etcd that referenced this pull request Jul 26, 2023
In order to fix etcd-io#12385,
PR etcd-io#14322 introduced a change
in which the client side may retry based on the error message
returned from server side.

This is not good, as it's too fragile and it's also changed the
protocol between client and server. Please see the discussion
in kubernetes/kubernetes#114403

Note: The issue etcd-io#12385 only
happens when auth is enabled, and client side reuse the same client
to watch.

So we decided to rollback the change on 3.5, reasons:
1.K8s doesn't enable auth at all. It has no any impact on K8s.
2.It's very easy for client application to workaround the issue.
  The client just needs to create a new client each time before watching.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
tjungblu pushed a commit to tjungblu/etcd that referenced this pull request Jul 26, 2023
In order to fix etcd-io#12385,
PR etcd-io#14322 introduced a change
in which the client side may retry based on the error message
returned from server side.

This is not good, as it's too fragile and it's also changed the
protocol between client and server. Please see the discussion
in kubernetes/kubernetes#114403

Note: The issue etcd-io#12385 only
happens when auth is enabled, and client side reuse the same client
to watch.

So we decided to rollback the change on 3.5, reasons:
1.K8s doesn't enable auth at all. It has no any impact on K8s.
2.It's very easy for client application to workaround the issue.
  The client just needs to create a new client each time before watching.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/cloudprovider area/dependency Issues or PRs related to dependency changes cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

6 participants