Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.5] gRPC health server sets serving status to NOT_SERVING on defrag #17914

Merged
merged 2 commits into from
May 7, 2024

Conversation

tjungblu
Copy link
Contributor

@tjungblu tjungblu commented Apr 30, 2024

gRPC health server sets serving status to NOT_SERVING on defrag.

Backported from 3.6 in #16278


would be awesome to have this available in 3.5 already, we ran into this recently with a performance test when running etcd with 38 gigs.

@tjungblu tjungblu changed the title gRPC health server sets serving status to NOT_SERVING on defrag [3.5] gRPC health server sets serving status to NOT_SERVING on defrag Apr 30, 2024
gRPC health server sets serving status to NOT_SERVING on defrag
Backport from 3.6 in etcd-io#16278

Co-authored-by: Chao Chen <[email protected]>
Signed-off-by: Thomas Jungblut <[email protected]>
@tjungblu tjungblu force-pushed the backport_35_grpc_defragserv branch from 1794125 to 750bc0b Compare April 30, 2024 13:09
@tjungblu
Copy link
Contributor Author

the grpcProxy failure seems to be #16475 which wasn't backported to 3.5

Copy link
Member

@chaochn47 chaochn47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tjungblu, the backport looks clean.

@chaochn47
Copy link
Member

chaochn47 commented May 3, 2024

Click approve too soon: there is a follow up PR #16836 which improves code readability. Could you please also take that one?

There is another in flight PR fixing the insecure / secure RPC servers set up #16959.. Although this is not a blocker of this PR since this is protected by an experimental feature flag and still improves the availability, we should document that in the help messages clearly.

@tjungblu
Copy link
Contributor Author

tjungblu commented May 6, 2024

Thanks @chaochn47, I'll pick the other PR too and let you know

Backport from 3.6 in etcd-io#16836

Co-authored-by: Chao Chen <[email protected]>
Signed-off-by: Thomas Jungblut <[email protected]>
@tjungblu
Copy link
Contributor Author

tjungblu commented May 6, 2024

@chaochn47 please take a look again with your refactoring applied

otherwise @ahrtr and @serathius PTAL

@@ -301,6 +301,7 @@ func newConfig() *config {
fs.DurationVar(&cfg.ec.ExperimentalWarningApplyDuration, "experimental-warning-apply-duration", cfg.ec.ExperimentalWarningApplyDuration, "Time duration after which a warning is generated if request takes more time.")
fs.BoolVar(&cfg.ec.ExperimentalMemoryMlock, "experimental-memory-mlock", cfg.ec.ExperimentalMemoryMlock, "Enable to enforce etcd pages (in particular bbolt) to stay in RAM.")
fs.BoolVar(&cfg.ec.ExperimentalTxnModeWriteWithSharedBuffer, "experimental-txn-mode-write-with-shared-buffer", true, "Enable the write transaction to use a shared buffer in its readonly check operations.")
fs.BoolVar(&cfg.ec.ExperimentalStopGRPCServiceOnDefrag, "experimental-stop-grpc-service-on-defrag", cfg.ec.ExperimentalStopGRPCServiceOnDefrag, "Enable etcd gRPC service to stop serving client requests on defragmentation.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding experimental prefix and removing it in future is a breaking change as we discussed in #17657. Also see #17657

Are we still happy to add prefix experimental for an flag?

I know this PR just backports changes from main to 3.5, so the suggestion (adding experimental in description only) is actually for the main branch (update main firstly). Do we want to follow the suggestion for now? Or do we have an agreement on #17657 and #17657 before we add any new flags? @fuweid @jmhbnz @serathius @siyuanfoundation @spzala

Suggested change
fs.BoolVar(&cfg.ec.ExperimentalStopGRPCServiceOnDefrag, "experimental-stop-grpc-service-on-defrag", cfg.ec.ExperimentalStopGRPCServiceOnDefrag, "Enable etcd gRPC service to stop serving client requests on defragmentation.")
fs.BoolVar(&cfg.ec.ExperimentalStopGRPCServiceOnDefrag, "stop-grpc-service-on-defrag", cfg.ec.ExperimentalStopGRPCServiceOnDefrag, "(Experimental) Enable etcd gRPC service to stop serving client requests on defragmentation.")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of #17657 we will still need to support migration from the old flag naming scheme (with prefix) to the new one. One more flag should not change much as we already have other flags to migrate. So I don't think we need to block on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy for this to proceed using older approach. We are still working on the KEP for feature flags so it will be a while off and agree with @serathius above we don't want to block this work.

@chaochn47
Copy link
Member

@chaochn47 please take a look again with your refactoring applied

LGTM, thanks @tjungblu !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants