Clone from k8s services instead of individual hosts #422

dougfales · 2019-11-05T20:13:57Z

This PR is an attempt to make the cloning process, especially during/after failover, more resilient by using k8s services to locate a source to clone from.

The Problem

The issue that sparked this is #412. In short, if you delete the master's PVC and delete the master pod, then the master will never be re-cloned and can never re-join the cluster.

Solution: Clone From a Service

Our solution is based on the idea that we could just use k8s services to point to the sidecar backup service on the master and on healthy replicas. To do that, we simply added the SidecarServerPort to the existing master service. We had to add one new service, HealthyReplicasSVC, to point to healthy replicas only (excluding the master).

The logic for cloning then becomes:

Try to clone from the healthy replica service.
If that service appears unavailable, try cloning from the master service.
If that service is also unavailable, try cloning from the init bucket url.
If there is no init bucket, return without cloning (as will happen when creating a new empty cluster).

If during the course of these attempts there is an error, we return with the error immediately. This causes the clone-and-init command to exit(1), which results in the pod initialization being retried later by k8s. This "fail fast" approach is intentional; we did not want to embed a bunch of retry logic in this function when k8s is already there to try again in a much more visible way.

Tests

We created an e2e test case which demonstrates the failure described in the issue here.

We have also added unit tests for the functionality in RunCloneCommand. These tests rely on a mock backup server, defined in appclone_fakeserver_test.go. This mock server runs on two separate ports because that was the simplest way we could think of to emulate two separate k8s services in a unit test environment.

Caveats/Questions

We are still testing this approach ourselves, but so far it fixes the original issue and seems not to cause any new ones.

However, looking at the cloning logic has raised a few questions for us about what the operator should do in some cases. For instance:

If for some reason both replica and master services are temporarily unavailable, and the cluster has an init bucket URL, we will clone from that bucket. This may not always make sense, e.g. if someone has a very old init bucket URL. This might be something to consider for future documentation or improvement.
It seems possible that, at a specific point during a failover, the healthy replica service could be empty while the master service is not (the window is small, but it could happen). In that case, is it acceptable that this code would fall back to the master service, putting extra strain on the master from xtrabackup? We think this acceptable for now, but I'd like to make this scenario less likely in the future.
If these services were unavailable and there is no init bucket URL configured, an empty replica might be initialized when what we really wanted was a cloned replica. This replica would then likely fail to replicate from the master, and I assume it would be killed and recreated eventually by the operator. This is probably a scenario to cover with an e2e test case eventually.

Notes

I will attempt to leave some comments throughout the code as some of the changes are much clearer in context.

Thanks again for considering this PR @AMecea. Please let us know if you want any changes or if you think there's a better way to go about this solution.

Thanks to @stankevich and @eik3 for helping put this together.

This change adds the SidecarServerPort to the MasterService and introduces one new service, HealthyReplicasService, so that we can try to clone from replicas first, then fall back to master. * Added a test suite for RunCloneCommand logic, along with a mock backup server. * Added checks for service availability when cloning. * Added "fail fast" logic when unexpected errors occur during cloning/download. * Added dataDir cleanup code so that interrupted cloning does not leave dataDir in an inconsistent state. * Added e2e test demonstrating cloning failure when PVC is removed and pod recreated. * Changed the connect timeout from the default of 30s to 5s so that an empty k8s service will not cause cloning attempts to hang unnecessarily for 30s.

dougfales · 2019-11-05T20:18:16Z

pkg/sidecar/server.go

+	return &http.Transport{
+		Proxy: http.ProxyFromEnvironment,
+		DialContext: (&net.Dialer{
+			Timeout:   time.Duration(dialTimeout) * time.Second,


I could not find a better way to adjust the timeout for the initial connect to an empty service. The problem we are trying to solve: when a k8s service is empty, the connect operation will hang until the timeout is reached.The default being 30s meant that we had to wait 30s before trying the next service.

pkg/sidecar/appclone_test.go

pkg/sidecar/appclone.go

dougfales · 2019-11-05T20:27:01Z

pkg/sidecar/appclone.go

 	// nolint: gosec
-	xtbkCmd := exec.Command("xtrabackup", "--prepare",
+	xtbkCmd := exec.Command(xtrabackupCommand, "--prepare",


By replacing these with a variable, I can disable them as needed in unit tests. For example, when we pass a valid xbstream which is not necessarily also a valid xtrabackup directory.

dougfales · 2019-11-05T20:28:46Z

pkg/sidecar/appclone.go

+	return os.RemoveAll(lfPath)
+}
+
+func cleanDataDir() {


This was done to avoid leaving the dataDir in an unknown or partially restored state. The case we ran into was a large backup (hundreds of GBs) being interrupted midstream due to a network blip. On subsequent pod re-starts, mysql would continually fail to start as the PVC had only a partial backup.

dougfales · 2019-11-06T12:27:28Z

I will try to get the linter issues resolved today.

dougfales · 2019-11-07T18:30:42Z

@AMecea I've managed to get the linter errors fixed and tests to pass in drone, but please consider this a WIP for now. We are still working out some issues in our own environment, so there will likely be a few more tweaks to this before it should be considered for review/merging. I'll ping you again when we feel like this is ready for your review.

When no services are available, we want to die and let k8s re-initiialize the pod, *unless* this is the initial pod, in which case it is valid to allow initialization to coninue without cloning from a service or bucket.

calind

Looks good overall and we'll include the changes in the upcoming 0.4 release. Thanks for your contribution!

pkg/sidecar/appclone_test.go

dougfales · 2019-11-14T19:42:10Z

Sounds great! Thank you @calind! 👍

dougfales · 2019-12-09T20:58:11Z

Hi @calind I just noticed that I never re-requested your review after I made the requested changes last month. I'm re-submitting for your review now. Let me know if you want any other changes and I'll be happy to make them. Thanks!

dougfales · 2020-01-08T19:33:58Z

Hey @calind @AMecea I was wondering if there's anything else I can do here to help this get merged? Thanks!

AMecea · 2020-01-09T13:10:52Z

Hi @dougfales,

I'm sorry for the delay, we were focused on something else lately 😞

I will do a few more tests and maybe some code cleanup and I will do a new release that contains those changes v0.4.0.

Thank you for this PR, you did a grate job 👍 !

dougfales · 2020-01-09T21:47:45Z

Thanks for merging @AMecea! It was a pleasure working on this PR! 😄 👍

We'll keep an eye out for v0.4.0. Thanks!

dougfales commented Nov 5, 2019

View reviewed changes

pkg/sidecar/appclone_test.go Outdated Show resolved Hide resolved

dougfales commented Nov 5, 2019

View reviewed changes

pkg/sidecar/appclone.go Show resolved Hide resolved

dougfales commented Nov 5, 2019

View reviewed changes

dougfales added 4 commits November 6, 2019 11:43

Linter: Url -> URL.

3d8833a

License headers on new files.

8c5a01f

Skip backup data truncation tests when xbstream is not available.

ba6d7bf

xtrbackup -> xtrabackup.

b69f5e7

dougfales changed the title ~~Clone from k8s services instead of individual hosts~~ [WIP] Clone from k8s services instead of individual hosts Nov 7, 2019

dougfales added 7 commits November 8, 2019 06:02

Broken test assertion.

97e5ab7

Better name for custom transport with timeout.

6c00317

Adding a specific case for cluster initialization.

5690efd

When no services are available, we want to die and let k8s re-initiialize the pod, *unless* this is the initial pod, in which case it is valid to allow initialization to coninue without cloning from a service or bucket.

Exit with a different error code in the event of a clone failure.

056c880

Comments.

cab3824

More forgiving timeout for this e2e test.

4663046

Remove unnecessary if.

f9cad01

dougfales changed the title ~~[WIP] Clone from k8s services instead of individual hosts~~ Clone from k8s services instead of individual hosts Nov 12, 2019

calind requested changes Nov 14, 2019

View reviewed changes

pkg/sidecar/appclone_test.go Outdated Show resolved Hide resolved

Use addr.Suggest() instead of hardcoded ports for mock test server.

e3bb4a0

dougfales requested a review from calind December 9, 2019 20:56

AMecea merged commit fcbb5c3 into bitpoke:master Jan 9, 2020

dougfales deleted the use-services-for-cloning branch April 6, 2020 19:29

neumantm mentioned this pull request Jan 30, 2022

Cloning failure after PVC removal and pod deletion of master pod #412

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clone from k8s services instead of individual hosts #422

Clone from k8s services instead of individual hosts #422

dougfales commented Nov 5, 2019

dougfales Nov 5, 2019

dougfales Nov 5, 2019

dougfales Nov 5, 2019

dougfales commented Nov 6, 2019

dougfales commented Nov 7, 2019

calind left a comment

dougfales commented Nov 14, 2019

dougfales commented Dec 9, 2019

dougfales commented Jan 8, 2020

AMecea commented Jan 9, 2020

dougfales commented Jan 9, 2020

Clone from k8s services instead of individual hosts #422

Clone from k8s services instead of individual hosts #422

Conversation

dougfales commented Nov 5, 2019

The Problem

Solution: Clone From a Service

Tests

Caveats/Questions

Notes

dougfales Nov 5, 2019

Choose a reason for hiding this comment

dougfales Nov 5, 2019

Choose a reason for hiding this comment

dougfales Nov 5, 2019

Choose a reason for hiding this comment

dougfales commented Nov 6, 2019

dougfales commented Nov 7, 2019

calind left a comment

Choose a reason for hiding this comment

dougfales commented Nov 14, 2019

dougfales commented Dec 9, 2019

dougfales commented Jan 8, 2020

AMecea commented Jan 9, 2020

dougfales commented Jan 9, 2020