Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clone from k8s services instead of individual hosts #422

Merged
merged 13 commits into from
Jan 9, 2020

Conversation

dougfales
Copy link
Contributor

This PR is an attempt to make the cloning process, especially during/after failover, more resilient by using k8s services to locate a source to clone from.

The Problem

The issue that sparked this is #412. In short, if you delete the master's PVC and delete the master pod, then the master will never be re-cloned and can never re-join the cluster.

Solution: Clone From a Service

Our solution is based on the idea that we could just use k8s services to point to the sidecar backup service on the master and on healthy replicas. To do that, we simply added the SidecarServerPort to the existing master service. We had to add one new service, HealthyReplicasSVC, to point to healthy replicas only (excluding the master).

The logic for cloning then becomes:

  1. Try to clone from the healthy replica service.
  2. If that service appears unavailable, try cloning from the master service.
  3. If that service is also unavailable, try cloning from the init bucket url.
  4. If there is no init bucket, return without cloning (as will happen when creating a new empty cluster).

If during the course of these attempts there is an error, we return with the error immediately. This causes the clone-and-init command to exit(1), which results in the pod initialization being retried later by k8s. This "fail fast" approach is intentional; we did not want to embed a bunch of retry logic in this function when k8s is already there to try again in a much more visible way.

Tests

We created an e2e test case which demonstrates the failure described in the issue here.

We have also added unit tests for the functionality in RunCloneCommand. These tests rely on a mock backup server, defined in appclone_fakeserver_test.go. This mock server runs on two separate ports because that was the simplest way we could think of to emulate two separate k8s services in a unit test environment.

Caveats/Questions

We are still testing this approach ourselves, but so far it fixes the original issue and seems not to cause any new ones.

However, looking at the cloning logic has raised a few questions for us about what the operator should do in some cases. For instance:

  1. If for some reason both replica and master services are temporarily unavailable, and the cluster has an init bucket URL, we will clone from that bucket. This may not always make sense, e.g. if someone has a very old init bucket URL. This might be something to consider for future documentation or improvement.

  2. It seems possible that, at a specific point during a failover, the healthy replica service could be empty while the master service is not (the window is small, but it could happen). In that case, is it acceptable that this code would fall back to the master service, putting extra strain on the master from xtrabackup? We think this acceptable for now, but I'd like to make this scenario less likely in the future.

  3. If these services were unavailable and there is no init bucket URL configured, an empty replica might be initialized when what we really wanted was a cloned replica. This replica would then likely fail to replicate from the master, and I assume it would be killed and recreated eventually by the operator. This is probably a scenario to cover with an e2e test case eventually.

Notes

I will attempt to leave some comments throughout the code as some of the changes are much clearer in context.

Thanks again for considering this PR @AMecea. Please let us know if you want any changes or if you think there's a better way to go about this solution.

Thanks to @stankevich and @eik3 for helping put this together.

This change adds the SidecarServerPort to the MasterService and
introduces one new service, HealthyReplicasService, so that we can try
to clone from replicas first, then fall back to master.

* Added a test suite for RunCloneCommand logic, along with a mock backup
server.
* Added checks for service availability when cloning.
* Added "fail fast" logic when unexpected errors occur during
cloning/download.
* Added dataDir cleanup code so that interrupted cloning does not leave
dataDir in an inconsistent state.
* Added e2e test demonstrating cloning failure when PVC is removed and
pod recreated.
* Changed the connect timeout from the default of 30s to 5s so that an
empty k8s service will not cause cloning attempts to hang unnecessarily
for 30s.
return &http.Transport{
Proxy: http.ProxyFromEnvironment,
DialContext: (&net.Dialer{
Timeout: time.Duration(dialTimeout) * time.Second,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find a better way to adjust the timeout for the initial connect to an empty service. The problem we are trying to solve: when a k8s service is empty, the connect operation will hang until the timeout is reached.The default being 30s meant that we had to wait 30s before trying the next service.

// nolint: gosec
xtbkCmd := exec.Command("xtrabackup", "--prepare",
xtbkCmd := exec.Command(xtrabackupCommand, "--prepare",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By replacing these with a variable, I can disable them as needed in unit tests. For example, when we pass a valid xbstream which is not necessarily also a valid xtrabackup directory.

return os.RemoveAll(lfPath)
}

func cleanDataDir() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done to avoid leaving the dataDir in an unknown or partially restored state. The case we ran into was a large backup (hundreds of GBs) being interrupted midstream due to a network blip. On subsequent pod re-starts, mysql would continually fail to start as the PVC had only a partial backup.

@dougfales
Copy link
Contributor Author

I will try to get the linter issues resolved today.

@dougfales
Copy link
Contributor Author

@AMecea I've managed to get the linter errors fixed and tests to pass in drone, but please consider this a WIP for now. We are still working out some issues in our own environment, so there will likely be a few more tweaks to this before it should be considered for review/merging. I'll ping you again when we feel like this is ready for your review.

@dougfales dougfales changed the title Clone from k8s services instead of individual hosts [WIP] Clone from k8s services instead of individual hosts Nov 7, 2019
When no services are available, we want to die and let k8s
re-initiialize the pod, *unless* this is the initial pod, in which case
it is valid to allow initialization to coninue without cloning from a
service or bucket.
@dougfales dougfales changed the title [WIP] Clone from k8s services instead of individual hosts Clone from k8s services instead of individual hosts Nov 12, 2019
Copy link
Member

@calind calind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall and we'll include the changes in the upcoming 0.4 release. Thanks for your contribution!

pkg/sidecar/appclone_test.go Outdated Show resolved Hide resolved
@dougfales
Copy link
Contributor Author

Sounds great! Thank you @calind! 👍

@dougfales dougfales requested a review from calind December 9, 2019 20:56
@dougfales
Copy link
Contributor Author

Hi @calind I just noticed that I never re-requested your review after I made the requested changes last month. I'm re-submitting for your review now. Let me know if you want any other changes and I'll be happy to make them. Thanks!

@dougfales
Copy link
Contributor Author

Hey @calind @AMecea I was wondering if there's anything else I can do here to help this get merged? Thanks!

@AMecea AMecea merged commit fcbb5c3 into bitpoke:master Jan 9, 2020
@AMecea
Copy link
Contributor

AMecea commented Jan 9, 2020

Hi @dougfales,

I'm sorry for the delay, we were focused on something else lately 😞

I will do a few more tests and maybe some code cleanup and I will do a new release that contains those changes v0.4.0.

Thank you for this PR, you did a grate job 👍 !

@dougfales
Copy link
Contributor Author

Thanks for merging @AMecea! It was a pleasure working on this PR! 😄 👍

We'll keep an eye out for v0.4.0. Thanks!

@dougfales dougfales deleted the use-services-for-cloning branch April 6, 2020 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants