Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxy: Added a connectivity test on startup to test for bad routing #360

Conversation

hilts-vaughan
Copy link
Contributor

Fixes #348

It's just a basic test for now. Maybe we should put this behind a flag. This will help customers fail fast instead of it being "ready for connections" but being unable to dial into the DB at all, which is obviously pretty important.

@kurtisvg
Copy link
Contributor

kurtisvg commented Mar 7, 2020

Definitely want this behavior flagged. Currently leaning towards disabled by default.

@kurtisvg kurtisvg added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Mar 7, 2020
@hilts-vaughan
Copy link
Contributor Author

Sure, I'll get some flags behind it.

Should we include the examples to have it?

return fmt.Errorf("Performed connectivity test to %v and got an error: %v", instanceName, err)
}

conn.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be deferred (and called after L328) so it always happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work if conn was null in the case of an error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go doesn't have null, it has nil. The main difference is that nil is a "zero value" and safe to use for things like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it; I need to read me some more Go docs.

conn, err := client.Dial(instanceName)

if err != nil {
return fmt.Errorf("Performed connectivity test to %v and got an error: %v", instanceName, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should say something like "Failed to connect successfully to using IP %s using instance %s: ". Ideally it should indicate if it's using public/private as well, since this would be helpful for debugging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, it would be helpful to find out if there is a simple way to verify we have VPC access to rule out some troubleshooting (perhaps there is a metadata server we can query?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Ack, it looks like getting the IP is a more complex since right now the proxy client abstracts all that way. I will look into what we can do there.

  2. Looks like it possible to get the private vs. public information from the Admin API. Is the right thing to add that information to the instanceConfig struct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: getting the address; the Go errors already include this actually. Not sure if it's OK to depend on that implementation detail or not but since it's just diagnostic information maybe that's OK for the sake of simplicity.

cmd/cloud_sql_proxy/cloud_sql_proxy.go Outdated Show resolved Hide resolved
@kurtisvg
Copy link
Contributor

kurtisvg commented Mar 7, 2020

Sure, I'll get some flags behind it.

Should we include the examples to have it?

One thing I've been meaning to add is a section with troubleshooting suggestions. Perhaps this would be a good place to highlight it?

@hilts-vaughan
Copy link
Contributor Author

Sure, that sounds good. Had a couple more customer cases come through about this; wouldn't be bad to have something to help people troubleshoot this themselves.

@hilts-vaughan
Copy link
Contributor Author

https://cloud.google.com/sql/docs/mysql/sql-proxy#troubleshooting

I can add the flag to this page once we have decided on the details; I think that's the best bet?

@kurtisvg
Copy link
Contributor

kurtisvg commented Apr 8, 2020

@hilts-vaughan can you rebase this? I'm planning on a release soon and I'd like to include this.

@kurtisvg kurtisvg closed this Apr 8, 2020
@kurtisvg kurtisvg reopened this Apr 8, 2020
@kurtisvg kurtisvg assigned kurtisvg and unassigned jsimonweb Apr 8, 2020
}
}

logging.Infof("Ready for new connections")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe accept connections while the connectivity test is happening?

And if there's a reason not to do that, explain that in the flag doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the biggest reason I had was that the messaging was confusing since "Ready for new connections" was throwing a lot of customers off, thinking things were good to go.

Without knowing the results of the test, it could be misleading. We could as a compromise allow the connections but not log anything out until we know for sure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that's what we're doing here looking back I think. We're accepting connections but we don't tell the user until we know for sure. I could revise the messaging though (or maybe even add a new log before and after to make it more clear.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems right to me - it makes sense to wait to say "Ready for new connections" until we've definitely reached a stable state.

for _, instanceConfig := range cfgs {
err := checkInstanceConnectivity(instanceConfig.Instance, instanceConfig.IsPublic, proxyClient);
if err != nil {
log.Fatal(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the logging package?

followed by
os.Exit(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use fatal in the code all over. I count almost no errors and os.Exit, especially not used that way.

Could you let me know if you think that's the best way to proceed still?

@hilts-vaughan
Copy link
Contributor Author

Thanks for the review. I'll look at getting some of those changes integrated today. I'll rebase it too while I'm at it.

Would you prefer one commit kvg?

@kurtisvg
Copy link
Contributor

Thanks for the review. I'll look at getting some of those changes integrated today. I'll rebase it too while I'm at it.

Would you prefer one commit kvg?

What ever makes sense is fine. I usually do larger changes in separate commits but I also often group misc feedback into a single "Addresses feedback." commit.

@hilts-vaughan hilts-vaughan force-pushed the mst/connectivity-test-cloudsql branch from 41a7b3c to be060b7 Compare April 15, 2020 20:30
@hilts-vaughan hilts-vaughan force-pushed the mst/connectivity-test-cloudsql branch from e72c6e8 to ca0e769 Compare April 15, 2020 20:50
@hilts-vaughan
Copy link
Contributor Author

I had to remove the defer since it was trying to operate on the nil and was dereferencing it. It works fine under the new model.

Comment on lines +37 to +40
* `-perform_connectivity_tests`: Performs connectivity tests on startup to the
databases and exits the proxy if one of them fails.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flag name probably needs some work:

Suggested change
* `-perform_connectivity_tests`: Performs connectivity tests on startup to the
databases and exits the proxy if one of them fails.
* `-verify_connect`: Verifies the proxy has a valid connection path to the instance on startup. Exits if unable to connect for each instance.

Maybe other options:

  • -verify_connect_on_start
  • -verify_connection_path

@broady - thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll defer to you both on the flag name. I I think either is good.

cmd/cloud_sql_proxy/cloud_sql_proxy.go Outdated Show resolved Hide resolved
cmd/cloud_sql_proxy/cloud_sql_proxy.go Outdated Show resolved Hide resolved
// and nothing more. This will help rule out basic network connectivity problems, though
// such as firewalls and the like.
conn, err := client.Dial(instanceName)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: nix newline

}
}

logging.Infof("Ready for new connections")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems right to me - it makes sense to wait to say "Ready for new connections" until we've definitely reached a stable state.

// and mark that on the config
for _, mapping := range inst.IpAddresses {
if mapping.Type == "PRIMARY" {
ret.IsPublic = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if they've specified private as a preferred IP type? Will this still return public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow. For private instances, this should be false since that's the default value in Go from what I can tell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Cloud SQL instance can have both a public and private ip. By default, the proxy uses a PUBLIC ip if it exists, otherwise a PRIVATE one. Users can override this behavior by specifying a different preference using [-ip_address_types]. If I'm reading this correctly, the current logic doesn't take this into account, meaning it will mark an instance as public even if the private ip is used to dial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what you mean. I'm going to add another flag and try and reconcile it with a nicer error message.

Not sure what else; we could try and provide both? But the user may not care about checking both perse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Two queries:

  1. Is this OK or should we do something else with this?
  2. Is there an easy way for me to write some unit tests for this?

Copy link
Contributor

@kurtisvg kurtisvg Apr 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this approach is probably the right way either. I think probably some better approaches would be one of the following:

  1. Reimplement similar logic to findIpAddr here in cmd, except returns the IP type used rather than the addr (findIpType?).

  2. Modify findIpAddr to return addr and type, and then move the "test connectivity" function into the client so that it has access to it.

Option 2 is probably better since we don't have to reimplement logic and don't run the risk that one changes but not the other, but might be somewhat invasive. I'm not sure without taking a closer look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no obvious reason that findIpAddr has to be in certs either from what I can tell since it is free most of dependencies inside of that module. You could probably split it out into something else.

That being said, at what cost? Does the user really need to know public / private? They're going to get the IP in the actual exception anyway so they could go check. Do you think there is enough value add here?

The hairy part is probably the "cert source" structures and splitting those out.

Thoughts?

Copy link
Contributor

@kurtisvg kurtisvg Apr 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, we definitely want to do this. One of the bigger issues with the proxy is that folks don't understand it's not a VPN, and doesn't grant a connection to an instance via Private IP if it didn't have it before. As some point I'd like to be able to provide troubleshooting steps specific to the IP type to make the error actionable (Using private IP and can't connect - Is the firewall egress rules allow for port 3307? Are you on a VPC?) and maybe even add some diagnostic info (like checking for a VPC metadata server to confirm connectivity).

That said, having this info included as part of this PR is "nice to have" but not required - we can always expand upon it later. If you want to leave it out for now, let's go back to just letting the user know we weren't able to reach their instance via a given IP.

@hilts-vaughan hilts-vaughan requested a review from kurtisvg April 29, 2020 01:48
@kurtisvg
Copy link
Contributor

Closing this PR since it's gone stale.

@kurtisvg kurtisvg closed this Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add connectivity tests on startup to the actual instances
7 participants