-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proxy: Added a connectivity test on startup to test for bad routing #360
proxy: Added a connectivity test on startup to test for bad routing #360
Conversation
Definitely want this behavior flagged. Currently leaning towards disabled by default. |
Sure, I'll get some flags behind it. Should we include the examples to have it? |
return fmt.Errorf("Performed connectivity test to %v and got an error: %v", instanceName, err) | ||
} | ||
|
||
conn.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be deferred (and called after L328) so it always happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work if conn
was null in the case of an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go doesn't have null
, it has nil
. The main difference is that nil
is a "zero value" and safe to use for things like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it; I need to read me some more Go docs.
conn, err := client.Dial(instanceName) | ||
|
||
if err != nil { | ||
return fmt.Errorf("Performed connectivity test to %v and got an error: %v", instanceName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should say something like "Failed to connect successfully to using IP %s using instance %s: ". Ideally it should indicate if it's using public/private as well, since this would be helpful for debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, it would be helpful to find out if there is a simple way to verify we have VPC access to rule out some troubleshooting (perhaps there is a metadata server we can query?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Ack, it looks like getting the IP is a more complex since right now the proxy client abstracts all that way. I will look into what we can do there.
-
Looks like it possible to get the private vs. public information from the Admin API. Is the right thing to add that information to the
instanceConfig
struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: getting the address; the Go errors already include this actually. Not sure if it's OK to depend on that implementation detail or not but since it's just diagnostic information maybe that's OK for the sake of simplicity.
One thing I've been meaning to add is a section with troubleshooting suggestions. Perhaps this would be a good place to highlight it? |
Sure, that sounds good. Had a couple more customer cases come through about this; wouldn't be bad to have something to help people troubleshoot this themselves. |
https://cloud.google.com/sql/docs/mysql/sql-proxy#troubleshooting I can add the flag to this page once we have decided on the details; I think that's the best bet? |
@hilts-vaughan can you rebase this? I'm planning on a release soon and I'd like to include this. |
} | ||
} | ||
|
||
logging.Infof("Ready for new connections") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe accept connections while the connectivity test is happening?
And if there's a reason not to do that, explain that in the flag doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the biggest reason I had was that the messaging was confusing since "Ready for new connections" was throwing a lot of customers off, thinking things were good to go.
Without knowing the results of the test, it could be misleading. We could as a compromise allow the connections but not log anything out until we know for sure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, that's what we're doing here looking back I think. We're accepting connections but we don't tell the user until we know for sure. I could revise the messaging though (or maybe even add a new log before and after to make it more clear.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems right to me - it makes sense to wait to say "Ready for new connections" until we've definitely reached a stable state.
for _, instanceConfig := range cfgs { | ||
err := checkInstanceConnectivity(instanceConfig.Instance, instanceConfig.IsPublic, proxyClient); | ||
if err != nil { | ||
log.Fatal(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the logging
package?
followed by
os.Exit(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use fatal
in the code all over. I count almost no errors and os.Exit
, especially not used that way.
Could you let me know if you think that's the best way to proceed still?
Thanks for the review. I'll look at getting some of those changes integrated today. I'll rebase it too while I'm at it. Would you prefer one commit kvg? |
What ever makes sense is fine. I usually do larger changes in separate commits but I also often group misc feedback into a single "Addresses feedback." commit. |
41a7b3c
to
be060b7
Compare
…tances are reachable
e72c6e8
to
ca0e769
Compare
I had to remove the |
* `-perform_connectivity_tests`: Performs connectivity tests on startup to the | ||
databases and exits the proxy if one of them fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flag name probably needs some work:
* `-perform_connectivity_tests`: Performs connectivity tests on startup to the | |
databases and exits the proxy if one of them fails. | |
* `-verify_connect`: Verifies the proxy has a valid connection path to the instance on startup. Exits if unable to connect for each instance. |
Maybe other options:
-verify_connect_on_start
-verify_connection_path
@broady - thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll defer to you both on the flag name. I I think either is good.
// and nothing more. This will help rule out basic network connectivity problems, though | ||
// such as firewalls and the like. | ||
conn, err := client.Dial(instanceName) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: nix newline
} | ||
} | ||
|
||
logging.Infof("Ready for new connections") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems right to me - it makes sense to wait to say "Ready for new connections" until we've definitely reached a stable state.
cmd/cloud_sql_proxy/proxy.go
Outdated
// and mark that on the config | ||
for _, mapping := range inst.IpAddresses { | ||
if mapping.Type == "PRIMARY" { | ||
ret.IsPublic = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if they've specified private as a preferred IP type? Will this still return public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow. For private instances, this should be false since that's the default value in Go
from what I can tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Cloud SQL instance can have both a public and private ip. By default, the proxy uses a PUBLIC ip if it exists, otherwise a PRIVATE one. Users can override this behavior by specifying a different preference using [-ip_address_types
]. If I'm reading this correctly, the current logic doesn't take this into account, meaning it will mark an instance as public even if the private ip is used to dial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get what you mean. I'm going to add another flag and try and reconcile it with a nicer error message.
Not sure what else; we could try and provide both? But the user may not care about checking both perse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Two queries:
- Is this OK or should we do something else with this?
- Is there an easy way for me to write some unit tests for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this approach is probably the right way either. I think probably some better approaches would be one of the following:
-
Reimplement similar logic to
findIpAddr
here in cmd, except returns the IP type used rather than the addr (findIpType
?). -
Modify
findIpAddr
to return addr and type, and then move the "test connectivity" function into the client so that it has access to it.
Option 2 is probably better since we don't have to reimplement logic and don't run the risk that one changes but not the other, but might be somewhat invasive. I'm not sure without taking a closer look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no obvious reason that findIpAddr
has to be in certs
either from what I can tell since it is free most of dependencies inside of that module. You could probably split it out into something else.
That being said, at what cost? Does the user really need to know public / private? They're going to get the IP in the actual exception anyway so they could go check. Do you think there is enough value add here?
The hairy part is probably the "cert source" structures and splitting those out.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point, we definitely want to do this. One of the bigger issues with the proxy is that folks don't understand it's not a VPN, and doesn't grant a connection to an instance via Private IP if it didn't have it before. As some point I'd like to be able to provide troubleshooting steps specific to the IP type to make the error actionable (Using private IP and can't connect - Is the firewall egress rules allow for port 3307? Are you on a VPC?) and maybe even add some diagnostic info (like checking for a VPC metadata server to confirm connectivity).
That said, having this info included as part of this PR is "nice to have" but not required - we can always expand upon it later. If you want to leave it out for now, let's go back to just letting the user know we weren't able to reach their instance via a given IP.
Co-Authored-By: Kurtis Van Gent <[email protected]>
…ilts-vaughan/cloudsql-proxy into mst/connectivity-test-cloudsql
…have something that is private
Closing this PR since it's gone stale. |
Fixes #348
It's just a basic test for now. Maybe we should put this behind a flag. This will help customers fail fast instead of it being "ready for connections" but being unable to dial into the DB at all, which is obviously pretty important.