-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCR image missing shell in v1.16 #317
Comments
ash
has gone from the latest container version
Hey folks, apologies if this broke any builds. This was done intentionally, we switched from using alpine to distroless as a base container to improve security. (You can see the new build in our Dockerfile.) I think we'll need to evaluate if we want to offer a second image for folks that need a shell. I'm not sure the best way to introduce the shell into a distroless container, but it looks like you can use the above dockerfile and update the distoless base image with the |
Thank you for the answer. |
Is version 1.15 build somewhere else/differently? The commit referenced seems to also be in 1.15? |
Yeah this broke my CD pipeline, absolutely fundamentally my fault for copy/pasting the above stack overflow article and for using
changing to |
I have the same issue. I don't really mind that there is no shell per se, but using 1.16 currently breaks the deployment of this image in our GKE Kubernetes clusters, as we used Switching to distroless is quite a radical change and should be tagged as a new major release version 2.0.0 IMHO. This would make it clear to everyone that fundamental changes have been might that might very well break builds. |
Hey folks, Apologies for any inconveniences this caused. I would strong recommend avoiding the use of We're not planning on guaranteeing the existence of any shells or other tools in the binary other than the proxy. If we were to define an API for this image, it would only contain the path that the proxy is located at ( For using the proxy with other tools, I would recommend downloading a release binary into an environment with the correct versions of other tools. We have versioned release links on the release page that can be used. For Cloud Build, I believe the correct way to connect to Cloud SQL would be to download the proxy to |
No need in apologies, its great and easy to use software and setting tag for image is not a big deal at least for me.
One issue, that same step also can be bash less, curl - wget less and that causing creating additional steps and containers. For example having ability to stop proxy from other build steps will make bash useless. |
We're not using |
Any chance that the release notes mention the fact that you changed to distroless? That seems to be a big change worth of being mentioned there. |
I was using a similar solution (running a shell script in the gce-proxy container), and it broke for me as well. Here is the workaround I have come up with: use the docker builder to send a TERM signal to the gce-proxy container.
|
Super!!! Thank you! |
Alternatively, here is another solution that keeps the execution of the proxy contained to a single step: - id: cmd-with-proxy
name: [YOUR-CONTAINER-HERE]
timeout: 100s
entrypoint: sh
args:
- -c
- '(/workspace/cloud_sql_proxy -dir=/workspace -instances=[INSTANCE_CONNECTION_NAME] & sleep 2) && [YOUR-COMMAND-HERE]' This starts the proxy in the background, waits 2 seconds for it to start up, and then executes the specified command. Since the proxy is in a background process, it will automatically exit when the process is complete. It does require that the proxy is in the - id: proxy-install
name: alpine:3.10
entrypoint: sh
args:
- -c
- 'wget -O /workspace/cloud_sql_proxy https://storage.googleapis.com/cloudsql-proxy/v1.16/cloud_sql_proxy.linux.386 && chmod +x /workspace/cloud_sql_proxy'
waitFor: ['-'] I verified earlier today that this works, and shows the step as failed if either the proxy fails to start or if the script exits with a non-zero code. Here's the steps:
- id: proxy-install
name: alpine:3.10
entrypoint: sh
args:
- -c
- 'wget -O /workspace/cloud_sql_proxy https://storage.googleapis.com/cloudsql-proxy/v1.16/cloud_sql_proxy.linux.386 && chmod +x /workspace/cloud_sql_proxy'
waitFor: ['-']
- id: execute-with-proxy
name: python:3.7
timeout: 100s
entrypoint: sh
args:
- -c
- '(/workspace/cloud_sql_proxy -dir=/workspace -instances=[INSTANCE_CONNECTION_NAME] & sleep 2) && (pip install -r requirements.txt && python test_sql.py)'
waitFor: ['proxy-install']
And here's db = sqlalchemy.create_engine(
# Equivalent URL:
# mysql+pymysql://<db_user>:<db_pass>@/<db_name>?unix_socket=/cloudsql/<cloud_sql_instance_name>
sqlalchemy.engine.url.URL(
drivername='mysql+pymysql',
username=db_user,
password=db_pass,
database=db_name,
query={
'unix_socket': '/workspace/{}'.format(cloud_sql_connection_name)})
)
try:
with db.connect() as conn:
now = conn.execute('SELECT NOW() as now').fetchone()
print('Connection successful.')
except Exception as ex:
print('Connection not successful: {}'.format(ex))
exit(1) I believe this should also work if the proxy was listening on a local TCP port, as now the proxy is running in the same container (but I haven't verify this). |
For kubernetes deployments, in the provided kubernetes doc, a lifecycle prestop hook calling "sleep" is given: https://github.com/GoogleCloudPlatform/cloudsql-proxy/blob/master/Kubernetes.md#creating-the-cloud-sql-proxy-deployment
This may be a silly question, but is this |
@cjrh Not a stupid question. Sleep won't work since there isn't a shell to execute it inside the proxy. The It also looks like our example has a typo in it. Can you try the following and see if it works without the lifecycle? command:
- /cloud_sql_proxy
- -dir=/cloudsql
- -instances=project:database1=tcp:0.0.0.0:3306,project:database2=tcp:0.0.0.0:3307
- -credential_file=/credentials/credentials.json
- -term_timeout=10s If so, please let me know and I'll update it accordingly. |
@kurtisvg Thanks for your reply! Yep it should be It seems to me there is a subtle, but important difference between the effect of using
However, for (1) it seems that the pod is not immediately removed from the ingress, and in practice there is an awkward delay where I still see new connections being made to my app container for a second or two, until the pod is removed from the LB. My goal is to also handle these requests (until the pod is removed from the LB and it no longer receives requests), and this typically means they require DB access via the sql proxy. If these requests are not handled correctly, errors are returned to the caller service. From my admittedly minimal testing, it seems like the sql proxy will not allow any new connections in the time window after SIGTERM, but before the existing open connections are closed according to the I apologise if I'm not making sense as I have minimal experience with k8s. The TLDR is that I'm trying to use a Perhaps an alternative design might be to just return some kind of specific error back to the upstream service and have it do retries. But that's messy. What is best practice for dealing with that tiny window of time between the pod being marked TERMINATING and it getting removed from the LB? Right now, I am using version 1.15 of the sql proxy because it still has a shell, and the "connection draining" behaviour I've described above is working well, so I'm unsure how to proceed here with 1.16 and beyond. If it matters, my app uses gunicorn. |
I would consider this a bug - IIRC the intended behavior was to continue accepting new connections until the |
Reproduction isn't necessary to disprove this, the code in
The sleeping for-loop will terminate if either the specific Should I make a new issue to request that this be changed to the following?
? |
@cjrh - Yes, please open a new issue. According to this comment on the PR for the feature, the intention was to allow connections to resume during shutdown. |
The problem's with https://github.com/GoogleContainerTools/distroless, this image base have not The cloudsql-proxy (Dockerfile) https://github.com/GoogleCloudPlatform/cloudsql-proxy/blob/master/Dockerfile#L27 or https://github.com/GoogleCloudPlatform/cloudsql-proxy/blob/1.16/Dockerfile#L27 use Distroless as base image in building. A solution possible's add the entry bellow. diff --git a/base/base.bzl b/base/base.bzl
index acde679..7ee5739 100644
--- a/base/base.bzl
+++ b/base/base.bzl
@@ -29,6 +29,7 @@ def distro_components(distro_suffix):
debs = [
DISTRO_PACKAGES[distro_suffix]["base-files"],
DISTRO_PACKAGES[distro_suffix]["netbase"],
+ DISTRO_PACKAGES[distro_suffix]["sh"],
DISTRO_PACKAGES[distro_suffix]["tzdata"],
],
env = { |
@kurtisvg My cloudbuild.yaml is:
Do you have any ideas of where I should start to look to fix this? |
For the cloudsql proxy authors, what do you think about adding a command to the proxy itself so it can do the healthcheck? i.e. |
@kurtisvg can you give some context on the outcome of this issue? There you link an alpine-based issue and an ubuntu-based image. Which one is it based on now? I came here looking for a way to implement this healthcheck like: database:
image: gcr.io/cloudsql-docker/gce-proxy:1.17
# ...
healthcheck:
test: pg_isready --dbname=${DB_NAME} --host=${DB_HOST} --port=${DB_PORT} --username=${DB_USER} But I'm not sure of the outcome of the discussion. Can you please give me some context on how to access this shell? I suppose that Ubuntu does have a shell while alpine does not. So Which one is it built against now? Thanks! |
|
@kurtisvg any ETA for those buster\alpine images? |
I would suggest this issue could be closed, since the |
@CarlQLange Looks already closed to me :) |
Oh my goodness, what a brain fart! Sorry about that! |
This is great. Would be even better if this was incorporated in an official gcp document that shows how to deploy from cloud build by keeping cloud sql running in background. If such doc already exists, its very hard to find. Please provide a link for future reference. |
Seems like
ash
has gone from the latest container versionWe used it at cloudbuild as like
For the trick which allowed us to not close connection until
touch /cloudsql/stop
command from other step.Now no
ash
orbash
in container and it become impossible to use it in cloudbuild in a simple mannerThe text was updated successfully, but these errors were encountered: