Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloudsql-proxy keeps active connections to DB instance even after client disconnects #147

Closed
innocode-devops opened this issue Feb 8, 2018 · 36 comments
Assignees
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@innocode-devops
Copy link

Hello, we've noticed that the proxy sometimes keeps active connections to DB after the K8S payload requesting these connection died. Observed on PostgreSQL instances.

For example, we have deployment, which spawns 1 pod, application in this pod requests 10 connections. If we go to CloudSQL dashboard for this instance, we'll see that there are 10 new active connections now.

If we have autoscaling event, deployment spawns one more pod with 10 more connections. We can see that reflected on the dashboard. But after application load decreases and deployment scales down back to 1 pod from 2, we can still see 20 active connections to DB. If we delete cloudsql-proxy pod and its respective deployment spawns a new one, we will see that application reconnects and there are now 10 normal active connections.

This does not happen consistently 100% of a time, but is usually related to scaling events. This can create a problem, when these zombie connections add up to using significant fraction of max active connections possible on that DB.

@sigxcpu76
Copy link
Contributor

We've lost a node in GKE and all the connections from lost pods are still alive from the cloud-sqlproxy point of view, therefore it keeps the server-side connections too. This quickly adds to the PostgreSQL instance limit.

@Carrotman42
Copy link
Contributor

Carrotman42 commented Mar 22, 2018 via email

@sigxcpu76
Copy link
Contributor

I am pretty sure that cloud SQL proxy never notices the disconnects. If the client is abnormally terminated, there is no TCP RST sent to the proxy, so it never notices the client connection went down.
Also, the tcp keepalive mechanism from kernel doesn't seem to kick in.
I think the best approach would be to implement TCP socket keep-alive on client connections and close the ones that are dead.

@innocode-devops
Copy link
Author

@Carrotman42 is there any traction on this issue? The more the load, the more drastic the effect. Take a look at this screenshot:

cloudsql

When this happens in the middle of the night, it may use all the active connections quota and SQL instance might go into failover mode, which will degrade app's performance. Also throwing CPUs at the SQL instance to increase active connections count is not money-wise :'(

@mwmanley
Copy link

+1 to this as we have been experiencing the same issues on open Postgres proxy connections. Since Google does not expose a method for increasing connection pool sizes within CloudSQL, which is an annoying deficiency, it would be nice if this issue was fixed sooner than later.

@kYann
Copy link

kYann commented Jul 2, 2018

Hi !
Could we have a new release with this fix ?

Thanks,

Yann

@kurtisvg kurtisvg added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. 🚨 needs triage 🚨 labels Jul 16, 2018
@kurtisvg
Copy link
Contributor

@hfwang Could you take a look at this?

@pudo
Copy link

pudo commented Aug 7, 2018

We've got the same issue, to the tune of 130 connections on an instance with 200 limit. Is a new release planned?

@hfwang
Copy link
Contributor

hfwang commented Aug 14, 2018

Sorry for the long delay. The change looks good, I'm going to pull it into our internal repo, and let it soak in our test environment for the remainder of this week. Assuming all goes well, I'll merge it in.

@kurtisvg kurtisvg added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. Status: In Progress and removed 🚨 needs triage 🚨 labels Aug 16, 2018
@hfwang hfwang closed this as completed in 2ec72f4 Aug 22, 2018
@ekimia
Copy link

ekimia commented Aug 28, 2018

@hfwang is this in the active version?

@mikesoares
Copy link

@hfwang We have been experiencing the same issue. We built the cloud_sql_proxy binary from source (latest from master with your fix), deployed this to a new Docker image (based on 1.11), then spun up new pods which use it. We are still experiencing the same issue.

For more background and context, we have a Django app deployed with the CONN_MAX_AGE setting set to 30. The default behavior to close connections immediately after use works without issue, but obviously there's some overhead associated with doing this (hence, the reason why we'd like to reuse connections).

Interestingly, this only seems to be an issue when using cloud_sql_proxy in GCP. When we set this up in a separate AWS deployment (using the 1.11 release) to point to a CloudSQL instance with the CONN_MAX_AGE setup described above, we see no issue whatsoever, and actually see the connections getting closed accordingly after 30 seconds. This seems to imply your fix is not sufficient or the issue lies elsewhere. Thoughts?

@Carrotman42
Copy link
Contributor

Carrotman42 commented Aug 29, 2018 via email

@mikesoares
Copy link

mikesoares commented Aug 29, 2018

@Carrotman42

We're running Debian Stretch across the board. Only some slight differences, listed below, including uname output.

On AWS, our Django app is not containerized and and runs on a regular EC2 instance.

$ uname -a
Linux *************** 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 GNU/Linux

On AWS, the cloud_sql_proxy binary is running on its own instance as well.

$ uname -a
Linux *************** 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4 (2018-08-21) x86_64 GNU/Linux

On GCP, we're using a Kubernetes pod with cloud_sql_proxy running alongside as recommended.

# uname -a
Linux *************** 4.4.111+ #1 SMP Sat May 5 12:48:47 PDT 2018 x86_64 GNU/Linux

@mikesoares
Copy link

@Carrotman42 @hfwang Any new hunches or findings?

@ayushin
Copy link

ayushin commented Dec 21, 2018

We have the same problem, has anybody found a solution?

@kurtisvg
Copy link
Contributor

kurtisvg commented Jan 3, 2019

Reopening issue since the fix doesn't appear to have resolved the issue.

@kurtisvg kurtisvg reopened this Jan 3, 2019
@lychung83
Copy link
Contributor

Hello, I'm trying to reproduce this problem. Would you mind give me some detailed instructions to do it?

@sigxcpu76
Copy link
Contributor

No. cloudsqlproxy keeps connections open because it never knows that the client disappeared (e.g. kubernetes node that had the client crashed).

@mariusstaicu
Copy link

We have 5 identical kubernetes pods of a spring boot app connecting through cloudsql-proxy to a postgres database in GCP. Each pod contains one container with the app and one container with cloudsql-proxy. We're using Hikari for connection pooling. Once every 6-8 days our hikaricp logs report connection leaks on all the instances approximately at 10ms apart:

Time message instance
July 9th 2019, 13:06:07.836 Connection leak detection triggered for org.postgresql.jdbc.PgConnection@10b3ef79 on thread http-nio-8080-exec-7, stack trace follows shop-service-5b8bf45b85-5vszs
July 9th 2019, 13:06:07.834 Connection leak detection triggered for org.postgresql.jdbc.PgConnection@399af6ab on thread http-nio-8080-exec-2, stack trace follows shop-service-5b8bf45b85-5vszs
July 9th 2019, 13:06:07.833 Connection leak detection triggered for org.postgresql.jdbc.PgConnection@642d532b on thread http-nio-8080-exec-10, stack trace follows shop-service-5b8bf45b85-p6ttq
July 9th 2019, 13:06:07.831 Connection leak detection triggered for org.postgresql.jdbc.PgConnection@5c8c19ed on thread http-nio-8080-exec-4, stack trace follows shop-service-5b8bf45b85-9fd8q
July 9th 2019, 13:06:07.825 Connection leak detection triggered for org.postgresql.jdbc.PgConnection@1482764c on thread http-nio-8080-exec-9, stack trace follows shop-service-5b8bf45b85-xgrfn
July 9th 2019, 13:06:07.825 Connection leak detection triggered for org.postgresql.jdbc.PgConnection@65417777 on thread http-nio-8080-exec-9, stack trace follows shop-service-5b8bf45b85-ncgnb

Maybe this is an indication of something happening with all the proxy instances at the same time (maybe closed and reopened connection with cloud sql ?) and after that, the connections are not released by cloudsql-proxy ?

@kurtisvg
Copy link
Contributor

kurtisvg commented Jul 9, 2019

@mariusstaicu - If the leak is being detected in Hikari, the leak is being caused by your application and not with cloud-sql-proxy. You can check out our page on Managing Database Connections and take care that connections are always closed, even when an error occurs.

@mariusstaicu
Copy link

usually yes, however how come it occurs on multiple instances in the same time ?

@kurtisvg
Copy link
Contributor

kurtisvg commented Jul 9, 2019

That would depend on the application, so I'm not able to answer. It could be some scheduled or reoccurring event that isn't freeing the connections. If Hikari is able to detect the leaks, it means the leak is client side and the proxy is working as intended (it should hold connections to the database open as long as the client does).

@gustavovalverde
Copy link

Just following up on this one. I had to use pgbouncer in front of cloudsql proxy because of this issue. Any ETA for this fix?

@kurtisvg
Copy link
Contributor

@gustavovalverde We've been unable to replicate this at all since #161 was released. I've tried to replicate it on more than one occasion, but unable to do so. If you have steps that can replicate it somewhat consistency, we'd like to hear them.

@c4tz
Copy link

c4tz commented Jan 13, 2020

We found a way to reproduce this behaviour locally:

$ docker run -d -v $HOME/.config/gcloud:/root/.config/gcloud -p 5432:5432 --name proxy gcr.io/cloudsql-docker/gce-proxy:1.11 /cloud_sql_proxy -instances=PROJECTID:ZONE:INSTANCE=tcp:0.0.0.0:5432
2020/01/13 11:22:30 Listening on 0.0.0.0:5432 for PROJECTID:ZONE:INSTANCE
2020/01/13 11:22:30 Ready for new connections
$ docker run -it --link proxy postgres:12.1-alpine psql -h proxy -U USERNAME
Password for user USERNAME:
psql (12.1, server 9.6.15)
Type "help" for help.

(leave the connection open)
$ kill -9 CLIENT_PID

This will make cloudsql_proxy keep the connection to the cloudsql instance, because it never gets a closing signal from the psql process running in the other Docker container.

We encountered the problem described in this thread only with PREEMPTIBLE nodes in our k8s clusters. This makes perfect sense, as they might even get a ACPI G3 Mechanical Off (as described here) and thus might not be able to send a signal to the proxy in order to close the connection anymore.

As shown in the example, we use v1.11, which should already contain the changes from #161.

@kurtisvg kurtisvg self-assigned this Jan 13, 2020
@kurtisvg
Copy link
Contributor

@c4tz - #161 isn't listed in the patch notes until v1.12

@c4tz
Copy link

c4tz commented Jan 14, 2020

Good catch, but unfortunately I can reproduce the problem with v1.16 locally, too. The following graph shows the increase of connections while I started and killed the client's processes. It went back down only after I shut down the cloudsql_proxy container.

image

@devilleweppenaar
Copy link

devilleweppenaar commented May 29, 2020

Can we maybe have an ETA of when someone will be able to look at this issue?

It wouldn't have been a major issue for us if there was an alternative version based on an image with a shell (#330, #371), which would allow us to configure a livenessProbe, like so:

livenessProbe:
  exec:
    command: ["nc", "-nz", "127.0.0.1", "3306"]
  initialDelaySeconds: 5
  timeoutSeconds: 3

But as it stands, this causes a system outage whenever our CloudSQL instance performs any maintenance that closes client connections.

We're running v1.16.

@kurtisvg
Copy link
Contributor

Hi @devillexio - this is currently on my plate, so it's my fault the ball was dropped here.

Can we maybe have an ETA of when someone will be able to look at this issue?

I actually did spend 2-3 days looking into this again a month or so ago, but neglected to follow up here. As it currently stands, I'm still unable to replicate this issue, even following the above instructions. As soon as I close psql the connection seems to disappears.

I've tried using in a couple of different languages / frameworks, I have tried interrupting connections mid long running statements, and every weird edge case I was able to come up with. On 2 separate occasions I've even sat down with customers reporting similar issues to try to debug and both times it amounted to a misunderstanding of how their connection pool library was handling connections.

I currently suspect that this issue is indeed fixed, and that continued reports have been a misunderstanding. However, I intended to write a new e2e test to verify, but ended up falling down the rabbit hole on a couple of other efforts and haven't made time to circle back yet. That's my bad, and I'm sorry I haven't followed up sooner.

It wouldn't have been a major issue for us if there was an alternative version based on an image with a shell (#330, #371)

As a suitable workaround, you can pretty easily create your down Dockerfile where you can curl the latest version of the Cloud SQL proxy right from our releases page. That said, we are planning on having those two issues closed up before the end of this quarter.

which would allow us to configure a livenessProbe

Tangentially, I wouldn't really recommend relying too heavily on a livenessProbe like this. This check verifies that the proxy is listening, but not necessarily that it's connecting successfully. It's usually more thorough to have a pod check that verifies your application is able to successfully reach your database, and restart the whole pod if necessary.

this causes a system outage whenever our CloudSQL instance performs any maintenance that closes client connections.

Could you explain why this is? This might be a different issue unrelated to this one - this current problem has been describe as cache of client side connections. If the server terminates the connection, the Cloud SQL proxy would have to be actively attempt to reconnect in order to tie up more server connection slots (which I don't think is the case).

@devilleweppenaar
Copy link

Hi @kurtisvg. Thank you for the prompt, honest and detailed response! 🙌

I fully appreciate that this is a difficult issue to reproduce consistently, and would happily share any information that would help in your investigation.

I apologize for the length of this reply, but there's a lot of information to cover.

What we have to go on is basically the symptoms of SQL connectivity issues we've experienced a number of times that always seem to coincide with our CloudSQL instance performing maintenance that results in active connections being closed, and new connection attempts timing out.

During the latest incident, two days ago, we noticed the following:

  • Our CloudSQL instance completed an update
  • During and after the update our applications started logging fatal DB timeout errors, which resulted in the application pod restarting (thanks to the container liveness)
  • These errors (and resulting application pod restarts) continued until we restarted the whole deployment by issuing a restart using: kubectl rollout restart deploy/<application-deployment>
  • After the deployment restart, the application and sidecar proxy pod came back up, no further errors were logged or application pod restarts were observed, and processing resumed as per usual.

During our investigation, we also discovered our CloudSQL reported 39 Aborted_connects and 5337 Aborted_clients since it last started up. It reported there were 0 Connection_errors.

  • Aborted_clients - The number of connections that were aborted because the client died without closing the connection properly. See Section B.4.2.10, “Communication Errors and Aborted Connections”.
  • Aborted_connects - The number of failed attempts to connect to the MySQL server. See Section B.4.2.10, “Communication Errors and Aborted Connections”. For additional connection-related information, check the Connection_errors_xxx status variables and the host_cache table. As of MySQL 5.7.3, Aborted_connects is not visible in the embedded server because for that server it is not updated and is not meaningful.
  • Connection_errors_xxx - These variables provide information about errors that occur during the client connection process. They are global only and represent error counts aggregated across connections from all hosts. These variables track errors not accounted for by the host cache (see Section 5.1.11.2, “DNS Lookups and the Host Cache”), such as errors that are not associated with TCP connections, occur very early in the connection process (even before an IP address is known), or are not specific to any particular IP address (such as out-of-memory conditions).

Naturally, we were immediately drawn to the highest number, Aborted_clients. Following the link, we found: If a client is unable even to connect, the server increments the Aborted_connects status variable.

Unsuccessful connection attempts can occur for the following reasons:

  • A client attempts to access a database but has no privileges for it.
  • A client uses an incorrect password.
  • A connection packet does not contain the right information.
  • It takes more than connect_timeout seconds to obtain a connect packet. See Section 5.1.7, “Server System Variables”.

We also noticed that after the DB restart, the number of MySQL connections did not rise up to the same number as before the restart:
image

As mentioned, we've noticed a similar series of events for at least the last 3 instances where our CloudSQL instance performed maintenance.

Given what was observed, we thought that a livenessProbe on the cloudsql-proxy sidecar container would be a suitable workaround. A number of failed attempts led us to this ticket and the discovery that our workaround needs a shell.

Thank you for the suggestion of rolling our own Docker image, but I would honestly prefer using a more official one maintained by the community, if possible.

Tangentially, I wouldn't really recommend relying too heavily on a livenessProbe like this. This check verifies that the proxy is listening, but not necessarily that it's connecting successfully. It's usually more thorough to have a pod check that verifies your application is able to successfully reach your database, and restart the whole pod if necessary.

I agree completely that relying on this workaround isn't ideal. That being said, I'm not sure if the end result of the proposed suggestion would be significantly different from what we currently have, i.e. a database timeout error encountered by our application is considered fatal, which then restarts the application pod.

I'm happy to provide any additional information that would help.

For reference, our deployment details are as follows:

  • application
    • image node:lts-slim; curretly node v12.17.0 (npm v6.14.4)
    • database driver: sidorares/node-mysql2 (v2.1.0)
    • We have both uncaughtException and unhandledRejection for the node process, and both of these result in a process exit, and pod restart, after logging the error.
  • cloudsql-proxy
    • image: gcr.io/cloudsql-docker/gce-proxy:1.16
    • sidecar to application

I'd also like to thank my colleagues @theclive and @AndreMocke91 for their help in the investigation.

@kurtisvg
Copy link
Contributor

kurtisvg commented Jun 2, 2020

Hey @devillexio,

Thanks for following up in such detail - it's super helpful when digging into issues like these. There is quite a bit to comment on, but I'll try to break it up into 3 main thoughts:

  1. It sounds like there are two independent issues (failing to recover from maintenance and leaking connections)
  2. A drop in connections after an cluster-wide restart doesn't necessarily (or even usually) indicate that the proxy is leaking connections
  3. I don't think the nc health-check you are describing will solve either problems (also, the current pattern for your application is potentially problematic)

First, what you have described definitely sounds like two independent issues; The first one is that when your database undoes maintenance your applications fails to reach your database until the proxy is restarted, while the second (the one this issue is describing) is when a client drops a connection, the proxy is somehow keeping the corresponding connection to the Cloud SQL instance live. I would appreciate if you would open a new issue, so that we can track the investigation into that separately.

Second, a drop in connections after a cluster-wide restart doesn't necessarily indicate that the Cloud SQL proxy is dangling connections. Checking the documentation for node-mysql2, there is the following detail:

"The pool does not create all connections upfront but creates them on demand until the connection limit is reached"

This is a pretty common pattern across connection pool libraries - while the api varies by implementation, there almost always is a "min" and "max" that the connection pool count sits between. Especially when the pool is oversized, new connections will only be created if there isn't already a connection immediately available. Additionally, this driver doesn't appear to support "max lifetime" args, so there isn't an easy way (from the clients perspective) to limit how long a connection can sit in the pool, even if idle. As such, I'd totally expect there to be a slow creep upwards of connections. This is why I think this problem has actually been fixed - #161 introduce a 1 minute TCP keep alive. The MySQL errors you are describing (Aborted_clients or Aborted_connects) will actually never occur unless the proxy has already done (or at least started) it's job and created the TCP connection to the server.

The best way we can verify this problem does still exist is reproducible example of a client connecting to the proxy, dropping the connection, and the proxy continues to keep the server side connection alive for more than a minute. (This is why the example in this comment was interesting - it implied that perhaps the pkill -9 was required, meaning that an immediate process exit was required, perhaps before a connection could send a FIN packet. However, as I mentioned above this didn't seem reproducible on a recent version of the proxy).

Third, I don't think that a nc health-check will address these issues for you. The Cloud SQL proxy only operates at the TCP level - it doesn't touch the MySQL/Postgres/SqlServer protocol. If you run nc -nz 127.0.0.1 3306, it will successfully make a connection to the proxy, even if the proxy is unable to connect successfully to your instance. You can test this by starting the proxy in a situation where it can never connect (e.g., to an instance with a private IP from a machine that can't access it, or set up your firewall to block outgoing connections to 3307). Since nc only cares if the TCP connection to the proxy was formed, you are only measuring if the proxy itself is receiving connections (which is not a particularly useful metric, as it doesn't mean your application can connect).

Additionally, the pattern you've described (process exit on connection timeout) could potentially be causing a high number of Aborted_clients. If your application has multiple connections to the instance, are ensuring they are all closed before your process exits? If not, those connections could be terminating improperly and counting towards your high Aborted_clients count. I actually wouldn't recommend that anyone use a pattern such as this - networking is a fickle creature, and it's pretty much expected that something is going to go wrong. Often connection pools retry the underlying connections several times (I don't know about this driver in particular), but if not a single networking hiccup can restart your entire application and possible interrupt instances with successful connections.

Before describing my recommendation in more detail, let me clarify some terms (apologies if you are already be familiar, but I thought what I thought might be inconsistencies, so let me clarify so I'm sure we are both on the same page):
Container - A specific instance of a process in an isolated environment. The app and proxy each have separate containers.
Pod - Atomic unit of k8s. Each pod can contain multiple containers (always on the same node)
Sidecar pattern - The pattern of having a pod with two (or more) containers: a "main car" container with your application, and a "side car" container with the proxy.

From what you specified above, I get the impression that when your app exits, only the main container restarts. My suggestion instead would be instead to introduce a health check at the pod level, but based on the health of your application. There are likely many different ways to implement, but you'll want to include some check that verifies the app is able to connect + authenticate to the database successfully. If this check fails 3-4 times in a row (or maybe X% over a specific timeframe), you should restart the entire pod (not just the container). If the issue is related to the proxy, this will it give it a fresh start (and hopefully corrects the problem without a manual redeploy).

Apologies for the novel, but hopefully it's all helpful for you.

@devilleweppenaar
Copy link

Thanks, @kurtisvg! This is really useful!

Your suggested solution to have the check at the pod level instead of the container level should resolve our issue. I didn't even know this was possible. 😄

Also, I've logged a separate issue (#400) as you suggested for the instances where the application cannot reconnect to the database instance after maintenance.

@devilleweppenaar
Copy link

@kurtisvg I've spent some time scouring the k8s documentation to find a way to add a livenessProbe to the Pod, but it doesn't currently seem possible. It seems to be a feature request on a long-standing ticket: kubernetes/kubernetes#40908.

Do you perhaps have an example config where you've configured a probe that results in the pod being restarted, instead of individual containers in the pod?

@kurtisvg
Copy link
Contributor

kurtisvg commented Jun 3, 2020

@devillexio Yeah, I alluded above there might be a few different ways to do it. I'm not really an expert on k8s, so there might already be a better or more established pattern.

My approach would probably be to have a sentinel file in a shared volume to tell the proxy when to restart. Assuming you are already sharing /cloudsql between the containers, you could have the sidecar use a custom entrypoint script:

rm -f /cloudsql/restart_proxy # remove any existing sentinel
./cloud_sql_proxy ... &
PID=$!
while [ ! -f /cloudsql/restart_proxy || ! kill -0 "$PID" ]; do
    sleep 1
done
kill "$PID"
exit $?

Then on the main container, you could add a preStop hook, which should be sent after the primary container fails the livelinessProbe:

spec:
  containers:
  - name: your-container-name
    image: your-image-name
    lifecycle:
      preStop:
        exec:
          command: ["touch", "/cloudsql/restart_proxy"]

Alternatively to preStop, you could potentially have your application drop the file just before it exits on sigterm.

@devilleweppenaar
Copy link

Brilliant. Thanks, @kurtisvg. My bad for missing it the first time around. I'll definitely be giving this a try.

@kurtisvg
Copy link
Contributor

Hey folks,

I spent some time trying to add this as a permanent test run in our CI, but ran into some issues getting an accurate connection count from the instance side without any guarantee that different instances of the test aren't running in parallel.

Instead I have had this bash script running on an isolated instance for a couple of weeks straight now, but haven't been able to replicate this with v1.18.0 of the proxy:
image

As you can see by the image, the connection all terminate and don't linger. It's worth pointing out that at higher durations, the graph does some kind of averaging for the window that is somewhat deceiving, as the connections do in fact reach zero but it isn't reflected properly in the graph.

I'm going to go ahead and close this, and lock it since it was a popular issue (and popular issues have a habit of attracting similar but unrelated question seekers). If anyone feels that they've encountered this issue again in some way shape or form, please open a new issue and make sure to include detailed reproduction steps.

Thanks!

@GoogleCloudPlatform GoogleCloudPlatform locked as resolved and limited conversation to collaborators Oct 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests