Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlplugins/mysql: default interpolateParams in DSN #5425

Closed

Conversation

mattrobenolt
Copy link

@mattrobenolt mattrobenolt commented Feb 14, 2024

By default, the go-sql-driver/mysql driver wraps every query in a prepared statement, and this extra prepared statement means every single query is effectively two round trips. A request of ComPrepare to prepare the query, then ComStmtExecute with the statement id and the parameters.

In theory, this would be be fine if Temporal was explicitly aware of this and prepared their own statements explicitly and reused the prepared statements over and over through the lifetime.

Short of that, wrapping every query in a prepared statement just effectively doubles the queries and double the latency.

I had originally considered exposing this as a configuration knob, since this can be set through connectAttributes manually, but I don't think there's value in having this be configurable and we can provide a better default.

See: https://github.com/go-sql-driver/mysql#interpolateparams

How did you test it?

I have tested locally, and we are starting to tell our customers (PlanetScale) to set this manually in their connectAttributes.

Potential risks

There's a potential for some reason, the behavior to change and not work with client side interpolation, but it's unknown currently and I think would be considered a bug in the go-sql-driver/mysql library if anything is uncovered.

Is hotfix candidate?

I'd say Yes, but I personally have little experience with the Temporal community. :)

@mattrobenolt mattrobenolt requested a review from a team as a code owner February 14, 2024 21:15
@CLAassistant
Copy link

CLAassistant commented Feb 14, 2024

CLA assistant check
All committers have signed the CLA.

By default, the `go-sql-driver/mysql` driver wraps every query in a
prepared statement, and this extra prepared statement means every single
query is effectively two round trips. A request of ComPrepare to prepare
the query, then ComStmtExecute with the statement id and the parameters.

In theory, this would be be fine if Temporal was explicitly aware of
this and prepared their own statements explicitly and reused the
prepared statements over and over through the lifetime.

Short of that, wrapping every query in a prepared statement just
effectively doubles the queries and double the latency.

I had originally considered exposing this as a configuration knob, since
this can be set through `connectAttributes` manually, but I don't think
there's value in having this be configurable and we can provide a better
default.

See: https://github.com/go-sql-driver/mysql#interpolateparams
@mattrobenolt
Copy link
Author

I guess there are a few failing test cases, so this might be tricky for me to debug. :/ But if we can make this work, or at minimum exposed more easily as a config option, that'd be great.

@tdeebswihart tdeebswihart self-assigned this Feb 15, 2024
@tdeebswihart
Copy link
Contributor

I guess there are a few failing test cases, so this might be tricky for me to debug. :/ But if we can make this work, or at minimum exposed more easily as a config option, that'd be great.

I'll take a look at it and see if I can help you resolve those. I saw in our history that we've had other issues with ? and mysql8 and this seems related

@mattrobenolt
Copy link
Author

I saw in our history that we've had other issues with ? and mysql8 and this seems related

I'd be interested in hearing about it.

For context, I'm sorta a middleman here trying to get a better experience for mutual users. :)

I'm quite familiar with mysql protocol and the mysql driver side of things, and I know how this behavior works in PlanetScale and vitess. We have some mutual customers that would benefit quite drastically from either, using prepared statements explicitly so they are saved and reued for the lifetime, or avoid them and stick to client side interpolation. It's just a lot of wasted CPU cycles on the server to handle the extra Prepare + Execute for every query.

My assumption is that Temporal didn't explicitly choose this behavior, since it's the default of the mysql driver, and I think from the mysql driver, they are erring on the side of caution. We've recommended for most applications to set ?interpolateParams=true and I've been doing this for many years for performance things without any issues.

I'm not actually sure what query patterns Temporal is testing and if it's a side effect of an explicit test, or if it's really a bug we'll need to work around and figure out how to do a query differently.

I'm willing to offer up my help how I can, but not being a Temporal user puts me in a slightly challenging position here. This would definitely be mutually beneficial to avoid the overhead of 2 round trip per query if we can make it happen.

@tdeebswihart
Copy link
Contributor

I saw in our history that we've had other issues with ? and mysql8 and this seems related

I'd be interested in hearing about it.

For context, I'm sorta a middleman here trying to get a better experience for mutual users. :)

I'm quite familiar with mysql protocol and the mysql driver side of things, and I know how this behavior works in PlanetScale and vitess. We have some mutual customers that would benefit quite drastically from either, using prepared statements explicitly so they are saved and reued for the lifetime, or avoid them and stick to client side interpolation. It's just a lot of wasted CPU cycles on the server to handle the extra Prepare + Execute for every query.

My assumption is that Temporal didn't explicitly choose this behavior, since it's the default of the mysql driver, and I think from the mysql driver, they are erring on the side of caution. We've recommended for most applications to set ?interpolateParams=true and I've been doing this for many years for performance things without any issues.

I'm not actually sure what query patterns Temporal is testing and if it's a side effect of an explicit test, or if it's really a bug we'll need to work around and figure out how to do a query differently.

I'm willing to offer up my help how I can, but not being a Temporal user puts me in a slightly challenging position here. This would definitely be mutually beneficial to avoid the overhead of 2 round trip per query if we can make it happen.

I saw we have a comment at https://github.com/temporalio/temporal/pull/3886/files#diff-068f89497865ae54e0047289cd7c911f3afc746c9a920bd57b4aae4c1bdabac6R32 about this behavior but I'm not sure why that's the case. I'll let you know if I figure anything out.

I'm assuming you haven't had anyone running mysql8 try this change yet, as it'd have broken otherwise

@tdeebswihart
Copy link
Contributor

It looks like only our visibility code is failing which is interesting, hrm.

@mattrobenolt
Copy link
Author

I'm assuming you haven't had anyone running mysql8 try this change yet, as it'd have broken otherwise

We run all mysql8, but we don't have a lot of experience with this yet. I'm just did some basic testing and we're working to solicit feedback from customers. The challenge is getting them to adopt this config when it's very hard to set with the default docker-compose setups and whatnot, it's not exposed as an option.

As for the ? stuff, so that in theory shouldn't even be sent over the wire. That's the point of the interpolateParams=true option, it converts the ? and escapes the values in place. So maybe in this case, you need to go back to ? instead of using %v to let the driver do the escaping properly. But ? shouldn't be going over the wire.

The client side interpolation is looking for exactly that: https://github.com/go-sql-driver/mysql/blob/v1.7.1/connection.go#L198-L295

@tdeebswihart
Copy link
Contributor

I'm assuming you haven't had anyone running mysql8 try this change yet, as it'd have broken otherwise

We run all mysql8, but we don't have a lot of experience with this yet. I'm just did some basic testing and we're working to solicit feedback from customers. The challenge is getting them to adopt this config when it's very hard to set with the default docker-compose setups and whatnot, it's not exposed as an option.

As for the ? stuff, so that in theory shouldn't even be sent over the wire. That's the point of the interpolateParams=true option, it converts the ? and escapes the values in place. So maybe in this case, you need to go back to ? instead of using %v to let the driver do the escaping properly. But ? shouldn't be going over the wire.

The client side interpolation is looking for exactly that: https://github.com/go-sql-driver/mysql/blob/v1.7.1/connection.go#L198-L295

Yep. I'll need to dig into our mysql8 visibility code to figure out which query is failing and trace it back

@mattrobenolt
Copy link
Author

From my understanding, I know there are 2 databases in play here. I think from my understanding, the visibility database is much less volume than the main temporal database?

If that's correct, it might be a safer first step to apply this setting as a default only for temporal and not visibility as an intermediate step since that'd be arguably the biggest win.

@tdeebswihart
Copy link
Contributor

Aha, it's go-sql-driver/mysql#819. Our visibility schema embeds the memo as a []byte which can't be used with interpolateParams=true for a JSON field.

These queries are hard-coded and can be trivially prepared; I'll hop on that this afternoon; it shouldn't be a big change and will let us enable that option for you

@mattrobenolt
Copy link
Author

Aha, yeah, doing an explicit prepare then for that sounds like a winner. :)

@tdeebswihart
Copy link
Contributor

The simplest approach is actually to track the Memo as json.RawMessage since the mysql driver handles that directly. That's simpler than manually preparing all our statements while should still let folks enable this.

As for setting this as our default: I'm curious as to whether there are downsides to using interpolateParams=true. The biggest I can think of is if the driver's interpolation code isn't entirely correct and could let SQL injection sneak in.

@mattrobenolt
Copy link
Author

As for setting this as our default: I'm curious as to whether there are downsides to using interpolateParams=true. The biggest I can think of is if the driver's interpolation code isn't entirely correct and could let SQL injection sneak in.

I think there's a risk with any client side interpolation, but every ORM and whatnot does it. Given that this mysql driver is the most popular in Go that I'm aware of, I would suspect most cases have been figured out by now, given nobody at large scale would want to run with their default behavior. We encourage setting this on for our customers, etc.

They have one use case:

This can not be used together with the multibyte encodings BIG5, CP932, GB2312, GBK or SJIS. These are rejected as they may introduce a SQL injection vulnerability!

And it sounds like they guard against it by just disallowing it for those encodings. I'd be curious if you had any other cases that you're aware of that would cause an issue that the driver doesn't outright reject.

One benefit of MySQL in this case is their types are quite a bit simpler and escaping is a bit easier than in Postgres.

@tdeebswihart
Copy link
Contributor

We're on the same page there. I'm asking around our team just in case someone can think of something but I'm testing a patch that'll allow this to work at the same time

@tdeebswihart
Copy link
Contributor

I put up a new PR at #5428 that contains your change plus the one other change required to support it (and some cleanup while I was here). I'll keep iterating on it as tests finish, and will discuss with the team whether this is hotfix-worthy

@mattrobenolt
Copy link
Author

tyty it makes sense that encoding JSON as binary would be incorrect in this context since JSON is explicitly UTF-8.

I really appreciate carrying this through. Good open source response. 🫶

@tdeebswihart
Copy link
Contributor

tyty it makes sense that encoding JSON as binary would be incorrect in this context since JSON is explicitly UTF-8.

Yeah. I'm seeing reports in other MySQL drivers and forums about this happening with the utf8mb4 character set specifically (which our mysql8 plugin uses). It looks like there's still a few more issues I have to fix but I should have it all done early next week.

I really appreciate carrying this through. Good open source response. 🫶

Of course! I'm happy to deliver an easy speedup for customers using our MySQL backend 😄

@tdeebswihart
Copy link
Contributor

Enabling interpolateParams is causing more problems than I expected; I'll keep poking away at this next week but we have a long weekend this weekend.

If you have customers running mysql8 with this set make sure they don't enable it for mysql8-based visibility stores while I figure out what's wrong

@tdeebswihart
Copy link
Contributor

Do you have benchmarks that measure the performance improvement with this set to true, by the way?

@mattrobenolt
Copy link
Author

Do you have benchmarks that measure the performance improvement with this set to true, by the way?

I don't, other than it's objectively doing less than half the work and half the requests against Vitess. I can't easily quantify what the tangible savings is, but that's a lot of round trip and extra CPU on both sides being spent.

@tdeebswihart
Copy link
Contributor

I'm going to close this in favor of #5441 and #5428 as this PR will cause errors for folks using MySQL 8 visibility stores.

We can continue discussion here however

@mattrobenolt
Copy link
Author

As I mentioned earlier, I think keeping this off entirely from visibility store is more than fine. From my limited understanding of Temporal, it seems a vast majority of the workload is in the main temporal database, not the visibility one. I might be misinterpreting that though.

@emmercm
Copy link

emmercm commented Apr 2, 2024

I'm a customer of @mattrobenolt's and I wanted to add some voice here. Prepared statements are a scaling concern for my self-hosted Temporal v1.20.4 instance. Here are some quick measures:

  • The cluster has a throughput of 300-350 workflows per second, roughly 1,500-1,800 activities per second
  • The history service maintains a rate of 22-27k persistence requests per second, which translates to 150-190k queries per second as measured by the DB

I have been needing to scale my number of history services significantly recently, I believe in part due to the query multiplier that prepared statements adds. We have been seeing a lot of logs that indicate DB timeouts, but the DB is healthy, so we think the connection pool may be too small and that there's contention within the history service. The history service isn't under significant CPU or memory pressure. We have 32 instances each with 60 connections right now, which feels significant.

@tdeebswihart it looks like #5428 should be close to landing?

tdeebswihart added a commit that referenced this pull request Apr 2, 2024
## What changed?

We now specify `interpolateParams` by default when using MySQL for our
main.

While we don't currently support specifying `interpolateParams` for
MySQL visibility databases, I changed the type returned when serializing
`VisibilitySearchAttributes` as a first step towards doing so (MySQL 8
won't allow binary data within JSON columns, which makes sense).

While we could also have explicitly prepared and reused our statements
that requires far more work and it may not be the better approach
anyways.

## Why?

@mattrobenolt did a good job of explaining why in #5425 but I'll
summarize it here anyways.

When `interpolateParams` is false (the default) the driver will prepare
parameterized statements before executing them, meaning we need two
round-trips to the database for each query. By setting
`interpolateParams` to true the DB driver will handle interpolation and
send the query just once to the database, halving the number of round
trips necessary. This should improve the performance of all clusters
using MySQL.

## How did you test it?
Existing tests.

## Potential risks
No

## Is hotfix candidate?
This is up for discussion
@tdeebswihart
Copy link
Contributor

I'm a customer of @mattrobenolt's and I wanted to add some voice here. Prepared statements are a scaling concern for my self-hosted Temporal v1.20.4 instance. Here are some quick measures:

* The cluster has a throughput of 300-350 workflows per second, roughly 1,500-1,800 activities per second

* The history service maintains a rate of 22-27k persistence requests per second, which translates to 150-190k queries per second as measured by the DB

I have been needing to scale my number of history services significantly recently, I believe in part due to the query multiplier that prepared statements adds. We have been seeing a lot of logs that indicate DB timeouts, but the DB is healthy, so we think the connection pool may be too small and that there's contention within the history service. The history service isn't under significant CPU or memory pressure. We have 32 instances each with 60 connections right now, which feels significant.

@tdeebswihart it looks like #5428 should be close to landing?

It's as close as "I need to hit the merge button" and I just did, thank you for the reminder! I'd become caught up in other work, pardon the lateness.

@emmercm
Copy link

emmercm commented Apr 3, 2024

@tdeebswihart that's the best kind of close! I appreciate the work that you and @mattrobenolt have done here, I'm hoping this is a massive win for self-hosted MySQL.

@tdeebswihart
Copy link
Contributor

tdeebswihart commented Apr 3, 2024

As far as releasing this goes: we're happy to backport this to the 1.23 release (and make 1.23.1) but don't have the bandwidth to test this functionality ourselves right now. Would y'all (@mattrobenolt) be comfortable with testing it?

@mattrobenolt
Copy link
Author

I think @emmercm might be in a better position to test this with us. My knowledge of Temporal is only surface level. Reach out and let's see how it goes. :)

@emmercm
Copy link

emmercm commented Apr 3, 2024

No need to necessarily backport, I'm on an older Temporal version because I haven't invested the time to do the v1.21 and v1.22 schema upgrades yet - not for any specific reason. I need to spend the time upgrading before I'll be able to effectively test this.

@tdeebswihart
Copy link
Contributor

Alright, then this will be included as part of our next release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants