-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workload: add querylog workload generator #35672
Conversation
This seems similar in spirit to pkg/workload/querybench, could they be merged? |
Initially, we thought of embedding this |
Can you elaborate on why? I haven't had a chance to dig into the nuance between them, so I may very well be missing something, but if it's easy to have one of these that reads queries from a file and issues them, I definitely prefer that. |
Here is what we have at the moment:
I said it was "relatively easy" to embed the former into the latter because However, that file format is not "query log" per se, so at least the way I see it it no longer would be query-log based workload. We would be creating a bigger, less comprehensible code that attempts to address too many goals at once. I think that if we get other query logs from other users/customers, we will need to "teach" |
Okay, that makes sense. This conversation makes me realize that long-term we should probably be thinking of a more generic/powerful input format (something like https://github.com/memsql/dbbench maybe) and then simple tooling to convert all these other query text file formats into that one format. But of course I'm not going to ask you to pick that off in this PR. Carry on, thanks for answering my questions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I am guessing this will need continued tweaking as we see more complicated and differing query files, but let's merge it and edit as necessary.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto, @danhhz, @jordanlewis, and @yuzefovich)
pkg/workload/querylog/querylog.go, line 110 at r1 (raw file):
`the placeholders. Say it is 0.9, then with 0.1 probability the values will be generated randomly.`) g.flags.Int64Var(&g.seed, `seed`, 1, `Random number generator seed.`) g.flags.Int64Var(&g.stmtTimeoutSeconds, `statement-timeout`, 0, `Sets session's statement_timeout setting (in seconds).'`)
This would be a great variable to move to the main workload generator as every workload could probably use such a thing.
pkg/workload/querylog/querylog.go, line 236 at r1 (raw file):
if err != nil { if w.config.verbose { log.Infof(ctx, "Encountered an error %s while deducing column names corresponding to the placeholders\n", err.Error())
nit: You don't need an \n in Infof messages, and same throughout.
6babab9
to
7196345
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. I believe it will be evolving constantly to support different kinds of queries.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @danhhz, and @jordanlewis)
pkg/workload/querylog/querylog.go, line 110 at r1 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
This would be a great variable to move to the main workload generator as every workload could probably use such a thing.
I attempted to do that, but it seems tricky - statement_timeout
is a session variable, so it must be set on each connection, and it seems to me that with the current implementation of workload only workers (those in QueryLoad.WorkerFns
) will always have access to that, so I currently don't see an easy way of moving this parameter to the main generator. Possibly I'm missing something though.
pkg/workload/querylog/querylog.go, line 236 at r1 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
nit: You don't need an \n in Infof messages, and same throughout.
Thanks, done.
If it wasn't easy to fix the thing with the session timeout, don't worry about it - perhaps @danhhz could weigh in on a better way to do that. But I would love to see this integrated into all workloads in a future commit, as it would make it much easier to test client cancellation, which is a common real-world workload type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of these days I want to move the workload run impl to be a set of composable pieces that generators can fit together to do what they want. Maybe it'd be possible in that world, but not now.
FYI, this isn't guaranteed to work as written. Session variables are basically impossible to use with go's sql abstraction because there's no hook to run something when the connection pool opens a new connection. If you want to be sure this flag works, either set it on the connection string, or use something like pgx and store a pgx.Conn
on each worker.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @danhhz, and @jordanlewis)
Thanks @danhhz. Could you take a quick look whether I'm setting up and using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IANAPgxExpert but seems fine mod one comment
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @jordanlewis, and @yuzefovich)
pkg/workload/querylog/querylog.go, line 196 at r2 (raw file):
} connCfg, err := pgx.ParseConnectionString(strings.Join(urls, ` `))
Is joining with
the right thing to do here? Pretty sure this works above because of https://github.com/cockroachdb/cockroach/blob/43bfafb30e90b6873bbbe78c9bc92dd7bc4e0ce7/pkg/workload/driver.go, which you'd have to replicate here. Could also leave it as a TODO for now and error if len(urls) != 1
pkg/workload/querylog/querylog.go, line 273 at r2 (raw file):
continue } // TODO(yuzefovich): do we care about the returned rows?
Not 100% sure on this, but I think we stop incrementally evaluating the query if the results buffer fills up (and is not emptied). Either way, probably want to iterate the query results to make sure it's doing everything it normally would.
Adds a querylog workload generator that generates and executes the queries based on the provided query log (according to the frequency of the queries found in the log). It assumes that the corresponding data has already been loaded and attempts to simulate the workload that was captured by the log. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @danhhz, and @jordanlewis)
pkg/workload/querylog/querylog.go, line 196 at r2 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
Is joining with
the right thing to do here? Pretty sure this works above because of https://github.com/cockroachdb/cockroach/blob/43bfafb30e90b6873bbbe78c9bc92dd7bc4e0ce7/pkg/workload/driver.go, which you'd have to replicate here. Could also leave it as a TODO for now and error if
len(urls) != 1
Good point, I was blindly following the pattern. Left a TODO.
pkg/workload/querylog/querylog.go, line 273 at r2 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
Not 100% sure on this, but I think we stop incrementally evaluating the query if the results buffer fills up (and is not emptied). Either way, probably want to iterate the query results to make sure it's doing everything it normally would.
Thanks, done.
Thanks for the reviews and suggestions! I'll go ahead and merge this, and we'll iterate on it as necessary. bors r+ |
35672: workload: add querylog workload generator r=yuzefovich a=yuzefovich Adds a querylog workload generator that generates and executes the queries based on the provided query log (according to the frequency of the queries found in the log). It assumes that the corresponding data has already been loaded and attempts to simulate the workload that was captured by the log. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>
Build succeeded |
Adds a querylog workload generator that generates and executes the
queries based on the provided query log (according to the frequency
of the queries found in the log). It assumes that the corresponding
data has already been loaded and attempts to simulate the workload
that was captured by the log.
Release note: None