-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workload/movr: better fake data generation #38203
Conversation
@tbg feels like the faker stuff could use some tests, any opinions on what would be most useful? I think I'd mostly like to verify the lazy initialization (the movr workload is built into the cockroach binary and I want to avoid the allocs at startup if you're not using the workload), but not entirely sure how to do that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For testing, it'd be good to get some very basic coverage (i.e. just set up a deterministic rng and call each gen 3 times, verifying the outcome). No good idea about testing the absence of allocs. I'd just avoid any global vars in that package and put some comments in the right place (dict.go
?) mentioning why things are set up without globals. That is, if there isn't a good reason to have that singleton-init'ed global that I'm unaware of.
Reviewed 6 of 6 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @danhhz)
pkg/workload/faker/address.go, line 35 at r1 (raw file):
func (f *addressFaker) buildingNumber(rng *rand.Rand) string { return strconv.Itoa(randInt(rng, 1000, 99999))
Interesting that you don't dip below 1k.
pkg/workload/faker/faker.go, line 27 at r1 (raw file):
var faker Faker // GetFaker lazily initializes a singleton Faker and returns it.
Is it worth making it a singleton? It does hold on to a bunch of data but not very much. Are we calling this a lot? I sort of figured we'd only need one of them during workload (per node), but maybe I'm missing a bunch of context.
pkg/workload/faker/lorem.go, line 59 at r1 (raw file):
func firstToUpper(s string) string { isFirst := true
I didn't see you actually use any unicode testdata, but I suppose that's the "correct" way of upper-casing the first letter of a string
.
pkg/workload/faker/lorem.go, line 77 at r1 (raw file):
f := loremFaker{} f.words = makeWeightedEntries( `a`, 1.0,
Mind pulling those dicts out into a separate file. For example dict.cc
which contains a handful of methods such as
func words() *weightedEntries {
return makeWeightedEntries(/* ... */)
}
pkg/workload/faker/name.go, line 530 at r1 (raw file):
`Dalton`, 0.000615113, `Damon`, 0.00034308, `Dan`, 0.000388496,
Who would've thought
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @tbg)
pkg/workload/faker/address.go, line 35 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Interesting that you don't dip below 1k.
Just porting the faker logic
pkg/workload/faker/faker.go, line 27 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Is it worth making it a singleton? It does hold on to a bunch of data but not very much. Are we calling this a lot? I sort of figured we'd only need one of them during workload (per node), but maybe I'm missing a bunch of context.
Good call, it's really not that much data
pkg/workload/faker/lorem.go, line 59 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
I didn't see you actually use any unicode testdata, but I suppose that's the "correct" way of upper-casing the first letter of a
string
.
Added tests
pkg/workload/faker/lorem.go, line 77 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Mind pulling those dicts out into a separate file. For example
dict.cc
which contains a handful of methods such asfunc words() *weightedEntries { return makeWeightedEntries(/* ... */) }
Done except I didn't port it to c++ : - p
pkg/workload/faker/name.go, line 530 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Who would've thought
my last name is in there too. looks like if you generate O(10 million) users you'd expect to get me at some point 😬
This includes a partial port of the faker python library to the workload/faker package. Release note: None
bors r=tbg |
38203: workload/movr: better fake data generation r=tbg a=danhhz This includes a partial port of the faker python library to the workload/faker package. Release note: None Co-authored-by: Daniel Harrison <[email protected]>
Build succeeded |
This includes a partial port of the faker python library to the
workload/faker package.
Release note: None