Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workload/movr: better fake data generation #38203

Merged
merged 1 commit into from
Jun 18, 2019

Conversation

danhhz
Copy link
Contributor

@danhhz danhhz commented Jun 17, 2019

This includes a partial port of the faker python library to the
workload/faker package.

Release note: None

@danhhz danhhz requested a review from tbg June 17, 2019 14:43
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@danhhz
Copy link
Contributor Author

danhhz commented Jun 17, 2019

@tbg feels like the faker stuff could use some tests, any opinions on what would be most useful? I think I'd mostly like to verify the lazy initialization (the movr workload is built into the cockroach binary and I want to avoid the allocs at startup if you're not using the workload), but not entirely sure how to do that

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

For testing, it'd be good to get some very basic coverage (i.e. just set up a deterministic rng and call each gen 3 times, verifying the outcome). No good idea about testing the absence of allocs. I'd just avoid any global vars in that package and put some comments in the right place (dict.go?) mentioning why things are set up without globals. That is, if there isn't a good reason to have that singleton-init'ed global that I'm unaware of.

Reviewed 6 of 6 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @danhhz)


pkg/workload/faker/address.go, line 35 at r1 (raw file):

func (f *addressFaker) buildingNumber(rng *rand.Rand) string {
	return strconv.Itoa(randInt(rng, 1000, 99999))

Interesting that you don't dip below 1k.


pkg/workload/faker/faker.go, line 27 at r1 (raw file):

var faker Faker

// GetFaker lazily initializes a singleton Faker and returns it.

Is it worth making it a singleton? It does hold on to a bunch of data but not very much. Are we calling this a lot? I sort of figured we'd only need one of them during workload (per node), but maybe I'm missing a bunch of context.


pkg/workload/faker/lorem.go, line 59 at r1 (raw file):

func firstToUpper(s string) string {
	isFirst := true

I didn't see you actually use any unicode testdata, but I suppose that's the "correct" way of upper-casing the first letter of a string.


pkg/workload/faker/lorem.go, line 77 at r1 (raw file):

	f := loremFaker{}
	f.words = makeWeightedEntries(
		`a`, 1.0,

Mind pulling those dicts out into a separate file. For example dict.cc which contains a handful of methods such as

func words() *weightedEntries {
   return makeWeightedEntries(/* ... */)
}

pkg/workload/faker/name.go, line 530 at r1 (raw file):

		`Dalton`, 0.000615113,
		`Damon`, 0.00034308,
		`Dan`, 0.000388496,

Who would've thought

Copy link
Contributor Author

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @tbg)


pkg/workload/faker/address.go, line 35 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Interesting that you don't dip below 1k.

Just porting the faker logic


pkg/workload/faker/faker.go, line 27 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Is it worth making it a singleton? It does hold on to a bunch of data but not very much. Are we calling this a lot? I sort of figured we'd only need one of them during workload (per node), but maybe I'm missing a bunch of context.

Good call, it's really not that much data


pkg/workload/faker/lorem.go, line 59 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I didn't see you actually use any unicode testdata, but I suppose that's the "correct" way of upper-casing the first letter of a string.

Added tests


pkg/workload/faker/lorem.go, line 77 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Mind pulling those dicts out into a separate file. For example dict.cc which contains a handful of methods such as

func words() *weightedEntries {
   return makeWeightedEntries(/* ... */)
}

Done except I didn't port it to c++ : - p


pkg/workload/faker/name.go, line 530 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Who would've thought

my last name is in there too. looks like if you generate O(10 million) users you'd expect to get me at some point 😬

This includes a partial port of the faker python library to the
workload/faker package.

Release note: None
@danhhz
Copy link
Contributor Author

danhhz commented Jun 18, 2019

bors r=tbg

craig bot pushed a commit that referenced this pull request Jun 18, 2019
38203: workload/movr: better fake data generation r=tbg a=danhhz

This includes a partial port of the faker python library to the
workload/faker package.

Release note: None

Co-authored-by: Daniel Harrison <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jun 18, 2019

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants