Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workload: fix TPC-H generator to generate correct values for p_name and ps_supplycost #93814

Merged
merged 2 commits into from
Dec 19, 2022

Conversation

rytaft
Copy link
Collaborator

@rytaft rytaft commented Dec 16, 2022

workload: fix TPC-H generator to generate correct values for p_name

The TPC-H generator creates values for p_name in the part table by selecting 5 elements from a list of words called randPartNames. It is supposed to select 5 random unique elements from the full list, but prior to this commit, it only selected a permutation of the first 5 elements. This commit fixes the logic and adds a unit test.

Epic: None
Release note: None

workload: fix TPC-H generator value of ps_supplycost

Prior to this commit, the TPC-H generator was generating
a value between 0 and 10 for ps_supplycost in the partsupp
table. However, the spec calls for a random value in the range
[1.00 .. 1,000.00]. This commit fixes the oversight.

Epic: None
Release note: None

@rytaft rytaft requested review from jordanlewis and a team December 16, 2022 21:04
@rytaft rytaft requested a review from a team as a code owner December 16, 2022 21:04
@rytaft rytaft requested review from smg260 and renatolabs and removed request for a team December 16, 2022 21:04
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rytaft rytaft changed the title workload: fix TPC-H generator to generate correct values for p_name workload: fix TPC-H generator to generate correct values for p_name and ps_supplycost Dec 16, 2022
@@ -213,7 +213,7 @@ func (w *tpch) tpchPartSuppInitialRowBatch(
// PS_AVAILQTY random value [1 .. 9,999].
availQtyCol[i] = int16(randInt(rng, 1, 9999))
// PS_SUPPLYCOST random value [1.00 .. 1,000.00].
supplyCostCol[i] = float64(randFloat(rng, 1, 1000, 100))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems out of sync with the comment - should the comment be 10,000 or should the value stay 1,000? Or does this function do something odd?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is correct. randFloat does do something a bit odd:

func randFloat(rng *rand.Rand, x, y, shift int) float32 {
	return float32(randInt(rng, x, y)) / float32(shift)
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that I needed to change the first value too -- done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

namePerm[i] = namePerm[j]
namePerm[j] = i
j := rng.Intn(len(namePerm))
namePerm[i], namePerm[j] = namePerm[j], namePerm[i]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦
How did you notice this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed some weird values while working on a stats PR over a year ago (this has been on my todo list since then... haha)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, woohoo!

@@ -213,7 +213,7 @@ func (w *tpch) tpchPartSuppInitialRowBatch(
// PS_AVAILQTY random value [1 .. 9,999].
availQtyCol[i] = int16(randInt(rng, 1, 9999))
// PS_SUPPLYCOST random value [1.00 .. 1,000.00].
supplyCostCol[i] = float64(randFloat(rng, 1, 1000, 100))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

namePerm[i] = namePerm[j]
namePerm[j] = i
j := rng.Intn(len(namePerm))
namePerm[i], namePerm[j] = namePerm[j], namePerm[i]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, woohoo!

The TPC-H generator creates values for p_name in the part table by selecting
5 elements from a list of words called randPartNames. It is supposed to select
5 random unique elements from the full list, but prior to this commit, it only
selected a permutation of the first 5 elements. This commit fixes the logic and
adds a unit test.

Epic: None
Release note: None
Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs and @smg260)

@craig
Copy link
Contributor

craig bot commented Dec 17, 2022

Build failed (retrying...):

Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs, @rytaft, and @smg260)


pkg/workload/tpch/random_test.go line 46 at r4 (raw file):

	// We can't guarantee much about the global distribution of names,
	// but we should make sure that we're not always using the same 5

I think we could safely make a stronger assertion. 92 choose 5 gives 4,9177,128 unique combinations. After 100 shuffles, the probability of seeing the same combination is astronomically low.


pkg/workload/tpch/random_test.go line 49 at r4 (raw file):

	// names. Run up to 100 times before failing.
	for i := 0; i < 100; i++ {
		if len(seen) > 5 {

nPartNames instead of the magic number.

@craig
Copy link
Contributor

craig bot commented Dec 17, 2022

Build failed:

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @renatolabs, @rytaft, and @smg260)


pkg/workload/tpch/random.go line 133 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

haha, woohoo!

I remember talking to Lina about this bug (which she shared with me) at some point this Summer, but we forgot to follow up.

Prior to this commit, the TPC-H generator was generating
a value between 0 and 10 for ps_supplycost in the partsupp
table. However, the spec calls for a random value in the range
[1.00 .. 1,000.00]. This commit fixes the oversight.

Epic: None
Release note: None
Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs! Last failure is an unrelated flake: #93460.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @renatolabs, @smg260, and @srosenberg)


pkg/workload/tpch/random.go line 133 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I remember talking to Lina about this bug (which she shared with me) at some point this Summer, but we forgot to follow up.

👍


pkg/workload/tpch/random_test.go line 46 at r4 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

I think we could safely make a stronger assertion. 92 choose 5 gives 4,9177,128 unique combinations. After 100 shuffles, the probability of seeing the same combination is astronomically low.

Done.


pkg/workload/tpch/random_test.go line 49 at r4 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

nPartNames instead of the magic number.

Done.

@craig
Copy link
Contributor

craig bot commented Dec 19, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Dec 19, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Dec 19, 2022

Build succeeded:

@blathers-crl
Copy link

blathers-crl bot commented Dec 19, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 7c9dd1b to blathers/backport-release-22.1-93814: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants