-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workload: fix TPC-H generator to generate correct values for p_name and ps_supplycost #93814
Conversation
@@ -213,7 +213,7 @@ func (w *tpch) tpchPartSuppInitialRowBatch( | |||
// PS_AVAILQTY random value [1 .. 9,999]. | |||
availQtyCol[i] = int16(randInt(rng, 1, 9999)) | |||
// PS_SUPPLYCOST random value [1.00 .. 1,000.00]. | |||
supplyCostCol[i] = float64(randFloat(rng, 1, 1000, 100)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems out of sync with the comment - should the comment be 10,000 or should the value stay 1,000? Or does this function do something odd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is correct. randFloat
does do something a bit odd:
func randFloat(rng *rand.Rand, x, y, shift int) float32 {
return float32(randInt(rng, x, y)) / float32(shift)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized that I needed to change the first value too -- done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
namePerm[i] = namePerm[j] | ||
namePerm[j] = i | ||
j := rng.Intn(len(namePerm)) | ||
namePerm[i], namePerm[j] = namePerm[j], namePerm[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦
How did you notice this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed some weird values while working on a stats PR over a year ago (this has been on my todo list since then... haha)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, woohoo!
@@ -213,7 +213,7 @@ func (w *tpch) tpchPartSuppInitialRowBatch( | |||
// PS_AVAILQTY random value [1 .. 9,999]. | |||
availQtyCol[i] = int16(randInt(rng, 1, 9999)) | |||
// PS_SUPPLYCOST random value [1.00 .. 1,000.00]. | |||
supplyCostCol[i] = float64(randFloat(rng, 1, 1000, 100)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
namePerm[i] = namePerm[j] | ||
namePerm[j] = i | ||
j := rng.Intn(len(namePerm)) | ||
namePerm[i], namePerm[j] = namePerm[j], namePerm[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, woohoo!
The TPC-H generator creates values for p_name in the part table by selecting 5 elements from a list of words called randPartNames. It is supposed to select 5 random unique elements from the full list, but prior to this commit, it only selected a permutation of the first 5 elements. This commit fixes the logic and adds a unit test. Epic: None Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs and @smg260)
Build failed (retrying...): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs, @rytaft, and @smg260)
pkg/workload/tpch/random_test.go
line 46 at r4 (raw file):
// We can't guarantee much about the global distribution of names, // but we should make sure that we're not always using the same 5
I think we could safely make a stronger assertion. 92 choose 5 gives 4,9177,128 unique combinations. After 100 shuffles, the probability of seeing the same combination is astronomically low.
pkg/workload/tpch/random_test.go
line 49 at r4 (raw file):
// names. Run up to 100 times before failing. for i := 0; i < 100; i++ { if len(seen) > 5 {
nPartNames
instead of the magic number.
Build failed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @renatolabs, @rytaft, and @smg260)
pkg/workload/tpch/random.go
line 133 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
haha, woohoo!
I remember talking to Lina about this bug (which she shared with me) at some point this Summer, but we forgot to follow up.
Prior to this commit, the TPC-H generator was generating a value between 0 and 10 for ps_supplycost in the partsupp table. However, the spec calls for a random value in the range [1.00 .. 1,000.00]. This commit fixes the oversight. Epic: None Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs! Last failure is an unrelated flake: #93460.
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @renatolabs, @smg260, and @srosenberg)
pkg/workload/tpch/random.go
line 133 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I remember talking to Lina about this bug (which she shared with me) at some point this Summer, but we forgot to follow up.
👍
pkg/workload/tpch/random_test.go
line 46 at r4 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
I think we could safely make a stronger assertion. 92 choose 5 gives 4,9177,128 unique combinations. After 100 shuffles, the probability of seeing the same combination is astronomically low.
Done.
pkg/workload/tpch/random_test.go
line 49 at r4 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
nPartNames
instead of the magic number.
Done.
Build failed (retrying...): |
Build failed (retrying...): |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 7c9dd1b to blathers/backport-release-22.1-93814: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
workload: fix TPC-H generator to generate correct values for p_name
The TPC-H generator creates values for
p_name
in thepart
table by selecting 5 elements from a list of words calledrandPartNames
. It is supposed to select 5 random unique elements from the full list, but prior to this commit, it only selected a permutation of the first 5 elements. This commit fixes the logic and adds a unit test.Epic: None
Release note: None
workload: fix TPC-H generator value of ps_supplycost
Prior to this commit, the TPC-H generator was generating
a value between 0 and 10 for
ps_supplycost
in thepartsupp
table. However, the spec calls for a random value in the range
[1.00 .. 1,000.00]
. This commit fixes the oversight.Epic: None
Release note: None