-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
randgen: add PopulateTableWithRandData #75677
Conversation
5ba2007
to
bf834a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks useful! Just some drive-by comments.
bf834a2
to
dd227d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 6 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @msbutler and @stevendanna)
bazel-out, line 1 at r3 (raw file):
/private/var/tmp/_bazel_michaelbutler/13ba282fa2b19539d0c969c1113bb37c/execroot/cockroach/bazel-out
Is this file necessary, or did you commit it by error?
pkg/sql/randgen/schema.go, line 221 at r3 (raw file):
if colTypes[j].Family() == types.OidFamily { // choose 0 or 1 as the OID value as the value must be less than the // number of tables in the database.
As far as I can tell, there is only a limit on REGTYPE, not any other types in the OID family (OID, REGCLASS, REGPROC, REGPROCEDURE, REGROLE, REGNAMESPACE).
pkg/sql/randgen/schema.go, line 231 at r3 (raw file):
} valBuilder.WriteString(")") return valBuilder
nit: might as well return a string
instead of the builder
pkg/sql/randgen/schema.go, line 237 at r3 (raw file):
func PopulateTableWithRandData(rng *rand.Rand, db *gosql.DB, tableName string, numRows int) error { var ignored, createStmtSQL string res := db.QueryRow(fmt.Sprintf("SHOW CREATE TABLE %s", tableName))
nit: You don't need ignored
if you change the query to SELECT create_statement FROM [SHOW CREATE TABLE %]
.
pkg/sql/randgen/schema.go, line 284 at r3 (raw file):
} else { nullable = append(nullable, false) }
nit: this can be simplified to nullable = append(nullable col.Nullable.Nullability == tree.Null)
.
pkg/sql/randgen/schema.go, line 295 at r3 (raw file):
fail int // number of failed insert statements ) maxTries := numRows * 10
I'm a little worried that trying only 10 times per row will cause the tests to flake. Have you tried stressing the test to see how likely it is to fail?
pkg/sql/randgen/schema.go, line 309 at r3 (raw file):
if fail > maxTries { // This could mean PopulateTableWithRandomData or RandDatum couldn't // handle this table's schema. consider filing a bug.
nit: consider => Consider
pkg/sql/randgen/schema.go, line 310 at r3 (raw file):
// This could mean PopulateTableWithRandomData or RandDatum couldn't // handle this table's schema. consider filing a bug. return errors.Errorf(`could not populate %d rows for table with schema \n \t %s \n--
This function isn't being used yet, so I'm not sure what the usage pattern will be, but I'm wondering if it would make more sense to return no error in this case. Perhaps a better API would be something like:
func PopulateTableWithRandData(
rng *rand.Rand,
db *gosql.DB,
tableName string,
numInserts int,
) (numRowsInserted int, err error)
The function would attempt exactly numInserts
INSERT statements, none of which are guaranteed to succeed. It would return the number of rows successfully inserted. An error is returned if no inserts could be attempted (the table doesn't exist, or the create statement failed to parse). I think this would make the behavior of this function more predictable and let callers decide how many inserts they want to perform and how to proceed given the number of successfully inserted rows.
dd227d8
to
f3a0ce3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @stevendanna)
pkg/sql/randgen/schema.go, line 221 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
As far as I can tell, there is only a limit on REGTYPE, not any other types in the OID family (OID, REGCLASS, REGPROC, REGPROCEDURE, REGROLE, REGNAMESPACE).
Done.
pkg/sql/randgen/schema.go, line 295 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I'm a little worried that trying only 10 times per row will cause the tests to flake. Have you tried stressing the test to see how likely it is to fail?
I got it to pass the stress test a few times with a multiplier of 10, though I'll bump it to 100. In a previous iteration of this test, I was hitting OOM errors on the stress test, which compelled me to lower the multiplier, but a lot has changed since I was OOMing.
pkg/sql/randgen/schema.go, line 310 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
This function isn't being used yet, so I'm not sure what the usage pattern will be, but I'm wondering if it would make more sense to return no error in this case. Perhaps a better API would be something like:
func PopulateTableWithRandData( rng *rand.Rand, db *gosql.DB, tableName string, numInserts int, ) (numRowsInserted int, err error)The function would attempt exactly
numInserts
INSERT statements, none of which are guaranteed to succeed. It would return the number of rows successfully inserted. An error is returned if no inserts could be attempted (the table doesn't exist, or the create statement failed to parse). I think this would make the behavior of this function more predictable and let callers decide how many inserts they want to perform and how to proceed given the number of successfully inserted rows.
I like this idea! Then, schema_test will pass if at least one of the 10 randomly generated tables had a numRowsInserted>0.
bazel-out, line 1 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Is this file necessary, or did you commit it by error?
whoops. removed.
PopulateRandTable populates the caller's table with random data. This helper function aims to make it easier for engineers to develop randomized tests that leverage randgen / sqlsmith. I considered adding random insert statements into sqlsmith's randtables setup, however the high probably of a faulty insert statement would cause the whole setup to fail. See cockroachdb#75159 In the future, I'd like to develop a new helper function PopulateDatabaseWithRandData which calls PopulateTableWithRandData on each table in the order of the fk dependency graph. Informs cockroachdb#72345 Release note: None
f3a0ce3
to
6236b0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @stevendanna)
TFTR!!! bors r=mgartner |
Build failed: |
bors r=mgartner |
Build succeeded: |
PopulateTableWithRandData populates the caller's table with random data. This helper
function aims to make it easier for engineers to develop randomized tests that
leverage randgen / sqlsmith.
Informs #72345
Release note: None