Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: fix inefficient memory usage for SHOW CREATE ALL TABLES #61127

Conversation

RichardJCai
Copy link
Contributor

@RichardJCai RichardJCai commented Feb 25, 2021

crdb_internal.show_create_all_tables is now a generator that returns
create / alter statements row by row.

The id sort is now done on only the table ids.
The topological sort is done using a map of int64 -> int64
meaning the tableMetadata is no longer kept in ram during the sort.

Memory is now proportional to the number of tables in the database.
The amount of memory used in this builtin for storing table ids is
the same as the amount of memory needed when generating the underlying
crdb_internal.create_statements. This is technically still not constrained
so we should make it clear that the command should not be run on dbs with
an excessive number of tables.

Release justification: fix for inefficient memory usage, low risk, builtin
currently not in release version of crdb

Release note (sql change): SHOW CREATE ALL TABLES is updated to be more
memory efficient, however this should still not be run on a database with
an excessive number of tables. Users should not run this on a
database with more than 10000 tables (arbitrary but tested number)

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@RichardJCai RichardJCai force-pushed the fix_show_create_all_tables_builtin_02252021 branch 2 times, most recently from 5e69021 to 710671a Compare February 25, 2021 20:02
@RichardJCai RichardJCai changed the title WIP fix to show_create_all_tables builtin sql: fix to show_create_all_tables builtin Feb 25, 2021
@RichardJCai RichardJCai changed the title sql: fix to show_create_all_tables builtin sql: fix inefficient memory usage for SHOW CREATE ALL TABLES Feb 25, 2021
@RichardJCai RichardJCai requested review from rafiss, knz and a team February 25, 2021 20:06
@RichardJCai RichardJCai marked this pull request as ready for review February 25, 2021 20:06
@RichardJCai RichardJCai force-pushed the fix_show_create_all_tables_builtin_02252021 branch from 710671a to 901b314 Compare February 25, 2021 20:51
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @RichardJCai)


pkg/sql/sem/builtins/generator_builtins.go, line 359 at r1 (raw file):

		makeGeneratorOverload(
			tree.ArgTypes{
				{"dbName", types.String},

database_name


pkg/sql/sem/builtins/generator_builtins.go, line 1581 at r1 (raw file):

// Start implements the tree.ValueGenerator interface.
func (s *showCreateAllTablesGenerator) Start(ctx context.Context, txn *kv.Txn) error {
	ids, err := getTopologicallySortedTableIDs(ctx, s.ie, txn, s.dbName, s.timestamp)

That's still not cutting it though -- you're accumulating all the identifiers in RAM before starting anything.

What is needed is to open the internal executor iterator in Start and then consume one row from that per invocation of Next

Then close the iterator in the SRF close function.


pkg/sql/sem/builtins/generator_builtins.go, line 1613 at r1 (raw file):

		}
		createStmtStr := string(tree.MustBeDString(createStmt))
		s.curr = tree.NewDString(fmt.Sprintf("%s;", createStmtStr))

createStmtStr + ";" is more efficient and also clearer


pkg/sql/sem/builtins/generator_builtins.go, line 1621 at r1 (raw file):

			alterStmt := tree.MustBeDString(s.alterArr[s.alterArrIdx])
			s.curr = tree.NewDString(
				fmt.Sprintf("%s;", alterStmt),

ditto

@RichardJCai RichardJCai force-pushed the fix_show_create_all_tables_builtin_02252021 branch from 901b314 to 74b4744 Compare March 2, 2021 00:03
@RichardJCai RichardJCai requested a review from a team as a code owner March 2, 2021 00:03
Copy link
Contributor Author

@RichardJCai RichardJCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rafiss)


pkg/sql/sem/builtins/generator_builtins.go, line 359 at r1 (raw file):

Previously, knz (kena) wrote…

database_name

Done.


pkg/sql/sem/builtins/generator_builtins.go, line 1581 at r1 (raw file):

Previously, knz (kena) wrote…

That's still not cutting it though -- you're accumulating all the identifiers in RAM before starting anything.

What is needed is to open the internal executor iterator in Start and then consume one row from that per invocation of Next

Then close the iterator in the SRF close function.

As discussed offline, I didn't change this and it still holds the ids in ram. We use the same amount of memory to hold the ids / descriptors in ram when creating the vtable for crdb_internal.create_statements.

Added comments about the reasoning here. I did try to add some memory accounting, although I'm not sure if I'm doing it correctly.

The second commit runs into a panic with the memory accounting because of Close() not being called.

The panic happens when testing for local-vec-off due to what seems like ValuesGenerator.Close() is not being called. Not familiar with the execution part of the code base but it seems like in project_set.go ConsumerClosed is not being called for the local-vec-off.

Will keep investigating.


pkg/sql/sem/builtins/generator_builtins.go, line 1613 at r1 (raw file):

Previously, knz (kena) wrote…

createStmtStr + ";" is more efficient and also clearer

Done.


pkg/sql/sem/builtins/generator_builtins.go, line 1621 at r1 (raw file):

Previously, knz (kena) wrote…

ditto

Done.

@RichardJCai RichardJCai requested a review from a team March 2, 2021 00:09
@RichardJCai
Copy link
Contributor Author

Requesting review from @cockroachdb/sql-execution as well since y'all are probably more familiar with the memory accounting and may be able to read more into why local-vec-off is panicking due to the aforementioned:

The panic happens when testing for local-vec-off due to what seems like ValuesGenerator.Close() is not being called. Not familiar with the execution part of the code base but it seems like in project_set.go ConsumerClosed is not being called for the local-vec-off.

@RichardJCai RichardJCai requested a review from knz March 2, 2021 00:23
@yuzefovich
Copy link
Member

Try the following diff:

diff --git a/pkg/sql/rowexec/project_set.go b/pkg/sql/rowexec/project_set.go
index 92186759d2..6e3ba6664e 100644
--- a/pkg/sql/rowexec/project_set.go
+++ b/pkg/sql/rowexec/project_set.go
@@ -92,7 +92,13 @@ func newProjectSetProcessor(
                processorID,
                output,
                nil, /* memMonitor */
-               execinfra.ProcStateOpts{InputsToDrain: []execinfra.RowSource{ps.input}},
+               execinfra.ProcStateOpts{
+                       InputsToDrain: []execinfra.RowSource{ps.input},
+                       TrailingMetaCallback: func(context.Context) []execinfrapb.ProducerMetadata {
+                               ps.close()
+                               return nil
+                       },
+               },
        ); err != nil {
                return nil, err
        }
@@ -278,14 +284,19 @@ func (ps *projectSetProcessor) toEncDatum(d tree.Datum, colIdx int) rowenc.EncDa
        return rowenc.DatumToEncDatum(ctyp, d)
 }
 
-// ConsumerClosed is part of the RowSource interface.
-func (ps *projectSetProcessor) ConsumerClosed() {
-       for _, gen := range ps.gens {
-               gen.Close(ps.Ctx)
+func (ps *projectSetProcessor) close() {
+       if ps.InternalClose() {
+               for _, gen := range ps.gens {
+                       if gen != nil {
+                               gen.Close(ps.Ctx)
+                       }
+               }
        }
+}
 
-       // The consumer is done, Next() will not be called again.
-       ps.InternalClose()
+// ConsumerClosed is part of the RowSource interface.
+func (ps *projectSetProcessor) ConsumerClosed() {
+       ps.close()
 }
 
 // ChildCount is part of the execinfra.OpNode interface.

It fixed the panic I observed on make test PKG=./pkg/sql TESTS=TestCreateStatementType.

@RichardJCai RichardJCai force-pushed the fix_show_create_all_tables_builtin_02252021 branch from 74b4744 to 51b4582 Compare March 2, 2021 02:32
@RichardJCai
Copy link
Contributor Author

Try the following diff:

diff --git a/pkg/sql/rowexec/project_set.go b/pkg/sql/rowexec/project_set.go
index 92186759d2..6e3ba6664e 100644
--- a/pkg/sql/rowexec/project_set.go
+++ b/pkg/sql/rowexec/project_set.go
@@ -92,7 +92,13 @@ func newProjectSetProcessor(
                processorID,
                output,
                nil, /* memMonitor */
-               execinfra.ProcStateOpts{InputsToDrain: []execinfra.RowSource{ps.input}},
+               execinfra.ProcStateOpts{
+                       InputsToDrain: []execinfra.RowSource{ps.input},
+                       TrailingMetaCallback: func(context.Context) []execinfrapb.ProducerMetadata {
+                               ps.close()
+                               return nil
+                       },
+               },
        ); err != nil {
                return nil, err
        }
@@ -278,14 +284,19 @@ func (ps *projectSetProcessor) toEncDatum(d tree.Datum, colIdx int) rowenc.EncDa
        return rowenc.DatumToEncDatum(ctyp, d)
 }
 
-// ConsumerClosed is part of the RowSource interface.
-func (ps *projectSetProcessor) ConsumerClosed() {
-       for _, gen := range ps.gens {
-               gen.Close(ps.Ctx)
+func (ps *projectSetProcessor) close() {
+       if ps.InternalClose() {
+               for _, gen := range ps.gens {
+                       if gen != nil {
+                               gen.Close(ps.Ctx)
+                       }
+               }
        }
+}
 
-       // The consumer is done, Next() will not be called again.
-       ps.InternalClose()
+// ConsumerClosed is part of the RowSource interface.
+func (ps *projectSetProcessor) ConsumerClosed() {
+       ps.close()
 }
 
 // ChildCount is part of the execinfra.OpNode interface.

It fixed the panic I observed on make test PKG=./pkg/sql TESTS=TestCreateStatementType.

Thanks that seems to have fixed it.

@RichardJCai RichardJCai force-pushed the fix_show_create_all_tables_builtin_02252021 branch from 51b4582 to e8221fc Compare March 4, 2021 15:53
@RichardJCai
Copy link
Contributor Author

@yuzefovich Can you double check that the changes I made to project_set.go are valid? Thanks

Also @rafiss / @knz can I get another look at this? Specifically the memory monitoring stuff is worth a good look since I'm not familiar with it.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in project_set.go LGTM. Also, FYI Raphael is on vacation for week and a half.

Reviewed 1 of 3 files at r2, 1 of 8 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @rafiss, @RichardJCai, and @yuzefovich)


pkg/sql/sem/builtins/generator_builtins.go, line 1789 at r3 (raw file):

// Close implements the tree.ValueGenerator interface.
func (s *showCreateAllTablesGenerator) Close(ctx context.Context) {
	s.acc.Clear(ctx)

This should be s.acc.Close(ctx). The reasoning is that we instantiate a new memory account in the constructor, so showCreateAllTablesGenerator is its owner and should close the account once itself is being closed.


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 99 at r3 (raw file):

		// Account for memory of map.
		sizeOfKeyValue := int64(unsafe.Sizeof(tid)) + int64(len(refs))*sizeOfInt64
		sizeOfMap += sizeOfKeyValue

Just a note here that using map has an overhead. At some point I did some experiments to measure that overhead and came down to approximately 64 bytes per entry in the map:

// hashAggregatorSizeOfBucketsItem is a guess on how much space (in bytes)
// each item added to 'buckets' map of hashAggregator takes up in the map
// (i.e. it is memory internal to the map, orthogonal to "key-value" pair
// that we're adding to the map).
hashAggregatorSizeOfBucketsItem = 64
.


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 119 at r3 (raw file):

	// The sort relies on creating a new array for the ids.
	acc.Grow(ctx, int64(len(ids))*sizeOfInt64)

Grow will return an error when the memory budget has been exceeded, so we should its return value here and in topologicalSort.


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 128 at r3 (raw file):

	// the two arrays should have the same length.
	if len(ids) != len(topologicallyOrderedIDs) {
		errors.AssertionFailedf("show_create_all_tables_builtin failed. "+

Seems like a missing return.


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 152 at r3 (raw file):

		AS OF SYSTEM TIME %s
		WHERE database_name = $1 
			AND is_virtual = FALSE

nit: uneven indentation (likely spaces vs tabs)

@RichardJCai RichardJCai force-pushed the fix_show_create_all_tables_builtin_02252021 branch from e8221fc to 24c9d44 Compare March 4, 2021 17:23
Copy link
Contributor Author

@RichardJCai RichardJCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @rafiss, and @yuzefovich)


pkg/sql/sem/builtins/generator_builtins.go, line 1789 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

This should be s.acc.Close(ctx). The reasoning is that we instantiate a new memory account in the constructor, so showCreateAllTablesGenerator is its owner and should close the account once itself is being closed.

Done


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 99 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Just a note here that using map has an overhead. At some point I did some experiments to measure that overhead and came down to approximately 64 bytes per entry in the map:

// hashAggregatorSizeOfBucketsItem is a guess on how much space (in bytes)
// each item added to 'buckets' map of hashAggregator takes up in the map
// (i.e. it is memory internal to the map, orthogonal to "key-value" pair
// that we're adding to the map).
hashAggregatorSizeOfBucketsItem = 64
.

Added a var for extra overhead per item, just to double check, is 64 bytes per entry correct? Seems a bit high.


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 119 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Grow will return an error when the memory budget has been exceeded, so we should its return value here and in topologicalSort.

Done.


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 128 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Seems like a missing return.

Done.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @rafiss, @RichardJCai, and @yuzefovich)


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 99 at r3 (raw file):

Previously, RichardJCai (Richard Cai) wrote…

Added a var for extra overhead per item, just to double check, is 64 bytes per entry correct? Seems a bit high.

Yes, in my experiments 64 bytes overhead was per each k-v entry to the map, and 64 bytes doesn't include the memory of k or v. Well, maps are expensive, and that's why we implemented a vectorized hash table :)

It is possible that the number is higher than it actually is (I think I expected it to be more like 32 bytes), and 64 bytes value was making the memory account (as reported by mon.BoundAccount) to be what I was observing in the heap profile, but we tend to forget to account for some things, so possibly the "extra" overhead hides missing accounting somewhere else.

@RichardJCai RichardJCai force-pushed the fix_show_create_all_tables_builtin_02252021 branch from 24c9d44 to e974b23 Compare March 4, 2021 19:17
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool! just have some small comments

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @RichardJCai, and @yuzefovich)


pkg/sql/sem/builtins/generator_builtins.go, line 378 at r4 (raw file):

			showCreateAllTablesGeneratorType,
			makeShowCreateAllTablesGenerator,
			`Returns a rows CREATE table statements followed by 

nit: the grammar is mistaken for "Returns a rows CREATE table statements..." missing word maybe?


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 137 at r4 (raw file):

		}
	}
	acc.Shrink(ctx, memoryUsed)

we don't actually clear out the seen map here right? why are we shrinking here? (the mem accounting stuff is also a bit unfamiliar to me, so just asking)

i wonder if it would be cleaner if topologicalSort returned a cleanup() function callback, and the code here could just call the callback rather than having to know about how the topologicalSort uses memory internally. then we wouldn't need the memoryUsed counter i think


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 146 at r4 (raw file):

			len(ids), len(topologicallyOrderedIDs))
	}
	acc.Shrink(ctx, int64(len(ids))*sizeOfInt64)

maybe there should be a comment saying what is shrinking here?


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 210 at r4 (raw file):

	tid int64,
	dependsOnIDs map[int64][]int64,
	seen map[int64]bool,

this would use less memory as a map[int64]struct{} (and using _, isPresent := seen[tid]) to check if a key is present.


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 249 at r4 (raw file):

func getCreateStatement(
	ctx context.Context, ie sqlutil.InternalExecutor, txn *kv.Txn, id int64, ts string, dbName string,
) (tree.Datum, error) {

is it safer to return a (tree.DString, error) here and return an error if the Datum is the wrong type? maybe that's just extraneous though -- if the existing approach is conventional for using internalExecutor we can stick with it


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 282 at r4 (raw file):

	dbName string,
	statementType string,
) (tree.Datum, error) {

same comment about DString

@RichardJCai RichardJCai force-pushed the fix_show_create_all_tables_builtin_02252021 branch from e974b23 to 5108e57 Compare March 5, 2021 18:38
Copy link
Contributor Author

@RichardJCai RichardJCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @rafiss, @RichardJCai, and @yuzefovich)


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 137 at r4 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

we don't actually clear out the seen map here right? why are we shrinking here? (the mem accounting stuff is also a bit unfamiliar to me, so just asking)

i wonder if it would be cleaner if topologicalSort returned a cleanup() function callback, and the code here could just call the callback rather than having to know about how the topologicalSort uses memory internally. then we wouldn't need the memoryUsed counter i think

I'm working under the assumption here that the seen map gets GC'ed after the call the topologicalSort above, so I thought it would be alright to shrink it after it gets GC'ed. Let me know your thoughts on this.

IMO adding a cleanup() function callback wouldn't be clean because of how the function calls itself recursively. If we added another function that called the recursive function that returned a callback, I think the recursive function for topologicalSort would still have to return or use a int ptr to keep track of how much memory it uses.


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 146 at r4 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

maybe there should be a comment saying what is shrinking here?

Done.


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 210 at r4 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

this would use less memory as a map[int64]struct{} (and using _, isPresent := seen[tid]) to check if a key is present.

Good call, done.


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 249 at r4 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

is it safer to return a (tree.DString, error) here and return an error if the Datum is the wrong type? maybe that's just extraneous though -- if the existing approach is conventional for using internalExecutor we can stick with it

I did some digging into this, it seems that the conventional approach for using IE is to use tree.MustBeDString and panic when it isn't the case. Not sure what the reasoning is here for panicking instead of returning an error though

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one final comment then lgtm!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @RichardJCai, and @yuzefovich)


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 137 at r4 (raw file):

Previously, RichardJCai (Richard Cai) wrote…

I'm working under the assumption here that the seen map gets GC'ed after the call the topologicalSort above, so I thought it would be alright to shrink it after it gets GC'ed. Let me know your thoughts on this.

IMO adding a cleanup() function callback wouldn't be clean because of how the function calls itself recursively. If we added another function that called the recursive function that returned a callback, I think the recursive function for topologicalSort would still have to return or use a int ptr to keep track of how much memory it uses.

well i think technically the seen map is on the heap until the end of this getTopologicallySortedTableIDs function. if you want the memory to be available to GC any earlier than that, you'd need to do seen = make(map[int64]struct{}) again to remove the reference to the old map.

also, i just realized, why do we need memoryUsed? can't we just look at the size of seen after the topological sort is done and shrink by that amount?

Copy link
Contributor Author

@RichardJCai RichardJCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @RichardJCai, and @yuzefovich)


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 137 at r4 (raw file):

technically

Oops yep you're right for point one, I'll update seen to nil to remove the old reference.

For the second point on memoryUsed, the amount we're growing for the map is not actually the exact amount the map uses.

The reason being, we want to allocate and track memory for each entry we are adding when we add it to the map, however we can't exactly estimate the overhead so we roughly estimate it. @yuzefovich came up with a 64 byte overhead per entry.

When we shrink we needa shrink by the exact amount we allocated which will not be the exact amount we allocated to seen.

Let me know if this logic makes sense to you.

@RichardJCai RichardJCai force-pushed the fix_show_create_all_tables_builtin_02252021 branch from 5108e57 to b15e9cb Compare March 5, 2021 19:12
Copy link
Contributor Author

@RichardJCai RichardJCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @RichardJCai, and @yuzefovich)


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 137 at r4 (raw file):

Previously, RichardJCai (Richard Cai) wrote…

technically

Oops yep you're right for point one, I'll update seen to nil to remove the old reference.

For the second point on memoryUsed, the amount we're growing for the map is not actually the exact amount the map uses.

The reason being, we want to allocate and track memory for each entry we are adding when we add it to the map, however we can't exactly estimate the overhead so we roughly estimate it. @yuzefovich came up with a 64 byte overhead per entry.

When we shrink we needa shrink by the exact amount we allocated which will not be the exact amount we allocated to seen.

Let me know if this logic makes sense to you.

Nvm I misunderstood what you meant, updated.

@RichardJCai RichardJCai requested a review from rafiss March 5, 2021 19:56
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on comments but lgtm! nice stuff

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @RichardJCai, and @yuzefovich)


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 209 at r5 (raw file):

// be enough.
// memoryUsed is used to determine how much memory is used by the seen map.
// memoryUsed should be Shrunk after the sort is finished.

nit: remove the memoryUsed comment now

@RichardJCai RichardJCai force-pushed the fix_show_create_all_tables_builtin_02252021 branch from b15e9cb to 87cc164 Compare March 5, 2021 20:42
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 8 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @RichardJCai, and @yuzefovich)


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 138 at r6 (raw file):

	// Clear memory used by the seen map.
	seen = nil

Not sure if you're done updating this stuff, but len(seen) will return 0 after seen = nil.

@RichardJCai RichardJCai force-pushed the fix_show_create_all_tables_builtin_02252021 branch from 87cc164 to f84f70a Compare March 5, 2021 21:06
Copy link
Contributor Author

@RichardJCai RichardJCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @RichardJCai, and @yuzefovich)


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 138 at r6 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Not sure if you're done updating this stuff, but len(seen) will return 0 after seen = nil.

Oops my brain turned off for a moment

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @RichardJCai, and @yuzefovich)


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 140 at r7 (raw file):

	sizeOfSeen := len(seen)
	seen = nil
	acc.Shrink(ctx, int64(sizeOfSeen)*sizeOfInt64)

not quite sure, but should this be acc.Shrink(ctx, int64(sizeOfSeen)*(sizeOfInt64+mapEntryOverhead))?

since on line 229 we are growing by sizeOfInt64+mapEntryOverhead for each entry that is added.

@RichardJCai RichardJCai force-pushed the fix_show_create_all_tables_builtin_02252021 branch from f84f70a to 7e5e400 Compare March 6, 2021 20:46
Copy link
Contributor Author

@RichardJCai RichardJCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @rafiss, @RichardJCai, and @yuzefovich)


pkg/sql/sem/builtins/show_create_all_tables_builtin.go, line 140 at r7 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

not quite sure, but should this be acc.Shrink(ctx, int64(sizeOfSeen)*(sizeOfInt64+mapEntryOverhead))?

since on line 229 we are growing by sizeOfInt64+mapEntryOverhead for each entry that is added.

yep it should🤦

crdb_internal.show_create_all_tables is now a generator that returns
create / alter statements row by row.

The id sort is now done on only the table ids.
The topological sort is done using a map of int64 -> int64
meaning the tableMetadata is no longer kept in ram during the sort.

Memory is now proportional to the number of tables in the database.
The amount of memory used in this builtin for storing table ids is
the same as the amount of memory needed when generating the underlying
crdb_internal.create_statements. This is technically still not constrained
so we should make it clear that the command should not be run on dbs with
an excessive number of tables.

Release justification: fix for inefficient memory usage, low risk, builtin
currently not in release version of crdb

Release note (sql change): SHOW CREATE ALL TABLES is updated to be more
memory efficient, however this should still not be run on a database with
an excessive number of tables. Users should not run this on a
database with more than 10000 tables (arbitrary but tested number)..
Release justification: None

Release note: None
@RichardJCai RichardJCai force-pushed the fix_show_create_all_tables_builtin_02252021 branch from 7e5e400 to 6141e97 Compare March 7, 2021 02:29
@RichardJCai
Copy link
Contributor Author

Thanks for the reviews!

bors r=rafiss

@craig
Copy link
Contributor

craig bot commented Mar 7, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants