Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importccl: move stub stats logic outside publish txn #70447

Merged
merged 1 commit into from
Sep 21, 2021

Conversation

adityamaru
Copy link
Contributor

This change pulls the logic related to inserting table statistic
stubs for new tables created during the import, out of the txn
closure that sets the tables to public from offline.
This was motivated by the failure mode observed in #70410 where
we failed to write the stats stub for a particular table and ended
up with a descriptor published in the offline state.

Fixes: #70410

Release note: None

@adityamaru adityamaru requested review from dt, ajwerner, michae2 and a team September 20, 2021 21:52
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@adityamaru
Copy link
Contributor Author

adityamaru commented Sep 20, 2021

I don't have a clear RCA for the associated issue yet, so I'm hoping that this proposed solution might get someone to see something I missed. Regardless, I feel more comfortable keeping the table publishing closure as small as possible, especially since the stats stubbing is best effort, and swallowing the error seems to be tickling something.

Another open question is does this refactor mean we should add a details.StatsInserted bool to prevent us from inserting multiple stubs across resumptions? Does it matter?

@adityamaru adityamaru removed the request for review from a team September 20, 2021 21:54
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:/ I want to reach enlightenment on this.

Nevertheless, the change itself :lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt and @michae2)


pkg/ccl/importccl/import_stmt.go, line 2371 at r1 (raw file):

					statistic.NullCount = nullCount
				}
				err = stats.InsertNewStats(ctx, execCfg.InternalExecutor, nil /* txn */, statistics)

when I look at this, I feel sad about how sequential it is is. Hope this isn't a global cluster with hecka columns.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, actually, there's probably a reason it was in the txn. We probably want the stats published at or before we make the tables public, no? If we're going to do it, maybe do it before?

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt and @michae2)

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at this. Original motivation for same txn was to write stub stats atomically with table publishing, but before publishing would also be fine. After publishing would leave a gap during which queries would be planned without stats. (Granted, it's best-effort anyway, so not a huge issue, but might as well not have a gap.)

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt)


pkg/ccl/importccl/import_stmt.go, line 2371 at r1 (raw file):

Previously, ajwerner wrote…

when I look at this, I feel sad about how sequential it is is. Hope this isn't a global cluster with hecka columns.

Fair point. I'm starting work on distinct and null counts. @adityamaru if you add something like TODO(michae2): parallelize somewhere I can take a shot at it.

This change pulls the logic related to inserting table statistic
stubs for new tables created during the import, out of the txn
closure that sets the tables to `public` from `offline`.
This was motivated by the failure mode observed in cockroachdb#70410 where
we failed to write the stats stub for a particular table and ended
up with a descriptor published in the offline state.

Fixes: cockroachdb#70410

Release note: None
Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a clear RCA for the associated issue yet, so I'm hoping that this proposed solution might get someone to see something I missed. Regardless, I feel more comfortable keeping the table publishing closure as small as possible, especially since the stats stubbing is best effort, and swallowing the error seems to be tickling something.

I'm not sure exactly what TransactionAbortedError(ABORT_REASON_CLIENT_REJECT): "unnamed" means yet, but it seems bad in general to keep using the same txn after the abort. So this is definitely an improvement.

Another open question is does this refactor mean we should add a details.StatsInserted bool to prevent us from inserting multiple stubs across resumptions? Does it matter?

I don't think it matters for query planning (the optimizer will simply use the latest stats). But as @ajwerner points out writing the statistics is currently serial, so if the import has many tables or columns, writing redundant stats will add some extra latency to the IMPORT. Adding the bool sounds nice, if it's not too much trouble.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dt and @michae2)

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: This is fine, too.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @dt and @michae2)

@adityamaru
Copy link
Contributor Author

TFTR! Bazel flake is unrelated.

bors r=ajwerner,michae2

@craig
Copy link
Contributor

craig bot commented Sep 21, 2021

This PR was included in a batch that was canceled, it will be automatically retried

@adityamaru
Copy link
Contributor Author

For posterity, the RCA from Nathan in the related issue is consistent with what we were observing. I feel better about this change now 🙂

@craig
Copy link
Contributor

craig bot commented Sep 21, 2021

Build succeeded:

@craig craig bot merged commit 16d935d into cockroachdb:master Sep 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: tpccbench/nodes=9/cpu=4/multi-region failed
4 participants