Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/schemachanger: add back TXN_DROPPED, stop using OFFLINE #86639

Merged
merged 6 commits into from
Sep 1, 2022

Conversation

ajwerner
Copy link
Contributor

@ajwerner ajwerner commented Aug 23, 2022

The first two commits enable synthetic descriptors to be used during transaction
execution to influence the virtual tables.

We added the OFFLINE and performed write operations during drops to move the
descriptor into that state in order to deal with the fact that synthetically
dropped descriptors would not appear so in virtual tables. Fortunately, this
shortcoming has been rectified. We now revive the TXN_DROPPED state. This
proves to be important because the OFFLINE state is not backwards compatible,
and the TXN_DROPPED state is. Ideally we'd find a way to move the descriptors
straight to DROPPED in the pre-commit case for all the cases we today support,
but doing that would require an overhaul of the dependency rules which, at this
point in the cycle, feels risky.

Fixes #86626
Fixes #77148
Fixes #86691

Release justification: Fixes a release-blocking backwards incompatibility.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ajwerner ajwerner force-pushed the ajwerner/add-back-txn-dropped branch from c3c22c6 to 5e13723 Compare August 23, 2022 13:51
@ajwerner
Copy link
Contributor Author

This turns out to be insufficient; we need to make it such that the bulk of the work in the drop cases does happen during pre-commit. Otherwise, we're stuck with incompatibilities.

@ajwerner ajwerner changed the title sql/catalog/descs: combine synthetic descriptors with stored descriptors sql/schemachanger: add back TXN_DROPPED, stop using OFFLINE Aug 23, 2022
@ajwerner ajwerner force-pushed the ajwerner/add-back-txn-dropped branch from 5e13723 to 1f1ee74 Compare August 25, 2022 17:29
@Xiang-Gu Xiang-Gu force-pushed the ajwerner/add-back-txn-dropped branch 2 times, most recently from f3e316b to ab3df2b Compare August 29, 2022 16:51
@Xiang-Gu
Copy link
Contributor

I fixed a few build errors, rebase on Master, and ran --rewrite on affected tests in the above push.

CI now reports test failures; we will need to look into them next.

descriptor.typeFilter(IsDescriptor),
descriptor.joinTarget(),
joinOnDescIDUntyped(descriptor.el, element, "id"),
descriptor.targetStatus(scpb.ToAbsent),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this line be descriptor.targetStatus(scpb.ToPublic)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we want this rule to unify if we are dropping the descriptor. The NotJoin implies negation where it is used if everything inside the rule unifies.

from.typeFilter(IsDescriptor),
from.el.AttrEqVar(screl.DescID, "_"),
from.el.AttrEqVar(rel.Self, to.el),
to.el.AttrEqVar(rel.Self, to.el),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this line always true and hence redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, removed

registerDepRule(
"descriptor txn_drop before descriptor drop",
scgraph.PreviousStagePrecedence,
"descriptor", "dependent",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this rule is about making sure one descriptor reached TXN_DROPPED before reaching DROPPED in a previous stage, then we probably should change the name of the second variable dependent to, maybe, descriptor_self.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed

type statusFunc = func(
from nodeVars, fromStatus scpb.Status, to nodeVars, toStatus scpb.Status,
) rel.Clause
addRules := func(c depElement, targetStatus scpb.TargetStatus, sf statusFunc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems sf statusFunc is not used. Did I miss something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you're right. Removed.

Comment on lines 99 to 100
// PreviousTransactionPrecedence is like PreviousStagePrecedence but does
// not allow the transition to occur if the current phase is not PostCommit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be elobarate a bit more on the comment? Does it meanfrom must be reached before to in a previous stage, and both stages must be in PostCommit phase? If so, what do you think of the name PreviousStageInPostCommitPrecendence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that from must be reached before to in a previous stage and if from was reached in StatementPhase than to cannot be reached in PreCommitPhase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to both better naming and a better comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does not allow the transition to occur unless the current phase is at least PostCommitPhase, because StatementPhase and PreCommitPhase are special in that they take place in the same transaction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ajwerner ajwerner force-pushed the ajwerner/add-back-txn-dropped branch from ab3df2b to ea5ba71 Compare August 30, 2022 17:22
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look and fixing the issues. The validation things are interesting. I'll get to the bottom of them one way or another. I think I addressed the basic comments, but did it on the old version of the branch. I'll find and add back your changes.

Comment on lines 99 to 100
// PreviousTransactionPrecedence is like PreviousStagePrecedence but does
// not allow the transition to occur if the current phase is not PostCommit.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that from must be reached before to in a previous stage and if from was reached in StatementPhase than to cannot be reached in PreCommitPhase.

Comment on lines 99 to 100
// PreviousTransactionPrecedence is like PreviousStagePrecedence but does
// not allow the transition to occur if the current phase is not PostCommit.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to both better naming and a better comment.

type statusFunc = func(
from nodeVars, fromStatus scpb.Status, to nodeVars, toStatus scpb.Status,
) rel.Clause
addRules := func(c depElement, targetStatus scpb.TargetStatus, sf statusFunc) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you're right. Removed.

registerDepRule(
"descriptor txn_drop before descriptor drop",
scgraph.PreviousStagePrecedence,
"descriptor", "dependent",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed

from.typeFilter(IsDescriptor),
from.el.AttrEqVar(screl.DescID, "_"),
from.el.AttrEqVar(rel.Self, to.el),
to.el.AttrEqVar(rel.Self, to.el),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, removed

descriptor.typeFilter(IsDescriptor),
descriptor.joinTarget(),
joinOnDescIDUntyped(descriptor.el, element, "id"),
descriptor.targetStatus(scpb.ToAbsent),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we want this rule to unify if we are dropping the descriptor. The NotJoin implies negation where it is used if everything inside the rule unifies.

@ajwerner ajwerner force-pushed the ajwerner/add-back-txn-dropped branch 3 times, most recently from 8763145 to cff1905 Compare August 30, 2022 20:47
Copy link
Contributor

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is coming along nicely!

@@ -560,9 +560,18 @@ func (s *TestState) MustReadDescriptor(ctx context.Context, id descpb.ID) catalo
return desc
}

// AddSyntheticDescriptor is part of the scexec.Catalog interface.
func (s *TestState) AddSyntheticDescriptor(desc catalog.MutableDescriptor) {
s.LogSideEffectf("added synthetic descriptor: %v", desc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead print the yaml diff between the synthetic descriptor and the existing immutable descriptor if it exists?

@@ -117,4 +117,6 @@ type MutationVisitorStateUpdater interface {

// RefreshStats refresh stats for a given descriptor.
RefreshStats(id descpb.ID)
MarkDescriptorSynthetic(id descpb.ID)
// MarkDescriptorSynthetic(id descpb.ID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detritus

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -322,7 +322,7 @@ func (sb stageBuilder) isOutgoingOpEdgeAllowed(e *scgraph.OpEdge) bool {
if e.Type() != sb.opType {
return false
}
if !e.IsPhaseSatisfied(sb.bs.phase) {
if !e.IsPhaseSatisfied(sb.bs.phase) && !sb.bc.g.IsNoOp(e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. By the way, you probably should remove hasOpEdgeWithOps and the logic around it in this PR. Search for // TODO(postamar): uphold the 2-version invariant using dep rules instead.

func makeRandomVars(n int) (ret []Var) {
ret = make([]Var, n)
for i := range ret {
ret[i] = Var(hex.EncodeToString(uuid.MakeV4().GetBytes()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to hurt legibility when debugging. Can't we instead use a reserved prefix followed by a counter? You probably can assert that this reserved prefix isn't used in vars outside of rel.

// if we have a not-join, then we need to find the slots for the inputs
// and we have to build the sub-query, which is a whole new query, and
// we have to then figure out its depth. In general, its depth should be
// the current join depth, so the current
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incomplete comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved into processNotJoin


}

// notJoinOnNodeWithStatusIn is a cache to memoize getNotJoinOnNodeWithStatusIn. a not-join rule which will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incomplete comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the tail

// IsPhaseSatisfied returns true iff the operations can run in the given phase.
func (oe *OpEdge) IsPhaseSatisfied(phase scop.Phase) bool {
return phase >= oe.minPhase
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good riddance.

@@ -361,7 +358,7 @@ func (sb stageBuilder) nextTargetState(t currentTargetState) currentTargetState
} else {
next.hasOpEdgeWithOps = true
}
}
}*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just get rid of this, as suggested in a previous comment, along with hasOpEdgeWithOps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if sb.isUnmetInboundDep(de) {
ret = true
ret = sb.isUnmetInboundDep(de)
if ret {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is this really more readable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there was some other code in an earlier iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I combined it.

return !fromIsFulfilled ||
(sb.bs.phase <= scop.PreCommitPhase &&
// If it has been fulfilled implicitly because it's the initial
// status, then it doesn't matter the current phase, let it go.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then the current phase doesn't matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are just nits, so LGTM.

We added the `OFFLINE` and performed write operations during drops to move the
descriptor into that state in order to deal with the fact that synthetically
dropped descriptors would not appear so in virtual tables. Fortunately, this
shortcoming has been rectified. We now revive the TXN_DROPPED state. This
proves to be important because the `OFFLINE` state is not backwards compatible,
and the `TXN_DROPPED` state is. Ideally we'd find a way to move the descriptors
straight to `DROPPED` in the pre-commit case for all the cases we today
support, but doing that would require an overhaul of the dependency rules
which,at this point in the cycle, feels risky.

Fixes cockroachdb#86626

Release justification: Fixes a release-blocking backwards incompatibility.

Release note: None
…reCommit

We can schedule non-revertible operations into PreCommit so long as the next
stage is PostCommitNonRevertible.

Release justification: part of bug fix

Release note: None
This is just to avoid bugs later.

Release justification: cleanup

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/add-back-txn-dropped branch from a6c8876 to 6a7477d Compare September 1, 2022 03:14
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took your changes and addressed the comments.

return !fromIsFulfilled ||
(sb.bs.phase <= scop.PreCommitPhase &&
// If it has been fulfilled implicitly because it's the initial
// status, then it doesn't matter the current phase, let it go.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if sb.isUnmetInboundDep(de) {
ret = true
ret = sb.isUnmetInboundDep(de)
if ret {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there was some other code in an earlier iteration.

if sb.isUnmetInboundDep(de) {
ret = true
ret = sb.isUnmetInboundDep(de)
if ret {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I combined it.

@@ -361,7 +358,7 @@ func (sb stageBuilder) nextTargetState(t currentTargetState) currentTargetState
} else {
next.hasOpEdgeWithOps = true
}
}
}*/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 99 to 100
// PreviousTransactionPrecedence is like PreviousStagePrecedence but does
// not allow the transition to occur if the current phase is not PostCommit.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


}

// notJoinOnNodeWithStatusIn is a cache to memoize getNotJoinOnNodeWithStatusIn. a not-join rule which will
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the tail

panic(err)
}
}
_ = screl.ForEachElementType(func(el scpb.Element) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, deleted the new one.

// if we have a not-join, then we need to find the slots for the inputs
// and we have to build the sub-query, which is a whole new query, and
// we have to then figure out its depth. In general, its depth should be
// the current join depth, so the current
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved into processNotJoin

@@ -117,4 +117,6 @@ type MutationVisitorStateUpdater interface {

// RefreshStats refresh stats for a given descriptor.
RefreshStats(id descpb.ID)
MarkDescriptorSynthetic(id descpb.ID)
// MarkDescriptorSynthetic(id descpb.ID)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ajwerner ajwerner marked this pull request as ready for review September 1, 2022 03:35
@ajwerner ajwerner requested review from a team and miretskiy September 1, 2022 03:35
@ajwerner ajwerner force-pushed the ajwerner/add-back-txn-dropped branch 4 times, most recently from e67e63e to 2ada0c2 Compare September 1, 2022 06:34
@postamar
Copy link
Contributor

postamar commented Sep 1, 2022

CI is failing due to a compilation error:
pkg/sql/schemachanger/rel/query_build.go:323:3: unnecessary assignment to the blank identifier (S1005)

Let's 🚢 🇮🇹. My very minor concern about the var generator can wait another day.

@ajwerner ajwerner force-pushed the ajwerner/add-back-txn-dropped branch from 2ada0c2 to 1051bf0 Compare September 1, 2022 12:52
This patch endows rel with the ability to define not-join rules which allow
extra variables to be introduced inside the clauses. When a not-join rule is
invoked, it leads to the outer query finding a contradiction if the rule fully
unifies.

Fixes cockroachdb#77148

Release justification: part of a broader bug-fix

Release note: None
…ules

Previously we had implicit rules in opgen which were used to enforce
sequencing between transitions. This was brittle and made it hard to
collapse certain edges. Now that we can represent a not-join, we can
add the rules we want selectively, only when descriptors are not being
dropped.

Fixes cockroachdb#86691
Fixes cockroachdb#86626

Release justification: fixes a bug

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/add-back-txn-dropped branch from 1051bf0 to affdfee Compare September 1, 2022 12:53
@ajwerner
Copy link
Contributor Author

ajwerner commented Sep 1, 2022

TFTR!

bors r=postamar

@craig
Copy link
Contributor

craig bot commented Sep 1, 2022

Build succeeded:

@craig craig bot merged commit 9911916 into cockroachdb:master Sep 1, 2022
@ajwerner ajwerner deleted the ajwerner/add-back-txn-dropped branch September 1, 2022 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants