Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: scan doesn't return row added in the same txn #40487

Closed
RaduBerinde opened this issue Sep 4, 2019 · 20 comments
Closed

kv: scan doesn't return row added in the same txn #40487

RaduBerinde opened this issue Sep 4, 2019 · 20 comments
Assignees
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-investigation Further steps needed to qualify. C-label will change.

Comments

@RaduBerinde
Copy link
Member

RaduBerinde commented Sep 4, 2019

I've been experimenting with TPCC and the new foreign key path. When I made an improvement that runs multiple FK checks in parallel, I started getting occasional violation errors. I was able to reproduce this locally and capture some traces.

Repro steps:

  • set up a local cluster using the origin/fk-violation-repro branch; it feels like the error happens only when the server is under heavy load so I used GOMAXPROCS=4.
GOMAXPROCS=4 ./cockroach start --insecure --logtostderr --max-sql-memory 4GB
  • import TPCC fixtures with 100 warehouses
bin/workload fixtures import tpcc --warehouses 100
  • run
bin/workload run tpcc --warehouses=100 --duration=10m --split --wait=false --method=simple --mix=newOrder=100

The error is always the same:

Error: error in newOrder: ERROR: insert on table "order_line" violates foreign key constraint "fk_ol_w_id_ref_order" (SQLSTATE 23503)

This is the parent row in the order table which should have been inserted in the same transaction.

I set up a local zipkin and below is the span for the join reader. You can find the relevant span by looking for the omfg tag ("annotation query" in zipkin UI). All 9 rows reference the same order row (57, 2, 3029) so we scan 1 span. The KV request returns no rows (0 loop iterations).

image

This is the corresponding distsender span:
image

Earlier spans in the transaction confirm that we did in fact insert a row with these values.

@awoods187 awoods187 added the C-investigation Further steps needed to qualify. C-label will change. label Sep 4, 2019
@awoods187 awoods187 added the A-kv Anything in KV that doesn't belong in a more specific category. label Sep 4, 2019
@nvanbenschoten
Copy link
Member

Could you expand on

made an improvement that runs multiple FK checks in parallel

Are you running multiple KV checks in the same batch or are you issuing multiple batches in parallel?

@RaduBerinde
Copy link
Member Author

Different batches, in parallel. When the batches are run serially I don't see the issue. They are checking different parent tables so it's a bit puzzling.

@irfansharif irfansharif self-assigned this Sep 5, 2019
@irfansharif
Copy link
Contributor

irfansharif commented Sep 11, 2019

diff --git i/pkg/kv/txn_coord_sender.go w/pkg/kv/txn_coord_sender.go
index 5a14058eda..44beb1bb4d 100644
--- i/pkg/kv/txn_coord_sender.go
+++ w/pkg/kv/txn_coord_sender.go
@@ -453,6 +453,7 @@ func NewTxnCoordSenderFactory(
 	if tcf.heartbeatInterval == 0 {
 		tcf.heartbeatInterval = base.DefaultTxnHeartbeatInterval
 	}
+	tcf.heartbeatInterval = 5 * base.DefaultTxnHeartbeatInterval
 	if tcf.metrics == (TxnMetrics{}) {
 		tcf.metrics = MakeTxnMetrics(metric.TestSampleInterval)
 	}

Discovered this yesterday. Increasing the heartbeat interval to trigger more aborted transactions drastically reduces how long it takes for the consistency error to manifest (from taking upwards of 60s to 7-9s, consistently). This points to our current flow cancellation procedure: when the root txn is aborted/abandoned, we set it up to not return results to the client so as to not miss seeing our own writes. Suspiciously we avoid doing this if the flow is local and is using the root txn, both of which apply here.

@RaduBerinde
Copy link
Member Author

CC @andreimatei

@andreimatei
Copy link
Contributor

Suspiciously we avoid doing this if the flow is local and is using the root txn

The local can't miss to see its own writes; when using the RootTxn, the transaction cleanup process (that cleans up intents, which is how one might miss to see its own writes) is synchronized with the sending and receiving of batches, and so a client is not supposed to get results back after this cleanup. At least in theory.

@irfansharif, are you looking at this? Would you like me to take over? And if so, are Radu's instructions still the best repro?

@irfansharif
Copy link
Contributor

Banged my head over this all morning again to no avail, happy to have you take over. The best repro is with the patch I added above. The missing key error only ever occurs after a flurry of aborted transactions now to be retried. Naively this diff seemed awfully suspect, though I'm not familiar enough with pkg/sql to understand the reasoning as stated here.

@irfansharif
Copy link
Contributor

I was also going to pull on this thread next, though our usage of planner seems fine despite parallelizing the two FK checks.

@RaduBerinde
Copy link
Member Author

I think that comment refers to executing different RETURNING NOTHING statements in parallel which we don't do anymore.

@jordanlewis
Copy link
Member

Yeah, that was about RETURNING NOTHING.

@jordanlewis
Copy link
Member

I think the root/leaf stuff might be a red herring. None of the queries in question get run on remote nodes. The main query is a mutation, which can't be distributed. The postqueries contain nodes that prevent distribution (errorIfRowsNode, scanBufferNode).

@andreimatei
Copy link
Contributor

Will look at this tomorrow. The money in this cubicle is on some variation of
#25329 and #22615

@andreimatei
Copy link
Contributor

So I think this is simply #25329; one can't have concurrent uses of a root transaction. And so I understand that the distsql union processor is probably broken.

I think there's also funky something funky here:

if r.txn != nil && r.txn.Type() == client.LeafTxn {

That check was changed to only affect leaf transactions, but I think it's useless for leaves.

More tomorrow.

@RaduBerinde
Copy link
Member Author

Weird.. I assume you can have multiple scans in parallel in a Leaf txn (or distsql would be badly broken)? We could maybe set up a Leaf txn even if it's all local (?)

@andreimatei
Copy link
Contributor

Yeah, Leaf transactions are fine. That's part of their point. But you can't do writes in leaf transactions.

So, we have a couple of requirements:

  1. mutations need to use the root
  2. roots don't support concurrent use
  3. processors that might execute concurrently with others need to use leaves

Concurrency is introduced by processors inputs with more than one stream, which require synchronizers (or simply a RowChannel if for an "unordered synchronizer"). We're helped, however, by the fact that a mutation forces everything to be planned on the gateway. When everything is forced on the gateway, only a union can introduce a synchronizer (I hope).

So the plan I've discussed with @jordanlewis is the following:

  1. Disallow mutations under unions. This means that everything under a union can use a leaf.
  2. Have different processors in a flow use different transactions: everything from the top of the plan above a synchronizer will use the Root (which satisfies the mutations), and everything below will use the leaf (which satisfies the need for concurrency for unions, or generally for other TableReaders when combined with remote flows).

@RaduBerinde
Copy link
Member Author

To clarify requirement 2: is it allowed to have read-only queries on leaves concurrently with a root mutation?

@andreimatei
Copy link
Contributor

is it allowed to have read-only queries on leaves concurrently with a root mutation?

yes

@andreimatei
Copy link
Contributor

I've worked on this and made progress. Have yet to deal with the columnar execution world, but @jordanlewis helped me figure out how stuff works there.

@RaduBerinde , do you want to disallow mutations under a union? You seem like the right guy for the job. And also, maybe you can think about whether there's other cases where one might end up with a mutation under a synchronizer (which synchronizer introduces concurrent execution).

@RaduBerinde
Copy link
Member Author

Yeah, very easy to do from opt. Will think about other cases.

@irfansharif irfansharif removed their assignment Sep 24, 2019
@andreimatei
Copy link
Contributor

I've made good progress on this, but I've now hit a something I wasn't prepared for: I naively thought that all vectorizedFlows have exactly one processor - a Materializer - under which a bunch of other Operators lie. And so I thought that, by starting from that root Materializer, I can visit the tree of Operators/OpNodes and tell each one what transaction they can run in.
But lo and behold, you can also have vectorizedFlows with no processors at all. You can have outboxes and routers under which there's a line of Operators too. These roots are represented in the Flow as Startables - added here.
So now I'm thinking how best to propagate the transaction through these guys. Should I pass the transaction through Startable.Start() and let each such columnar startable pass it along to its child Operators? Or should I make the vectorized flow setup process keep track of what operators need the txn regardless of where they'll end up in the final flow? All operators are created by NewColOperator, so I could have that guy maintain a list of operators that need the transaction...

@andreimatei
Copy link
Contributor

So here's where we are on this:
Originally I thought that we have the following problems within this issue:
a) concurrent mutations can end up committing unintended data. That's what Radu fixed in #40975
b) concurrent reads on the gateway can miss seeing previous writes from the same txn. Among other consequences, this can cause the surprising failure to validate FKs that was reported here. I have #41102 open to fix that. However, such reads would be retried, so... that somewhat limits the consequences.

But I've recently realized that there's also another problem, worse than b).
c) possible write skew. Described in #41173. This has to do with races with operations done by the RootTxn (on the gateway) and remote flows. So only queries that were not fully planned on the gateway were affected.

Unfortunately, my fix for b) in #41102 I believe actually makes c) more likely. Because the PR introduces uses of the LeafTxn on the gateway too, now queries that execute fully on the gateway but are not completely "fused" are also susceptible. This means queries with a union in them. And also more weird queries with an outbox on the gateway; these would be queries that are distributed, but previously the part of the gateway flow leading into the outbox would not contribute to the problem.

In other words, I've lied when I answered @RaduBerinde's question:

is it allowed to have read-only queries on leaves concurrently with a root mutation?

I said yes. I now believe the answer is no. Except the question was implicitly scoped with concurrent uses of the root and leaves on the gateway. I believe even (existing) uses of remote leaves are unsafe.

andreimatei added a commit to andreimatei/cockroach that referenced this issue Sep 30, 2019
Before this patch, a DistSQL flow running on its gateway node would use
the RootTxn for all its processors for row-based flows / all of its
operators for vectorized flows. Some of these processors/operator can
execute concurrently with one another. RootTxns don't support concurrent
requests (see cockroachdb#25329), resulting in some reads possibly missing to see
the transaction's own writes.

This patch rectifies the situation by being discriminate about what
parts of the gateway flow use the RootTxn and what parts use LeafTxns.
LeafTxns can be used concurrently (and they can be used concurrently
with a RootTxn too). Some things need the RootTxn - mutations.
Since cockroachdb#40975, mutations can no longer run concurrently with other
mutations. So, we use the RootTxn for all processors/operators fused
with the DistSQLReceiver, and we use LeafTxns for all others.

Processors and operators no longer use the txn from the FlowCtx (that
field goes away). The exact details on how we determine what runs in
what transactions depend on the type of flow.

i) For row-based flows, the "head processor" and everything fused with
it will use the RootTxn. All the other processors will use the Leaf.
This is done in FlowBase.startInternal().
Processors get the txn to use through Run() and Start() (if they
implement RowSource).

ii) For vectorized flows, I went a different route. These guys don't
have Run() method (they have an Init() but that's more inconvenient to
pass a txn to). They do, however, have facilities for visiting trees of
OpNodes. So, the operators that need a txn declare that by implementing
a new KVOp interface. At flow setup time we visit all the operators and
give them the right txn - in colexec.SetTxn().

Fixes cockroachdb#40487

Release justification: bug fix

Release note (bug fix): Fix the possibility of confusing error messages
caused by transactions that are about to abort missing to see previous
writes performed by the same transaction.
andreimatei added a commit that referenced this issue Oct 4, 2019
This patch makes it so that we can optionally fuse the inputs into
unordered and ordered sync with the sync (and also with the sync's
consumer).

Before this patch, the fusing logic was bailing on fusing ordered syncs
for no reason (other than code complexity). This patch embraces the
complexity.

Before this patch, producers of unordered syncs were always run in
parallel. This patch makes it so that they're optionally serialized
(i.e. each source is consumed before moving on to the next source). The
option is taken in case a query ends up running entirely on the gateway
(either because it was forced to run on the gateway or because all the
data happens to be on the gateway). If that's the case, then the
concurrency was not giving us much. Dissallowing concurrency in these
cases fixes #40487: concurrent use of a Root txn is not kosher and some
queries planned entirely on the gateway need to use a Root (i.e.
mutations).

The serialization of row-based unordered syncs is done by implementing
them through an orderedSync with no ordering. For vectorized ones, I've
created a new SerialUnorderedSynchronizer.

Fixes #40487

Release justification: Fixes bug.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Oct 5, 2019
This patch makes it so that we can optionally fuse the inputs into
unordered and ordered sync with the sync (and also with the sync's
consumer).

Before this patch, the fusing logic was bailing on fusing ordered syncs
for no reason (other than code complexity). This patch embraces the
complexity.

Before this patch, producers of unordered syncs were always run in
parallel. This patch makes it so that they're optionally serialized
(i.e. each source is consumed before moving on to the next source). The
option is taken in case a query ends up running entirely on the gateway
(either because it was forced to run on the gateway or because all the
data happens to be on the gateway). If that's the case, then the
concurrency was not giving us much. Dissallowing concurrency in these
cases fixes cockroachdb#40487 for "regular queries": concurrent use of a Root txn
is not kosher and some queries planned entirely on the gateway need to
use a Root (i.e.  mutations). Everything going through the "normal"
planning process is now asserted to not result in any concurrency if it
resulted in a single flow (on the gateway).

The serialization of row-based unordered syncs is done by implementing
them through an orderedSync with no ordering. For vectorized ones, I've
created a new SerialUnorderedSynchronizer.

Release justification: Fixes bug.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Oct 5, 2019
Before this patch, a DistSQL flow running on its gateway node would use
the RootTxn for all its processors for row-based flows / all of its
operators for vectorized flows if there are no remote flows. Some of
these processors/operator can execute concurrently with one another.
RootTxns don't support concurrent requests (see cockroachdb#25329), resulting in
some reads possibly missing to see the transaction's own writes.

This patch fixes things by using a LeafTxn on the gateway in case
there's concurrency on the gateway or if there's any remote flows. In
other words, the Root is used only if there's no remote flows and no
concurrency. This is sufficient for supporting mutations (which need the
Root), because mutations force everything to be planned on the gateway
and so, thanks to the previous commit, there's no concurrency if that's
the case.

Fixes cockroachdb#40487
Touches cockroachdb#24798

Release justification: Fixes bad bugs.

Release note: Fix a bug possibly leading to transactions missing to see
their own previous writes (cockroachdb#40487).
andreimatei added a commit to andreimatei/cockroach that referenced this issue Oct 5, 2019
Before this patch, a DistSQL flow running on its gateway node would use
the RootTxn for all its processors for row-based flows / all of its
operators for vectorized flows if there are no remote flows. Some of
these processors/operator can execute concurrently with one another.
RootTxns don't support concurrent requests (see cockroachdb#25329), resulting in
some reads possibly missing to see the transaction's own writes.

This patch fixes things by using a LeafTxn on the gateway in case
there's concurrency on the gateway or if there's any remote flows. In
other words, the Root is used only if there's no remote flows and no
concurrency. This is sufficient for supporting mutations (which need the
Root), because mutations force everything to be planned on the gateway
and so, thanks to the previous commit, there's no concurrency if that's
the case.

Fixes cockroachdb#40487
Touches cockroachdb#24798

Release justification: Fixes bad bugs.

Release note: Fix a bug possibly leading to transactions missing to see
their own previous writes (cockroachdb#40487).
@craig craig bot closed this as completed in 995e850 Oct 5, 2019
andreimatei added a commit to andreimatei/cockroach that referenced this issue Oct 9, 2019
This patch makes it so that we can optionally fuse the inputs into
unordered and ordered sync with the sync (and also with the sync's
consumer).

Before this patch, the fusing logic was bailing on fusing ordered syncs
for no reason (other than code complexity). This patch embraces the
complexity.

Before this patch, producers of unordered syncs were always run in
parallel. This patch makes it so that they're optionally serialized
(i.e. each source is consumed before moving on to the next source). The
option is taken in case a query ends up running entirely on the gateway
(either because it was forced to run on the gateway or because all the
data happens to be on the gateway). If that's the case, then the
concurrency was not giving us much. Dissallowing concurrency in these
cases fixes cockroachdb#40487 for "regular queries": concurrent use of a Root txn
is not kosher and some queries planned entirely on the gateway need to
use a Root (i.e.  mutations). Everything going through the "normal"
planning process is now asserted to not result in any concurrency if it
resulted in a single flow (on the gateway).

The serialization of row-based unordered syncs is done by implementing
them through an orderedSync with no ordering. For vectorized ones, I've
created a new SerialUnorderedSynchronizer.

Release justification: Fixes bug.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Oct 9, 2019
Before this patch, a DistSQL flow running on its gateway node would use
the RootTxn for all its processors for row-based flows / all of its
operators for vectorized flows if there are no remote flows. Some of
these processors/operator can execute concurrently with one another.
RootTxns don't support concurrent requests (see cockroachdb#25329), resulting in
some reads possibly missing to see the transaction's own writes.

This patch fixes things by using a LeafTxn on the gateway in case
there's concurrency on the gateway or if there's any remote flows. In
other words, the Root is used only if there's no remote flows and no
concurrency. This is sufficient for supporting mutations (which need the
Root), because mutations force everything to be planned on the gateway
and so, thanks to the previous commit, there's no concurrency if that's
the case.

Fixes cockroachdb#40487
Touches cockroachdb#24798

Release justification: Fixes bad bugs.

Release note: Fix a bug possibly leading to transactions missing to see
their own previous writes (cockroachdb#40487).
andreimatei added a commit to andreimatei/cockroach that referenced this issue Oct 9, 2019
This patch makes it so that we can optionally fuse the inputs into
unordered and ordered sync with the sync (and also with the sync's
consumer).

Before this patch, the fusing logic was bailing on fusing ordered syncs
for no reason (other than code complexity). This patch embraces the
complexity.

Before this patch, producers of unordered syncs were always run in
parallel. This patch makes it so that they're optionally serialized
(i.e. each source is consumed before moving on to the next source). The
option is taken in case a query ends up running entirely on the gateway
(either because it was forced to run on the gateway or because all the
data happens to be on the gateway). If that's the case, then the
concurrency was not giving us much. Dissallowing concurrency in these
cases fixes cockroachdb#40487 for "regular queries": concurrent use of a Root txn
is not kosher and some queries planned entirely on the gateway need to
use a Root (i.e.  mutations). Everything going through the "normal"
planning process is now asserted to not result in any concurrency if it
resulted in a single flow (on the gateway).

The serialization of row-based unordered syncs is done by implementing
them through an orderedSync with no ordering. For vectorized ones, I've
created a new SerialUnorderedSynchronizer.

Release justification: Fixes bug.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Oct 9, 2019
Before this patch, a DistSQL flow running on its gateway node would use
the RootTxn for all its processors for row-based flows / all of its
operators for vectorized flows if there are no remote flows. Some of
these processors/operator can execute concurrently with one another.
RootTxns don't support concurrent requests (see cockroachdb#25329), resulting in
some reads possibly missing to see the transaction's own writes.

This patch fixes things by using a LeafTxn on the gateway in case
there's concurrency on the gateway or if there's any remote flows. In
other words, the Root is used only if there's no remote flows and no
concurrency. This is sufficient for supporting mutations (which need the
Root), because mutations force everything to be planned on the gateway
and so, thanks to the previous commit, there's no concurrency if that's
the case.

Fixes cockroachdb#40487
Touches cockroachdb#24798

Release justification: Fixes bad bugs.

Release note: Fix a bug possibly leading to transactions missing to see
their own previous writes (cockroachdb#40487).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-investigation Further steps needed to qualify. C-label will change.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants