Fixing transaction_timed_out errors with long running reads. #182

apkar · 2019-04-25T23:05:11Z

This patch attempts to fix transaction_timed_out errors in read path

Remove wait() in choose-when block of doNonIsolatedRO, instead added another when block to represent the same logic.
Running delay, that controls transaction split logic, at a higher priority.

src/QLPlan.actor.cpp

dongxinEric

LGTM

alexmiller-apple · 2019-04-25T23:08:22Z

src/QLPlan.actor.cpp

-						first = false;
+			try {
+				loop choose {
+					when(Reference<ScanReturnedContext> doc = waitNext(docs)) {


What is the priority at which this stream is yielding results?

Doc Layer code that is sending documents to this stream is not setting any priority explicitly. Source of the data is FDB flow bindings. Flow bindings set the priority to TaskDefaultOnMainThread (7500) and delay default priority is TaskDefaultDelay (7010). I tried this with TaskProxyGRVTimer (8510) instead of max priority, that too seems to be fixing the issue. Shall I go with it?

alexmiller-apple · 2019-04-25T23:14:17Z

src/QLPlan.actor.cpp

-			loop choose {
-				when(state Reference<ScanReturnedContext> doc = waitNext(docs)) {
-					// throws end_of_stream when totally finished
-					Void _ = wait(outerLock->take());


Would it be equivalent and simpler to just do Void _ = wait( outerLock->take() || timeout ); instead of splitting this into two functions?

If I am thinking right, it would look like this

loop choose { when(state Reference<ScanReturnedContext> doc = waitNext(docs)) { // throws end_of_stream when totally finished Void _ = wait(outerLock->take() || timeout); if (!timeout.isReady()) { ..... } } when(Void _ = wait(timeout)) { break; } } // squeeze remaining documents from the stream // checkpoint planner

This will need special handling with !timeout.isReady().

And due to the way query planner checkpoint scheme works, it's important there are no documents pending in document stream when the checkpoint is being called, that would be below this loop. I guess I can have extra code before the checkpoint to squeeze the documents left in the stream.

If so, probably having 3 when blocks probably clearer code.

alexmiller-apple · 2019-04-25T23:22:48Z

src/QLPlan.actor.cpp

 			innerCheckpoint = innerCheckpoint->stopAndCheckpoint();

+			while (!bufferedDocs.empty()) {
+				Void _ = wait(bufferedDocs.front().second);
+				output.send(bufferedDocs.front().first);


Do you not need to innerLock->release(); here also?

Because of checkpoint at line 730, innerLock is not valid anymore, no one is waiting on it.

src/QLPlan.actor.cpp

alexmiller-apple · 2019-04-25T23:28:01Z

src/QLPlan.actor.cpp

+						// throws end_of_stream when totally finished
+						bufferedDocs.push_back(std::make_pair(doc, outerLock->take()));
+					}
+					when(Void _ = wait(bufferedDocs.empty() ? Never() : bufferedDocs.front().second)) {


You're taking outerLock, but releasing innerLock, so what's making sure that you actually process documents as you add them?

alexmiller-apple · 2019-04-25T23:28:13Z

src/QLPlan.actor.cpp

+				loop choose {
+					when(Reference<ScanReturnedContext> doc = waitNext(docs)) {
+						// throws end_of_stream when totally finished
+						bufferedDocs.push_back(std::make_pair(doc, outerLock->take()));


Do you need to pass a priority to take() as well?

Co-Authored-By: apkar <[email protected]>

`delay()`

…nto read_timeouts

dongxinEric · 2019-04-26T01:48:02Z

src/QLPlan.actor.cpp

@@ -699,22 +699,23 @@ ACTOR static Future<Void> doNonIsolatedRO(PlanCheckpoint* outerCheckpoint,
 			state FutureStream<Reference<ScanReturnedContext>> docs = subPlan->execute(innerCheckpoint.getPtr(), dtr);
 			state FlowLock* innerLock = innerCheckpoint->getDocumentFinishedLock();
 			state bool first = true;
-			state Future<Void> timeout = delay(3.0, TaskMaxPriority);
+			state Future<Void> timeout = delay(3.0, g_network->getCurrentTask() + 1);


g_network->getCurrentTask() + 1 could still give you a task priority lower than TaskDefaultOnMainThread right?

Idea is current task is running at TaskDefaultMainThread as that’s the one setting the promise stream.

Fixing transaction_timed_out errors with long running reads.

da7ff49

apkar requested review from dongxinEric and alexmiller-apple April 25, 2019 23:05

dongxinEric reviewed Apr 25, 2019

View reviewed changes

src/QLPlan.actor.cpp Show resolved Hide resolved

dongxinEric approved these changes Apr 25, 2019

View reviewed changes

alexmiller-apple reviewed Apr 25, 2019

View reviewed changes

alexmiller-apple and others added 3 commits April 25, 2019 17:07

Update src/QLPlan.actor.cpp

90c209f

Co-Authored-By: apkar <[email protected]>

Use g_network->getCurrentTask() + 1 instead of TaskMaxPriority for

d48fd25

`delay()`

Merge branch 'read_timeouts' of github.com:apkar/fdb-document-layer i…

bd002d1

…nto read_timeouts

alexmiller-apple approved these changes Apr 26, 2019

View reviewed changes

apkar self-assigned this Apr 26, 2019

dongxinEric reviewed Apr 26, 2019

View reviewed changes

apkar merged commit 9835d44 into FoundationDB:master Apr 26, 2019

apkar mentioned this pull request May 22, 2019

"transaction_timeout" on long running requests #162

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing transaction_timed_out errors with long running reads. #182

Fixing transaction_timed_out errors with long running reads. #182

apkar commented Apr 25, 2019

dongxinEric left a comment

alexmiller-apple Apr 25, 2019

apkar Apr 26, 2019

alexmiller-apple Apr 25, 2019

apkar Apr 26, 2019 •

edited

Loading

alexmiller-apple Apr 25, 2019

apkar Apr 26, 2019

alexmiller-apple Apr 25, 2019

alexmiller-apple Apr 25, 2019

dongxinEric Apr 26, 2019

apkar Apr 26, 2019

Fixing transaction_timed_out errors with long running reads. #182

Fixing transaction_timed_out errors with long running reads. #182

Conversation

apkar commented Apr 25, 2019

dongxinEric left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apkar Apr 26, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apkar Apr 26, 2019 •

edited

Loading