Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing transaction_timed_out errors with long running reads. #182

Merged
merged 4 commits into from
Apr 26, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 39 additions & 19 deletions src/QLPlan.actor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -687,38 +687,59 @@ ACTOR static Future<Void> doNonIsolatedRO(PlanCheckpoint* outerCheckpoint,
Reference<MetadataManager> mm) {
if (!dtr)
dtr = self->newTransaction();
state double startt = now();
state Reference<PlanCheckpoint> innerCheckpoint(new PlanCheckpoint);
state int nTransactions = 1;
state int64_t nResults = 0;
state FlowLock* outerLock = outerCheckpoint->getDocumentFinishedLock();
state Deque<std::pair<Reference<ScanReturnedContext>, Future<Void>>> bufferedDocs;

try {
state uint64_t metadataVersion = wait(cx->bindCollectionContext(dtr)->getMetadataVersion());
loop {
state FutureStream<Reference<ScanReturnedContext>> docs = subPlan->execute(innerCheckpoint.getPtr(), dtr);
state FlowLock* innerLock = innerCheckpoint->getDocumentFinishedLock();
state bool first = true;
state Future<Void> timeout = delay(3.0);
state Future<Void> timeout = delay(3.0, g_network->getCurrentTask() + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

g_network->getCurrentTask() + 1 could still give you a task priority lower than TaskDefaultOnMainThread right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea is current task is running at TaskDefaultMainThread as that’s the one setting the promise stream.

state bool finished = false;

loop choose {
when(state Reference<ScanReturnedContext> doc = waitNext(docs)) {
// throws end_of_stream when totally finished
Void _ = wait(outerLock->take());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be equivalent and simpler to just do Void _ = wait( outerLock->take() || timeout ); instead of splitting this into two functions?

Copy link
Contributor Author

@apkar apkar Apr 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am thinking right, it would look like this

loop choose {
	when(state Reference<ScanReturnedContext> doc = waitNext(docs)) {
	// throws end_of_stream when totally finished
		Void _ = wait(outerLock->take() || timeout);
		if (!timeout.isReady()) {
                       .....
		}
	}
	when(Void _ = wait(timeout)) { break; }
}

// squeeze remaining documents from the stream

// checkpoint planner

This will need special handling with !timeout.isReady().

And due to the way query planner checkpoint scheme works, it's important there are no documents pending in document stream when the checkpoint is being called, that would be below this loop. I guess I can have extra code before the checkpoint to squeeze the documents left in the stream.

If so, probably having 3 when blocks probably clearer code.

dongxinEric marked this conversation as resolved.
Show resolved Hide resolved
innerLock->release();
output.send(doc);
++nResults;
if (first) {
timeout = delay(DOCLAYER_KNOBS->NONISOLATED_INTERNAL_TIMEOUT);
first = false;
try {
loop choose {
when(Reference<ScanReturnedContext> doc = waitNext(docs)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the priority at which this stream is yielding results?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc Layer code that is sending documents to this stream is not setting any priority explicitly. Source of the data is FDB flow bindings. Flow bindings set the priority to TaskDefaultOnMainThread (7500) and delay default priority is TaskDefaultDelay (7010). I tried this with TaskProxyGRVTimer (8510) instead of max priority, that too seems to be fixing the issue. Shall I go with it?

// throws end_of_stream when totally finished
bufferedDocs.push_back(std::make_pair(doc, outerLock->take(g_network->getCurrentTask() + 1)));
}
when(Void _ = wait(bufferedDocs.empty() ? Never() : bufferedDocs.front().second)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're taking outerLock, but releasing innerLock, so what's making sure that you actually process documents as you add them?

innerLock->release();
output.send(bufferedDocs.front().first);
bufferedDocs.pop_front();
++nResults;
if (first) {
timeout =
delay(DOCLAYER_KNOBS->NONISOLATED_INTERNAL_TIMEOUT, g_network->getCurrentTask() + 1);
first = false;
}
}
when(Void _ = wait(timeout)) { break; }
}
when(Void _ = wait(timeout)) { break; }
ASSERT(!docs.isReady());
} catch (Error& e) {
if (e.code() != error_code_end_of_stream)
throw;
finished = true;
}

ASSERT(!docs.isReady());

innerCheckpoint = innerCheckpoint->stopAndCheckpoint();

while (!bufferedDocs.empty()) {
Void _ = wait(bufferedDocs.front().second);
output.send(bufferedDocs.front().first);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you not need to innerLock->release(); here also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of checkpoint at line 730, innerLock is not valid anymore, no one is waiting on it.

bufferedDocs.pop_front();
++nResults;
}

if (finished)
throw end_of_stream();

dtr = self->newTransaction();
state uint64_t newMetadataVersion = wait(cx->bindCollectionContext(dtr)->getMetadataVersion());
if (newMetadataVersion != metadataVersion) {
Expand Down Expand Up @@ -763,7 +784,7 @@ ACTOR static Future<Void> doNonIsolatedRW(PlanCheckpoint* outerCheckpoint,
state FlowLock* innerLock = innerCheckpoint->getDocumentFinishedLock();
state bool first = true;
state bool finished = false;
state Future<Void> timeout = delay(3.0);
state Future<Void> timeout = delay(3.0, g_network->getCurrentTask() + 1);
state Deque<std::pair<Reference<ScanReturnedContext>, Future<Void>>> committingDocs;
state Deque<Reference<ScanReturnedContext>> bufferedDocs;

Expand All @@ -781,7 +802,8 @@ ACTOR static Future<Void> doNonIsolatedRW(PlanCheckpoint* outerCheckpoint,
waitNext(docs)) { // throws end_of_stream when totally finished
committingDocs.push_back(std::make_pair(doc, doc->commitChanges()));
if (first) {
timeout = delay(DOCLAYER_KNOBS->NONISOLATED_INTERNAL_TIMEOUT);
timeout = delay(DOCLAYER_KNOBS->NONISOLATED_INTERNAL_TIMEOUT,
g_network->getCurrentTask() + 1);
first = false;
}
}
Expand Down Expand Up @@ -1148,10 +1170,8 @@ ACTOR static Future<Void> findAndModify(PlanCheckpoint* outerCheckpoint,
PromiseStream<Reference<ScanReturnedContext>> output) {
if (!dtr)
dtr = NonIsolatedPlan::newTransaction(database);
state double startt = now();
state Reference<PlanCheckpoint> innerCheckpoint(new PlanCheckpoint);
state int nTransactions = 1;
state int64_t nResults = 0;
state FlowLock* outerLock = outerCheckpoint->getDocumentFinishedLock();
state Reference<ScanReturnedContext> firstDoc;
state bool any = false;
Expand Down