Skip to content

Commit

Permalink
Close translog writer if exception on write channel (#29401)
Browse files Browse the repository at this point in the history
Today we close the translog write tragically if we experience any I/O
exception on a write. These tragic closes lead to use closing the
translog and failing the engine. Yet, there is one case that is missed
which is when we touch the write channel during a read (checking if
reading from the writer would put us past what has been flushed). This
commit addresses this by closing the writer tragically if we encounter
an I/O exception on the write channel while reading. This becomes
interesting when we consider that this method is invoked from the engine
through the translog as part of getting a document from the
translog. This means we have to consider closing the translog here as
well which will cascade up into us finally failing the engine.

Note that there is no semantic change to, for example, primary/replica
resync and recovery. These actions will take a snapshot of the translog
which syncs the translog to disk. If an I/O exception occurs during the
sync we already close the writer tragically and once we have synced we
do not ever read past the position that was synced while taking the
snapshot.
  • Loading branch information
jasontedor committed Apr 6, 2018
1 parent 3ef753e commit b720bd6
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 9 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -584,7 +584,16 @@ public Operation readOperation(Location location) throws IOException {
if (current.generation == location.generation) {
// no need to fsync here the read operation will ensure that buffers are written to disk
// if they are still in RAM and we are reading onto that position
return current.read(location);
try {
return current.read(location);
} catch (final IOException e) {
try {
closeOnTragicEvent(e);
} catch (final Exception inner) {
e.addSuppressed(inner);
}
throw e;
}
} else {
// read backwards - it's likely we need to read on that is recent
for (int i = readers.size() - 1; i >= 0; i--) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -380,16 +380,25 @@ public boolean syncUpTo(long offset) throws IOException {

@Override
protected void readBytes(ByteBuffer targetBuffer, long position) throws IOException {
if (position + targetBuffer.remaining() > getWrittenOffset()) {
synchronized (this) {
// we only flush here if it's really really needed - try to minimize the impact of the read operation
// in some cases ie. a tragic event we might still be able to read the relevant value
// which is not really important in production but some test can make most strict assumptions
// if we don't fail in this call unless absolutely necessary.
if (position + targetBuffer.remaining() > getWrittenOffset()) {
outputStream.flush();
try {
if (position + targetBuffer.remaining() > getWrittenOffset()) {
synchronized (this) {
// we only flush here if it's really really needed - try to minimize the impact of the read operation
// in some cases ie. a tragic event we might still be able to read the relevant value
// which is not really important in production but some test can make most strict assumptions
// if we don't fail in this call unless absolutely necessary.
if (position + targetBuffer.remaining() > getWrittenOffset()) {
outputStream.flush();
}
}
}
} catch (final IOException e) {
try {
closeWithTragicEvent(e);
} catch (final IOException inner) {
e.addSuppressed(inner);
}
throw e;
}
// we don't have to have a lock here because we only write ahead to the file, so all writes has been complete
// for the requested location.
Expand Down

0 comments on commit b720bd6

Please sign in to comment.