Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add primary term to translog header #29227

Merged
merged 18 commits into from
Apr 12, 2018
Merged

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Mar 23, 2018

This commit adds the current primary term to the header of the current translog file. Having a term in a translog header is a prerequisite step that allows us to trim translog operations given the max valid seq# for that term.

This commit also updates tests to conform the primary term invariant which guarantees that all translog operations in a translog file have its terms at most the term stored in the translog header.

This commit adds the current primary term to the header of the current
translog file. Having a term in a translog file allows us to trim
translog operations in that file given the max valid seq# for that term.

This commit also updates tests to conform the primary term invariant which
guarantees that all translog operations in a translog file have its terms at
most the term stored in the translog header.
@dnhatn dnhatn added >enhancement v7.0.0 v6.3.0 :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Mar 23, 2018
@dnhatn dnhatn requested review from s1monw and bleskes March 23, 2018 19:49
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@dnhatn
Copy link
Member Author

dnhatn commented Mar 25, 2018

@bleskes Please let me know if I need to split this into two separate PRs.

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some initial comments. looks good

// Write primary term
out.writeLong(primaryTerm);
// Checksum header
out.writeInt((int) out.getChecksum());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should call out.flush() here to ensure everything is written to the channel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


/**
* Writes this header with the latest format into the file channel
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if you need to put a SuppressWarning here since we don't close the channel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I added a SuppressWarning for both Eclipse and Intellij.

try {
InputStreamStreamInput headerStream = new InputStreamStreamInput(java.nio.channels.Channels.newInputStream(channel),
channel.size()); // don't close
// Lucene's CodecUtil writes a magic number of 0x3FD76C17 with the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR removes this LUCENE_CODEC_HEADER_BYTE check etc. since we don't support it, correct. Maybe we should keep it and fail with a descriptive error message instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've restored this.

dnhatn added 3 commits March 26, 2018 11:25
# Conflicts:
#	server/src/test/java/org/elasticsearch/index/engine/EngineDiskUtilsTests.java
#	server/src/test/java/org/elasticsearch/index/shard/RefreshListenersTests.java
@dnhatn
Copy link
Member Author

dnhatn commented Mar 26, 2018

@s1monw I've addressed your comments. Would you please take another look? Thank you.

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I left some nits.

@@ -140,10 +141,11 @@
*/
public Translog(
final TranslogConfig config, final String translogUUID, TranslogDeletionPolicy deletionPolicy,
final LongSupplier globalCheckpointSupplier) throws IOException {
final LongSupplier globalCheckpointSupplier, final LongSupplier primaryTermSupplier) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add java docs for the term supplier ? also describe what the term means here - i.e., it is sampled when a generation is rolled and the translog will reject operations with a higher term until rolled again.

@@ -1723,7 +1728,7 @@ static String createEmptyTranslog(Path location, long initialGlobalCheckpoint, S
final String translogUUID = UUIDs.randomBase64UUID();
TranslogWriter writer = TranslogWriter.create(shardId, translogUUID, 1, location.resolve(getFilename(1)), channelFactory,
new ByteSizeValue(10), 1, initialGlobalCheckpoint,
() -> { throw new UnsupportedOperationException(); }, () -> { throw new UnsupportedOperationException(); }
() -> { throw new UnsupportedOperationException(); }, () -> { throw new UnsupportedOperationException(); }, 0L
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we make people pass the term to this method? If it's difficult, it's not a big deal but I feel it will be good to have the right term on the file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed ad5a212

private final long primaryTerm;
private final int headerSizeInBytes;

TranslogHeader(String translogUUID, long primaryTerm) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add java docs about what the uuid and term mean? i.e., what do they enforce?

return headerSizeInBytes;
}

static int defaultSizeInBytes(String translogUUID) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is used in the constructor so it's not really default, maybe just headerSizeInBytes?

// 0x00 => version 0 of the translog
final byte b1 = Channels.readFromFileChannel(channel, 0, 1)[0];
if (b1 == 0x3f) { // LUCENE_CODEC_HEADER_BYTE
throw new TranslogCorruptedException("translog looks like version 1 or later, but has corrupted header. path:" + path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify what you mean with version 1 or later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that the translog was created in ES 1.4 or later but corrupted. Moved this from

throw new TranslogCorruptedException("translog looks like version 1 or later, but has corrupted header. path:" + path);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K. Thanks

assert primaryTerm >= 0 : "Primary term must be non-negative [" + primaryTerm + "]; translog path [" + path + "]";
} else {
assert version == VERSION_CHECKPOINTS : "Unknown header version [" + version + "]";
primaryTerm = UNKNOWN_PRIMARY_TERM;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UNKOWN_PRIMARY_TERM is -1. I wonder if we should 0 as a default here. Primary terms are always non negative.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can use 0L as the default. I pushed d50d7ed

}
// Verify the checksum
if (version >= VERSION_PRIMARY_TERM) {
Translog.verifyChecksum(in);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@dnhatn dnhatn requested a review from s1monw April 11, 2018 14:07
@s1monw
Copy link
Contributor

s1monw commented Apr 11, 2018

@dnhatn this has many many unrelated changes I think some merge was messed up?

@s1monw
Copy link
Contributor

s1monw commented Apr 11, 2018

@dnhatn nervermind I think github messed something up when I check what changed

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/translog/Translog.java
#	server/src/test/java/org/elasticsearch/action/resync/ResyncReplicationRequestTests.java
#	server/src/test/java/org/elasticsearch/index/IndexModuleTests.java
#	server/src/test/java/org/elasticsearch/index/shard/IndexShardTests.java
#	server/src/test/java/org/elasticsearch/index/shard/IndexingOperationListenerTests.java
#	server/src/test/java/org/elasticsearch/index/translog/TranslogTests.java
#	server/src/test/java/org/elasticsearch/indices/recovery/RecoverySourceHandlerTests.java
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnhatn dnhatn requested a review from bleskes April 11, 2018 15:34
Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @dnhatn

// 0x00 => version 0 of the translog
final byte b1 = Channels.readFromFileChannel(channel, 0, 1)[0];
if (b1 == 0x3f) { // LUCENE_CODEC_HEADER_BYTE
throw new TranslogCorruptedException("translog looks like version 1 or later, but has corrupted header. path:" + path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K. Thanks

TranslogWriter.writeHeader(out, translogRef);
fc.force(true);
try (FileChannel fc = FileChannel.open(filename, StandardOpenOption.WRITE, StandardOpenOption.CREATE_NEW)) {
TranslogHeader header = new TranslogHeader(translogUUID, 0L);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use the DEFAULT_PRIMARY_TERM?

@dnhatn
Copy link
Member Author

dnhatn commented Apr 12, 2018

Thanks @s1monw and @bleskes for reviewing.

@dnhatn dnhatn merged commit f96e00b into elastic:master Apr 12, 2018
@dnhatn dnhatn deleted the add-term-tlog-header branch April 12, 2018 17:58
dnhatn added a commit that referenced this pull request Apr 13, 2018
This change adds the current primary term to the header of the current
translog file. Having a term in a translog header is a prerequisite step
that allows us to trim translog operations given the max valid seq# for
that term.

This commit also updates tests to conform the primary term invariant
which guarantees that all translog operations in a translog file have
its terms at most the term stored in the translog header.
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Apr 13, 2018
* master:
  Enable skipping fetching latest for BWC builds (elastic#29497)
  Add remote cluster client (elastic#29495)
  Ensure flush happens on shard idle
  Adds SpanGapQueryBuilder in the query DSL (elastic#28636)
  Control max size and count of warning headers (elastic#28427)
  Make index APIs work without types. (elastic#29479)
  Deprecate filtering on `_type`. (elastic#29468)
  Fix auto-generated ID example format (elastic#29461)
  Fix typo in max number of threads check docs (elastic#29469)
  Add primary term to translog header (elastic#29227)
  Add a helper method to get a random java.util.TimeZone (elastic#29487)
  Move TimeValue into elasticsearch-core project (elastic#29486)
martijnvg added a commit that referenced this pull request Apr 13, 2018
* es/master:
  Add remote cluster client (#29495)
  Ensure flush happens on shard idle
  Adds SpanGapQueryBuilder in the query DSL (#28636)
  Control max size and count of warning headers (#28427)
  Make index APIs work without types. (#29479)
  Deprecate filtering on `_type`. (#29468)
  Fix auto-generated ID example format (#29461)
  Fix typo in max number of threads check docs (#29469)
  Add primary term to translog header (#29227)
  Add a helper method to get a random java.util.TimeZone (#29487)
  Move TimeValue into elasticsearch-core project (#29486)
  Fix NPE in InternalGeoCentroidTests#testReduceRandom (#29481)
  Build: introduce keystoreFile for cluster config (#29491)
  test: Index more docs, so that it is less likely the search request does not time out.
martijnvg added a commit that referenced this pull request Apr 13, 2018
* es/6.x:
  Enable skipping fetching latest for BWC builds (#29497)
  Add remote cluster client (#29495)
  Ensure flush happens on shard idle
  Adds SpanGapQueryBuilder in the query DSL (#28636)
  Fix auto-generated ID example format (#29461)
  Fix typo in max number of threads check docs (#29469)
  Add primary term to translog header (#29227)
  Add a helper method to get a random java.util.TimeZone (#29487)
  Move TimeValue into elasticsearch-core project (#29486)
  Fix NPE in InternalGeoCentroidTests#testReduceRandom (#29481)
  Build: introduce keystoreFile for cluster config (#29491)
  test: Index more docs, so that it is less likely the search request does not time out.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v6.3.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants