-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-9662: CheckIndex should be concurrent - parallelizing index check across segments #128
Changes from 4 commits
bbd0250
1ac1305
9679cea
e87d891
2009c70
ed04964
b7f3c0f
22c87c2
99cea46
c5c1e56
6ab669c
a97de22
9883192
1bc1f8a
dadfe80
57f542f
71b5279
cad119e
9c88f2f
8f949dc
51bcaa2
b39a856
b892b21
09ae809
138b72e
6cd6a46
d1b19ea
bae7e7a
70dc71c
817c050
2ef3e8d
034f209
c7e3d4d
68fe4f5
afea6a7
6400304
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,6 +31,13 @@ | |
import java.util.List; | ||
import java.util.Locale; | ||
import java.util.Map; | ||
import java.util.concurrent.Callable; | ||
import java.util.concurrent.CompletableFuture; | ||
import java.util.concurrent.CompletionException; | ||
import java.util.concurrent.ExecutorService; | ||
import java.util.concurrent.Executors; | ||
import java.util.concurrent.TimeUnit; | ||
import java.util.function.Supplier; | ||
import org.apache.lucene.codecs.Codec; | ||
import org.apache.lucene.codecs.DocValuesProducer; | ||
import org.apache.lucene.codecs.NormsProducer; | ||
|
@@ -60,6 +67,7 @@ | |
import org.apache.lucene.util.FixedBitSet; | ||
import org.apache.lucene.util.IOUtils; | ||
import org.apache.lucene.util.LongBitSet; | ||
import org.apache.lucene.util.NamedThreadFactory; | ||
import org.apache.lucene.util.StringHelper; | ||
import org.apache.lucene.util.SuppressForbidden; | ||
import org.apache.lucene.util.Version; | ||
|
@@ -450,6 +458,13 @@ public void setChecksumsOnly(boolean v) { | |
|
||
private boolean checksumsOnly; | ||
|
||
/** Set threadCount used for parallelizing index integrity checking. */ | ||
public void setThreadCount(int tc) { | ||
threadCount = tc; | ||
} | ||
|
||
private int threadCount = Runtime.getRuntime().availableProcessors(); | ||
|
||
/** | ||
* Set infoStream where messages should go. If null, no messages are printed. If verbose is true | ||
* then more details are printed. | ||
|
@@ -488,8 +503,35 @@ public Status checkIndex() throws IOException { | |
* quite a long time to run. | ||
*/ | ||
public Status checkIndex(List<String> onlySegments) throws IOException { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please don't overload it here. force callers to pass executor. I am still worried that tests are using multiple threads here which must not happen: on an 8 core machine we don't want to use 8 jvms * 8 threads each. I am also concerned about newly-created synchronization issues here (e.g. with output). If checkindex fails it is really important that we can read this output. All these changes leave my concerned that this is the right way to go. At the minimal, if we are going to make these changes to CheckIndex, then it needs to be done in a more cautious/defensive way (e.g. default threadcount to 1 instead of numcpus). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I took a quick look at these methods' current usage.
Oh I didn't realize it would spawn multiple jvm processes, or are you suggesting that if this gets run in multiple jvm processes for some reasons then the overall overhead will be high (which I agree!) ? Shall we do something like
Yeah definitely agree with this. For the output messages synchronization issue, I put in some nocommit comment earlier and proposed to also include segment / part id in these messages as a cheap way to allow manual sorting / grep-ing after the messages were written, to essentially work around this issue. Not sure if this will be good enough? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe post the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is just how Lucene's test infrastructure runs tests -- it spawns multiple JVMs, each of which is running one Lucene test at a time, but that test may use (often uses?) its own threads, including here in |
||
ExecutorService executorService = | ||
Executors.newFixedThreadPool(threadCount, new NamedThreadFactory("async-check-index")); | ||
try { | ||
return checkIndex(onlySegments, executorService); | ||
} finally { | ||
executorService.shutdown(); | ||
try { | ||
executorService.awaitTermination(5, TimeUnit.SECONDS); | ||
} catch ( | ||
@SuppressWarnings("unused") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why ignore? Seems like something went wrong there - log it at least? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah agreed. Updated. |
||
InterruptedException e) { | ||
} finally { | ||
executorService.shutdownNow(); | ||
} | ||
} | ||
} | ||
|
||
/** | ||
* Returns a {@link Status} instance detailing the state of the index. | ||
* | ||
* <p>This method allows caller to pass in customized ExecutorService to speed up the check. | ||
* | ||
* <p><b>WARNING</b>: make sure you only call this when the index is not opened by any writer. | ||
*/ | ||
public Status checkIndex(List<String> onlySegments, ExecutorService executorService) | ||
throws IOException { | ||
ensureOpen(); | ||
long startNS = System.nanoTime(); | ||
|
||
NumberFormat nf = NumberFormat.getInstance(Locale.ROOT); | ||
SegmentInfos sis = null; | ||
Status result = new Status(); | ||
|
@@ -605,6 +647,11 @@ public Status checkIndex(List<String> onlySegments) throws IOException { | |
result.newSegments.clear(); | ||
result.maxSegmentName = -1; | ||
|
||
// nocommit the msg statements with infoStream inside this loop (as well as inside each | ||
// index part checking methods such as testLiveDocs) may be | ||
// interleaved when this section of code run concurrently. Is it ok to not synchronize for | ||
// these msg statements and instead have each msg content to include segment id information for | ||
// identification? | ||
for (int i = 0; i < numSegments; i++) { | ||
final SegmentCommitInfo info = sis.info(i); | ||
long segmentName = Long.parseLong(info.info.name.substring(1), Character.MAX_RADIX); | ||
|
@@ -639,6 +686,8 @@ public Status checkIndex(List<String> onlySegments) throws IOException { | |
SegmentReader reader = null; | ||
|
||
try { | ||
// nocommit these msg statements may require synchronization if printed without segment | ||
// identifier | ||
msg(infoStream, " version=" + (version == null ? "3.0" : version)); | ||
msg(infoStream, " id=" + StringHelper.idToString(info.info.getId())); | ||
final Codec codec = info.info.getCodec(); | ||
|
@@ -731,40 +780,170 @@ public Status checkIndex(List<String> onlySegments) throws IOException { | |
} | ||
|
||
if (checksumsOnly == false) { | ||
// This redundant assignment is done to make compiler happy | ||
SegmentReader finalReader = reader; | ||
|
||
// Test Livedocs | ||
segInfoStat.liveDocStatus = testLiveDocs(reader, infoStream, failFast); | ||
CompletableFuture<Void> testliveDocs = | ||
CompletableFuture.supplyAsync( | ||
callableToSupplier(() -> testLiveDocs(finalReader, infoStream, failFast)), | ||
executorService) | ||
.thenAccept( | ||
liveDocStatus -> { | ||
segInfoStat.liveDocStatus = liveDocStatus; | ||
}); | ||
|
||
// Test Fieldinfos | ||
segInfoStat.fieldInfoStatus = testFieldInfos(reader, infoStream, failFast); | ||
CompletableFuture<Void> testFieldInfos = | ||
CompletableFuture.supplyAsync( | ||
callableToSupplier(() -> testFieldInfos(finalReader, infoStream, failFast)), | ||
executorService) | ||
.thenAccept( | ||
fieldInfoStatus -> { | ||
segInfoStat.fieldInfoStatus = fieldInfoStatus; | ||
}); | ||
|
||
// Test Field Norms | ||
segInfoStat.fieldNormStatus = testFieldNorms(reader, infoStream, failFast); | ||
CompletableFuture<Void> testFieldNorms = | ||
CompletableFuture.supplyAsync( | ||
callableToSupplier(() -> testFieldNorms(finalReader, infoStream, failFast)), | ||
executorService) | ||
.thenAccept( | ||
fieldNormStatus -> { | ||
segInfoStat.fieldNormStatus = fieldNormStatus; | ||
}); | ||
|
||
// Test the Term Index | ||
segInfoStat.termIndexStatus = | ||
testPostings(reader, infoStream, verbose, doSlowChecks, failFast); | ||
CompletableFuture<Void> testTermIndex = | ||
CompletableFuture.supplyAsync( | ||
() -> { | ||
try { | ||
return testPostings( | ||
finalReader, infoStream, verbose, doSlowChecks, failFast); | ||
} catch (IOException e) { | ||
throw new CompletionException(e); | ||
} | ||
}, | ||
executorService) | ||
.thenAccept( | ||
termIndexStatus -> { | ||
segInfoStat.termIndexStatus = termIndexStatus; | ||
}); | ||
|
||
// Test Stored Fields | ||
segInfoStat.storedFieldStatus = testStoredFields(reader, infoStream, failFast); | ||
CompletableFuture<Void> testStoredFields = | ||
CompletableFuture.supplyAsync( | ||
() -> { | ||
try { | ||
return testStoredFields(finalReader, infoStream, failFast); | ||
} catch (IOException e) { | ||
throw new CompletionException(e); | ||
} | ||
}, | ||
executorService) | ||
.thenAccept( | ||
storedFieldStatus -> { | ||
segInfoStat.storedFieldStatus = storedFieldStatus; | ||
}); | ||
|
||
// Test Term Vectors | ||
segInfoStat.termVectorStatus = | ||
testTermVectors(reader, infoStream, verbose, doSlowChecks, failFast); | ||
CompletableFuture<Void> testTermVectors = | ||
CompletableFuture.supplyAsync( | ||
() -> { | ||
try { | ||
return testTermVectors( | ||
finalReader, infoStream, verbose, doSlowChecks, failFast); | ||
} catch (IOException e) { | ||
throw new CompletionException(e); | ||
} | ||
}, | ||
executorService) | ||
.thenAccept( | ||
termVectorStatus -> { | ||
segInfoStat.termVectorStatus = termVectorStatus; | ||
}); | ||
|
||
// Test Docvalues | ||
segInfoStat.docValuesStatus = testDocValues(reader, infoStream, failFast); | ||
CompletableFuture<Void> testDocValues = | ||
CompletableFuture.supplyAsync( | ||
() -> { | ||
try { | ||
return testDocValues(finalReader, infoStream, failFast); | ||
} catch (IOException e) { | ||
throw new CompletionException(e); | ||
} | ||
}, | ||
executorService) | ||
.thenAccept( | ||
docValuesStatus -> { | ||
segInfoStat.docValuesStatus = docValuesStatus; | ||
}); | ||
|
||
// Test PointValues | ||
segInfoStat.pointsStatus = testPoints(reader, infoStream, failFast); | ||
CompletableFuture<Void> testPointvalues = | ||
CompletableFuture.supplyAsync( | ||
() -> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe add a utility method that accepts a callable and wraps in completion exception? This is a very repetitive block. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also feel this is quite repetitive and cumbersome to look at. I gave it some more tries but it seems to be a bit difficult actually. The main issue here is the
Maybe some refactoring in those For the code in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a commit that shows what I meant (and replaces three calls). Sorry for not replacing all of them. Maybe it can be made even less verbose if you fold all those blocks into a single function that accepts executor, first callable, follow-up callable and just returns completable future from the composition of these. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah ok I see. I think I encountered the issue earlier when I tried to use |
||
try { | ||
return testPoints(finalReader, infoStream, failFast); | ||
} catch (IOException e) { | ||
throw new CompletionException(e); | ||
} | ||
}, | ||
executorService) | ||
.thenAccept( | ||
pointsStatus -> { | ||
segInfoStat.pointsStatus = pointsStatus; | ||
}); | ||
|
||
// Test VectorValues | ||
segInfoStat.vectorValuesStatus = testVectors(reader, infoStream, failFast); | ||
CompletableFuture<Void> testVectors = | ||
CompletableFuture.supplyAsync( | ||
() -> { | ||
try { | ||
return testVectors(finalReader, infoStream, failFast); | ||
} catch (IOException e) { | ||
throw new CompletionException(e); | ||
} | ||
}, | ||
executorService) | ||
.thenAccept( | ||
vectorValuesStatus -> { | ||
segInfoStat.vectorValuesStatus = vectorValuesStatus; | ||
}); | ||
|
||
// Test index sort | ||
segInfoStat.indexSortStatus = testSort(reader, indexSort, infoStream, failFast); | ||
CompletableFuture<Void> testSort = | ||
CompletableFuture.supplyAsync( | ||
() -> { | ||
try { | ||
return testSort(finalReader, indexSort, infoStream, failFast); | ||
} catch (IOException e) { | ||
throw new CompletionException(e); | ||
} | ||
}, | ||
executorService) | ||
.thenAccept( | ||
indexSortStatus -> { | ||
segInfoStat.indexSortStatus = indexSortStatus; | ||
}); | ||
|
||
// Rethrow the first exception we encountered | ||
// This will cause stats for failed segments to be incremented properly | ||
// nocommit The error != null check ordering below requires sequencing the above async | ||
// calls. | ||
// Does the order really matter here or can be done differently? | ||
CompletableFuture.allOf( | ||
testliveDocs, | ||
testFieldInfos, | ||
testFieldNorms, | ||
testTermIndex, | ||
testStoredFields, | ||
testTermVectors, | ||
testDocValues, | ||
testPointvalues, | ||
testVectors, | ||
testSort) | ||
.join(); | ||
if (segInfoStat.liveDocStatus.error != null) { | ||
throw new RuntimeException("Live docs test failed"); | ||
} else if (segInfoStat.fieldInfoStatus.error != null) { | ||
|
@@ -783,6 +962,8 @@ public Status checkIndex(List<String> onlySegments) throws IOException { | |
throw new RuntimeException("Points test failed"); | ||
} | ||
} | ||
|
||
// nocommit parallelize this as well? | ||
final String softDeletesField = reader.getFieldInfos().getSoftDeletesField(); | ||
if (softDeletesField != null) { | ||
checkSoftDeletes(softDeletesField, info, reader, infoStream, failFast); | ||
|
@@ -843,6 +1024,18 @@ public Status checkIndex(List<String> onlySegments) throws IOException { | |
return result; | ||
} | ||
|
||
private <T> Supplier<T> callableToSupplier(Callable<T> callable) { | ||
return () -> { | ||
try { | ||
return callable.call(); | ||
} catch (RuntimeException | Error e) { | ||
throw e; | ||
} catch (Throwable e) { | ||
throw new CompletionException(e); | ||
} | ||
}; | ||
} | ||
|
||
/** | ||
* Tests index sort order. | ||
* | ||
|
@@ -931,6 +1124,8 @@ public static Status.LiveDocStatus testLiveDocs( | |
final Status.LiveDocStatus status = new Status.LiveDocStatus(); | ||
|
||
try { | ||
// nocommit these msg statements may require synchronization if printed without segment | ||
// identifier | ||
if (infoStream != null) infoStream.print(" test: check live docs....."); | ||
final int numDocs = reader.numDocs(); | ||
if (reader.hasDeletions()) { | ||
|
@@ -969,6 +1164,8 @@ public static Status.LiveDocStatus testLiveDocs( | |
} | ||
} | ||
} | ||
// nocommit these msg statements may require synchronization if printed without segment | ||
// identifier | ||
msg( | ||
infoStream, | ||
String.format( | ||
|
@@ -3622,6 +3819,7 @@ public static class Options { | |
boolean doSlowChecks = false; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [Pre-existing] We could remove all these There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've created a spin-off issue available here https://issues.apache.org/jira/browse/LUCENE-10074. |
||
boolean verbose = false; | ||
boolean doChecksumsOnly = false; | ||
int threadCount; | ||
List<String> onlySegments = new ArrayList<>(); | ||
String indexPath = null; | ||
String dirImpl = null; | ||
|
@@ -3720,6 +3918,12 @@ public static Options parseOptions(String[] args) { | |
} | ||
i++; | ||
opts.dirImpl = args[i]; | ||
} else if ("-threadCount".equals(arg)) { | ||
if (i == args.length - 1) { | ||
throw new IllegalArgumentException("-threadCount requires a following number"); | ||
} | ||
i++; | ||
opts.threadCount = Integer.parseInt(args[i]); | ||
} else { | ||
if (opts.indexPath != null) { | ||
throw new IllegalArgumentException("ERROR: unexpected extra argument '" + args[i] + "'"); | ||
|
@@ -3787,8 +3991,10 @@ public int doCheck(Options opts) throws IOException, InterruptedException { | |
setDoSlowChecks(opts.doSlowChecks); | ||
setChecksumsOnly(opts.doChecksumsOnly); | ||
setInfoStream(opts.out, opts.verbose); | ||
setThreadCount(opts.threadCount); | ||
|
||
Status result = checkIndex(opts.onlySegments); | ||
|
||
if (result.missingSegments) { | ||
return 1; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe validate the argument? It must be >= 1 at least?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.