-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use IndexInput#prefetch
for terms dictionary lookups.
#13359
Changes from 6 commits
e880d88
127699c
082776b
36ffdc5
062b054
ea6ebc3
64c355f
2639e1b
373da0d
b0ab98f
e6d19a7
448472d
2fcd76a
4fb285c
a2b9a4d
c596da3
d9f0e2a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,7 @@ | |
package org.apache.lucene.index; | ||
|
||
import java.io.IOException; | ||
import org.apache.lucene.store.IndexInput; | ||
import org.apache.lucene.util.AttributeSource; | ||
import org.apache.lucene.util.BytesRef; | ||
import org.apache.lucene.util.BytesRefIterator; | ||
|
@@ -61,6 +62,15 @@ public enum SeekStatus { | |
*/ | ||
public abstract boolean seekExact(BytesRef text) throws IOException; | ||
|
||
/** | ||
* Prepare a future call to {@link #seekExact}. This typically calls {@link IndexInput#prefetch} | ||
* on the right range of bytes under the hood so that the next call to {@link #seekExact} is | ||
* faster. This can be used to parallelize I/O across multiple terms by calling {@link | ||
* #prepareSeekExact} on multiple terms enums before calling {@link #seekExact(BytesRef)} on the | ||
* same {@link TermsEnum}s. | ||
*/ | ||
public void prepareSeekExact(BytesRef text) throws IOException {} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we look into subclasses such as |
||
|
||
/** | ||
* Seeks to the specified term, if it exists, or to the next (ceiling) term. Returns SeekStatus to | ||
* indicate whether exact term was found, a different term was found, or EOF was hit. The target | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,7 @@ | |
import java.io.IOException; | ||
import java.util.Arrays; | ||
import java.util.List; | ||
import java.util.function.Supplier; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we open a spinoff issue to maybe add prefetch to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, only |
||
import org.apache.lucene.index.IndexReaderContext; | ||
import org.apache.lucene.index.LeafReaderContext; | ||
import org.apache.lucene.index.Term; | ||
|
@@ -316,7 +317,11 @@ private static TermStates adjustFrequencies( | |
List<LeafReaderContext> leaves = readerContext.leaves(); | ||
TermStates newCtx = new TermStates(readerContext); | ||
for (int i = 0; i < leaves.size(); ++i) { | ||
TermState termState = ctx.get(leaves.get(i)); | ||
Supplier<TermState> supplier = ctx.get(leaves.get(i)); | ||
if (supplier == null) { | ||
continue; | ||
} | ||
TermState termState = supplier.get(); | ||
if (termState == null) { | ||
continue; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sort of reminds me of two-phase commit, except at read-time not write-time: we now break up these IO heavy read APIs into two phases, now, where step 1 is the intention to get X soon (allowing prefetch to happen, especially concurrently not just in the background of the calling thread, but, across the N different Xs we want to retrieve). Step 2 is to then go and block on the IO to retrieve each of the N Xs. Two phased reads!