Use `IndexInput#prefetch` for terms dictionary lookups. #13359

jpountz · 2024-05-11T09:46:37Z

This introduces TermsEnum#prepareSeekExact, which essentially calls IndexInput#prefetch at the right offset for the given term. Then it takes advantage of the fact that BooleanQuery already calls Weight#scorerSupplier on all clauses, before later calling ScorerSupplier#get on all clauses. So TermQuery now calls TermsEnum#prepareSeekExact on Weight#scorerSupplier (if scores are not needed), which in-turn means that the I/O all terms dictionary lookups get parallelized across all term queries of a BooleanQuery on a given segment (intra-segment parallelism).

This introduces `TermsEnum#prepareSeekExact`, which essentially calls `IndexInput#prefetch` at the right offset for the given term. Then it takes advantage of the fact that `BooleanQuery` already calls `Weight#scorerSupplier` on all clauses, before later calling `ScorerSupplier#get` on all clauses. So `TermQuery` now calls `TermsEnum#prepareSeekExact` on `Weight#scorerSupplier` (if scores are not needed), which in-turn means that the I/O all terms dictionary lookups get parallelized across all term queries of a `BooleanQuery` on a given segment (intra-segment parallelism).

jpountz · 2024-05-11T09:54:24Z

This is a draft as I need to do more work on tests and making sure that this new method cannot corrupt the state of the SegmentTermsEnum.

But I created a benchmark that starts looking like running a Lucene query that is encouraging. It creates an index with many terms that have very short postings lists, so that running boolean queries on these terms is heavy on terms dictionary lookups rather than reading postings. Then it manually runs a disjunction over 3 terms (some of these terms may not exist in the index as they are created randomly), computing how long it takes to evaluate all hits. To work properly when running a query, we'd need to move `#bulkScorer` from `Weight` to `ScorerSupplier`, which I intend to do as a follow-up.

import java.io.IOException;
import java.io.UncheckedIOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.StringField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TieredMergePolicy;
import org.apache.lucene.search.BooleanClause.Occur;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.DocIdSetIterator;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreMode;
import org.apache.lucene.search.Scorer;
import org.apache.lucene.search.ScorerSupplier;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.Weight;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class TermsEnumPrefetchBench {

  private static final int NUM_TERMS = 3;
  public static int DUMMY;
  
  public static void main(String[] args) throws Exception {
    Path dirPath = Paths.get(args[0]);
    Directory dir = FSDirectory.open(dirPath);
    if (DirectoryReader.indexExists(dir) == false) {
      TieredMergePolicy mp = new TieredMergePolicy();
      mp.setSegmentsPerTier(100);
      mp.setMaxMergeAtOnce(100);
      mp.setMaxMergedSegmentMB(1024);
      try (IndexWriter w = new IndexWriter(dir, new IndexWriterConfig()
          .setMergePolicy(mp)
          .setRAMBufferSizeMB(1024))) {
        ExecutorService executor = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
        AtomicLong indexed = new AtomicLong(0);
        for (int task = 0; task < 1000; ++task) {
          executor.execute(() -> {
            Random r = ThreadLocalRandom.current();
            for (int i = 0; i < 1_000; ++i) {
              Document doc = new Document();
              for (int j = 0; j < 10_000; ++j) {
                doc.add(new StringField("f", Long.toString(r.nextLong(20_000_000_000L)), Store.NO));
              }
              try {
                w.addDocument(doc);
              } catch (IOException e) {
                throw new UncheckedIOException(e);
              }
              final long actualIndexed = indexed.incrementAndGet(); 
              if (actualIndexed % 10_000 == 0) {
                System.out.println("Indexed: " + actualIndexed);
              }
            }
          });
        }

        executor.shutdown();
        executor.awaitTermination(1, TimeUnit.DAYS);
        w.commit();
        System.out.println("Start force merging");
        w.forceMerge(1);
        System.out.println("Done force merging");
        w.commit();
      }
    }
    List<Long> latencies = new ArrayList<>();
    try (IndexReader reader = DirectoryReader.open(dir)) {
      IndexSearcher searcher = new IndexSearcher(reader);

      Random r = ThreadLocalRandom.current();
      for (int i = 0; i < 10_000; ++i) {
        long start = System.nanoTime();
        BooleanQuery.Builder query = new BooleanQuery.Builder();
        for (int t = 0; t < NUM_TERMS; ++t) {
          query.add(new TermQuery(new Term("f", Long.toString(r.nextLong(20_000_000_000L)))), Occur.SHOULD);
        }
        Weight weight = searcher.createWeight(searcher.rewrite(query.build()), ScoreMode.COMPLETE_NO_SCORES, 1f);
        ScorerSupplier ss = weight.scorerSupplier(reader.leaves().get(0));
        if (ss != null) {
          Scorer scorer = ss.get(Long.MAX_VALUE);
          DocIdSetIterator iter = scorer.iterator();
          for (int d = iter.nextDoc(); d != DocIdSetIterator.NO_MORE_DOCS; d = iter.nextDoc()) {
            DUMMY++;
          }
          long end = System.nanoTime();
          latencies.add((end - start) / 1000);
        }
      }
    }
    latencies.sort(null);
    System.out.println("P50: " + latencies.get(latencies.size() / 2));
    System.out.println("P90: " + latencies.get(latencies.size() * 9 / 10));
    System.out.println("P99: " + latencies.get(latencies.size() * 99 / 100));
  }

}

Without the change:
P50: 286
P90: 403
P99: 532

With the change:
P50: 148
P90: 246
P99: 368

rmuir · 2024-05-11T23:42:49Z

lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java

+    final long fp = in.getFilePointer();
+    in.seek(fpSeek);
+    in.prefetch(1); // TODO: could we know the length of the block?
+    in.seek(fp); // TODO: do we actually need to do this?


I really don't like these calls to seek() just to prefetch data. Since it is just prefetching, I'd prefer if this "dance" was an impl detail, if needed.
It would make the code simpler to just pass parameter to prefetch rather than do this.

Then it is clear that the default implementation won't cause harm (unnecessary io) for any directory subclasses

So I think prefetch should take location as argument? It is just a hint and not real i/o by the thread. It's intentionally not sequential and sequential API for it only hurts.

I opened #13363.

jpountz · 2024-05-15T13:58:39Z

I iterated a bit on this change:

TermsEnum#prepareSeekExact is introduced, which only prefetches data which is later going to be needed by TermsEnum#seekExact.
TermStates#build no longer runs on the IndexSearcher threadpool, but in the current thread, leveraging TermsEnum#prepareSeekExact to parallelize I/O across all terms and segments.
TermQuery and SynonymQuery call TermsEnum#prepareSeekExact in Weight#scorerSupplier so that the I/O associated with terms dictionary lookups is parallelized across clauses of the same BooleanQuery.

rmuir

whew terms stuff is always heavy, i left one concern

rmuir · 2024-05-17T11:00:36Z

lucene/core/src/java/org/apache/lucene/index/TermsEnum.java

+   * #prepareSeekExact} on multiple terms enums before calling {@link #seekExact(BytesRef)} on the
+   * same {@link TermsEnum}s.
+   */
+  public void prepareSeekExact(BytesRef text) throws IOException {}


Can we look into subclasses such as FilterLeafReader.FilterTermsEnum to make sure this new method behaves correctly?

mikemccand · 2024-05-21T14:12:10Z

But I created a benchmark that starts looking like running a Lucene query that is encouraging

Was this with a forced-cold index?

jpountz · 2024-05-21T14:18:03Z

It creates a 50GB terms dictionary while my machine only has ~28GB of RAM for the page cache, so many terms dictionary lookups result in page faults.

mikemccand · 2024-05-22T09:29:04Z

lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java

+    // TODO: should we try to reuse the current state of this terms enum when applicable?
+    BytesRefFSTEnum<BytesRef> indexEnum = new BytesRefFSTEnum<>(fr.index);
+    InputOutput<BytesRef> output = indexEnum.seekFloor(target);
+    if (output != null) { // should never be null since we already checked against fr.getMin()?


Indeed -- maybe change to assert?

mikemccand · 2024-05-22T09:31:09Z

lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java

+                  output.output.bytes, output.output.offset, output.output.length));
+      final long fpSeek = code >>> Lucene90BlockTreeTermsReader.OUTPUT_FLAGS_NUM_BITS;
+      initIndexInput();
+      in.prefetch(fpSeek, 1); // TODO: could we know the length of the block?


I think you can do indexEnum.next() and if that is non-null (it will be null if you are on the very last block -- we could handle that case as well maybe) then get the fp for that next block and subtract the two?

I tried this but this doesn't work as expected, it sometimes gives me blocks that have prior offsets and I'm not intimate enough with this terms dictionary to understand why, maybe you do?

Egads, that is really weird -- I would not expect those fp to go backwards on .next() -- I thought the FST index was a depth-first traversal of all on-disk (leaf) blocks. I will need to mull some more about this :)

Maybe @vsop-479 has some insight?

Anyway, I don't think we need to hold up this nice PR for this -- we can try to improve this later. PnP!

it sometimes gives me blocks that have prior offsets

I would not expect those fp to go backwards on .next()

I haven't looked into the whole context.
But this will happens when we finished a Block and go back to read its parent block's term.

e.g.
We have terms like: "regular", "request1", "request2", "rest". Set minTermBlockSize to 2, maxTermBlockSize to 3. We will get blocks: b1: ["re"](root), b2: ["gular", "quest", "st"], b3: ["1", "2"].
Then as we call next, we will get the(term: fp) like:
"regular": b2,
"request1": b3,
"request2": b3,
"rest": b2.

But this will happens when we finished a Block and go back to read its parent block's term.

Yeah I agree that the rest term will be back in b2 block, but, the FSTEnum we are talking about is the in-memory terms index that holds the file pointer offset to the start of these blocks ... I would have expected the FST to have e.g. in your example:

reg -> b2

req -> b3

rez -> b4

Or so, with b4 > b3 > b2 block file pointers ...

OK! This was nagging at me so I dug into it, printing the FSTEnum iteration output on a nightly Lucene benchy index ... I now understand why the pointers indeed go backwards. I think this was the point you were making above @vsop-479 -- sorry I misunderstood at first ;)

It's because when writing the blocks we write "bottoms up" on depth first traversal through the terms, and only write a node when it is finished / returned from. Leaf blocks will be written immediately / in order since they are started, terms come out, finished. But for a non-leaf blocks, first all leaf blocks under them are written (in order), and THEN the non-leaf block is written only when we are done with all those recursions and writing any straggler terms that live in the non-leaf block.

But the prefixes are added to the index FST in the correct (term sorted) order. So this means the file pointer can indeed go backwards when iterating the terms index. I'll mull some more about whether we could (efficiently) know the term block length ...

Actually I was mistaken, I thought it was SegmentTermsEnum we are talking about -- sorry about that;)

But for a non-leaf blocks, first all leaf blocks under them are written (in order), and THEN the non-leaf block is written only when we are done with all those recursions and writing any straggler terms that live in the non-leaf block.

This means if we subtract the fp of a non-leaf block and its next, we will get its sub blocks' total length?

But for a non-leaf blocks, first all leaf blocks under them are written (in order), and THEN the non-leaf block is written only when we are done with all those recursions and writing any straggler terms that live in the non-leaf block.

This means if we subtract the fp of a non-leaf block and its next, we will get its sub blocks' total length?

It's tricky. I think if you do that, you'll get the total length of the next's sub blocks total length? Because each non-leaf block is written at the end of the recursive (depth first) visit of all of its sub blocks.

I'm still not sure how to cleanly/efficiently get the total bytes length of a leaf block by looking solely at the FST terms index. So we should proceed with the hint as is (pre-fetch 1 byte from position X) -- "typically" the terms block will fit into a single IO page (512 or 4096 bytes) and any further readahead the IO system does.

mikemccand

Thanks @jpountz -- this is a nice change -- two phased reads!

mikemccand · 2024-05-22T10:23:35Z

lucene/core/src/java/org/apache/lucene/index/CheckIndex.java

                BytesRef term;
                while ((term = termsEnum.next()) != null) {

                  // This is the term vectors:
                  postings = termsEnum.postings(postings, PostingsEnum.ALL);
                  assert postings != null;

+                  if ((seekExactCounter++ & 0xFF) == 0) {
+                    postingsTermsEnum.prepareSeekExact(term);


Nice -- this is to make sure we are exercising the API?

mikemccand · 2024-05-22T10:28:17Z

lucene/core/src/java/org/apache/lucene/index/TermsEnum.java

+   * #prepareSeekExact} on multiple terms enums before calling {@link #seekExact(BytesRef)} on the
+   * same {@link TermsEnum}s.
+   *
+   * <p><b>NOTE</b>: The terms enum is unpositioned after calling this method.


Hmm is it really that it is unpositioned, or, that this method does not alter the TermsEnum's positioned state?

I.e. if I position it to some term, then call this method, won't it still be positioned on that same (prior) term?

Or are we trying to reserve the future right in the API to break the positioning, even though this first impl preserves it?

Indeed, I'm trying to reserve the right to update the state of the terms enum through this API in the future. See also AssertingTermsEnum#prepareSeekExact.

mikemccand · 2024-05-22T10:29:56Z

lucene/core/src/java/org/apache/lucene/index/TermsEnum.java

+   *
+   * <p><b>NOTE</b>: It is not necessary to call this method before calling {@link
+   * #seekExact(BytesRef, TermState)}. {@link TermsEnum} implementations are expected to implement
+   * this method in an I/O-free fashion.


this method -> that method? Since I think you mean seekExact(BytesRef, TermState) when you say this method here (but the previous this method two lines up is referring to prepareSeekExact)? Pronouns are hard!

English as a whole is hard. :) I'll fix.

Hmm actually I mean prepareSeekExact when I say this method. I'll replace this method with prepareSeekExact to avoid ambiguity

Oh, sorry I was confused, you meant the second occurrence of "this method"!

LOL!! Merely communicating about an English sentence, in English, is ESPECIALLY HARD!!

mikemccand · 2024-05-22T10:32:03Z

lucene/core/src/java/org/apache/lucene/index/TermStates.java

   */
-  public TermState get(LeafReaderContext ctx) throws IOException {
+  public Supplier<TermState> get(LeafReaderContext ctx) throws IOException {


This sort of reminds me of two-phase commit, except at read-time not write-time: we now break up these IO heavy read APIs into two phases, now, where step 1 is the intention to get X soon (allowing prefetch to happen, especially concurrently not just in the background of the calling thread, but, across the N different Xs we want to retrieve). Step 2 is to then go and block on the IO to retrieve each of the N Xs. Two phased reads!

mikemccand · 2024-05-22T10:37:16Z

lucene/core/src/java/org/apache/lucene/search/BlendedTermQuery.java

@@ -19,6 +19,7 @@
 import java.io.IOException;
 import java.util.Arrays;
 import java.util.List;
+import java.util.function.Supplier;


Should we open a spinoff issue to maybe add prefetch to TermInSetQuery too?

Well, only TermQuery and SynonymQuery are handled so we could open an issue for every other query, I'm not sure we should open these issues? But indeed, let's think of how to take advantage of prefetching in PointRangeQuery, TermInSetQuery, FeatureQuery, etc.

mikemccand · 2024-05-22T10:43:23Z

lucene/core/src/java/org/apache/lucene/search/TermQuery.java

@@ -150,7 +170,12 @@ public Scorer get(long leadCost) throws IOException {

        @Override
        public long cost() {
-          return docFreq;
+          try {
+            TermsEnum te = getTermsEnum();


Hmm this getter got more costly. It's too bad TermState is so opaque -- under the hood it (BlockTermState) is already storing docFreq.

Note that we were already getting a TermsEnum to be able to get the cost before, it just happened before creating the ScorerSupplier. So the additional cost here is the if (termsEnum != null) check under getTermsEnum().

Agreed that it's a pity to pull a terms enum only to get a cost, which is already encapsulated in the term state. Though I don't expect it to be a major issue in practice.

This relates to #13359: we want to take advantage of the `Weight#scorerSupplier` call to start scheduling some I/O in the background in parallel across clauses. For this to work properly with top-level disjunctions, we need to move `#bulkScorer()` from `Weight` to `ScorerSupplier` as well, so that the disjunctive `BooleanQuery` first performs a call to `Weight#scorerSupplier()` on all inner clauses, and then `ScorerSupplier#bulkScorer` on all inner clauses. `ScorerSupplier#get` and `ScorerSupplier#bulkScorer` only support being called once. This forced me to fix some inefficiencies in `bulkScorer()` implementations when we would pull scorers and then throw it away when realizing that the strategy we were planning on using was not optimal. This is why e.g. `ReqExclBulkScorer` now also supports prohibited clauses that produce a two-phase iterator.

jpountz · 2024-05-27T08:11:31Z

Now that #13408 has been merged, I could update the benchmark to simply call IndexSearcher#search.

import java.io.IOException;
import java.io.UncheckedIOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.StringField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TieredMergePolicy;
import org.apache.lucene.search.BooleanClause.Occur;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Sort;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class TermsEnumPrefetchBench {

  private static final int NUM_TERMS = 3;
  public static int DUMMY;
  
  public static void main(String[] args) throws Exception {
    Path dirPath = Paths.get(args[0]);
    Directory dir = FSDirectory.open(dirPath);
    if (DirectoryReader.indexExists(dir) == false) {
      TieredMergePolicy mp = new TieredMergePolicy();
      mp.setSegmentsPerTier(100);
      mp.setMaxMergeAtOnce(100);
      mp.setMaxMergedSegmentMB(1024);
      try (IndexWriter w = new IndexWriter(dir, new IndexWriterConfig()
          .setMergePolicy(mp)
          .setRAMBufferSizeMB(1024))) {
        ExecutorService executor = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
        AtomicLong indexed = new AtomicLong(0);
        for (int task = 0; task < 1000; ++task) {
          executor.execute(() -> {
            Random r = ThreadLocalRandom.current();
            for (int i = 0; i < 1_000; ++i) {
              Document doc = new Document();
              for (int j = 0; j < 10_000; ++j) {
                doc.add(new StringField("f", Long.toString(r.nextLong(20_000_000_000L)), Store.NO));
              }
              try {
                w.addDocument(doc);
              } catch (IOException e) {
                throw new UncheckedIOException(e);
              }
              final long actualIndexed = indexed.incrementAndGet(); 
              if (actualIndexed % 10_000 == 0) {
                System.out.println("Indexed: " + actualIndexed);
              }
            }
          });
        }

        executor.shutdown();
        executor.awaitTermination(1, TimeUnit.DAYS);
        w.commit();
        System.out.println("Start force merging");
        w.forceMerge(1);
        System.out.println("Done force merging");
        w.commit();
      }
    }
    List<Long> latencies = new ArrayList<>();
    try (IndexReader reader = DirectoryReader.open(dir)) {
      IndexSearcher searcher = new IndexSearcher(reader);

      Random r = ThreadLocalRandom.current();
      for (int i = 0; i < 10_000; ++i) {
        long start = System.nanoTime();
        BooleanQuery.Builder query = new BooleanQuery.Builder();
        for (int t = 0; t < NUM_TERMS; ++t) {
          query.add(new TermQuery(new Term("f", Long.toString(r.nextLong(20_000_000_000L)))), Occur.SHOULD);
        }
        DUMMY += searcher.search(query.build(), 1, Sort.INDEXORDER).totalHits.value;
        long end = System.nanoTime();
        latencies.add((end - start) / 1000);
      }
    }
    latencies.sort(null);
    System.out.println("P50: " + latencies.get(latencies.size() / 2));
    System.out.println("P90: " + latencies.get(latencies.size() * 9 / 10));
    System.out.println("P99: " + latencies.get(latencies.size() * 99 / 100));
  }

}

Results still look good.

Before the change:
P50: 282
P90: 387
P99: 537

After the change:
P50: 161
P90: 253
P99: 379

github-actions · 2024-06-18T00:19:34Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

jpountz · 2024-06-18T17:11:30Z

I pushed a new approach. Instead of prepareSeekExact returning void, it now returns a Supplier and forbids calling any other method on TermsEnum until the Supplier has been consumed. There are two benefits:

Implementations can early exit when they can figure out that the term doesn't exist without doing any I/O by returning null. In turn, this saves creating scorer on other required clauses of the same query.
It is easier to split the work between the first and second phases without duplicating work.
However, there is one downside, which is that you can only do one background I/O request per TermsEnum instance, so you need multiple TermsEnum instances if you want more I/O concurrency than that. This looks like a better trade-off to me, it only makes things like PKLookup a bit more sophisticated if they want to do I/O concurrency.

The benchmark still reports similar numbers:

Without the change

P50: 307
P90: 423
P99: 585

With the change

P50: 162
P90: 258
P99: 405

jpountz · 2024-06-24T10:40:04Z

I will merge soon if there are no objections.

github-actions · 2024-07-09T00:19:19Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

…fetch

In apache/lucene#13359 a new "prepareSeekExact" method was added that can improve seeking on TermsEnum implementations. Two of our own subclasses of TermsEnum don't seem to support seeking for text, so we can safely throw an UOE there. The third (FilterableTermsEnum) changes to simple returning a Supplier for the actual "seek" method for now.

rmuir reviewed May 11, 2024

View reviewed changes

jpountz added 3 commits May 14, 2024 18:47

Merge branch 'main' into terms_dict_lookup_prefetch

127699c

Fix compilation.

082776b

remove nocommit

36ffdc5

jpountz mentioned this pull request May 15, 2024

Improve Lucene's I/O concurrency #13179

Closed

9 tasks

jpountz added 2 commits May 15, 2024 10:27

iter

062b054

SynonymQuery.

ea6ebc3

jpountz marked this pull request as ready for review May 15, 2024 13:55

rmuir requested changes May 17, 2024

View reviewed changes

jpountz added 3 commits May 17, 2024 13:47

Merge branch 'main' into terms_dict_lookup_prefetch

64c355f

iter

2639e1b

CheckIndex

373da0d

mikemccand reviewed May 22, 2024

View reviewed changes

jpountz added 3 commits May 22, 2024 14:37

Merge branch 'main' into terms_dict_lookup_prefetch

b0ab98f

Review feedback.

e6d19a7

Add thread safety check.

448472d

jpountz mentioned this pull request May 24, 2024

Move bulkScorer() from Weight to ScorerSupplier #13408

Merged

Merge branch 'main' into terms_dict_lookup_prefetch

2fcd76a

jpountz added this to the 10.0.0 milestone May 27, 2024

github-actions bot added the Stale label Jun 18, 2024

Merge branch 'main' into terms_dict_lookup_prefetch

4fb285c

jpountz added 2 commits June 18, 2024 09:44

Adjust to the removal of Weight from the Scorer ctor.

a2b9a4d

New approach.

c596da3

github-actions bot removed the Stale label Jun 19, 2024

github-actions bot added the Stale label Jul 9, 2024

Merge remote-tracking branch 'origin/main' into terms_dict_lookup_pre…

d9f0e2a

…fetch

jpountz merged commit 026d661 into apache:main Jul 10, 2024
3 checks passed

jpountz deleted the terms_dict_lookup_prefetch branch July 10, 2024 13:36

jpountz mentioned this pull request Jan 7, 2025

TestDefaultCodecParallelizesIO.testTermsSeekExact fails #14108

Open

Use IndexInput#prefetch for terms dictionary lookups. #13359

Use IndexInput#prefetch for terms dictionary lookups. #13359

Conversation

jpountz commented May 11, 2024

jpountz commented May 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpountz commented May 15, 2024

rmuir left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikemccand commented May 21, 2024

jpountz commented May 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vsop-479 May 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vsop-479 Jun 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikemccand left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikemccand May 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpountz commented May 27, 2024

github-actions bot commented Jun 18, 2024

jpountz commented Jun 18, 2024

jpountz commented Jun 24, 2024

github-actions bot commented Jul 9, 2024

Use `IndexInput#prefetch` for terms dictionary lookups. #13359

Use `IndexInput#prefetch` for terms dictionary lookups. #13359

vsop-479 May 23, 2024 •

edited

Loading

vsop-479 Jun 3, 2024 •

edited

Loading

mikemccand May 22, 2024 •

edited

Loading