Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce IOContext.LOAD #11930

Merged
merged 12 commits into from
Nov 15, 2022
Merged

Introduce IOContext.LOAD #11930

merged 12 commits into from
Nov 15, 2022

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Nov 14, 2022

This PR introduces a new LOAD IOContext for files that are a small fraction of the total index size and are expected to be accessed in a random-access fashion. This may be leveraged by some Directory implementations to load such files into physical memory (e.g. Java heap) in order to provide stronger guarantees on query latency.

The default codec has a number of small and hot files, that actually used to be
fully loaded in memory before we moved them off-heap. In the general case,
these files are expected to fully fit into the page cache for things to work
well. Should we give control over preloading to codecs? This is what this
commit does for the following files:
 - Terms index (`tip`)
 - Points index (`kdi`)
 - Stored fields index (`fdx`)
 - Terms vector index (`tvx`)

This only has an effect on `MMapDirectory`.
Copy link
Contributor

@uschindler uschindler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine.

Maybe we should still add the the BiPredicate<String,IOContext> constant to MMapDirectory in this PR (but not eanble it by default). This would require that #11929 is merged first.

* choose to load such files into physical memory (e.g. Java heap) as a way to provide stronger
* guarantees on query latency.
*/
public final boolean load;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description looks good.

@@ -82,7 +83,7 @@ final class Lucene90DocValuesProducer extends DocValuesProducer {
merging = false;

// read in the entries from the metadata file.
try (ChecksumIndexInput in = state.directory.openChecksumInput(metaName, state.context)) {
try (ChecksumIndexInput in = state.directory.openChecksumInput(metaName, IOContext.READONCE)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about this earlier, but it is really nice you fixed this,too!

@uschindler
Copy link
Contributor

As we started over here: My suggestion would be to let this go in without changing the default in MMapDirectory.

If we want to change the default we can make a followup PR on MMapDirectory. But at moment due to disagreement if preload is enough, we should just add the option to enable this in MMapDir. If we really want to stick the small files and load them to heap, maybe a NRTCachingDirectory like approach would be best.

@jpountz
Copy link
Contributor Author

jpountz commented Nov 15, 2022

Agreed Uwe.

@jpountz jpountz merged commit 729dc2b into apache:main Nov 15, 2022
@jpountz jpountz deleted the add_iocontext_load branch November 15, 2022 12:59
jpountz added a commit that referenced this pull request Nov 15, 2022
The default codec has a number of small and hot files, that actually used to be
fully loaded in memory before we moved them off-heap. In the general case,
these files are expected to fully fit into the page cache for things to work
well. Should we give control over preloading to codecs? This is what this
commit does for the following files:
 - Terms index (`tip`)
 - Points index (`kdi`)
 - Stored fields index (`fdx`)
 - Terms vector index (`tvx`)

This only has an effect on `MMapDirectory`.
@uschindler
Copy link
Contributor

I think a CHANGES.txt entry would be good.

jpountz added a commit that referenced this pull request Nov 15, 2022
jpountz added a commit that referenced this pull request Nov 15, 2022
@rmuir rmuir added this to the 9.5.0 milestone Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants