Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move anonymous Weight implementation in PointRangeQuery to named class #13711

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jainankitk
Copy link
Contributor

Description

Moves the anonymous Weight implementation in PointRangeQuery#createWeight to named class for better extensibility and resusability.

Signed-off-by: Ankit Jain <[email protected]>
@jpountz
Copy link
Contributor

jpountz commented Sep 3, 2024

I see from the linked issue that you would like to extend PointRangeQuery, but in general we don't like to think of our queries as being extensible. I wonder if you could do what you need through composition rather than extension?

@jainankitk
Copy link
Contributor Author

I see from the linked issue that you would like to extend PointRangeQuery, but in general we don't like to think of our queries as being extensible. I wonder if you could do what you need through composition rather than extension?

@jpountz - Thanks for your feedback. While I agree with using composition rather than extension for PointRangeQuery itself, Weight implementation itself within PointRangeQuery will be more reusable (via composition instead of extension) by being a non-anonymous class.

@jpountz
Copy link
Contributor

jpountz commented Sep 3, 2024

Sorry, I don't think we should make Lucene's Weight implementations public.

I looked up the OpenSearch issue, if I understand correctly, the problem you're trying to solve is that it's wasteful for PointRangeQuery to evaluate the whole range when it's only asked for the first 10 doc IDs that match the range query. I agree it's wasteful. We have the same problem on nightly benchmarks and the IntNRQ task. I wonder if there are better ways to do what you're after, e.g. adding a TopDocs Weight#topk(int n, int totalHitsThreshold) API that would default to collecting hits, and that some classes such as PointRangeQuery could override. As I'm writing this, I'm not convinced that it's actually a good idea. Using sparse indexing would likely be a better approach, especially if the index can be sorted, as this would produce good iterators that don't have this huge up-front cost of evaluating the query against the entire segment.

Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants