Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lucene & limit #123

Closed
silvanheller opened this issue May 17, 2022 · 7 comments
Closed

Lucene & limit #123

silvanheller opened this issue May 17, 2022 · 7 comments
Assignees

Comments

@silvanheller
Copy link
Member

Lucene currently does not propagate a limit to the underlying IndexSearcher, see:

private val results = this.searcher.search(this.query, Integer.MAX_VALUE)

This was never noticed because Cineast did not tell cottontail, see: vitrivr/cineast#312

@ppanopticon
Copy link
Member

ppanopticon commented May 17, 2022

This is intended because a LIMIT expressed in a query to Cottontail DB cannot always be pushed-down to Lucene, e.g., if we use Lucene to satisfy a predicate and then sort the result based on another column that the index does not cover.

The question is: Is the LIMIT respected or is it ignored completely?

@silvanheller
Copy link
Member Author

Limit is respected but not pushed down to the Lucene index, even for simple queries where it would make sense

@lucaro
Copy link
Member

lucaro commented May 17, 2022

So this means the result of the query as a whole is correct, but lucene returns unnecessarily many results at an intermediate stage which have to be discarded afterward?

@silvanheller
Copy link
Member Author

Yes, this is visible in the logs:

2022-05-17 15:45:44 [DefaultDispatcher-worker-2] DEBUG TransactionalGrpcService:114 - [3414, q-cin-ft-rows-4f8] Preparation of Select completed successfully in 32.095876ms.
2022-05-17 15:45:52 [cottontaildb-query-worker-4] DEBUG IndexScanOperator:70 - Read 428892 entries from warren.cineast.features_audiotranscription.idx_feature_lucene.
2022-05-17 15:45:52 [DefaultDispatcher-worker-2] INFO  TransactionalGrpcService:152 - [3414, q-cin-ft-rows-4f8] Execution of Select completed successfully in 8.392559588s.

@ppanopticon
Copy link
Member

So this is a performance issue I take it? I'll have a look!

@ppanopticon
Copy link
Member

I have made some changes on master. Those do not address the raised issue directly but still iron-out a few glitches, which I suspect to be the culprit.

Could you quickly test, if there is a performance improvement using that version?

@silvanheller
Copy link
Member Author

Much faster, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants