-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do you speed up a "ORDER BY rank" FTS5 query? #10
Comments
So assuming you ran both You can play with some indexing options such as adding stemming with I've hit the same problem, and the issue is that without ordering by rank, SQLite will just return the first matching rows it finds, while with ranking it has to look at all of them (so it should be less worse if your query is more specific). Then it's made worse because the SQLite ranking algorithm uses the number of occurrences of a term in a document to do the ranking (like most FTS search), but as far as I understand from the documentation doesn't actually store that information ( You can't prevent it from storing all that position information in FTS5 without losing the ranking ability altogether (using I'm trying to make tantivy work in wasm the same way as SQLite right now which should be much more efficient for FTS. |
Thank you for the thoughtful followup. I did insert I tried to workaround the performance issue in another FTS5 table called I'll try using "trigram" on the Side note, have you seen other search-in-a-box solutions like https://github.com/typesense/typesense or https://github.com/meilisearch/MeiliSearch? They might interesting to look at if tantivy turns out to be a bit difficult. |
I would absolutely love this. I'm about to abuse sql.js-httpvfs to add search to my gh-pages hosted log archive, and I would be delighted to switch that out for tantivy. Anywhere I can follow along and/or donate to help out? Edit: I see there's https://github.com/phiresky/tantivy-wasm, which unfortunately has not quite enough information for me to get it working, but I'll keep poking at it. |
News about that will probably appear here: quickwit-oss/tantivy#1067 It already works and doesn't require much special setup (running |
Do you still see tantivy as the future for this type of feature? It doesn't look like you've touched it in over a year. Did tantivy end up not being as fast or reliable as sqlite? |
SQLite now official supports WASM, so may be, it might be much more efficient than sql.js. You can read more about it here and here |
I've switched to pagefind. Much simpler and it supports everything sql.js aims to someday support. |
Thank you for writing this library. It's really neat!
I'm currently trying to debug a performance issue with querying a FTS5 virtual table. This table is populated with 6 million rows of articles that have title and text columns. The query I'm using on this table looks like:
select * from articles where articles match ? order by rank LIMIT 7
Right now, when that query is executed by the sql.js-httpvfs worker, it fires a continuous stream of slew network requests, taking 20 seconds plus to download tens of MB of data.
I found that removing the "order by rank" clause reduces the latency of the query to a few seconds and the size of the query to less than 1 MB of data. Unfortunately, the results are also much less useful; they're returned in alphabetical order and not in order of relevancy.
Here are the query plan comparisons between the two:
What can I do to speed this kind of query up without losing the relevancy that "ORDER BY rank" provides?
The text was updated successfully, but these errors were encountered: