Incremental index and lazy loading? #306

chrisspen · 2022-07-10T20:52:51Z

chrisspen
Jul 10, 2022

Hi, I'm looking for a solution to make about ~1 million text articles searchable on the client side. To save costs, I can't host a FTS server, and due to the number of documents, I can't load a large index in the client, nor even regenerate the index from scratch when it's updated on a daily basis.

Would Stork be a possible solution?

From the docs on building an index, it looks like it doesn't support incremental index creation, forcing you to rebuild the entire index after any minor data update. Is this correct?

Also, it looks like there's no support for splitting the index into smaller slices for lazy loading. i.e. If someone searches for "dog", it doesn't make sense to load the entire index containing all search terms. If should only load the parts of the index that include the keyword "dog". Does Stork also not support this functionality?

jameslittle230 · 2022-07-12T12:53:44Z

jameslittle230
Jul 12, 2022
Maintainer

Hi @chrisspen - thanks for asking.

I don't think Stork will work for you today. You're correct that indexes cannot be added to incrementally, and cannot be sliced/sharded/split.

I'm looking into solving for both of these use cases, so it's not out of the question that Stork would be the right solution for you in the future, but I don't think you'll get what you're looking for from Stork today.

Out of curiosity, if you're comfortable sharing, what are you working on that needs full-text search of 1M text articles?

Best, James

0 replies

chrisspen · 2022-07-13T03:45:12Z

chrisspen
Jul 13, 2022
Author

Thanks for the overview.

I have an application that generates transcripts for audio that I publish, and I'm looking for an inexpensive way too make them searchable online without use of a dedicated fts server. I'm currently using a very rudimentary distributed static index that caches keywords to transcript id, so it's not a proper tf-idf index, but it's better than nothing.

0 replies

whyboris · 2022-10-28T18:26:58Z

whyboris
Oct 28, 2022

I hope it's not rude to share an alternative (competitor ) product, but consider Pagefind - which breaks up the entire index into tiny chunks (~30k) that are swiftly downloaded as the user searches.

2 replies

jameslittle230 Oct 30, 2022
Maintainer

I just came across Pagefind the other day, and it looks like a really cool project that shares a lot of the same values as Stork! In Stork 2.0.0 I've been working on a similar chunking feature, but I'm excited that another project figured it out too.

chrisspen Nov 4, 2022
Author

I hope it's not rude to share an alternative (competitor ) product, but consider Pagefind - which breaks up the entire index into tiny chunks (~30k) that are swiftly downloaded as the user searches.

Definitely not rude, and much appreciated. Unfortunately, it doesn't look like Pagefind supports Gzip compression, so it can't see any of my html files. To use it, I'd have to first pre-decompress a hundred thousand files, and that's not going to happen...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental index and lazy loading? #306

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Incremental index and lazy loading? #306

chrisspen Jul 10, 2022

Replies: 3 comments · 2 replies

jameslittle230 Jul 12, 2022 Maintainer

chrisspen Jul 13, 2022 Author

whyboris Oct 28, 2022

jameslittle230 Oct 30, 2022 Maintainer

chrisspen Nov 4, 2022 Author

chrisspen
Jul 10, 2022

Replies: 3 comments 2 replies

jameslittle230
Jul 12, 2022
Maintainer

chrisspen
Jul 13, 2022
Author

whyboris
Oct 28, 2022

jameslittle230 Oct 30, 2022
Maintainer

chrisspen Nov 4, 2022
Author