Replies: 6 comments 2 replies
-
Follow up... This test site is (obviously) served by GitHub Pages, which serves files with the So, in the GitHub action I added the This reduces the download from 6.0MB to 2.3MB; we are definitely moving in the right direction. But the the Lighthouse (mobile) report remains essentially the same: Brotli can compress this further, to about 1.6MB, but that's not a configurable option for GitHub Pages. My next step is a Netlify test. It would be convenient if the default index extension corresponded to a media type that is compressed by default on Apache, NGINX, etc. |
Beta Was this translation helpful? Give feedback.
-
Thanks for this investigation, and I'm excited to see how you're using Stork! I'll pick out pieces and respond individually:
Definitely something I want to look at - it's been on my list for a while. I think this change would reduce index size by about 20%.
This will provide the most meaningful size reduction, since much of an index file is made up of the mapping between words and results. I'll play with some different ways to reduce the number of words indexed in the output file -- I'm excited by some of the ideas in the listed issue but want to think more about how they'd affect the configuration API.
Definitely something I want to work on more. At minimum, the indexer should gzip the bag of bytes before saving the file, and the WASM module should unzip it upon registration. Getting the server & browser to do this automatically by saving the file as |
Beta Was this translation helpful? Give feedback.
-
Update: the test site is now available on both GitHub Pages and Netlify. Although Brotli is capable of reducing the index file from 6.0 MB to 1.5 MB, Netlify's usage of Brotli is less aggressive, producing a 2.1 MB file. That's only a 10% improvement when compared to gzip compression. I've opened a related topic on the Netlify forum: https://answers.netlify.com/t/serving-pre-compressed-brotli-files/53515 The Lighthouse (mobile) report for the site served by Netlify is essentially the same as what I am seeing for the site served by GitHub, so I'm not going to post the results. No need to respond to this; I just wanted to provide an update. |
Beta Was this translation helpful? Give feedback.
-
I was curious if splitting a large file into chunks would improve performance (download in parallel), so I ran some rudimentary tests. Short answer: splitting a large file into chunks appears to hurt rather than help. Details here: |
Beta Was this translation helpful? Give feedback.
-
For the test site referenced above, v1.4.1 and v1.4.2 produce different index files (as expected), though the file size is identical.
@jameslittle230 Does this make sense to you? But the index produced by v1.4.2 has better potential for compression:
That's a 14% reduction with gzip compression. The reduction with Brotli compression is comparable. |
Beta Was this translation helpful? Give feedback.
-
I think it would make sense to add the size of the gzipped index to the benchmarks. |
Beta Was this translation helpful? Give feedback.
-
I recognize that I am pressing the limits, but I wanted to understand where Stork is a good fit today, and where it might be a good fit tomorrow. My starting point was, "The client publishes one short article per week. Where will they be in 5 years? Let's double that and test Stork."
Live site: https://jmooring.github.io/hugo-stork
Source: https://github.com/jmooring/hugo-stork (published site is in the gh-pages branch)
There are 500 articles, with an average of about 520 words per article. Once the site is loaded, the search is
fastreally fastinstantaneous.But as you might guess, with a 6MB index file, the site doesn't load as fast as I might like. The Lighthouse (mobile) report:
Yeah, I don't trust Lighthouse that much either. But clients do.
Does anyone have any tips for how to intelligently decrease the size of the index with the tools we have today?
In the future, it seems like #250 would decrease the index size, though I don't know if it would produce a meaningful reduction.
As I understand it, the index includes the full text of each file it indexes. This is great because you have context when displaying the search results, but on this site that's about 1.5 million characters. Perhaps there's an opportunity for improving compression, or an option to index the entire file but only display a short summary in the search results (the summary would be a separate element in the file object).
Any advice would be appreciated.
Beta Was this translation helpful? Give feedback.
All reactions