Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full-search implementation concerns #661

Open
evil-shrike opened this issue Sep 1, 2016 · 13 comments
Open

Full-search implementation concerns #661

evil-shrike opened this issue Sep 1, 2016 · 13 comments
Labels
search Search content for a static site
Milestone

Comments

@evil-shrike
Copy link

evil-shrike commented Sep 1, 2016

First of all full-search is awesome. Really cool. But let me criticize a bit.

  1. Why do we need search-stopwords.json?
    lurn already contains built-in stopwords for English. But you remove default stopWordFilter, then load a separate stopwords index file and generate a filter based on it. Why?
    That search-stopwords.json contains the same stopwords as default builtin filter!
    Moreover lunr addons for languages (from https://github.com/MihaiValentin/lunr-languages) contains their own stop words.
  2. Why not build index in build-time? Why instead do you load json in run-time and then add item by item into index. It can be done (and usually done) in build time. Then in run-time we can just load an index file:
    $.getJSON("index.json", function (data) { engine = lunr.Index.load(data); })
    That's all.
    I understand that you enrich search results with title and keywords which are absent in lunr.search's result. But it can be done via additional index file.
  3. no i18n
    Index should be built with honor of other languages. lunr natively supports only English. For additional languages support we need to add addons (from https://github.com/MihaiValentin/lunr-languages):
    in buildtime:
var lunr = require('lunr');
require('./lunr.stemmer.support.js')(lunr);
require('./lunr.ru.js')(lunr);
require('./lunr.multi.js')(lunr);
var lunrIdx = lunr(function() {
  this.use(lunr.multiLanguage('en', 'ru'));
  // config ref/fields
});

in runtime:

lunr.multiLanguage('en', 'ru');
engine = lunr.Index.load(data);

I can create a template for customization of index building but I think it should possible without template customization. Also please see #650 - these're problems with encoding of extracted keywords for indexing.

@qinezh
Copy link
Contributor

qinezh commented Sep 3, 2016

Thanks for @evil-shrike 's comments, it's quite reasonable and insightful. I'm glad to share something with you:

  1. For the first question, actually, it's a way to solve the issue Search result not so great on DocFX #279 , so users can customize the stop-words to avoid the problem.
  2. For the second one, the index.json generated by DocFX is not kind of serialised data which lunr.Index.load need. And additional index file would make the process more complicated?
  3. For the third one, I agree with you, we should support other languages.

Thanks @evil-shrike . Feel free to share it here if you have more concerns.

@evil-shrike
Copy link
Author

evil-shrike commented Sep 5, 2016

the index.json generated by DocFX is not kind of serialised data which lunr.Index.load need.

sure, we have to build it - similar to that how it's built in runtime currently and only call index.toJSON at the end, and we'll have index json for Index.load.
The need of additional index file would be compensated by the fact that we won't need stopwords.json (it'll used at buildtime and embeded into the generated index).

@qinezh
Copy link
Contributor

qinezh commented Sep 7, 2016

If no stopwords.json exists, how can users customize the stop-words? For example, what if user what to search the word 'let', which is included in default lunr.js stopwords?

@evil-shrike
Copy link
Author

evil-shrike commented Sep 7, 2016

I understand, I meant we don't need it at runtime (load a file from the server) if index would be built in built-time (with custom stopwords).

@vicancy vicancy added template The stock site template enhancement labels May 9, 2017
@vicancy vicancy added this to the Near term milestone May 9, 2017
@oyshan
Copy link

oyshan commented Sep 29, 2017

We're experiencing isses with the second point.
It seems the search index is built every time you load and/or navigate the page. This causes problems for docs sites with a medium/big-sized index.json file. It takes almost 10s for the lunr search index to be built. I.e. 10 seconds where Search does not work.
I agree that the Lunr-index should be built build-time, and only loaded run-time.

@nonno
Copy link

nonno commented Oct 17, 2017

I'm trying to customize the search-stopwords.json so I can filter Italian stopwords (we are writing documentation in Italian), but without success. I tried to override the file inside my custom template and also to set the array directly inside search-worker.js, but apparently nothing happens and the index.json in the _site root is always huge. Could anybody explain my how I can do it?

@Shazwazza
Copy link

The performance of this runtime index processing isn't so good. The doc site I have has an index.json file of about 8.5 MB. This means that search isn't available for a minute or two while it's being processed.

It's mentioned above that it might be possible to do this processing at build time instead of runtime in the browser. If that is possible, does anyone know how i can achieve that?

@scionwest
Copy link

I would like to know how to go about this as well. We are concerned about processing large index.json files at runtime. Any updates on this? @qinezh

@scottcurrie
Copy link

Checking back in on build-time indexes. Our search takes over a minute for the idex to be built on most desktops. It looks like search is just broken, because users would give up before the index is created.

@superyyrrzz superyyrrzz added search Search content for a static site and removed P2 template The stock site template labels May 9, 2019
@yufeih yufeih removed this from the Near term milestone Dec 21, 2022
@Unnvaldr
Copy link

Unnvaldr commented Jun 28, 2023

If somebody is still eager for a solution, I wrote one for DocFx v2 where the index generation is moved to build-time.
https://github.com/Unnvaldr/DocFx.Plugins.ExtractSearchIndex

@paulushub
Copy link

paulushub commented Jun 29, 2023

@Unnvaldr It is a year old now, how about a readme file explaining the features, limitations, etc, and for many, a license information?

@Unnvaldr
Copy link

@Unnvaldr It is a year old now, how about a readme file explaining the features, limitations, etc, and for many, a license information?

Project was private for that time, just recently decided to share it. Most of the stuff you specified will be added in the following days.

@yufeih
Copy link
Contributor

yufeih commented Oct 4, 2023

Added a browser cache to speed up page load speed for subsequent visits. The first page view still builds the index in the browser and is slow.

Multilanguage support require building index at build time since some languages like zh contains native dependencies not available in browser.

@yufeih yufeih mentioned this issue Oct 26, 2023
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
search Search content for a static site
Projects
None yet
Development

No branches or pull requests