Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async index updates (PUT, DELETE) #497

Closed
aguynamedben opened this issue Jun 9, 2020 · 14 comments
Closed

Async index updates (PUT, DELETE) #497

aguynamedben opened this issue Jun 9, 2020 · 14 comments
Labels
QUESTION Its always ok to ask a question, even if it is only loosely related to search-index

Comments

@aguynamedben
Copy link

aguynamedben commented Jun 9, 2020

Hi Fergie, I just discovered this library and think it's super interesting. Thank you for the work you've done in making this a public resource for people to learn from.

I have a question about how the JavaScript event loop is impacted during high-volume updates to the index. Is there any built-in way to run updates to the index asynchronously so that the JavaScript event loop (and the UI of my app/page) isn't locking up during index updates?

As an analogue, the flexsearch library (which I'm also not familiar with) claims to have an async option you can use when creating an index that will apparently spin up Web Workers to perform updates, so the UI/event loop isn't locked up during updates.

My app sometimes inserts 10,000 documents into its index at once. It's an Electron app, so we currently manage the index in a 2nd process to prevent the UI from blocking during large index updates. But we're considering switching to a Web Workers approach so that we can run a single process and handle index updates within a Web Worker.

Fun with event loop languages!

@eklem
Copy link
Collaborator

eklem commented Jun 11, 2020

But, these are asynchronous, aren't they, since they are promise based?

@eklem eklem added the QUESTION Its always ok to ask a question, even if it is only loosely related to search-index label Jun 11, 2020
@aguynamedben
Copy link
Author

@eklem Well yes, they are asynchronous, you're right, but a high volume of async activity could still (potentially?) tie up the event loop.

I think I may have worded parts of my question poorly...

Is there any built-in way to run updates to the index asynchronously so that the JavaScript event loop

I think I should have asked: Is there any built-in way to manage the index "offline" (i.e. in a Web Worker, child process) so that the "main" JavaScript event loop the app is running on isn't impacted.

Because search-index is built for Node.js or the Browser, I think the only real ways to do this would be:

  • Node.js - Have a child process manage the index. Query and update the index via IPC calls.
  • Browser - Have a Web Worker manage the index. Query and update it via the worker.postMessage API

If you get the index off the "main" JavaScript event loop, there are pros and cons:

  • Pros: JavaScript event loop for the main app is freed up, even if you're doing a lot of updates to the index
  • Cons: The index is now further away from the UI, you have to be careful with doing too much IPC

This is probably a feature request, and it might not even be needed... I haven't tested search-index in our application yet. I'm totally new to search-index and just thinking through the problem from an architecture standpoint.

For ideas on how to test, with our internal index framework, which has been simplistic in the early days but now is based mostly on LokiJS, we've had moments where we saw:

  • Slowness during a big insert... i.e. we get 10,000 docs from a web service, insert them all at once, the UI locks up a bit during that insert
  • Slowness during searches... i.e. there are docs being inserted and at the same time the user is doing a search.

We've worked around these problems to make our app fast, I'm just curious of what other ideas are out there. This looks like a great piece of search software and I hope to test it soon to get to know it better!

@eklem
Copy link
Collaborator

eklem commented Jun 11, 2020

I'll see if I can test it, as is, with a lot of documents, running in the browser and see if it becomes sluggish. No built in methods to handle this that I know =)

@fergiemcdowall
Copy link
Owner

Hi @aguynamedben and sorry for the late reply! (It seems that I need to fix my email alerts)

This is an interesting question. For reasons of simplicity, I tend to work on smallish indexes, and there is definitely a lot of work to do on performance. There are of course loads of ways to solve this type of problem, each with its own trade-offs.

To start work on this we need to create a test case that reproduce the performance issues. Then it should be fairly simple to start seeing which approaches solve, or at least alleviate, these problems.

Personally I am not married to the idea that search-index needs to be for both for the web and for the browser. Maybe we need to create separate versions in order to solve problems like this.

(BTW- v2 of search-index is just around the corner 🙂)

@eklem
Copy link
Collaborator

eklem commented Jul 23, 2020

I made an quick and dirty example-app based on the search-index demo, indexing almost 10000 documents (they are small, so all of them adds up to a little over 3MB of JSON).

It becomes unresponsive for some seconds and then there is no results for yet another 5-10 seconds before all is indexed.

Here's the repository: https://github.com/eklem/idx-tests
And here is the demo: https://eklem.github.io/idx-tests/async-index-and-search/

Try search for i.e. Rioja, Chateau or Riesling. Doesn't work with only lowercase.

Any suggestions on how we can make it more responsive? Index data in smaller chunks?

@eklem
Copy link
Collaborator

eklem commented Jul 23, 2020

Code that is interesting:

@eklem
Copy link
Collaborator

eklem commented Jul 23, 2020

So, at least in this simple example it's an issue. I haven't tested web workers before, bit I'll give it a try.

https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers

@aguynamedben
Copy link
Author

Cool, thanks for the updates. To be clear, I was asking out of curiosity and to start a discussion to learn. Please don't think I'm sending you on an errand to implement a feature request. :)

For context, I'm building an Electron app that can index a large number of documents (100k-1m). We're using our own logic build on top of LokiJS btrees. We have parts of our indexing/updating happening in the same process that displays the UI, so we have to be careful when doing batch operations. That's why I was asking about this.

We'll probably move more of the indexing to a separate process, so that the UI thread isn't doing any CPU-bound work. This is similar to the concept of using Web Workers. The UI would send [query or batch update] to the [Web Worker or background process] and not be locked up during the response.

Thanks for the feedback, there are tons of interesting links to look through. 🙏 Feel free to close the issue unless you all want to keep the discussion open for a specific reason.

@eklem
Copy link
Collaborator

eklem commented Jul 23, 2020

No problem, @aguynamedben . I'm curious myself, so I'll keep it open until I've tested with a web worker.

@eklem
Copy link
Collaborator

eklem commented Jul 26, 2020

@aguynamedben It's working quite well with a web-worker. For Chrome I had to re-initiate the search-index in the main app after the indexing was finished in the worker. If not, searches returned zero results. Anyway, the main point is that it works. Chrome seems a bit slower at indexing than Firefox. Closing this for now 😄

@eklem eklem closed this as completed Jul 26, 2020
@fergiemcdowall
Copy link
Owner

@eklem kudos for getting it to work- thats really interesting!

@eklem
Copy link
Collaborator

eklem commented Jul 27, 2020

@fergiemcdowall Thanks =)
Was quite easy to work with web workers, so I think it could be a good fit to search-index when indexing a lot of data at once, in the browser.

I'll update it with an input field for URLs, so people can play around with their own data.

@aguynamedben
Copy link
Author

Very cool!

@eklem
Copy link
Collaborator

eklem commented Jul 30, 2020

@aguynamedben now you can index with your own data if it's available as a URL

https://eklem.github.io/idx-tests/async-index-and-search/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
QUESTION Its always ok to ask a question, even if it is only loosely related to search-index
Projects
None yet
Development

No branches or pull requests

3 participants