-
Notifications
You must be signed in to change notification settings - Fork 842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental/Need feedback: Implement pluggable batching/dedup/concurrent fetch like facebook/dataloader #154
Conversation
…ems. And an example to demonstrate how it can be used.
It may be a while till I expiriment with it, but like how you done it all with minimal changes and also github.com/bigdrum/godataloader is kinda small. thanks for improving this library. cheers. |
This look really interesting. Anything we're missing here? |
FYI, I've been using this at a small scale production environment. And I haven't seen any bugs so far. The only problem that I still consider this as experiment is that for a object with only trivial resolve function (one that without expensive IO), the dataloader would still kick off a goroutine (one per parent object, not one per item, though, so not that many). We don't see any performance issue so far but it would be nice to provide some hint to optimize that away. |
Any status on this? I'm interested in parallel processing for my use-cases. GoRoutines are quite light if they're short-lived, so I wouldn't spend too much time on them. |
Have you looked at https://github.com/nicksrandall/dataloader? This library should probably be orthogonal to any dataloader |
https://github.com/nicksrandall/dataloader only dispatches the batch when batch size reaches some limit or some pre-configured amount of wait time has passed. This will introduce unnecessary delay. On the other hand, facebook's dataloader.js and my implementation do not have such issue. |
@bigdrum this is because facebook's implementation takes advantage of the javascript event loop. Golang (to my understanding) doesn't have such an event loop, so @nicksrandall's implementation approximates this with the batch size/time elapsed mechanism (both parameters are configurable to your use case). |
Right my implementation of data loader implements a custom scheduler to
achieve the similar effect.
…On Mon, Mar 6, 2017, 1:38 PM Tony Ghita ***@***.***> wrote:
@bigdrum <https://github.com/bigdrum> this is because facebook's
implementation takes advantage of the javascript event loop. Golang (to my
understanding) doesn't have such an event loop, so @nicksrandall
<https://github.com/nicksrandall>'s implementation approximates this with
the batch size/time elapsed mechanism (both parameters are configurable to
your use case).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#154 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACytkzUBGP71jOReYH6o74WOfHDR70Iks5rjFKWgaJpZM4JTf5k>
.
|
To be more elaborated, in essence, there are two kinds of tasks, the tasks that collect what need to be loaded, and the tasks of the fetching. If we can prioritize the collection task over the fetching task, we can batch as much as possible. FB's dataloader.js takes advantage of js env where a closure can be schedule at the end of the event loop, which essentially allow them to schedule the lower priority fetch tasks after the collection task. So event loop is not essence here, the essence is the ability to dispatch tasks in a certain order. The event loop and its API just allows dataloader.js to implement the custom scheduling logic, in a way that hides perfectly from the end user. Go doesn't support custom go routine scheduling, if it does, we could create a schedule domain, dispatch go routine with different strict priority. So I ended up implement a custom scheduling mechanism that allows running different closure with a custom order, which has two queues internally, to allow executing the tasks with two tier priority. Thus to achieve the same result. But I could not figure out a perfectly non-intrusive way as dataloader.js can do in node.js. @nicksrandall's implementation achieve this by manually delay the fetch task with a fixed amount of time, to allow the collection tasks to be executed before that. But since it doesn't know when the collection is done, it pays the cost of waiting blindly. It is a trade-off, @nicksrandall is definitely much easier to understand, and my implementation is harder, but to me, the precise scheduling without compromising the latency is exactly what I really want and inspired from FB's dataloader.js. Without which, I might just use node.js for graphql implementation. I'm presenting a way to achieve the same effect even though the underlying runtime is so different between node.js and go. In the end, what I'm proposing here is just a interface to allow such different execution model to be injected. People can choose different actual executor as they see fits their needs. (Sorry for potential grammar errors as I'm not a English native speaker.) |
I'm not sure where to look, but this scenario sounds like it needs a sync.WaitGroup{} that sees all the .Add() calls before the collector tasks start, and something does .Wait() before starting the fetchers. The collector tasks can call .Done() That's the scatter-gather pattern in GoLang. |
@bigdrum I like your approach of creating a custom scheduler. I know this a first draft but a few features that I think are critical for a datoader are max batch size and max time out. I worry that your implementation under load could grow unbounded. That feature should be pretty easy to add. |
@nicksrandall Right that's very good suggestion, and my approach does introduce more footprint comparing to yours as it creates new goroutines when it sees the need to yield to another graphql branch to collect more tasks (but note at a given moment only one goroutine is active, all others are waiting to be scheduled). The interesting thing is that this logic could be put into the custom scheduler. For example, we could provide some fairness to the low priority queue, like if the high priority queue size reaches some limit, dispatch the task in low priority queue. This allows the pending batch to be flushed. Same for the time based criteria. Since the dataloader/scheduler is per graphql query. We could also introduce a shared semaphore state so that the total pending tasks of a server could not exceed some given limit. |
I'm interested in seeing this merged too. What's the status? |
Any Update? 👍 |
Any updates on this? |
Any updates on this vs parallel resolution? |
Thanks a lot guys! 👍 — Closing this one in favor of #388, you might want to take a look to a real-use case working example that shows the support for concurrent resolvers. |
I have been considering how to implement the idea of facebook/dataloader with graphql in go (which address issues like #106, #132), and this PR shows the progress so far.
This PR is mainly a demonstration on how it could work. And so far I'm happy with the idea.
This requires some minimum changes to the graphql package (just to inject a executor with a simple interface). The real concurrency feature is implemented in the godataloader library that I wrote separately, which can be plugged into graphql.
examples/dataloader contains an actual example, which demonstrate the following features:
The test might best demonstrate the behavior.
This PR is not meant to be merged yet (at least the example probably doesn't fit to be part of this project because of its external dependency, unless the godataloader library is moved).
The godataloader library is probably a little bit more complicated than one would expect. And I'm happy to explain it if anyone finds this idea interesting.