-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Introduce DataLoader for Ballerina GraphQL #4569
Proposal: Introduce DataLoader for Ballerina GraphQL #4569
Comments
Not sure |
We need to map the exact remote/resource method and the corresponding loader function. Two resources can have different accessors and the same path inside a service. There isn't a way to provide two loader functions in that scenario. |
I agree. But My point is these functions are internal functions and mapping can do internally. Also, these functions are not part of the GraphQL original schema. By defining these functions as resource functions break the tools such as schema generation, isn't it? |
Yes, this is a valid point. Shall we have a meeting to check out the alternatives? Meantime, we will go ahead with this approach. We are hoping to release this as an experimental feature first. Will that be okay? |
Had a discussion with @sameerajayasoma @shafreenAnfar @ThisaruGuruge regarding this issue:
|
Following changes will be made to the API according to the meeting with the team (@sameerajayasoma @shafreenAnfar @ThisaruGuruge);
isolated distinct service class Author {
//...
isolated function prefetchBooks(graphql:Context ctx) {
// ...
}
@graphql:ResourceConfig {
// ... other fields
prefetch: self.prefetchBooks
}
isolated resource function get books(graphql:Context ctx) returns Book[]|error {
// ...
}
remote function books(BookInput[] input) returns Book[]|error {
// ...
}
}
public isolated class Context {
// ... omitted for brevity
public isolated function registerDataLoader(string key, dataloader:DataLoader dataLoader) {
// ...
}
public isolated function getDataLoader(string key) returns dataloader:DataLoader {
// ...
// panic if no key found
}
}
public isolated function add(anydata key); Putting it all together, the following example demonstrates the usage of the new API: Example @graphql:ServiceConfig {
contextInit: isolated function (http:RequestContext requestContext, http:Request request) returns graphql:Context {
graphql:Context context = new;
context.registerDataLoader("bookLoader", new DefaultDataLoader(batchBooks));
return context;
}
}
service on new graphql:Listener(9090) {
// ... omitted for brevity
}
isolated distinct service class Author {
//...
isolated function preBooks(graphql:Context ctx) {
dataloader:DataLoader bookLoader = ctx.getDataLoader("bookLoader");
bookLoader.add(self.author.id);
}
isolated resource function get books(graphql:Context ctx) returns Book[]|error {
dataloader:DataLoader bookLoader = ctx.getDataLoader("bookLoader");
return bookLoader.get(self.author.id);
}
} |
As for @MaryamZi's comment, it is currently not possible to pass an instance method reference to the annotation. As an alternative approach, we have considered passing the prefetch method name to the Example: isolated distinct service class Author {
//...
isolated function prefetchBooks(graphql:Context ctx) {
// ...
}
@graphql:ResourceConfig {
// ... other fields
prefetchMethodName: "prefetchBooks"
}
isolated resource function get books(graphql:Context ctx) returns Book[]|error {
// ...
}
remote function books(BookInput[] input) returns Book[]|error {
// ...
}
} We could validate the existence and signature of the "prefetchBooks" at compile time using a compiler plugin. What do you think, @sameerajayasoma? |
Summary
DataLoader is a versatile tool used for accessing various remote data sources in GraphQL. Within the realm of GraphQL, DataLoader is extensively employed to address the N+1 problem. The aim of this proposal is to incorporate a DataLoader functionality into the Ballerina GraphQL package.
Goals
Motivation
The N+1 problem
The N+1 problem can be exemplified in a scenario involving authors and their books. Imagine a book catalog application that displays a list of authors and their respective books. When encountering the N+1 problem, retrieving the list of authors requires an initial query to fetch author information (N), followed by separate queries for each author to retrieve their books (1 query per author).
This results in N+1 queries being executed, where N represents the number of authors, leading to increased overhead and potential performance issues. Following is a GraphQL book catalog application written in Ballerina which susceptible to N +1 problem
Executing the query
on the above service will print the following SQL queries in the terminal
where the first query returns 10 authors then for each author a separate query is executed to obtain the book details resulting in a total of 11 queries which leads to inefficient database querying. The DataLoader allows us to overcome this problem.
DataLoader
The DataLoader is the solution found by the original developers of the GraphQL spec. The primary purpose of DataLoader is to optimize data fetching and mitigate performance issues, especially the N+1 problem commonly encountered in GraphQL APIs. It achieves this by batching and caching data requests, reducing the number of queries sent to the underlying data sources. DataLoader helps minimize unnecessary overhead and improves the overall efficiency and response time of data retrieval operations.
Success Metrics
In almost all GraphQL implementations, the DataLoader is a major requirement. Since the Ballerina GraphQL package is now spec-compliant, we are looking for ways to improve the user experience in the Ballerina GraphQL package. Implementing a DataLoader in Ballerina will improve the user experience drastically.
Description
The DataLoader batches and caches operations for data fetchers from different data sources.The DataLoader requires users to provide a batch function that accepts an array of keys as input and retrieves the corresponding array of values for those keys.
API
DataLoader object
This object defines the public APIs accessible to users.
DefaultDataLoader class
This class provides a default implementation for the DataLoader
The DefaultDataLoader class is an implementation of the DataLoader with the following characteristics:
batchLoadFunction
to be provided during initialization.init
methodThe
init
method instantiates the DefaultDataLoader and accepts abatchLoadFunction
function pointer as a parameter. ThebatchLoadFunction
function pointer has the following type:Users are expected to define the logic for the
batchLoadFunction
, which handles the batching of operations. ThebatchLoadFunction
should return an array of anydata where each element corresponds to a key in the inputkeys
array upon successful execution.load
methodThe
load
method takes ananydata
key parameter and adds it to the key table for batch execution. If a result is already cached for the given key in the result table, the key will not be added to the key table again.get
methodThe
get
method takes ananydata
key as a parameter and retrieves the associated value by looking up the result in the result table. If a result is found for the given key, this method attempts to perform data binding and returns the result. If a result cannot be found or data binding fails, an error is returned.dispatch
methodThe
dispatch
method invokes the user-definedbatchLoadFunction
. It passes the collected keys as an input array to thebatchLoadFunction
, retrieves the result array, and stores the key-to-value mapping in the resultTable.Requirements to Engaging DataLoader in GraphQL Module
To integrate the DataLoader with the GraphQL module, users need to follow these three steps:
map<dataloader:DataLoader>
to its parameter list.map<dataloader:DataLoader>
parameter. This function is executed as a prefetch step before executing the corresponding resource method of GraphQL field. (Note that both the loadXXX method and the XXX method should have same resource accessor or should be remote methods)@dataloader:Loader
annotation and pass the required configuration. This annotation helps avoid adding loadXXX as a field in the GraphQL schema and also provides DataLoader configuration.Loader
annotationThe following section demonstrates the usage of DataLoader in Ballerina GraphQL.
Modifying the Book Catalog Application to Use DataLoader
In the previous Book Catalog Application example
SELECT * FROM books WHERE author = ${authorId}
was executed each time for N = 10 authors. To batch these database calls to a single request we need to use a DataLoader at the books field. The following code block demonstrates the changes made to the books field and Author service class.executing the following query
after incorporating DataLoader will now include only two database queries.
Engaging DataLoader with GraphQL Engine
At a high level the GraphQL Engine breaks the query into subproblems and then constructs the value for the query by solving the subproblems as shown in the below diagram.
Following algorithm demonstrates how the GraphQL engine engages the DataLoader at a high level.
loadXXX
(whereXXX
is the field name) is found, the engine:@dataloader:Loader
annotation.XXX
andloadXXX
functions.loadXXX
resource method and generates a placeholder value for that field.loadXXX
function is found, the engine executes the correspondingXXX
resource function for that field.dispatch()
function of all the created DataLoaders.XXX
).Future Plans
The DataLoader object will be enhanced with the following public methods:
These methods enhance the functionality of the DataLoader, providing more flexibility and control over data loading, caching, and result management.
The text was updated successfully, but these errors were encountered: