GraphQL::Dataloader::Source#sync complexity #4512
Replies: 5 comments 1 reply
-
Hey, thanks for the great write-up and interesting suggestion. I can see how this would improve the precision of dataloading in theory, but I'm wondering, how does this affect its efficiency in practice? My guess would have been that a given |
Beta Was this translation helpful? Give feedback.
-
Thank you for your response. I use StackProf found it takes much time in ActiveRecord::Core#hash which is due to Here is the result for query {
allSameRecords(size: 5000){
dataloaderUser {
id
}
}
} (I put the source code at the end) While I discovered that I can avoid using ActiveRecord as a key, I am currently considering that there might be a bottleneck in the I published a repository to demo this behavior.
The main code is as follows: Sources::Association of GraphQL::Dataloader::SourceThis is a Source which use ActiceRecord as key# frozen_string_literal: true
class Sources::Association < GraphQL::Dataloader::Source
def initialize(model, association_name)
@model = model
@association_name = association_name
validate
end
def load(record)
fail TypeError, "#{@model} loader can't load association for #{record.class}" if !record.is_a?(@model)
super(record)
end
def fetch(records)
preload_association(records)
records.map{|s| read_association(s) }
end
private
def validate
return if @model.reflect_on_association(@association_name)
fail ArgumentError, "No association #{@association_name} on #{@model}"
end
def preload_association(records)
::ActiveRecord::Associations::Preloader.new.preload(records, @association_name)
end
def read_association(record)
record.public_send(@association_name)
end
def association_loaded?(record)
record.association(@association_name).loaded?
end
end Sources::AssociationV2 of GraphQL::Dataloader::SourceThis is a Source which use ActiceRecord's object_id as key# frozen_string_literal: true
class Sources::AssociationV2 < GraphQL::Dataloader::Source
def initialize(model, association_name)
@model = model
@association_name = association_name
validate
@cache = {}
end
def load(record)
fail TypeError, "#{@model} loader can't load association for #{record.class}" if !record.is_a?(@model)
cache_key = cache_key(record)
@cache[cache_key] = record
super(cache_key)
end
def fetch(cache_keys)
records = cache_keys.map{|k| @cache[k] ||= @model.find(k) }
preload_association(records)
records.map{|s| read_association(s) }
end
private
def validate
return if @model.reflect_on_association(@association_name)
fail ArgumentError, "No association #{@association_name} on #{@model}"
end
def preload_association(records)
::ActiveRecord::Associations::Preloader.new.preload(records, @association_name)
end
def read_association(record)
record.public_send(@association_name)
end
def association_loaded?(record)
record.association(@association_name).loaded?
end
def cache_key(record)
record.object_id
end
end AssociationLoader of GraphQL::Batch::Loader# frozen_string_literal: true
class AssociationLoader < GraphQL::Batch::Loader
def self.validate(model, association_name)
new(model, association_name)
nil
end
def initialize(model, association_name)
super()
@model = model
@association_name = association_name
validate
end
def load(record)
raise TypeError, "#{@model} loader can't load association for #{record.class}" unless record.is_a?(@model)
return Promise.resolve(read_association(record)) if association_loaded?(record)
super
end
# We want to load the associations on all records, even if they have the same id
def cache_key(record)
record.object_id
end
def perform(records)
preload_association(records)
records.each { |record| fulfill(record, read_association(record)) }
end
private
def validate
unless @model.reflect_on_association(@association_name)
raise ArgumentError, "No association #{@association_name} on #{@model}"
end
end
def preload_association(records)
::ActiveRecord::Associations::Preloader.new.preload(records, @association_name)
end
def read_association(record)
record.public_send(@association_name)
end
def association_loaded?(record)
record.association(@association_name).loaded?
end
end Query Typemodule Types
class QueryType < Types::BaseObject
# Add `node(id: ID!) and `nodes(ids: [ID!]!)`
include GraphQL::Types::Relay::HasNodeField
include GraphQL::Types::Relay::HasNodesField
field :all_same_records, [Types::PostType] do
argument :size, Integer, required: true
end
field :all_different_records, [Types::PostType] do
argument :size, Integer, required: true
end
def all_same_records(size:)
post = Post.joins(:user).last
Array.new(size){ post }
end
def all_different_records(size:)
Post.joins(:user).last(size)
end
end
end User type# frozen_string_literal: true
module Types
class UserType < BaseObject
field :id, Integer
end
end Post Type# frozen_string_literal: true
module Types
class PostType < BaseObject
field :dataloader_user, Types::UserType, null: false
field :dataloader_user_v2, Types::UserType, null: false
field :graphql_batch_user, Types::UserType, null: false
def graphql_batch_user
AssociationLoader.for(object.class, :user).load(object)
end
def dataloader_user
dataloader.with(::Sources::Association, object.class, :user).load(object)
end
def dataloader_user_v2
dataloader.with(::Sources::AssociationV2, object.class, :user).load(object)
end
end
end Test Resultquery {
allSameRecords(size: 5000){
dataloaderUser {
id
}
}
} Before(original )After
query {
allSameRecords(size: 5000){
dataloaderUserV2 {
id
}
}
} BeforeAfterquery {
allSameRecords(size: 5000){
graphqlBatchUser {
id
}
}
} query {
allDifferentRecords(size: 5000){
dataloaderUser {
id
}
}
} BeforeAfterquery {
allDifferentRecords(size: 5000){
dataloaderUserV2 {
id
}
}
} BeforeAfterquery {
allDifferentRecords(size: 5000){
graphqlBatchUser {
id
}
}
} |
Beta Was this translation helpful? Give feedback.
-
Oh, I see! So, in this case, it isn't about the dataloading efficiency, but rather about the Ruby/CPU efficiency. Thanks for sharing what you found in this area -- it's really interesting. I'd like to adopt the changes you proposed here -- could you open a PR with them? |
Beta Was this translation helpful? Give feedback.
-
The PR #4519 has been completed. Therefore, I am closing the discussion. Thank you! |
Beta Was this translation helpful? Give feedback.
-
I am uncertain whether I should close the discussion. If I do incorrectly, please remind me. Thank you. |
Beta Was this translation helpful? Give feedback.
-
I noticed that in
GraphQL::Dataloader::Source#sync
, it is necessary to check all thepending_keys
instead of just thekey
whereload
was assigned.To elaborate, the length of
pending_keys
is 1 whenload
is initially called in the firstsync
operation.graphql-ruby/lib/graphql/dataloader/source.rb
Lines 37 to 42 in 250a724
graphql-ruby/lib/graphql/dataloader/source.rb
Lines 70 to 74 in 250a724
In subsequent
sync
operations, whenload
is called again, the length ofpending_keys
increases to 2.Similarly, for the Nth time
load
is called, the length ofpending_keys
becomes N insync
.As the number of
load
calls increases, the complexity of the checkpending_keys.any? { |k| [email protected]?(k) }
also increases.Is it possible to modify
sync
so thatpending_keys
corresponds to the key where load was assigned?For example:
Beta Was this translation helpful? Give feedback.
All reactions