chore: Use lazy rather than joined eager loading #1959
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
#1361 introduced Joined Eager Loading for joining one-to-many or many-to-many model relationships, however per the SQLAlchemy example this results in a slew of
LEFT OUTER JOIN
s which will rapidly result into a expansive interim result set—from a multiplicative combinatorial sense—when a model has multiple high cardinality relationships. Furthermore the result set—which is repetitive in nature—requires further overhead by SQLAlchemy to de-duplicate the rows against the actual model record.Given that the FAB response is an addition of relationships rather than a multiplication of relationships it seems like Lazy Loading is more desirable. Granted this results in one query per relationship, but the interim result sets are typically much smaller non-repetitive in nature (especially for high cardinality relationships) and thus more performant.
For example in Superset at Airbnb we have a virtual dataset which is comprised of a thousands of metrics and columns. Previously a query to obtain the dataset with the associated metric and column IDs would time out (the join would result in an interim result set of 100M+ rows) whereas with lazy loading the query takes << 5 seconds.
ADDITIONAL INFORMATION