Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Role model/dataset #3068

Merged

Conversation

philippthun
Copy link
Member

@philippthun philippthun commented Nov 18, 2022

The Role model is not backed by a dedicated table, but a dataset that is a UNION of eight individual roles tables. When filtering for roles, this was done by adding WHERE conditions to the overall dataset, which resulted in slow query executions. A more efficient approach is to move the WHERE conditions into the individual datasets per role.

This commit customizes the Role model/dataset. The creation of the UNIONed dataset is moved into a dedicated class (RoleDataset) that supports building a dataset with filters (i.e. WHERE conditions) per role (i.e. individual table). The where method of the Role dataset is overridden and uses Dataset.from ([1]) to change the dataset's source. It replaces the source with a new dataset built with filters per role.

As where calls can be chained, the filters need to be kept in the dataset instance (cache) and the source dataset needs to be built again on each where invocation with all the filters, i.e. previous and new WHERE conditions.

Furthermore fully qualified identifiers (table name and column name) have to be adapted; as the WHERE conditions are applied on the individual roles tables, no table name prefix is required and thus removed from the identifiers.

Some filters can now be handled in a more efficient way. When filtering for a specific role type, the corresponding dataset gets a WHERE 1 = 1, whereas all the other datasets get a WHERE 1 = 0, meaning they never return any rows. Something similar is done when filtering for organizations or spaces; the opposite role types get a WHERE 1 = 0 as they don't have the requested id field.

As there might be filters that still should be applied to the overall dataset, where invocations with a block (aka. virtual row block) are treated as before, i.e. applied at the end.

Drawbacks:

  • The current implementation of the custom Role model/dataset is rather strict with regards to allowed filter expressions. This means that if some functionality is added (e.g. new or different filter), the custom Role model/dataset needs to be adapted. This was done by intention to ensure that every code path is tested.
  • To limit the number of different filter expressions, invocations of first(cond) and find(cond) on the custom Role model/dataset have been replaced with where(cond).first which is semantically identical.

[1] https://sequel.jeremyevans.net/rdoc/classes/Sequel/Dataset.html#method-i-from

  • I have reviewed the contributing guide

  • I have viewed, signed, and submitted the Contributor License Agreement

  • I have made this pull request to the main branch

  • I have run all the unit tests using bundle exec rake

  • I have run CF Acceptance Tests

@philippthun philippthun force-pushed the more-specific-role-queries branch 2 times, most recently from 54cdc4f to 13bd96b Compare November 18, 2022 10:55
@philippthun philippthun changed the title wip Optimize Role model/dataset Nov 18, 2022
The Role model is not backed by a dedicated table, but a dataset that is
a UNION of eight individual roles tables. When filtering for roles, this
was done by adding WHERE conditions to the overall dataset, which
resulted in slow query executions. A more efficient approach is to move
the WHERE conditions into the individual datasets per role.

This commit customizes the Role model/dataset. The creation of the
UNIONed dataset is moved into a dedicated class (RoleDataset) that
supports building a dataset with filters (i.e. WHERE conditions) per
role (i.e. individual table). The 'where' method of the Role dataset is
overridden and uses Dataset.from ([1]) to change the dataset's source.
It replaces the source with a new dataset built with filters per role.

As 'where' calls can be chained, the filters need to be kept in the
dataset instance (cache) and the source dataset needs to be built again
on each 'where' invocation with all the filters, i.e. previous and new
WHERE conditions.

Furthermore fully qualified identifiers (table name and column name)
have to be adapted; as the WHERE conditions are applied on the
individual roles tables, no table name prefix is required and thus
removed from the identifiers.

Some filters can now be handled in a more efficient way. When filtering
for a specific role type, the corresponding dataset gets a
'WHERE 1 = 1', whereas all the other datasets get a 'WHERE 1 = 0',
meaning they never return any rows. Something similar is done when
filtering for organizations or spaces; the opposite role types get a
'WHERE 1 = 0' as they don't have the requested 'id' field.

As there might be filters that still should be applied to the overall
dataset, 'where' invocations with a block (aka. virtual row block) are
treated as before, i.e. applied at the end.

Drawbacks:
- The current implementation of the custom Role model/dataset is rather
  strict with regards to allowed filter expressions. This means that if
  some functionality is added (e.g. new or different filter), the custom
  Role model/dataset needs to be adapted. This was done by intention to
  ensure that every code path is tested.
- To limit the number of different filter expressions, invocations of
  'first(cond)' and 'find(cond)' on the custom Role model/dataset have
  been replaced with 'where(cond).first' which is semantically
  identical.

[1] https://sequel.jeremyevans.net/rdoc/classes/Sequel/Dataset.html#method-i-from
@philippthun
Copy link
Member Author

An SQL statement taken from the unit tests (three filters are applied on each individual roles table) and the complex "visibility" filter is applied at the end:

SELECT *, count(*) OVER () AS `pagination_total_results`
FROM (SELECT 'organization_auditor' AS `type`,
             `role_guid`            AS `guid`,
             `organization_id`,
             -1                     AS `space_id`,
             `user_id`,
             `created_at`,
             `updated_at`
      FROM `organizations_auditors`
      WHERE ((`user_id` IN (SELECT `id` FROM `users` WHERE (`guid` IN ('other-user-guid')))) AND
             (`created_at` < '2028-05-26 18:47:01') AND
             (`role_guid` IN ('f1faa0ec-69d4-4d95-9403-7da9ef3984b8', '5e094f74-5027-4d6b-8fbc-86e29449cccf')))
      UNION ALL
      SELECT 'organization_manager' AS `type`,
             `role_guid`            AS `guid`,
             `organization_id`,
             -1                     AS `space_id`,
             `user_id`,
             `created_at`,
             `updated_at`
      FROM `organizations_managers`
      WHERE ((`user_id` IN (SELECT `id` FROM `users` WHERE (`guid` IN ('other-user-guid')))) AND
             (`created_at` < '2028-05-26 18:47:01') AND
             (`role_guid` IN ('f1faa0ec-69d4-4d95-9403-7da9ef3984b8', '5e094f74-5027-4d6b-8fbc-86e29449cccf')))
      UNION ALL
      SELECT 'organization_billing_manager' AS `type`,
             `role_guid`                    AS `guid`,
             `organization_id`,
             -1                             AS `space_id`,
             `user_id`,
             `created_at`,
             `updated_at`
      FROM `organizations_billing_managers`
      WHERE ((`user_id` IN (SELECT `id` FROM `users` WHERE (`guid` IN ('other-user-guid')))) AND
             (`created_at` < '2028-05-26 18:47:01') AND
             (`role_guid` IN ('f1faa0ec-69d4-4d95-9403-7da9ef3984b8', '5e094f74-5027-4d6b-8fbc-86e29449cccf')))
      UNION ALL
      SELECT 'organization_user' AS `type`,
             `role_guid`         AS `guid`,
             `organization_id`,
             -1                  AS `space_id`,
             `user_id`,
             `created_at`,
             `updated_at`
      FROM `organizations_users`
      WHERE ((`user_id` IN (SELECT `id` FROM `users` WHERE (`guid` IN ('other-user-guid')))) AND
             (`created_at` < '2028-05-26 18:47:01') AND
             (`role_guid` IN ('f1faa0ec-69d4-4d95-9403-7da9ef3984b8', '5e094f74-5027-4d6b-8fbc-86e29449cccf')))
      UNION ALL
      SELECT 'space_auditor' AS `type`,
             `role_guid`     AS `guid`,
             -1              AS `organization_id`,
             `space_id`,
             `user_id`,
             `created_at`,
             `updated_at`
      FROM `spaces_auditors`
      WHERE ((`user_id` IN (SELECT `id` FROM `users` WHERE (`guid` IN ('other-user-guid')))) AND
             (`created_at` < '2028-05-26 18:47:01') AND
             (`role_guid` IN ('f1faa0ec-69d4-4d95-9403-7da9ef3984b8', '5e094f74-5027-4d6b-8fbc-86e29449cccf')))
      UNION ALL
      SELECT 'space_manager' AS `type`,
             `role_guid`     AS `guid`,
             -1              AS `organization_id`,
             `space_id`,
             `user_id`,
             `created_at`,
             `updated_at`
      FROM `spaces_managers`
      WHERE ((`user_id` IN (SELECT `id` FROM `users` WHERE (`guid` IN ('other-user-guid')))) AND
             (`created_at` < '2028-05-26 18:47:01') AND
             (`role_guid` IN ('f1faa0ec-69d4-4d95-9403-7da9ef3984b8', '5e094f74-5027-4d6b-8fbc-86e29449cccf')))
      UNION ALL
      SELECT 'space_developer' AS `type`,
             `role_guid`       AS `guid`,
             -1                AS `organization_id`,
             `space_id`,
             `user_id`,
             `created_at`,
             `updated_at`
      FROM `spaces_developers`
      WHERE ((`user_id` IN (SELECT `id` FROM `users` WHERE (`guid` IN ('other-user-guid')))) AND
             (`created_at` < '2028-05-26 18:47:01') AND
             (`role_guid` IN ('f1faa0ec-69d4-4d95-9403-7da9ef3984b8', '5e094f74-5027-4d6b-8fbc-86e29449cccf')))
      UNION ALL
      SELECT 'space_supporter' AS `type`,
             `role_guid`       AS `guid`,
             -1                AS `organization_id`,
             `space_id`,
             `user_id`,
             `created_at`,
             `updated_at`
      FROM `spaces_supporters`
      WHERE ((`user_id` IN (SELECT `id` FROM `users` WHERE (`guid` IN ('other-user-guid')))) AND
             (`created_at` < '2028-05-26 18:47:01') AND
             (`role_guid` IN ('f1faa0ec-69d4-4d95-9403-7da9ef3984b8', '5e094f74-5027-4d6b-8fbc-86e29449cccf')))) AS `t1`
WHERE ((`space_id` IN (SELECT `space_id`
                       FROM `spaces_developers`
                       WHERE (`user_id` = 4)
                       UNION
                       SELECT `space_id`
                       FROM `spaces_managers`
                       WHERE (`user_id` = 4)
                       UNION
                       SELECT `space_id`
                       FROM `spaces_auditors`
                       WHERE (`user_id` = 4)
                       UNION
                       SELECT `space_id`
                       FROM `spaces_supporters`
                       WHERE (`user_id` = 4)
                       UNION
                       SELECT `spaces`.`id`
                       FROM `spaces`
                                INNER JOIN (SELECT `organization_id`
                                            FROM `organizations_managers`
                                            WHERE (`user_id` = 4)) AS `t1`
                                           ON (`t1`.`organization_id` = `spaces`.`organization_id`))) OR
       (`organization_id` IN (2)))
ORDER BY `t1`.`created_at` DESC, `t1`.`guid` ASC LIMIT 50
OFFSET 0

@evanfarrar
Copy link
Member

evanfarrar commented Jan 20, 2023

This looks good to me and it is two orders of magnitude faster in an environment we have that is modeling modest user scale. Is there any reason it is still a draft?

@philippthun philippthun marked this pull request as ready for review January 24, 2023 08:21
@philippthun
Copy link
Member Author

The main reason for keeping this as a draft was that I wanted to perform more performance testing. As you run such tests in your environment, it should be good to be merged.

Copy link
Member

@moleske moleske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looked good to @evanfarrar and I. In our scenario, we saw a query for roles drop to about 0.002 seconds from over 2 seconds. And a query for /v3/organizations/:guid/users drop to 0.0009 seconds from 0.01 seconds.

The drawbacks seem like reasonable compromises to me, as the first will force us to think about this if we want a new filter. The later seemed reasonable since otherwise you'd have to write your own first method

@moleske moleske merged commit 2258522 into cloudfoundry:main Jan 25, 2023
philippthun added a commit to sap-contributions/cloud_controller_ng that referenced this pull request Sep 14, 2023
The optimizations done in PR cloudfoundry#3068 [1] make the dedicated org and space
role classes superfluous.

[1] cloudfoundry#3068
geofffranks pushed a commit to geofffranks/cloud_controller_ng that referenced this pull request Oct 19, 2023
The optimizations done in PR cloudfoundry#3068 [1] make the dedicated org and space
role classes superfluous.

[1] cloudfoundry#3068

Co-authored-by: Josh Russett <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants