Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose ActiveJob ability to customize the columns used to order the cursor #859

Merged
merged 1 commit into from
Feb 8, 2024

Conversation

bbraschi
Copy link
Contributor

@bbraschi bbraschi commented Jul 27, 2023

JobIteration::ActiveRecordCursor#condition adds an order by clause to the relation returned by the collection method. It default to order on the id column, see JobIteration::ActiveRecordCursor#initialize.

Ordering on id only can degrade performance significantly depending on which indexes are defined.

job-iteration supports configuring which columns are used to order the cursor. This PR explores a way to expose this ability in the maintenance-tasks gem.

@bbraschi bbraschi force-pushed the expose-cursor_columns branch 2 times, most recently from 459edc3 to e17aba3 Compare July 27, 2023 21:33
@bbraschi bbraschi marked this pull request as ready for review July 27, 2023 21:35
@@ -21,6 +21,9 @@ class NotFoundError < NameError; end
# @api private
class_attribute :collection_builder_strategy, default: NullCollectionBuilder.new

# As defined by JobIteration::EnumeratorBuilder#build_active_record_enumerator_on_records as `columns`
class_attribute :cursor_columns, default: nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I searched for a way to let subclasses of MaintenanceTasks::Task customize how the enumerator_builder instantiates the underlying relation, without success.

Using a class_attribute as an alternative. Is it appropriate?

Copy link
Member

@gmcgibbon gmcgibbon Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be overridden by specific tasks, so I think the best thing to do is provide a simple method like collection that can be overridden. A class variable doesn't seem necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY!

I understand that JobIteration::ActiveRecordBatchEnumerator#initialize here will raise if the collection includes an order by clause and hence that

  1. the collection provided to JobIteration::ActiveRecordBatchEnumerator#initialize cannot be ordered
  2. an explicit set of ordering columns must be given to JobIteration::ActiveRecordBatchEnumerator#initialize
      if relation.arel.orders.present? || relation.arel.taken.present?
        raise JobIteration::ActiveRecordCursor::ConditionNotSupportedError
      end

      @base_relation = relation.reorder(@columns.join(","))

MaintenanceTasks::Task could

  1. retrieve the order columns from the relation returned by collection
  2. reorder the the relation returned by collection so it is not ordered anymore
  3. provide the retrieved order columns to the JobIteration::ActiveRecordBatchEnumerator#initialize

However it would make more sense to let JobIteration::ActiveRecordBatchEnumerator itself

  1. support that the collection is ordered and
  2. derive what are the cursor columns.

For active_record_on_records, it seems that we'd also have to make sure that the relation and columns provided to JobIteration::ActiveRecordEnumerator#initialize are consistent.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood your comment, got it after rereading more carefully: expose an instance method instead of a class attribute. I find that the class attribute conveys better the intent of customizing the class. Will push a commit to go as you suggest, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@bbraschi bbraschi force-pushed the expose-cursor_columns branch from e17aba3 to b6057d8 Compare July 27, 2023 21:38
@bbraschi bbraschi self-assigned this Jul 27, 2023
enumerator_builder.active_record_on_records(
collection,
cursor: cursor,
**{ columns: @task.class.cursor_columns }.compact,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the compact? Isn't better to check if cursor_columns are set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this style because I had not checked in depth how active_record_on_records would behave if passed a nil for the columns argument. Hence I wanted to only provide the columns argument if it was set. I then used compact to avoid an if-then-else.

I have checked that active_record_on_records behaves the same way if the columns argument is absent or nil. It does because JobIteration::ActiveRecordEnumerator#initialize here defaults columns to nil.

-> updated to always provide columns: @task.class.cursor_columns in the Provide columns argument unconditionally to active_record_on_records commit.

@@ -21,6 +21,9 @@ class NotFoundError < NameError; end
# @api private
class_attribute :collection_builder_strategy, default: NullCollectionBuilder.new

# As defined by JobIteration::EnumeratorBuilder#build_active_record_enumerator_on_records as `columns`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you improve this documentation telling what this does instead of linking to method in another library? For the sake of this library, this method is only to configure which column will be used in the Active Record enumerator. Let's explain this and then, we can link to the other library, but if we are going to do that, we should really link to the library documentation, not just mention the reference of the method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improvement proposal in Document the cursor_columns attribute inline commit.

@@ -5,6 +5,7 @@
require "action_dispatch/system_testing/server"

ActionDispatch::SystemTesting::Server.silence_puma = true
Webdrivers::Chromedriver.required_version = "114.0.5735.90"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this now I believe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY - Dropped the commit.

@gmcgibbon
Copy link
Member

@bbraschi are you still interested in working on this?

@bbraschi
Copy link
Contributor Author

@gmcgibbon If you have you have the time to bring this to a conclusion, through this PR or another, please do. Else, I can work on it at the end of Jan 2024.

@bbraschi bbraschi force-pushed the expose-cursor_columns branch 3 times, most recently from f4e0c5b to 8ec0c18 Compare January 31, 2024 14:23
enumerator_builder.active_record_on_records(
collection,
cursor: cursor,
**{ columns: @task.class.cursor_columns }.compact,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this style because I had not checked in depth how active_record_on_records would behave if passed a nil for the columns argument. Hence I wanted to only provide the columns argument if it was set. I then used compact to avoid an if-then-else.

I have checked that active_record_on_records behaves the same way if the columns argument is absent or nil. It does because JobIteration::ActiveRecordEnumerator#initialize here defaults columns to nil.

-> updated to always provide columns: @task.class.cursor_columns in the Provide columns argument unconditionally to active_record_on_records commit.

@@ -54,6 +58,7 @@ def build_enumerator(_run, cursor:)
collection.relation,
cursor: cursor,
batch_size: collection.batch_size,
**{ columns: @task.class.cursor_columns }.compact,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked that active_record_on_batch_relations behaves the same way if the columns argument is absent or nil. It does because JobIteration::ActiveRecordBatchEnumerator#initialize here defaults columns to nil.

-> updated to always provide columns: @task.class.cursor_columns in the Provide columns argument unconditionally to active_record_on_records commit.

@@ -21,6 +21,9 @@ class NotFoundError < NameError; end
# @api private
class_attribute :collection_builder_strategy, default: NullCollectionBuilder.new

# As defined by JobIteration::EnumeratorBuilder#build_active_record_enumerator_on_records as `columns`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improvement proposal in Document the cursor_columns attribute inline commit.

@bbraschi
Copy link
Contributor Author

bbraschi commented Feb 5, 2024

@rafaelfranca , @gmcgibbon sorry for the PR getting stalled. Tried to take your comments into account, please review when you can. TY

app/models/maintenance_tasks/task.rb Outdated Show resolved Hide resolved
app/models/maintenance_tasks/task.rb Outdated Show resolved Hide resolved
app/models/maintenance_tasks/task.rb Outdated Show resolved Hide resolved
app/models/maintenance_tasks/task.rb Outdated Show resolved Hide resolved
app/models/maintenance_tasks/task.rb Outdated Show resolved Hide resolved
@@ -21,6 +21,9 @@ class NotFoundError < NameError; end
# @api private
class_attribute :collection_builder_strategy, default: NullCollectionBuilder.new

# As defined by JobIteration::EnumeratorBuilder#build_active_record_enumerator_on_records as `columns`
class_attribute :cursor_columns, default: nil
Copy link
Member

@gmcgibbon gmcgibbon Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be overridden by specific tasks, so I think the best thing to do is provide a simple method like collection that can be overridden. A class variable doesn't seem necessary.

@bbraschi bbraschi requested a review from gmcgibbon February 6, 2024 03:04
`job-iteration` supports configuring which columns are used to order
the cursor. This commit provides a way to expose this ability in
the `maintenance-tasks` gem.
@bbraschi bbraschi force-pushed the expose-cursor_columns branch from 1bbfa40 to 9ff94ee Compare February 8, 2024 17:31
@bbraschi bbraschi merged commit e104de4 into main Feb 8, 2024
41 checks passed
@bbraschi bbraschi deleted the expose-cursor_columns branch February 8, 2024 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants