Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce reaper threads #576

Merged
merged 9 commits into from
Feb 17, 2021
Merged

Reduce reaper threads #576

merged 9 commits into from
Feb 17, 2021

Conversation

mhenrixon
Copy link
Owner

Unfortunately, due to a bug in concurrent-ruby, the existing implementation of the TimerTask was bleeding threads and there for memory. This PR attempts to fix that by limiting the number of threads available using RubySingleThreadExecutor

Should address the feedback from @ArturT in #571

@mhenrixon mhenrixon self-assigned this Feb 15, 2021
@mhenrixon mhenrixon mentioned this pull request Feb 15, 2021
@ArturT
Copy link
Contributor

ArturT commented Feb 15, 2021

Hi @mhenrixon

I've tested this PR in production:

# Gemfile
gem 'sidekiq-unique-jobs', github: 'mhenrixon/sidekiq-unique-jobs', branch: 'v7/reduce-reaper-threads'

I've used this config to see if threads will be leaking.

  config.reaper_interval = 10  # Reap every X seconds
  config.reaper_timeout  = 5  # Give the reaper X seconds to finish

I see in the sidekiq output:

2021-02-15T20:57:04.146889+00:00 app[worker.1]: pid=4 tid=7mk uniquejobs=reaper INFO: Starting Reaper
2021-02-15T20:57:25.859359+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 58 digests
2021-02-15T20:57:40.704981+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 1 digests
2021-02-15T20:57:53.401270+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Nothing to delete; exiting.
2021-02-15T20:58:06.017724+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 28 digests
2021-02-15T20:58:16.393242+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 5 digests
2021-02-15T20:58:26.929716+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 4 digests
2021-02-15T20:58:37.475966+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Nothing to delete; exiting.
2021-02-15T20:58:48.553926+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 13 digests
2021-02-15T20:58:58.947070+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Nothing to delete; exiting.
2021-02-15T20:59:09.711465+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 10 digests
2021-02-15T20:59:19.868316+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 1 digests
2021-02-15T20:59:30.019265+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 2 digests
2021-02-15T20:59:40.866759+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 9 digests
2021-02-15T20:59:51.014401+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 2 digests
2021-02-15T21:00:01.341271+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 4 digests
2021-02-15T21:00:11.979236+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 9 digests
2021-02-15T21:00:22.137291+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Nothing to delete; exiting.
2021-02-15T21:00:32.388851+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Nothing to delete; exiting.
2021-02-15T21:00:43.014032+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 6 digests
2021-02-15T21:00:53.273144+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Nothing to delete; exiting.
2021-02-15T21:01:04.245938+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Nothing to delete; exiting.
2021-02-15T21:01:15.566350+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Nothing to delete; exiting.
2021-02-15T21:01:31.134723+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 4 digests
2021-02-15T21:01:41.974335+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Nothing to delete; exiting.
2021-02-15T21:01:54.644633+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Nothing to delete; exiting.
2021-02-15T21:02:07.859006+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 1 digests
2021-02-15T21:02:20.042666+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Nothing to delete; exiting.
2021-02-15T21:02:31.960472+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 6 digests
2021-02-15T21:02:46.192225+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 14 digests
2021-02-15T21:02:56.980521+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Nothing to delete; exiting.
2021-02-15T21:03:08.478825+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 18 digests
2021-02-15T21:03:22.918316+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Nothing to delete; exiting.
2021-02-15T21:04:10.693211+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 75 digests

Good news: Reaper is started every 10s and it removes digests.

Previously when I used:

 config.reaper_interval = 600  # Reap every X seconds
 config.reaper_timeout  = 595 

then the reaper timeouted, sometimes it took a few minutes to delete digests.

Did you improve something about how fast reaper is?

Maybe it accidental behavior because after a while I see longer than 10s intervals in logs:

2021-02-15T21:10:18.992944+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 48 digests
2021-02-15T21:10:47.429090+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 59 digests
2021-02-15T21:11:10.294648+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 63 digests
2021-02-15T21:11:27.818268+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 36 digests
2021-02-15T21:11:45.827980+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 1 digests
2021-02-15T21:11:56.719331+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Nothing to delete; exiting.
2021-02-15T21:12:07.829986+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 10 digests
2021-02-15T21:12:18.673384+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 5 digests
2021-02-15T21:12:28.910946+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 2 digests
2021-02-15T21:12:39.293548+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 5 digests
2021-02-15T21:12:49.422754+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 1 digests
2021-02-15T21:13:00.268398+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Nothing to delete; exiting.
2021-02-15T21:13:48.850870+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 23 digests
2021-02-15T21:15:43.956349+00:00 app[worker.1]: pid=4 tid=2i0g uniquejobs=reaper INFO: Deleting batch with 49 digests

Probably for safety reasons I will keep using 10 minutes intervals for reaper instead of 10s.

BTW Regarding the root problem with leaking threads. For the previous config, I had 12-13 threads on average.
Now I see 14-15 threads. I will keep it running for some time and report back if the number of threads is leaking more or not.

@ArturT
Copy link
Contributor

ArturT commented Feb 15, 2021

I believe a number of threads is more or less stable after an hour of running your PR fix in production.

I noticed also lower RAM usage and higher GC frequency. I guess that's good because I was about to start working on the issue with too much RAM consumption but it seems like your PR also helps with that. I even tried to run on purpose some heavy workers to see if the RAM usage will increase but it seems to be stable.

threads, RAM usage


EDITED:

I guess more GC frequency is because I run more workers. I'll keep the PR fix for the night to see what's tomorrow morning.

@ArturT
Copy link
Contributor

ArturT commented Feb 16, 2021

After running this PR on production since yesterday it looks like the number of threads is stable ~14 threads.

You can ignore my comment about memory usage. It's increasing over a few hours as it used to be. Not related to this PR. It's more about my worker's code.
threads stable

@mhenrixon
Copy link
Owner Author

@ArturT while I am generally pleased with things being slightly better, the increased GC worries me a little. I will dig in and see if I can spot something. In general, concurrent ruby violates some performance problems.

For example, using &method(:method_name) is about 400 times slower than calling the block as I do in this PR. Using a lower timeout and a higher frequency that would make a huge impact over time.

That said, in this PR I also nullify the executor and create a new one, that could be the reason for more GC.

I'll do some more digging first. Glad I'm on the right path though

@ArturT
Copy link
Contributor

ArturT commented Feb 17, 2021

Please note I'm note sure what's root reason of more GC frequency. The previous week I used 10 minutes reaper interval.

Now while I was testing this PR I used 10 seconds reaper interval. It could be just smaller interval a main reason of more GC frequency. We don't compare apple to apple.

My thinking is if this PR will be merged then I will stick with 10 minutes reaper interval just in case. It seems like I don't need more reaper calls. I did not see in output removed 1000 keys. It was always below 1000 when reaper 10 minutes interval. So I guess 10 seconds reaper interval would not add value. Am I missing something?

@mhenrixon
Copy link
Owner Author

10 minutes was the intended interval, I typed it up wrong in the readme.

Ok, I will fix the documentation, the examples and release a new version today.

@mhenrixon mhenrixon merged commit 1e13e2c into master Feb 17, 2021
@delete-merged-branch delete-merged-branch bot deleted the v7/reduce-reaper-threads branch February 17, 2021 07:51
@mhenrixon
Copy link
Owner Author

Alright @ArturT, released as v7.0.3

config.reaper = :ruby # :ruby, :lua or :none/nil
config.reaper_count = 1000 # Stop reaping after this many keys
config.reaper_interval = 600 # Reap orphans every 10 minutes
config.reaper_timeout = 150 # Timeout reaper after 1,5 minutes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

150/60 = 2.5 minutes

I keep in my own config 595 seconds as timeout. It's close to interval 600s.


def schedule_next_task(interval = execution_interval)
exec_task = ->(completion) { execute_task(completion) }
ScheduledTask.execute(interval, args: [Concurrent::Event.new], &exec_task)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be prefix Concurrent::ScheduledTask?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added fix in this PR #577

@ArturT
Copy link
Contributor

ArturT commented Feb 17, 2021

Thanks @mhenrixon

I noticed a random problem on sidekiq start. When I start sidekiq in development with bin/sidekiq then it fails with the below error NameError: uninitialized constant SidekiqUniqueJobs::TimerTask::ScheduledTask

2021-02-17T11:21:01.124Z pid=76841 tid=1fup INFO: Booted Rails 6.1.2.1 application in development environment
2021-02-17T11:21:01.125Z pid=76841 tid=1fup INFO: Running in ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-darwin18]
2021-02-17T11:21:01.125Z pid=76841 tid=1fup INFO: See LICENSE and the LGPL-3.0 for licensing details.
2021-02-17T11:21:01.125Z pid=76841 tid=1fup INFO: Upgrade to Sidekiq Pro for more features and support: https://sidekiq.org
Passing 'info' command to redis as is; administrative commands cannot be effectively namespaced and should be called on the redis connection directly; passthrough has been deprecated and will be removed in redis-namespace 2.0 (at /Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-unique-jobs-7.0.3/lib/sidekiq_unique_jobs/sidekiq_unique_jobs.rb:191:in `block in fetch_redis_version')
Passing 'script' command to redis as is; administrative commands cannot be effectively namespaced and should be called on the redis connection directly; passthrough has been deprecated and will be removed in redis-namespace 2.0 (at /Users/artur/.rvm/gems/ruby-2.7.1/gems/brpoplpush-redis_script-0.1.1/lib/brpoplpush/redis_script/script.rb:69:in `load')
2021-02-17T11:21:01.134Z pid=76841 tid=1fup DEBUG: Executed update_version.lua in 2ms
2021-02-17T11:21:01.135Z pid=76841 tid=1fup uniquejobs=upgrade_locks INFO: Skipping upgrade because uniquejobs:dead has been set
2021-02-17T11:21:01.136Z pid=76841 tid=1fup uniquejobs=reaper INFO: Starting Reaper
2021-02-17T11:21:01.136Z pid=76841 tid=1fup WARN: {"context":"Exception during Sidekiq lifecycle event.","event":"startup"}
2021-02-17T11:21:01.145Z pid=76841 tid=1fup WARN: NameError: uninitialized constant SidekiqUniqueJobs::TimerTask::ScheduledTask
Did you mean?  SidekiqUniqueJobs::SCHEDULE
2021-02-17T11:21:01.145Z pid=76841 tid=1fup WARN: /Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-unique-jobs-7.0.3/lib/sidekiq_unique_jobs/timer_task.rb:25:in `schedule_next_task'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/concurrent-ruby-1.1.8/lib/concurrent-ruby/concurrent/timer_task.rb:220:in `block in execute'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/concurrent-ruby-1.1.8/lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:41:in `block in synchronize'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/concurrent-ruby-1.1.8/lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:41:in `synchronize'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/concurrent-ruby-1.1.8/lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:41:in `synchronize'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/concurrent-ruby-1.1.8/lib/concurrent-ruby/concurrent/timer_task.rb:217:in `execute'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-unique-jobs-7.0.3/lib/sidekiq_unique_jobs/orphans/manager.rb:35:in `block in start'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-unique-jobs-7.0.3/lib/sidekiq_unique_jobs/logging.rb:103:in `block in with_logging_context'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-6.1.3/lib/sidekiq/logger.rb:11:in `with'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-unique-jobs-7.0.3/lib/sidekiq_unique_jobs/logging.rb:118:in `call'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-unique-jobs-7.0.3/lib/sidekiq_unique_jobs/logging.rb:118:in `with_configured_loggers_context'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-unique-jobs-7.0.3/lib/sidekiq_unique_jobs/logging.rb:102:in `with_logging_context'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-unique-jobs-7.0.3/lib/sidekiq_unique_jobs/orphans/manager.rb:31:in `start'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-unique-jobs-7.0.3/lib/sidekiq_unique_jobs/server.rb:31:in `start'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-unique-jobs-7.0.3/lib/sidekiq_unique_jobs/server.rb:20:in `block in configure'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-6.1.3/lib/sidekiq/util.rb:59:in `block in fire_event'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-6.1.3/lib/sidekiq/util.rb:58:in `each'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-6.1.3/lib/sidekiq/util.rb:58:in `fire_event'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-6.1.3/lib/sidekiq/cli.rb:92:in `run'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-6.1.3/bin/sidekiq:31:in `<top (required)>'
/Users/artur/.rvm/gems/ruby-2.7.1/bin/sidekiq:23:in `load'
/Users/artur/.rvm/gems/ruby-2.7.1/bin/sidekiq:23:in `<main>'
/Users/artur/.rvm/gems/ruby-2.7.1/bin/ruby_executable_hooks:24:in `eval'
/Users/artur/.rvm/gems/ruby-2.7.1/bin/ruby_executable_hooks:24:in `<main>'
2021-02-17T11:21:01.166Z pid=76841 tid=1fup WARN: NameError: uninitialized constant SidekiqUniqueJobs::TimerTask::ScheduledTask
Did you mean?  SidekiqUniqueJobs::SCHEDULE
2021-02-17T11:21:01.166Z pid=76841 tid=1fup WARN: /Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-unique-jobs-7.0.3/lib/sidekiq_unique_jobs/timer_task.rb:25:in `schedule_next_task'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/concurrent-ruby-1.1.8/lib/concurrent-ruby/concurrent/timer_task.rb:220:in `block in execute'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/concurrent-ruby-1.1.8/lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:41:in `block in synchronize'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/concurrent-ruby-1.1.8/lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:41:in `synchronize'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/concurrent-ruby-1.1.8/lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:41:in `synchronize'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/concurrent-ruby-1.1.8/lib/concurrent-ruby/concurrent/timer_task.rb:217:in `execute'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-unique-jobs-7.0.3/lib/sidekiq_unique_jobs/orphans/manager.rb:35:in `block in start'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-unique-jobs-7.0.3/lib/sidekiq_unique_jobs/logging.rb:103:in `block in with_logging_context'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-6.1.3/lib/sidekiq/logger.rb:11:in `with'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-unique-jobs-7.0.3/lib/sidekiq_unique_jobs/logging.rb:118:in `call'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-unique-jobs-7.0.3/lib/sidekiq_unique_jobs/logging.rb:118:in `with_configured_loggers_context'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-unique-jobs-7.0.3/lib/sidekiq_unique_jobs/logging.rb:102:in `with_logging_context'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-unique-jobs-7.0.3/lib/sidekiq_unique_jobs/orphans/manager.rb:31:in `start'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-unique-jobs-7.0.3/lib/sidekiq_unique_jobs/server.rb:31:in `start'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-unique-jobs-7.0.3/lib/sidekiq_unique_jobs/server.rb:20:in `block in configure'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-6.1.3/lib/sidekiq/util.rb:59:in `block in fire_event'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-6.1.3/lib/sidekiq/util.rb:58:in `each'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-6.1.3/lib/sidekiq/util.rb:58:in `fire_event'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-6.1.3/lib/sidekiq/cli.rb:92:in `run'
/Users/artur/.rvm/gems/ruby-2.7.1/gems/sidekiq-6.1.3/bin/sidekiq:31:in `<top (required)>'
/Users/artur/.rvm/gems/ruby-2.7.1/bin/sidekiq:23:in `load'
/Users/artur/.rvm/gems/ruby-2.7.1/bin/sidekiq:23:in `<main>'
/Users/artur/.rvm/gems/ruby-2.7.1/bin/ruby_executable_hooks:24:in `eval'
/Users/artur/.rvm/gems/ruby-2.7.1/bin/ruby_executable_hooks:24:in `<main>'

I guess the problem is missing Concurrent prefix in Concurrent::ScheduledTask in lib/sidekiq_unique_jobs/timer_task.rb.


EDITED:

I added fix in PR #577

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants