You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Job scheduler has an in-memory map that contains the scheduled jobs that are scheduled to run. When a job document is created, updated, or deleted this map is updated with the appropriate action. In this specific case the delete somehow failed which left a job that was still executing every 2 hours even though it didn't exist anymore. Ideally the sweeper would catch this and resolve the failure, but the sweeper has a bug where it doesn't remove jobs that were deleted.
For reference:
The sweeper is a background process that sweeps the job indices for job documents to schedule, re-schedule, and de-schedule documents. It does this on an interval defined by the sweep period. Every execution it will sweep all indices that were registered by plugins extending job scheduler which in turn will sweep the shards for each index. This sweepShard function is the one with the bug that is not handling job documents that were deleted from the index.
Comments:
From: @ftianli-amzn
Thanks @dbbaughe for the detailed and well-organized explanation.
Through your description, I think the problem is caused here: when de-scheduling fails, there is no backup to retry or anything else to deal with the failure .
The text was updated successfully, but these errors were encountered:
From: opendistro-for-elasticsearch/job-scheduler#65
Found from this forum discussion in ISM: https://discuss.opendistrocommunity.dev/t/ism-attempting-to-interact-with-an-obsolete-index/3224
Job scheduler has an in-memory map that contains the scheduled jobs that are scheduled to run. When a job document is created, updated, or deleted this map is updated with the appropriate action. In this specific case the delete somehow failed which left a job that was still executing every 2 hours even though it didn't exist anymore. Ideally the sweeper would catch this and resolve the failure, but the sweeper has a bug where it doesn't remove jobs that were deleted.
For reference:
The sweeper is a background process that sweeps the job indices for job documents to schedule, re-schedule, and de-schedule documents. It does this on an interval defined by the sweep period. Every execution it will sweep all indices that were registered by plugins extending job scheduler which in turn will sweep the shards for each index. This sweepShard function is the one with the bug that is not handling job documents that were deleted from the index.
Comments:
From: @ftianli-amzn
Thanks @dbbaughe for the detailed and well-organized explanation.
Through your description, I think the problem is caused here: when de-scheduling fails, there is no backup to retry or anything else to deal with the failure .
The text was updated successfully, but these errors were encountered: