Nomad Batch Job Inaccurate Job Summary #13519
Labels
stage/accepted
Confirmed, and intend to work on. No timeline committment though.
theme/batch
Issues related to batch jobs and scheduling
theme/job-summary
type/bug
Nomad version
Nomad Version 1.3.1
Issue
One of our Nomad Batch Jobs seems to have acquired a strange state in regards to the number of Dead and Running jobs within the Nomad Job summary. Below is a screenshot of this state from our cluster:
When running
nomad operator api /v1/job/<job_id>/summary
, this is the output:{"JobID":"<job_id>","Namespace":"default","Summary":{"<job>":{"Queued":0,"Complete":0,"Failed":0,"Running":0,"Starting":0,"Lost":0,"Unknown":0}},"Children":{"Pending":0,"Running":-182185,"Dead":891696},"CreateIndex":1514,"ModifyIndex":15192955}
After running
nomad system reconcile summaries
the state seems to be in a much healthier status. Runningnomad operator api /v1/job/<job_id>/summary
produces this new output:{"JobID":"<job_id>","Namespace":"default","Summary":{},"Children":{"Pending":0,"Running":0,"Dead":132},"CreateIndex":1514,"ModifyIndex":15193265}
How does nomad get into this state and does this mean we need to run the reconcile summaries at some point in the future?
The text was updated successfully, but these errors were encountered: